Large Language Models for Norwegian Contexts
Large Language Models for Norwegian Contexts
By Svein Arne Brygfjeld, Special Advisor on AI, the National Library of Norway

Preconditions and Opportunities
Large language models and the opportunities they bring have become a popular public talking point ever since OpenAI launched ChatGPT. The technology has quickly gained traction in many areas of society, both in Norway and internationally. A key question that arises is whether Norway, as a nation, can afford to be entirely dependent on major tech companies located in completely different parts of the world.
Large language models reflect not only language but also society. The models absorb many characteristics from their training data and acquire properties that make them applicable in many fields.
A double-edged sword
This is also a double-edged sword. If the models are primarily trained on data that reflect English-speaking cultures, they will naturally perform in ways that mirror those cultures. Additionally, international services appear to adapt to local or national political conditions, exemplified by DeepSeek’s way of covering Chinese political issues. Training large language models on content that sufficiently reflects the Norwegian languages, and Norwegian culture is becoming increasingly important for Norway as a nation.
Norway is relatively digitized as a country. This means that, in principle, there is abundant access to digital content that can be used as training data for language models. We also have institutions entrusted with a special societal responsibility for such content, such as the National Library of Norway (Nasjonalbiblioteket), which by law is tasked with collecting and preserving published material. The National Library is also in a unique position internationally due to its digitization program. So far, this program has resulted in the digitization of nearly all published books in Norway, along with large portions of newspapers and journals published in print. Thus, the foundation for establishing a robust data platform for training language models is quite strong in Norway, especially considering the prevalence of the Norwegian languages.
Limitations
However, as is generally the case, there are limitations regarding the right to use content for training. In most cases, it is not possible to use content from more recent literature and newspapers to train language models without infringing upon the rights of those who hold them—typically authors, publishers, and media companies.
With a view to possibly establishing a compensation scheme for using copyrighted content to train language models, the government asked the National Library in late 2023 to carry out a research project to explore how beneficial such content might be. Along with the Language Technology Group (LTG) at the University of Oslo, NorwAI, and Sigma2, the National Library conducted a research project called Mimir in the first half of 2024. During this short period, new training datasets were created based on the Library’s collection and data harvested from the Internet, new evaluation data were developed for Norwegian language and Norwegian contexts, a total of 17 large language models were trained, all were evaluated using the same methodology, and the project was summarized and documented. The research groups at NorwAI and the University of Oslo provided substantial voluntary resources in what should be viewed as a national effort.
Negotiations
Rights-holders, both from the newspaper and literary sectors, were able to closely follow the project. Through meetings during the project’s execution, they could monitor and to some extent influence the process. This was an important measure to prepare for any subsequent negotiations. In collaboration with the rights-holders’ organizations, and based on Mimir’s results, the National Library established principles and a shared understanding during autumn 2024 for a potential agreement on using copyrighted content in training language models. This effort also aims to better facilitate further research in this area.
This work contributed to the Norwegian Parliament (Stortinget), in its 2025 national budget, asking the National Library to provide language models for use in Norwegian society. Initially, NOK 20 million has been allocated for organizational and internal computing resources at the National Library, and NOK 20 million for computing resources through Sigma2. Compared to the major international tech giants, these amounts are small, but in a Norwegian national context, it could be an effective measure to ensure access to large language models tailored to Norwegian needs.
Additionally, based on experiences from Mimir and the processes with the rights-holders, the government has asked the National Library to negotiate with rights-holders concerning the use of copyrighted content. The Library expects these negotiations to take place during the first few months of 2025. If successful agreements are reached and the Parliament provides funding, it will create entirely new opportunities for developing language models adapted to Norwegian contexts that reflect our current era.
The Sami challenge
Both the work on making language models available to the public and the negotiations over using content for training include the Sami languages used in Norway. This is strongly advocated at the political level, yet it presents unique technological challenges. The amount of digital content available for training Sami language models is small, and for some Sami languages, it is insufficient. These challenges illustrate the need for close collaboration between the National Library, as the provider of language models to Norwegian society, and strong research communities in the field.
Based on the practical experiences from their collaboration, the National Library, together with NorwAI and UiO/LTG, has taken the initiative to apply for funding to establish a research center for language modeling. Several other institutions, both nationally and internationally, support the application to the Research Council of Norway. The application also spans fields such as law, education, and Sami languages.
Norway has a long and commendable tradition of cooperation. The initiatives we now see in the field of language models are very positive, and they point toward robust development in Norway. Together with recent news about language models that require less computational power for both training and use, Norway as a nation should be well equipped to develop and employ language models at the highest international level.
2025-03-12