NorLLM
NorLLM
SFI NorwAI – the Norwegian Research Center for AI Innovation – is working intensively to develop generative language models that can benefit Norwegian society. Generative language models gained widespread attention in 2023 after the international breakthrough of the large language models from OpenAI. In this section, we summarize the current status of NorwAI’s work with NorLLM (Norwegian Large Language Models).
Our key messages are as follows:
- Norway needs control over its own generative language models which are built on Norwegian data and values.
- We have a well-functioning system for collecting and managing published content for use in large language models.
- Lack of computational resources hinders both training and operation of large language models in Norway.
- There is a need for structures and mechanisms to ensure that training data, fine-tuning data, and align methods align with Norwegian values and support open models.
- NorwAI with its partners have the necessary expertise and experience and aims to develop Norwegian language models for the benefit of Norwegian society.
Access the NorLLM models
The models are available to test, for representatives from organizations based in the Nordic countries and students at Nordic universities. Apply for access to the NorLLM models on Hugging Face: https://huggingface.co/NorwAI
Technical inquiries
Inquiries about language models in general
National launch of the next gen NorLLM models
On May 15th NorwAI will present and launch the next generation of its NorLLM models. In addition, a group of partners, cooperating companies and organizations will present projects and plans for their use of the models.
Language models are taking off
The NorwAI language model activities received national attention in 2023. The interest continues into 2024 with VIP political visits.
Four models built - four new ones in the pipeline
NorwAI has built four distinct Norwegian generative language models. During winter of 2023/2024, an additional four models are being developed which will be made available in spring 2024. Collectively, these eight models represent steps toward NorwAI’s ambition to build a comprehensive generative base model for general use, with approximately 40 billion parameters by the end of 2024.
Lessons learned about Language Models
Interesting aspects are coming to light working with language models connected to transparency, copyrights, sustainability, values and norms and language variants.
Requirements for Large Language Models
If you are to build an environment for training and operation of Norwegian, commercially available language models, you must have access to resources:
The demand for Norwegian Models
NorwAI has been approached by several public organizations and private enterprises seeking an alternative to international models.
These entities have primarily raised two concerns regarding existing commercial models:
(i) handling sensitive and copyrighted data
(ii) the lack of quality in Norwegian language generation.
The Language Council of Norway about domain-competent generative language technology
Åse Wetås, Director of the Language Council of Norway, discusses the importance of developing language models that can handle specialized terminology and language for professional use across various societal sectors.
How can (Norw)AI protect personal data?
Protecting personal information is challenging with complex AI models that are hungry for data. NorwAI’s pledge to provide an individualized AI experience that provably respects privacy concerns is therefore more important than ever.
By Anders Løland, Research director, Norwegian Computing Center (NR)
Harmful behavior in language models
The responses from a language model reflect the data that goes into its training set. If the training data is incomplete, the model will combine words based on statistical probabilities and construct sentences that may be both plausible and grammatically correct but have little to do with reality.
The project “MIMIR” on copyrighted content
At the end of 2023, an initiative emerged that brought the three most active environments in Norway with expertise in language models to collaborate more closely. The “Mimir” project united the National Library of Norway, the University of Oslo, and NorwAI in a joint effort.
TrustLLM - An EU-Project as an answer to Generative AI hallucinations
The last two years have seen the rise of Generative AI. Many models provide useful functions but tend to make up facts and respond overly confidently. How to mitigate that risk?
In November 2023, a consortium with partners from Norway, Germany, Sweden, Iceland, Denmark, and the Netherlands kicked off the Horizon Europe funded project to develop open, trustworthy, and sustainable Large Language Models (LLMs).