research, evaluation of Nordic language capabilities
Primary Use Cases:
Research on LLMs in Nordic languages, Validation of model capabilities
Limitations:
Bias, Safety issues, Generation diversity, Hallucination, Possibility of harmful or inappropriate content
Supported Languages
da (proficient), sv (proficient), no (proficient), en (proficient), is (proficient)
Training Details
Data Sources:
Litteraturbanken, The Pile, Diva, PubMed, ArXiv, CodeParrot, Familjeliv, Flashback, Parlai, Pushshift.io Reddit dataset, English Math dataset from DeepMind, Swedish Math dataset, OPUS, Movie scripts, Natural Instructions, P3, Norwegian Colossal Corpus, Danish Gigaword, Icelandic Gigaword, Common Crawl, LES, Multilingual C4, OSCAR, Open Web Text, Various public Swedish website scrapes, JobTech/Arbetsförmedlingen, Wikipedia
Data Volume:
1.1TB of UTF-8 encoded text containing 660M documents with a total of 320B tokens
Methodology:
Pretrained using a causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation.
🆘 Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! 🌟
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 52758 in total.