GPT Sw3 1.3B is an open-source language model by AI-Sweden-Models. Features: 1.3b LLM, VRAM: 5.5GB, License: apache-2.0, LLM Explorer Score: 0.1, Arc: 30.4, HellaSwag: 50.4, MMLU: 26.1, GSM8K: 0.1.
GPT-SW3 is a collection of large decoder-only pretrained transformer language models trained on a dataset containing 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code.
Supported Languages
da (unknown), sv (unknown), no (unknown), en (unknown), is (unknown)
Training Details
Data Sources:
Books, Litteraturbanken, The Pile, Diva, The Pile: PubMed, The Pile: ArXiv, Code Parrot: Github code, Familjeliv, Flashback, Datasets collected through Parlai, Pushshift.io Reddit, English Math dataset generated with code from DeepMind, Swedish Math dataset, Summarization data
Data Volume:
320B tokens
Methodology:
Causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation
Note: green Score (e.g. "73.2") means that the model is better than AI-Sweden-Models/gpt-sw3-1.3b.
Rank the GPT Sw3 1.3B Capabilities
🆘 Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! 🌟
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 54677 in total.