| Model Type | |
| Use Cases |
| Areas: | |
| Applications: | | summarization, text generation, chatbot |
|
| Limitations: | | Not suitable for production use without adequate assessment of risks |
|
|
| Additional Notes | | The model requires further fine-tuning for specific use cases and includes biases representative of web data. |
|
| Supported Languages | | en (English), de (German), es (Spanish), fr (French), it (Italian), nl (Dutch), pl (Polish), pt (Portuguese), ro (Romanian), cs (Czech), sv (Swedish) |
|
| Training Details |
| Data Sources: | | RefinedWeb, RefinedWeb-English, Refined Web-Europe (cs, de, es, fr, it, nl, pl, pt, ro, sv), high quality technical data, code data, conversational data extracted from public sources |
|
| Data Volume: | |
| Methodology: | | four stage training strategy |
|
| Context Length: | |
| Training Time: | |
| Hardware Used: | |
| Model Architecture: | | Adapted from GPT-3 with rotary positional embeddings, multiquery attention, and FlashAttention2 |
|
|
| Input Output |
| Input Format: | | Token-based input with context length up to 8192 tokens |
|
| Accepted Modalities: | |
| Output Format: | |
| Performance Tips: | | Fine-tuning recommended for specific tasks |
|
|