| Model Type | |
| Use Cases |
| Areas: | | Research, Commercial applications |
|
| Applications: | | Natural language processing, Coding, Mathematics, Chatbots |
|
| Primary Use Cases: | | Generating long texts, Understanding structured data, Multilingual text processing |
|
|
| Supported Languages | | languages_supported (29 languages including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more), proficiency (Multilingual support) |
|
| Training Details |
| Data Sources: | | various sources mentioned in the technical report |
|
| Methodology: | | Pretraining & Post-training |
|
| Context Length: | |
| Model Architecture: | | transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias |
|
|
| Input Output |
| Input Format: | | Chat-based structured prompts |
|
| Accepted Modalities: | |
| Output Format: | | Generated text following the prompt schema with a max of 8192 tokens |
|
| Performance Tips: | | Use vLLM for processing long texts; ensure proper configuration of rope scaling for long contexts |
|
|