| Model Type | |
| Use Cases |
| Areas: | |
| Applications: | | Code reasoning, code fixing, real-world applications like Code Agents |
|
| Primary Use Cases: | | Coding enhancements, mathematics and general competencies |
|
| Limitations: | | Performance impact on shorter texts with static YaRN configuration |
|
| Considerations: | | Advised `rope_scaling` configuration only for long context processing |
|
|
| Supported Languages | |
| Training Details |
| Data Sources: | | source code, text-code grounding, Synthetic data |
|
| Data Volume: | |
| Methodology: | | Pretraining & Post-training |
|
| Context Length: | |
| Model Architecture: | | transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias |
|
|
| Input Output |
| Performance Tips: | | To enable YaRN, add specific configurations to `config.json` |
|
|