Model Type | |
Use Cases |
Areas: | |
Applications: | Code reasoning, code fixing, real-world applications like Code Agents |
|
Primary Use Cases: | Coding enhancements, mathematics and general competencies |
|
Limitations: | Performance impact on shorter texts with static YaRN configuration |
|
Considerations: | Advised `rope_scaling` configuration only for long context processing |
|
|
Supported Languages | |
Training Details |
Data Sources: | source code, text-code grounding, Synthetic data |
|
Data Volume: | |
Methodology: | Pretraining & Post-training |
|
Context Length: | |
Model Architecture: | transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias |
|
|
Input Output |
Performance Tips: | To enable YaRN, add specific configurations to `config.json` |
|
|