| Model Type |  | 
| Use Cases | 
| Areas: | | research, commercial applications | 
 |  | Applications: | | code generation, code reasoning, code fixing, code agents | 
 |  | Primary Use Cases: | | coding capabilities, mathematics, general competencies | 
 |  | Limitations: | | Not recommended for conversations | 
 |  | Considerations: | | Post-training or specific task tuning is recommended for certain applications | 
 |  | 
| Additional Notes | | The model's architecture includes RoPE, SwiGLU, RMSNorm, and Attention QKV bias. | 
 | 
| Supported Languages |  | 
| Training Details | 
| Data Sources: | | source code, text-code grounding, synthetic data | 
 |  | Data Volume: |  |  | Methodology: | | transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias architecture | 
 |  | Context Length: |  |  | Model Architecture: | | transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias | 
 |  | 
| Input Output | 
| Input Format: | | Supports up to 128K tokens in context length. | 
 |  | Accepted Modalities: |  |  | Output Format: |  |  | Performance Tips: | | Use 'rope_scaling' for handling long contexts optimally. | 
 |  |