| Model Type | |
| Use Cases |
| Areas: | | research, commercial applications |
|
| Applications: | | code generation, code reasoning, code fixing, code agents |
|
| Primary Use Cases: | | coding capabilities, mathematics, general competencies |
|
| Limitations: | | Not recommended for conversations |
|
| Considerations: | | Post-training or specific task tuning is recommended for certain applications |
|
|
| Additional Notes | | The model's architecture includes RoPE, SwiGLU, RMSNorm, and Attention QKV bias. |
|
| Supported Languages | |
| Training Details |
| Data Sources: | | source code, text-code grounding, synthetic data |
|
| Data Volume: | |
| Methodology: | | transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias architecture |
|
| Context Length: | |
| Model Architecture: | | transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias |
|
|
| Input Output |
| Input Format: | | Supports up to 128K tokens in context length. |
|
| Accepted Modalities: | |
| Output Format: | |
| Performance Tips: | | Use 'rope_scaling' for handling long contexts optimally. |
|
|