| Model Type | | causal language model, code generation |
|
| Use Cases |
| Areas: | | real-world applications, code agents |
|
| Applications: | | code generation, code reasoning, code fixing |
|
| Limitations: | | not recommended for conversation without post-training |
|
|
| Additional Notes | | Model supports a long context up to 128k tokens using YaRN for length extrapolation. |
|
| Supported Languages | |
| Training Details |
| Data Sources: | | source code, text-code grounding, synthetic data |
|
| Data Volume: | |
| Methodology: | | Pretraining & Post-training |
|
| Context Length: | |
| Hardware Used: | | transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings |
|
| Model Architecture: | | transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings |
|
|
| Input Output |
| Accepted Modalities: | |
| Performance Tips: | | Recommend using vLLM for deployment with considerations for rope_scaling in long contexts. |
|
|