Model Type | causal language model, code generation |
|
Use Cases |
Areas: | real-world applications, code agents |
|
Applications: | code generation, code reasoning, code fixing |
|
Limitations: | not recommended for conversation without post-training |
|
|
Additional Notes | Model supports a long context up to 128k tokens using YaRN for length extrapolation. |
|
Supported Languages | |
Training Details |
Data Sources: | source code, text-code grounding, synthetic data |
|
Data Volume: | |
Methodology: | Pretraining & Post-training |
|
Context Length: | |
Hardware Used: | transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings |
|
Model Architecture: | transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings |
|
|
Input Output |
Accepted Modalities: | |
Performance Tips: | Recommend using vLLM for deployment with considerations for rope_scaling in long contexts. |
|
|