| Model Type | | Mixture-of-Experts (MoE), code language model |
|
| Use Cases |
| Areas: | | code-specific tasks, math and reasoning, programming languages extension |
|
| Applications: | | AI code assistance, software development, research in code intelligence |
|
| Primary Use Cases: | | Code completion, Code insertion, Chatbot assistance for coding queries |
|
| Limitations: | | Optimal performance requires specified hardware, Compatibility with certain APIs necessary |
|
|
| Additional Notes | | Supported languages expanded to 338 from 86. Allows for commercial use. |
|
| Supported Languages | | languages_supported (, programming languages: 338, extended from 86), competence_level (high proficiency in code-specific tasks) |
|
| Training Details |
| Data Sources: | | DeepSeekMoE framework, intermediate checkpoint of DeepSeek-V2, additional 6 trillion tokens |
|
| Data Volume: | |
| Methodology: | | Mixture-of-experts mechanism for enhanced coding and reasoning |
|
| Context Length: | |
| Hardware Used: | | BF16 format inference requires 8*80GB GPUs |
|
| Model Architecture: | | Mixture-of-Experts with active parameters |
|
|
| Input Output |
| Input Format: | |
| Accepted Modalities: | |
| Output Format: | | Model-generated text responses |
|
| Performance Tips: | | Use of specified HF or vLLM frameworks for optimal inference. |
|
|