| Model Type | | Mixture-of-Experts, Language Model |
|
| Use Cases |
| Areas: | | Research, Commercial Applications, Chatbots |
|
| Applications: | | Language Understanding, Code Generation, Translation, Economical AI Applications |
|
| Primary Use Cases: | | Text Generation, Conversation AI, Code Classification |
|
| Limitations: | | Reduced performance on low resource languages and contexts, Complex and high resource computations |
|
| Considerations: | | Utilize on recommended hardware for efficiency. |
|
|
| Additional Notes | | The model focuses on efficiency with a large parameter architecture for high performance. |
|
| Supported Languages | | English (Advanced), Chinese (Advanced), Code (Intermediate) |
|
| Training Details |
| Data Sources: | |
| Data Volume: | |
| Methodology: | | Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) |
|
| Context Length: | |
| Model Architecture: | | Multi-head Latent Attention and DeepSeekMoE architecture |
|
|
| Safety Evaluation |
| Methodologies: | | Benchmarks, Comparison Tests, Open-ended Evaluation |
|
| Findings: | | Effective performance on both language and coding benchmarks |
|
| Risk Categories: | | Fairness, Bias, Misinformation |
|
| Ethical Considerations: | | Contains Responsible AI guidelines |
|
|
| Responsible Ai Considerations |
| Fairness: | | Benchmarks evaluate across languages and use cases. |
|
| Transparency: | | Performance and architecture details publicly shared. |
|
| Accountability: | | DeepSeek-AI is accountable for model's outputs. |
|
| Mitigation Strategies: | | Regular updates and evaluation on fairness and bias. |
|
|
| Input Output |
| Input Format: | | Supports text input, prompts for chat |
|
| Accepted Modalities: | |
| Output Format: | | Generated text with coherent structure |
|
| Performance Tips: | | Use multipliers and optimization libraries for GPU. |
|
|
| Release Notes |
| Version: | |
| Date: | |
| Notes: | | Introduction of Mixture-of-Experts, Enhanced efficiency and reduced training costs. |
|
|
|