| Model Type | | Chat model, Text generation |
|
| Use Cases |
| Areas: | | Chat applications, Creative content generation |
|
| Applications: | | Commercial applications, Research, Educational tools |
|
| Primary Use Cases: | | Chatbots, Virtual assistants, Story generation |
|
| Limitations: | | Potential for hallucination, May produce inconsistent outputs |
|
| Considerations: | | Adjust generation parameters for desired output qualities. |
|
|
| Additional Notes | | Models do not directly use Llama's weights; unique datasets and training infrastructure emphasize Yi's independent development. |
|
| Supported Languages | | English (Fluent), Chinese (Fluent) |
|
| Training Details |
| Data Sources: | | Trainer Multilingual Corpora, 3T Tokens |
|
| Data Volume: | |
| Methodology: | | Transformer-based architecture |
|
| Context Length: | |
| Training Time: | |
| Hardware Used: | | NVIDIA A800 (80GB), 4090 GPU |
|
| Model Architecture: | | Based on Llama's architecture |
|
|
| Responsible Ai Considerations |
| Fairness: | | Addressed during model development. |
|
| Transparency: | | Standard Transformer architecture; detailed in tech report. |
|
| Accountability: | |
| Mitigation Strategies: | | Use of Supervised Fine-Tuning for better accuracy. |
|
|
| Input Output |
| Input Format: | | Interactive prompt conversation |
|
| Accepted Modalities: | |
| Output Format: | | Text responses or follow-ups |
|
| Performance Tips: | | Calibrate temperature, top_p, top_k settings for desired response diversity. |
|
|
| Release Notes |
| Version: | |
| Date: | |
| Notes: | | Initial open-source release of chat model, supporting both 4-bit and 8-bit quantizations. |
|
| Version: | |
| Date: | |
| Notes: | | Improved performance in coding, math, and reasoning with larger context capabilities. |
|
|
|