| Model Type | | multilingual, text generation |
|
| Use Cases |
| Areas: | | research, commercial applications |
|
| Considerations: | | Ensure conformity to safety measures and legal requirements. |
|
|
| Additional Notes | | Model supports multi-round dialog capabilities with optimizations for mask loss training. |
|
| Supported Languages | | Chinese (high), English (high) |
|
| Training Details |
| Data Sources: | | Chinese and English high-quality data |
|
| Data Volume: | |
| Methodology: | |
| Hardware Used: | | 4x 40G A100 GPUs, DeepSpeed |
|
| Model Architecture: | | Decoder-only, using Rotary Embedding and SwiGLU activation |
|
|
| Input Output | |
| Release Notes |
| Version: | |
| Date: | |
| Notes: | | Release of the 52B version chat model. |
|
|
|