| Model Type | | text generation, bilingual, mixture of experts |
|
| Use Cases |
| Areas: | | Research, Commercial Applications |
|
| Applications: | | Text Generation, Bilingual Applications |
|
| Primary Use Cases: | |
| Limitations: | | No moderation mechanisms implemented |
|
| Considerations: | | Community engagement to enable moderation for deployment in sensitive environments |
|
|
| Additional Notes | | Includes EfficientScale training methodology. |
|
| Supported Languages | | en (proficient), zh (proficient) |
|
| Training Details |
| Data Sources: | | RedPajama-Data-V2, falcon-refinedweb, C4, Pile, WuDaoCorporaText, ChineseWebText |
|
| Data Volume: | |
| Methodology: | | EfficientScale training pipeline with Scale-Up and Scale-Out phases |
|
| Context Length: | |
| Model Architecture: | |
|
| Input Output |
| Input Format: | | Textual input in English or Chinese |
|
| Accepted Modalities: | |
| Output Format: | |
|
| Release Notes |
| Version: | |
| Notes: | | Initial dense model training |
|
| Version: | |
| Notes: | | Expanded dense model training |
|
| Version: | |
| Notes: | | Final model with Mixture of Experts |
|
|
|