Model Type | text generation, bilingual, mixture of experts |
|
Use Cases |
Areas: | Research, Commercial Applications |
|
Applications: | Text Generation, Bilingual Applications |
|
Primary Use Cases: | |
Limitations: | No moderation mechanisms implemented |
|
Considerations: | Community engagement to enable moderation for deployment in sensitive environments |
|
|
Additional Notes | Includes EfficientScale training methodology. |
|
Supported Languages | en (proficient), zh (proficient) |
|
Training Details |
Data Sources: | RedPajama-Data-V2, falcon-refinedweb, C4, Pile, WuDaoCorporaText, ChineseWebText |
|
Data Volume: | |
Methodology: | EfficientScale training pipeline with Scale-Up and Scale-Out phases |
|
Context Length: | |
Model Architecture: | |
|
Input Output |
Input Format: | Textual input in English or Chinese |
|
Accepted Modalities: | |
Output Format: | |
|
Release Notes |
Version: | |
Notes: | Initial dense model training |
|
Version: | |
Notes: | Expanded dense model training |
|
Version: | |
Notes: | Final model with Mixture of Experts |
|
|
|