| Model Type | | Large Language Model, Text Generation |
|
| Use Cases |
| Areas: | | Research, Commercial applications |
|
| Applications: | | General English-language tasks, Coding tasks |
|
| Primary Use Cases: | | Few-turn question answering |
|
| Limitations: | | Not tested for non-English proficiency, No multimodal capabilities |
|
| Considerations: | | Evaluation around safety for specific applications recommended |
|
|
| Supported Languages | | English (General Text Processing) |
|
| Training Details |
| Data Sources: | | 12T tokens of text and code |
|
| Data Volume: | |
| Methodology: | | Curriculum learning, changed data mix during training |
|
| Context Length: | |
| Hardware Used: | | Databricks infrastructure |
|
| Model Architecture: | | Fine-grained MoE with 16 experts |
|
|
| Input Output |
| Input Format: | | Text-based, up to 32768 tokens |
|
| Accepted Modalities: | |
| Output Format: | |
| Performance Tips: | | Using FlashAttention2 is recommended for faster inference. |
|
|
| Release Notes |
| Version: | |
| Notes: | | Instruction finetuned, Mixture-of-experts model |
|
|
|