| Model Type | |
| Use Cases |
| Areas: | |
| Primary Use Cases: | | Memory/compute constrained environments, Latency bound scenarios, Strong reasoning (code, math, logic) |
|
| Limitations: | | Languages other than English will experience worse performance, Reinforcement of stereotypes |
|
| Considerations: | | Common limitations of language models should be considered. |
|
|
| Additional Notes | | This is a static model trained on an offline dataset with cutoff date October 2023. Future versions may be released. |
|
| Supported Languages | |
| Training Details |
| Data Sources: | | synthetic data, filtered publicly available websites |
|
| Data Volume: | |
| Methodology: | | Fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) |
|
| Context Length: | |
| Training Time: | |
| Hardware Used: | |
| Model Architecture: | | Dense decoder-only Transformer model with alternating dense and blocksparse attentions |
|
|
| Responsible Ai Considerations |
| Fairness: | | Potential bias due to the training data's representation. |
|
| Mitigation Strategies: | | Post-training for safety measures. |
|
|
| Input Output |
| Input Format: | |
| Accepted Modalities: | |
| Output Format: | |
|
| Release Notes |
| Version: | |
| Notes: | | Model weight is released. |
|
|
|