| Model Type |  | 
| Use Cases | 
| Areas: |  |  | Primary Use Cases: | | Memory/compute constrained environments, Latency bound scenarios, Strong reasoning (code, math, logic) | 
 |  | Limitations: | | Languages other than English will experience worse performance, Reinforcement of stereotypes | 
 |  | Considerations: | | Common limitations of language models should be considered. | 
 |  | 
| Additional Notes | | This is a static model trained on an offline dataset with cutoff date October 2023. Future versions may be released. | 
 | 
| Supported Languages |  | 
| Training Details | 
| Data Sources: | | synthetic data, filtered publicly available websites | 
 |  | Data Volume: |  |  | Methodology: | | Fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) | 
 |  | Context Length: |  |  | Training Time: |  |  | Hardware Used: |  |  | Model Architecture: | | Dense decoder-only Transformer model with alternating dense and blocksparse attentions | 
 |  | 
| Responsible Ai Considerations | 
| Fairness: | | Potential bias due to the training data's representation. | 
 |  | Mitigation Strategies: | | Post-training for safety measures. | 
 |  | 
| Input Output | 
| Input Format: |  |  | Accepted Modalities: |  |  | Output Format: |  |  | 
| Release Notes | | 
| Version: |  |  | Notes: | | Model weight is released. | 
 |  | 
 |