| Model Type | |
| Use Cases |
| Areas: | | Research, Commercial applications |
|
| Primary Use Cases: | | Memory/compute constrained environments, Latency bound scenarios, Strong reasoning tasks |
|
| Limitations: | | Not specifically designed or evaluated for all downstream purposes. |
|
| Considerations: | | Adherence to laws and regulations is required. |
|
|
| Additional Notes | | This is a static model trained on an offline dataset with a cutoff date of October 2023. Future versions may improve upon it. |
|
| Supported Languages | |
| Training Details |
| Data Sources: | | Publicly available documents, Newly created synthetic data, High quality chat format supervised data |
|
| Data Volume: | |
| Methodology: | | Supervised fine-tuning, Direct Preference Optimization |
|
| Context Length: | |
| Training Time: | |
| Hardware Used: | |
| Model Architecture: | | Dense decoder-only Transformer |
|
|
| Responsible Ai Considerations |
| Fairness: | | Models can over- or under-represent groups, erase representation of some groups, or reinforce stereotypes. |
|
| Transparency: | | Inappropriate or offensive content generation potential. |
|
| Accountability: | | Developers need to ensure the model complies with laws and regulations. |
|
| Mitigation Strategies: | | Use safety classifiers or implement custom safety solutions. |
|
|
| Input Output |
| Input Format: | |
| Accepted Modalities: | |
| Output Format: | | Generated text in response to input |
|
| Performance Tips: | | For certain GPUs, call AutoModelForCausalLM.from_pretrained() with attn_implementation="eager". |
|
|
| Release Notes |
| Version: | |
| Notes: | | Improvement in long-context understanding, instruction following, reasoning capability. |
|
|
|