| Model Type | |
| Additional Notes | | The model is susceptible to jailbreak attacks and may generate inaccurate or biased content. Strong output validation controls are recommended. |
|
| Training Details |
| Data Sources: | | open source instruction datasets, internally collected synthetic datasets |
|
| Methodology: | | supervised fine-tuning and direct preference optimization |
|
| Training Time: | | between September 4, 2024, and November 10th, 2024. |
|
| Model Architecture: | | Hybrid-head Architecture with standard attention heads and Mamba heads, Grouped-Query Attention (GQA), Rotary Position Embeddings (RoPE) |
|
|
| Responsible Ai Considerations |
| Mitigation Strategies: | | Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and address unforeseen product misuse |
|
|
| Input Output |
| Accepted Modalities: | |
| Performance Tips: | | During generation, the batch size needs to be 1 as the current implementation does not fully support padding of Meta tokens + SWA |
|
|