| Model Type | |
| Use Cases |
| Areas: | |
| Applications: | |
| Primary Use Cases: | | Instruction tuned models for assistant-like tasks |
|
| Limitations: | | Use outside English laid out by Acceptable Use Policy |
|
| Considerations: | | Developers can fine-tune models for non-English languages adhering to license policy |
|
|
| Supported Languages | |
| Training Details |
| Data Sources: | |
| Data Volume: | | 15 trillion tokens of pretraining data |
|
| Methodology: | | Progressive training on increasing context lengths, NTK-aware interpolation to initialize RoPE theta |
|
| Context Length: | |
| Training Time: | |
| Hardware Used: | | Crusoe Energy high performance L40S cluster (GPU), Meta's Research SuperCluster (H100-80GB GPUs) |
|
| Model Architecture: | | Optimized transformer architecture using NTK-aware interpolation and RoPE theta optimization |
|
|
| Safety Evaluation |
| Methodologies: | | Red teaming, Adversarial tests |
|
| Findings: | | Residual risks are minimized, focus on limiting false refusals and maintaining model helpfulness |
|
| Risk Categories: | | Cybersecurity risks, Child safety risks, CBRNE hazards |
|
| Ethical Considerations: | | Transparency, rapid feedback loops, community collaboration for safety |
|
|
| Responsible Ai Considerations |
| Fairness: | | Model designed to be helpful and unbiased across different use cases |
|
| Transparency: | | Open approach with community feedback to ensure improvements in safety and efficiency |
|
| Accountability: | | Meta ensures accountability through detailed Responsible Use Guide and community interactions |
|
| Mitigation Strategies: | | Deployment of Meta Llama Guard 2 and Code Shield safeguards |
|
|
| Input Output |
| Input Format: | |
| Accepted Modalities: | |
| Output Format: | |
| Performance Tips: | | Optimize inputs for long context handling utilizing model's capability |
|
|
| Release Notes |
| Version: | |
| Date: | |
| Notes: | | Extended context, improved training efficiency with long contexts |
|
|
|