| Model Type | |
| Use Cases |
| Areas: | | chatbot, instruction-following |
|
| Applications: | | chat, instruction-based interactions |
|
| Primary Use Cases: | | ready-to-use chat/instruct model |
|
| Limitations: | | Model mostly trained on English data, may not generalize well to other languages |
|
| Considerations: | | Develop guardrails and take precautions for production use. |
|
|
| Additional Notes | | Instruct model, not ideal for further finetuning. Optimized architecture for inference featuring FlashAttention and multiquery. |
|
| Supported Languages | | English (primary), French (secondary) |
|
| Training Details |
| Data Sources: | | Baize instruction dataset, RefinedWeb |
|
| Data Volume: | | 150M tokens from Baize mixed with 5% RefinedWeb |
|
| Methodology: | | Finetuned on a mixture of chat data with 5% RefinedWeb |
|
| Context Length: | |
| Hardware Used: | | 64 A100 40GB GPUs on AWS SageMaker |
|
| Model Architecture: | | Causal decoder-only with adaptations from GPT-3, including rotary embeddings, multiquery attention, FlashAttention, and a single layer norm with parallel attention/MLP |
|
|
| Input Output | |