| Model Type | |
| Use Cases |
| Areas: | | research, multilingual tasks, code tasks |
|
| Limitations: | |
| Considerations: | | Model should finely respect guardrails for environments requiring moderated outputs. |
|
|
| Additional Notes | | Trained jointly by Mistral AI and NVIDIA. Drop-in replacement of Mistral 7B. |
|
| Supported Languages | | en (English), fr (French), de (German), es (Spanish), it (Italian), pt (Portuguese), ru (Russian), zh (Chinese), ja (Japanese) |
|
| Training Details |
| Methodology: | | Trained on a large proportion of multilingual and code data |
|
| Context Length: | |
| Model Architecture: | | Transformer model with 40 layers, Dim: 5,120, Head dim: 128, Hidden dim: 14,336, Activation Function: SwiGLU, Number of heads: 32, Number of kv-heads: 8 (GQA), Vocabulary size: 128k (2^17), Rotary embeddings (theta = 1M) |
|
|