| Model Type | |
| Use Cases | |
| Additional Notes | | Pretrained base model with no moderation mechanisms. Mistral Nemo requires smaller temperatures during use; recommended temperature is 0.3. |
|
| Supported Languages | | en (yes), fr (yes), de (yes), es (yes), it (yes), pt (yes), ru (yes), zh (yes), ja (yes) |
|
| Training Details |
| Data Sources: | | multilingual and code data |
|
| Context Length: | |
| Model Architecture: | | Transformer - 40 layers, 5,120 dimensions, 32 heads, 128 head dim, 14,436 hidden dim, SwiGLU activation, 128k vocabulary, Rotary embeddings |
|
|