Model Type | |
Use Cases | |
Additional Notes | Pretrained base model with no moderation mechanisms. Mistral Nemo requires smaller temperatures during use; recommended temperature is 0.3. |
|
Supported Languages | en (yes), fr (yes), de (yes), es (yes), it (yes), pt (yes), ru (yes), zh (yes), ja (yes) |
|
Training Details |
Data Sources: | multilingual and code data |
|
Context Length: | |
Model Architecture: | Transformer - 40 layers, 5,120 dimensions, 32 heads, 128 head dim, 14,436 hidden dim, SwiGLU activation, 128k vocabulary, Rotary embeddings |
|
|