| Model Type | |
| Use Cases |
| Considerations: | | Fine-tuning is recommended for specific tasks. |
|
|
| Additional Notes | | The checkpoint is the 'raw' pre-trained model and has not been tuned to a more specific task, indicating it should be fine-tuned before use in most cases. |
|
| Supported Languages | |
| Training Details |
| Data Sources: | | JeanKaddour/minipile, pszemraj/simple_wikipedia_LM, BEE-spoke-data/wikipedia-20230901.en-deduped, mattymchen/refinedweb-3m |
|
| Methodology: | |
| Context Length: | |
| Training Time: | |
| Hardware Used: | |
| Model Architecture: | | 768 hidden size, 6 layers, GQA (24 heads, 8 key-value) |
|
|
| Release Notes |
| Version: | |
| Date: | |
| Notes: | | smol_llama-101M-GQA (First version), a small 101M param decoder model. |
|
|
|