| Model Type | |
| Use Cases |
| Areas: | | On-device computing, Instruction following |
|
| Applications: | | Summarization, Text rewriting, Function calling |
|
| Primary Use Cases: | | English content generation, Instruction following |
|
| Limitations: | | Models primarily understand and generate English., Content may not be factually accurate or logically consistent., Presence of biases inherent to training data. |
|
| Considerations: | | Models are assistive tools and should not be used as definitive information sources. |
|
|
| Additional Notes | | Memory footprint of the 135M model is 723.56 MB when loaded. |
|
| Supported Languages | |
| Training Details |
| Data Sources: | | FineWeb-Edu, DCLM, The Stack, UltraFeedback |
|
| Data Volume: | |
| Methodology: | | Direct Preference Optimization (DPO), Supervised Fine-tuning (SFT) |
|
| Context Length: | |
| Hardware Used: | |
| Model Architecture: | |
|
| Input Output |
| Input Format: | | Token sequences encoded with a tokenizer |
|
| Accepted Modalities: | |
| Output Format: | | Generated token sequences |
|
| Performance Tips: | | Use multiple GPUs and specific precision settings (e.g., bfloat16) for optimal performance |
|
|