| Model Type | |
| Use Cases |
| Areas: | |
| Limitations: | | No safety guarantees for outputs |
|
| Considerations: | | Users should conduct thorough safety testing and implement appropriate filtering. |
|
|
| Additional Notes | | Supports pre-trained and instruction-tuned models with sizes varying from 270M to 3B parameters. Package includes data prep, training, fine-tuning, evaluation, checkpoints, and logs. |
|
| Training Details |
| Data Sources: | | RefinedWeb, deduplicated PILE, subset of RedPajama, subset of Dolma v1.6 |
|
| Data Volume: | |
| Methodology: | | layer-wise scaling strategy within transformer layers |
|
|
| Input Output |
| Input Format: | |
| Accepted Modalities: | |
| Output Format: | |
| Performance Tips: | | Use appropriate batch sizes and token speculation for faster generation. |
|
|