| Model Type | | Transformer-based Language Model |
|
| Use Cases |
| Areas: | |
| Primary Use Cases: | | Behavior and functionality research of large language models |
|
| Limitations: | | Not suitable for human-facing deployment, translation or generating text in other languages |
|
| Considerations: | | Conduct risk and bias assessments when using in downstream applications. |
|
|
| Additional Notes | | Pythia-410M is not tuned for downstream applications like commercial chatbots. |
|
| Supported Languages | | en (Primary language - English) |
|
| Training Details |
| Data Sources: | |
| Data Volume: | |
| Methodology: | | Trained with uniform batch size of 2M tokens. Used Flash Attention. Learning rate schedule decayed to a minimum of 0.1ร maximum LR. |
|
| Training Time: | | 143000 steps at a batch size of 2M |
|
| Model Architecture: | |
|
| Responsible Ai Considerations |
| Fairness: | | Biases regarding gender, religion, and race documented in Section 6 of the Pile paper. |
|
| Transparency: | | Model outputs should not be relied upon for factual accuracy. |
|
| Accountability: | | Users responsible for evaluating and informing audiences about generated outputs. |
|
| Mitigation Strategies: | | Implement risk and bias assessments when using in downstream applications. |
|
|
| Input Output |
| Input Format: | |
| Accepted Modalities: | |
| Output Format: | |
| Performance Tips: | | Always evaluate the outputs for factual accuracy and potential biases. |
|
|
| Release Notes |
| Date: | |
| Notes: | | Pythia models were renamed and parameter counts adjusted for clarity. |
|
| Version: | |
| Notes: | | Early version with hyperparameter discrepancies. |
|
|
|