| Model Type | | Transformer-based Language Model |
|
| Use Cases |
| Areas: | | Scientific Research, Interpretability Research |
|
| Applications: | | Research on behavior, functionality, limitations of large language models |
|
| Primary Use Cases: | | Controlled scientific experiments |
|
| Limitations: | | Not suitable for human-facing interactions, English language-only models, unsuitable for generating text in other languages, Not fine-tuned for genre prose or commercial chatbots |
|
| Considerations: | | Conduct risk and bias assessment if fine-tuning; evaluate risks before deployment |
|
|
| Additional Notes | | Pythia model suite renamed in January 2023 for clarity |
|
| Supported Languages | |
| Training Details |
| Data Sources: | | The Pile, 22 diverse sources including arXiv, CommonCrawl, Project Gutenberg, YouTube subtitles, GitHub |
|
| Data Volume: | |
| Model Architecture: | |
|
| Responsible Ai Considerations |
| Fairness: | | Documented biases with regards to gender, religion, and race (as per Pile paper). |
|
|
| Input Output |
| Input Format: | | String of text for next token prediction. |
|
| Accepted Modalities: | |
| Output Format: | | String (one token at a time) |
|
|
| Release Notes |
| Version: | |
| Date: | |
| Notes: | | Pyhtia-160M retrained to address hyperparameter discrepancies |
|
|
|