| Model Type | | auto-regressive, transformer architecture |
|
| Use Cases |
| Primary Use Cases: | | Research on large language models, Exploring applications such as question answering and reading comprehension, Evaluating and mitigating biases, Determining capabilities and limitations of models |
|
| Limitations: | | Base model not suitable for downstream applications without risk evaluation |
|
|
| Supported Languages | | en (High proficiency), others (Included 20 languages, mainly supports English) |
|
| Training Details |
| Data Sources: | | CCNet, C4, GitHub, Wikipedia, Books, ArXiv, Stack Exchange |
|
| Data Volume: | | 1T tokens with different breakdowns for different model sizes |
|
| Model Architecture: | |
|
| Responsible Ai Considerations |
| Fairness: | | Model reflects biases from web sources. Evaluated biases include gender, religion, race, sexual orientation, age, nationality, disability, physical appearance, and socioeconomic status. |
|
| Transparency: | | Model trained using web-sourced data which may contain biased and harmful content. |
|
| Accountability: | | Use GitHub repository to raise questions or comments. |
|
| Mitigation Strategies: | | Filtered data based on proximity to Wikipedia text using a Kneser-Ney language model and fastText linear classifier. |
|
|