Model Type | auto-regressive language model |
|
Use Cases |
Areas: | |
Applications: | large language models exploration |
|
Primary Use Cases: | question answering, natural language understanding, reading comprehension |
|
Limitations: | not trained with human feedback, may generate toxic or offensive content |
|
|
Supported Languages | en (high proficiency), bg (low proficiency), ca (low proficiency), cs (low proficiency), da (low proficiency), de (low proficiency), es (low proficiency), fr (low proficiency), hr (low proficiency), hu (low proficiency), it (low proficiency), nl (low proficiency), pl (low proficiency), pt (low proficiency), ro (low proficiency), ru (low proficiency), sl (low proficiency), sr (low proficiency), sv (low proficiency), uk (low proficiency) |
|
Training Details |
Data Sources: | CCNet, C4, GitHub, Wikipedia, Books, ArXiv, Stack Exchange |
|
Model Architecture: | |
|
Responsible Ai Considerations |
Fairness: | The model is not intended to inform decisions about matters central to human life. We filtered the data from the Web based on its proximity to Wikipedia text and references. |
|
Mitigation Strategies: | Filtered Web data based on proximity to Wikipedia using a Kneser-Ney language model and a fastText linear classifier. |
|
|