| Model Type | | Transformer-based, Autoregressive, Language Model |
|
| Use Cases |
| Areas: | |
| Primary Use Cases: | | Feature extraction, downstream task learning |
|
| Limitations: | | Not for direct deployment, Possible biases and inaccuracies |
|
|
| Additional Notes | | Not deduplicated before training, which might influence the model's output integrity. |
|
| Supported Languages | |
| Training Details |
| Data Sources: | |
| Data Volume: | |
| Methodology: | | Auto-regressive training using the GPT-NeoX library |
|
| Context Length: | |
| Training Time: | | Approx. 150,000 steps with 1538 sequences per step |
|
| Model Architecture: | | Similar to GPT-3, with specifics available in the paper. |
|
|
| Safety Evaluation |
| Risk Categories: | |
| Ethical Considerations: | | Various biases related to gender, religion, and race as discussed in the Pile paper. |
|
|
| Responsible Ai Considerations |
| Fairness: | | See Pile paper for discussion on biases |
|
|
| Input Output |
| Input Format: | | Text input required for prompts |
|
| Accepted Modalities: | |
| Output Format: | | Autoregressive next-token predictions |
|
|