Model Type | |
Use Cases |
Areas: | |
Applications: | NLP tasks, Large language model research |
|
Primary Use Cases: | Instruct or chat model use |
|
Limitations: | Limited non-English language capabilities, Biases due to representative training corpora |
|
Considerations: | Implement appropriate guardrails for production use. |
|
|
Additional Notes | Initial evaluation shows deterioration in arithmetic, stable performance in common sense and reasoning. |
|
Supported Languages | primary_languages (en, fr, de, es, it), additional_languages (pt, pl, nl, ro, cz, sv) |
|
Training Details |
Data Sources: | Anthropic/hh-rlhf, OpenAssistant/oasst1, databricks/databricks-dolly-15k, NatInstV2, momentum-internal |
|
Methodology: | Reinforcement Learning from Human Feedback (RLHF) |
|
Training Time: | |
Hardware Used: | |
Model Architecture: | Causal decoder-only; value network initialized from reward model |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Use prefix for chat mode. |
|
|