Model Type | instruction tuned, RLHF tuned, chat model |
|
Use Cases |
Limitations: | The model can produce problematic outputs (especially when prompted to do so). |
|
|
Supported Languages | |
Training Details |
Data Sources: | HuggingFaceH4/ultrafeedback_binarized, allenai/tulu-v2-sft-mixture, openbmb/UltraFeedback |
|
Methodology: | Direct Preference Optimization (DPO) |
|
|
Safety Evaluation |
Ethical Considerations: | The Tulu models have not been aligned to generate safe completions within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base Llama 2 models. |
|
|
Input Output |
Input Format: | <|user|> Your message here!
<|assistant|> |
|
Performance Tips: | Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit. |
|
|