| Model Type | | reward_model, evaluation, reranking, instruction |
|
| Use Cases |
| Areas: | | research, commercial applications |
|
| Applications: | | LLM evaluation, decoding enhancement, instruction alignment |
|
| Primary Use Cases: | | ranking output candidates, enhancing decoding processes, aligning models with RLHF methods |
|
|
| Supported Languages | |
| Training Details |
| Data Sources: | | openai/summarize_from_feedback, openai/webgpt_comparisons, Dahoas/synthetic-instruct-gptj-pairwise, Anthropic/hh-rlhf, lmsys/chatbot_arena_conversations, openbmb/UltraFeedback |
|
| Methodology: | | Pairwise comparison approach with bidirectional attention |
|
| Context Length: | |
| Hardware Used: | |
| Model Architecture: | | Pairwise comparison through bidirectional attention |
|
|
| Input Output |
| Input Format: | | Instruction and a pair of output candidates |
|
| Accepted Modalities: | |
| Output Format: | |
|