PairRM by llm-blender

 ยป  All LLMs  ยป  llm-blender  ยป  PairRM   URL Share it on

  Arxiv:2112.09332   Arxiv:2306.02561   Dataset:anthropic/hh-rlhf Dataset:dahoas/synthetic-instr... Dataset:lmsys/chatbot arena co... Dataset:openai/summarize from ... Dataset:openai/webgpt comparis...   Dataset:openbmb/ultrafeedback   Deberta   En   Endpoints compatible   Evaluation   Instruct   Instruction   Region:us   Reranking   Reward-model   Reward model   Rlhf   Safetensors
Model Card on HF ๐Ÿค—: https://huggingface.co/llm-blender/PairRM 

PairRM Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
PairRM (llm-blender/PairRM)
๐ŸŒŸ Advertise your project ๐Ÿš€

PairRM Parameters and Internals

Model Type 
reward_model, evaluation, reranking, instruction
Use Cases 
Areas:
research, commercial applications
Applications:
LLM evaluation, decoding enhancement, instruction alignment
Primary Use Cases:
ranking output candidates, enhancing decoding processes, aligning models with RLHF methods
Supported Languages 
en (proficient)
Training Details 
Data Sources:
openai/summarize_from_feedback, openai/webgpt_comparisons, Dahoas/synthetic-instruct-gptj-pairwise, Anthropic/hh-rlhf, lmsys/chatbot_arena_conversations, openbmb/UltraFeedback
Methodology:
Pairwise comparison approach with bidirectional attention
Context Length:
2048
Hardware Used:
super-efficient hardware
Model Architecture:
Pairwise comparison through bidirectional attention
Input Output 
Input Format:
Instruction and a pair of output candidates
Accepted Modalities:
text
Output Format:
Score for each candidate
LLM NamePairRM
Repository ๐Ÿค—https://huggingface.co/llm-blender/PairRM 
Model Namemicrosoft/deberta-v3-large
Model Size436m
Required VRAM1.7 GB
Updated2025-08-18
Maintainerllm-blender
Model Typedeberta
Instruction-BasedYes
Model Files  1.7 GB   0.0 GB
Supported Languagesen
Model ArchitectureAutoModel
Licensemit
Tokenizer ClassDebertaV2Tokenizer
Padding Token[PAD]

Best Alternatives to PairRM

Best Alternatives
Context / RAM
Downloads
Likes
Autotrain Umberto Proclama0K / 0.9 GB50
Mamba Python0K / 2 GB130
...hi 3 Mini 4K Instruct Ct2 Int80K / 3.8 GB11
...l 8x7B Instruct V0.1 Llamafile0K /  GB81318
CSUMLM0K /  GB31
...hin 2.5 Mixtral 8x7b Llamafile0K /  GB1604
Instruct GPT J0K / 0 GB026
Vigogne Bloom 7b1 Instruct0K / 0.1 GB04
...a Instruction Fine Tune French0K / 0 GB04
MiniMaid L20K / 0 GB22
Note: green Score (e.g. "73.2") means that the model is better than llm-blender/PairRM.

Rank the PairRM Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 50729 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241124