XVERSE 65B 2 by xverse

 ยป  All LLMs  ยป  xverse  ยป  XVERSE 65B 2   URL Share it on

  Arxiv:2005.14165   Arxiv:2112.11446   Arxiv:2201.11990   Arxiv:2203.15556   Arxiv:2204.02311   Arxiv:2211.05100   Arxiv:2302.13971   Autotrain compatible   Custom code   Pytorch   Region:us   Sharded   Xverse
Model Card on HF ๐Ÿค—: https://huggingface.co/xverse/XVERSE-65B-2 

XVERSE 65B 2 Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
XVERSE 65B 2 (xverse/XVERSE-65B-2)
๐ŸŒŸ Advertise your project ๐Ÿš€

XVERSE 65B 2 Parameters and Internals

Model Type 
Decoder-only Transformer, Multilingual
Use Cases 
Areas:
Research, Commercial applications
Applications:
Multilingual processing tasks
Primary Use Cases:
Multi-round dialogues, Knowledge QA, Summarization
Limitations:
May produce inaccurate or biased content
Considerations:
Conduct safety tests before deployment.
Supported Languages 
en (54.91%), zh (31.09%), ja (3.22%), ru (3.15%), de (1.52%), es (0.91%), fr (0.73%), pl (0.48%), it (0.36%), pt (0.34%), cs (0.27%), uk (0.24%), tr (0.23%), nl (0.20%), hu (0.19%), ko (0.18%), sv (0.15%), fi (0.14%), el (0.14%), id (0.13%), vi (0.13%), ar (0.12%), ro (0.11%), th (0.10%), bg (0.10%), da (0.09%), sk (0.08%), mr (0.08%), hi (0.07%), no (0.07%), lt (0.05%), sl (0.05%), et (0.04%), lv (0.03%), sr (0.03%), ta (0.03%), kk (0.02%)
Training Details 
Data Sources:
Web Pages, Code, Encyclopedia, Books, Academic Papers, QA
Data Volume:
3.2 trillion tokens
Methodology:
Continual Pre-Training
Context Length:
16384
Hardware Used:
8*A800 80G GPUs
Model Architecture:
Decoder-only standard Transformer
Input Output 
Accepted Modalities:
Text
Release Notes 
Version:
XVERSE-65B-2
Date:
2023/12/08
Notes:
Continual Pre-Training; Enhanced capabilities in mathematics and coding.
Version:
XVERSE-65B
Date:
2023/11/06
Notes:
Base model release.
Date:
2023/11/24
Notes:
Update on pre-training data information.
Date:
2023/11/29
Notes:
Update model architecture information.
LLM NameXVERSE 65B 2
Repository ๐Ÿค—https://huggingface.co/xverse/XVERSE-65B-2 
Model Size65b
Required VRAM134.6 GB
Updated2025-09-23
Maintainerxverse
Model Typexverse
Model Files  4.9 GB: 1-of-28   5.3 GB: 2-of-28   5.3 GB: 3-of-28   5.3 GB: 4-of-28   5.0 GB: 5-of-28   4.9 GB: 6-of-28   4.9 GB: 7-of-28   4.9 GB: 8-of-28   4.9 GB: 9-of-28   4.9 GB: 10-of-28   4.9 GB: 11-of-28   4.9 GB: 12-of-28   4.9 GB: 13-of-28   4.9 GB: 14-of-28   4.9 GB: 15-of-28   4.9 GB: 16-of-28   4.9 GB: 17-of-28   4.9 GB: 18-of-28   4.9 GB: 19-of-28   4.9 GB: 20-of-28   4.9 GB: 21-of-28   4.9 GB: 22-of-28   4.9 GB: 23-of-28   4.9 GB: 24-of-28   4.9 GB: 25-of-28   4.9 GB: 26-of-28   4.3 GB: 27-of-28   1.6 GB: 28-of-28
Model ArchitectureXverseForCausalLM
Licenseapache-2.0
Context Length16384
Model Max Length16384
Transformers Version4.30.2
Tokenizer ClassPreTrainedTokenizerFast
Vocabulary Size100534
Torch Data Typebfloat16

Best Alternatives to XVERSE 65B 2

Best Alternatives
Context / RAM
Downloads
Likes
XVERSE 65B Chat16K / 132.8 GB12713
XVERSE 65B16K / 133.9 GB1638
XVERSE 65B Chat GPTQ Int48K / 37 GB171
Note: green Score (e.g. "73.2") means that the model is better than xverse/XVERSE-65B-2.

Rank the XVERSE 65B 2 Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 51534 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241124