XVERSE 65B by xverse

 ยป  All LLMs  ยป  xverse  ยป  XVERSE 65B   URL Share it on

  Arxiv:2005.14165   Arxiv:2112.11446   Arxiv:2201.11990   Arxiv:2203.15556   Arxiv:2204.02311   Arxiv:2211.05100   Arxiv:2302.13971   Autotrain compatible   Custom code   Pytorch   Region:us   Sharded   Xverse
Model Card on HF ๐Ÿค—: https://huggingface.co/xverse/XVERSE-65B 

XVERSE 65B Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
XVERSE 65B (xverse/XVERSE-65B)
๐ŸŒŸ Advertise your project ๐Ÿš€

XVERSE 65B Parameters and Internals

Model Type 
multilingual, large language model
Use Cases 
Areas:
academic research, commerical use
Applications:
multilingual tasks, text generation, dialogue, summarization
Primary Use Cases:
Chinese question answering, English question answering, language comprehension, common sense questioning, logical reasoning, math solving, coding
Limitations:
may produce inaccurate, biased, or offensive content
Considerations:
Developers should conduct safety tests before deployment.
Supported Languages 
en (54.91), zh (31.09), ru (3.15), ja (3.22), de (1.52), es (0.91), fr (0.73), pl (0.48), it (0.36), pt (0.34), nl (0.20), cs (0.27), sv (0.15), ko (0.18), fi (0.14), ar (0.12), ro (0.11), bg (0.10), th (0.10), da (0.09), hu (0.19), no (0.07), hi (0.07), iw (0.06), fa (0.07), sl (0.05), et (0.04), lv (0.03), sk (0.08), ms (0.05), ca (0.06), sr (0.03), tr (0.23), uk (0.24), id (0.13), mr (0.08), lt (0.05), kk (0.02), ta (0.03)
Training Details 
Data Sources:
web pages, code, encyclopedia, books, academic papers, QA, other
Data Volume:
2.6 trillion tokens
Methodology:
FlashAttention2, 3D parallelism with virtual pipeline
Context Length:
16000
Hardware Used:
A800 80G GPU, 1500GB memory for training
Model Architecture:
Decoder-only Transformer
Input Output 
Input Format:
tokenized input using BPE with vocabulary size 100,534
Accepted Modalities:
text
Output Format:
text
Performance Tips:
Use bfloat16 for better fine-tuning performance
Release Notes 
Version:
2023/11/29
Date:
2023-11-29
Notes:
Update model architecture and additional pre-training data information.
Version:
2023/11/24
Date:
2023-11-24
Notes:
Update the related information of pre-training data.
Version:
2023/11/06
Date:
2023-11-06
Notes:
Released the XVERSE-65B base model.
LLM NameXVERSE 65B
Repository ๐Ÿค—https://huggingface.co/xverse/XVERSE-65B 
Model Size65b
Required VRAM133.9 GB
Updated2025-09-23
Maintainerxverse
Model Typexverse
Model Files  3.8 GB: 1-of-28   4.9 GB: 2-of-28   4.9 GB: 3-of-28   4.9 GB: 4-of-28   4.9 GB: 5-of-28   4.9 GB: 6-of-28   4.9 GB: 7-of-28   4.9 GB: 8-of-28   4.9 GB: 9-of-28   4.9 GB: 10-of-28   4.9 GB: 11-of-28   4.9 GB: 12-of-28   4.9 GB: 13-of-28   4.9 GB: 14-of-28   4.9 GB: 15-of-28   4.9 GB: 16-of-28   4.9 GB: 17-of-28   4.9 GB: 18-of-28   4.9 GB: 19-of-28   4.9 GB: 20-of-28   4.9 GB: 21-of-28   4.9 GB: 22-of-28   4.9 GB: 23-of-28   4.9 GB: 24-of-28   4.9 GB: 25-of-28   4.9 GB: 26-of-28   4.9 GB: 27-of-28   2.7 GB: 28-of-28
Model ArchitectureXverseForCausalLM
Licenseapache-2.0
Context Length16384
Model Max Length16384
Transformers Version4.30.2
Tokenizer ClassPreTrainedTokenizerFast
Vocabulary Size100534
Torch Data Typebfloat16

Best Alternatives to XVERSE 65B

Best Alternatives
Context / RAM
Downloads
Likes
XVERSE 65B Chat16K / 132.8 GB12713
XVERSE 65B 216K / 134.6 GB1610
XVERSE 65B Chat GPTQ Int48K / 37 GB171
Note: green Score (e.g. "73.2") means that the model is better than xverse/XVERSE-65B.

Rank the XVERSE 65B Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 51534 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241124