Llama 2 7B Chat Hf FP8 by neuralmagic

 ยป  All LLMs  ยป  neuralmagic  ยป  Llama 2 7B Chat Hf FP8   URL Share it on

  Autotrain compatible   Conversational   Endpoints compatible   Fp8   Llama   Region:us   Safetensors   Sharded   Tensorflow   Vllm

Llama 2 7B Chat Hf FP8 Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Llama 2 7B Chat Hf FP8 (neuralmagic/Llama-2-7b-chat-hf-FP8)
๐ŸŒŸ Advertise your project ๐Ÿš€

Llama 2 7B Chat Hf FP8 Parameters and Internals

Model Type 
Chatbot, Text Generation
Use Cases 
Areas:
Commercial, Research
Applications:
Assistant-like chat
Primary Use Cases:
English Language Chatbots
Limitations:
Out-of-scope: Use in any manner that violates applicable laws or regulations, Use in languages other than English
Additional Notes 
Quantization reduces number of bits per parameter significantly enhancing resource efficiency.
Supported Languages 
English (Proficient)
Training Details 
Data Sources:
ultrachat calibration samples
Methodology:
FP8 Quantization using AutoFP8
Context Length:
4096
Hardware Used:
GPU (vLLM >= 0.5.0)
Model Architecture:
Llama-2-7b-chat-hf
Input Output 
Input Format:
Text
Accepted Modalities:
text
Output Format:
Text
Performance Tips:
Sufficient GPU memory and proper setup of vLLM backend enhances performance
Release Notes 
Version:
1.0
Date:
6/26/2024
Notes:
Quantized to FP8 for efficiency. Initial release of the model for assistant-like chat in English.
LLM NameLlama 2 7B Chat Hf FP8
Repository ๐Ÿค—https://huggingface.co/neuralmagic/Llama-2-7b-chat-hf-FP8 
Model Size7b
Required VRAM7 GB
Updated2025-04-08
Maintainerneuralmagic
Model Typellama
Model Files  5.0 GB: 1-of-2   2.0 GB: 2-of-2
Model ArchitectureLlamaForCausalLM
Licensellama2
Context Length4096
Model Max Length4096
Transformers Version4.41.2
Tokenizer ClassLlamaTokenizer
Vocabulary Size32000
Torch Data Typefloat16

Best Alternatives to Llama 2 7B Chat Hf FP8

Best Alternatives
Context / RAM
Downloads
Likes
A6 L1024K / 16.1 GB2010
A3.41024K / 16.1 GB130
A5.41024K / 16.1 GB120
A2.41024K / 16.1 GB120
M1024K / 16.1 GB1270
1571024K / 16.1 GB1010
1241024K / 16.1 GB930
1621024K / 16.1 GB600
2 Very Sci Fi1024K / 16.1 GB3170
1181024K / 16.1 GB150
Note: green Score (e.g. "73.2") means that the model is better than neuralmagic/Llama-2-7b-chat-hf-FP8.

Rank the Llama 2 7B Chat Hf FP8 Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 51544 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241124