Falcon 40B Instruct GPTQ Inference Endpoints by philschmid

 ยป  All LLMs  ยป  philschmid  ยป  Falcon 40B Instruct GPTQ Inference Endpoints   URL Share it on

Falcon 40B Instruct GPTQ Inference Endpoints is an open-source language model by philschmid. Features: 40b LLM, VRAM: 22.5GB, License: apache-2.0, Quantized, Instruction-Based, LLM Explorer Score: 0.08.

  Arxiv:1911.02150   Arxiv:2005.14165   Arxiv:2104.09864   Arxiv:2205.14135   4bit   Custom code Dataset:tiiuae/falcon-refinedw...   En   Endpoints compatible   Gptq   Instruct   Quantized   Refinedweb   Region:us

Falcon 40B Instruct GPTQ Inference Endpoints Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Falcon 40B Instruct GPTQ Inference Endpoints (philschmid/falcon-40b-instruct-GPTQ-inference-endpoints)
๐ŸŒŸ Advertise your project ๐Ÿš€

Falcon 40B Instruct GPTQ Inference Endpoints Parameters and Internals

Model Type 
Causal decoder-only
Use Cases 
Areas:
Chatbots, Instruction-based tasks
Primary Use Cases:
Chat datasets
Limitations:
Limited generalization to non-English languages
Considerations:
Guardrails recommended for production use
Additional Notes 
Experimental GPTQ model; support is limited.
Supported Languages 
English (primarily trained), French (also supported)
Training Details 
Data Sources:
Baize, RefinedWeb
Data Volume:
150M tokens
Methodology:
finetuned
Context Length:
2048
Hardware Used:
64 A100 40GB GPUs
Model Architecture:
Adapted from GPT-3, causal decoder-only with rotary positional embeddings and FlashAttention
Input Output 
Accepted Modalities:
text
Performance Tips:
Expected to be slow with around 0.7 tokens/s
LLM NameFalcon 40B Instruct GPTQ Inference Endpoints
Repository ๐Ÿค—https://huggingface.co/philschmid/falcon-40b-instruct-GPTQ-inference-endpoints 
Model Size40b
Required VRAM22.5 GB
Updated2026-03-29
Maintainerphilschmid
Model TypeRefinedWeb
Instruction-BasedYes
Model Files  22.5 GB
Supported Languagesen
GPTQ QuantizationYes
Quantization Typegptq|4bit
Model ArchitectureRWForCausalLM
Licenseapache-2.0
Model Max Length2048
Transformers Version4.29.2
Is Biased0
Tokenizer ClassPreTrainedTokenizerFast
Vocabulary Size65024
Torch Data Typebfloat16

Best Alternatives to Falcon 40B Instruct GPTQ Inference Endpoints

Best Alternatives
Context / RAM
Downloads
Likes
Falcon 40B Instruct GPTQ0K / 22.5 GB321197
Falcon 40B Instruct GPTQ0K / 22.5 GB41
Falcon 40B Instruct 8bit0K / 41.8 GB146
...alcon 40B Instruct W4 G128 AWQ0K / 22.3 GB72
H2ogpt Oig Oasst1 Falcon 40B0K / 82.5 GB1106
Note: green Score (e.g. "73.2") means that the model is better than philschmid/falcon-40b-instruct-GPTQ-inference-endpoints.

Rank the Falcon 40B Instruct GPTQ Inference Endpoints Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 52473 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Check out Ag3ntum โ€” our secure, self-hosted AI agent for server management.
Release v20260328a