DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B By DavidAU: Benchmarks, Features and Detailed Analysis. Insights on DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B.

Autotrain compatible Conversational Endpoints compatible Merge Mergekit Mixtral Moe Region:us Safetensors Sharded Tensorflow

Model Card on HF 🤗: https://huggingface.co/DavidAU/DeepSeek-MOE-4X8B-R1-Distill-Llama-3.1-Mad-Scientist-24B

DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B Benchmarks

MMLU Pro: 21.88

GPQA: 11.63

MUSR: 11.79

BBH: 25.61

IFEval: 34.36 vs 88 (so35)^-61%

MATH Lvl 5: 7.55

LLME Score: 0.25647

^nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").

What is the LLM Explorer Rank (Score)

DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B (DavidAU/DeepSeek-MOE-4X8B-R1-Distill-Llama-3.1-Mad-Scientist-24B)

🌟 Advertise your project 🚀

DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B Parameters and Internals

LLM Name	DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B
Repository 🤗	https://huggingface.co/DavidAU/DeepSeek-MOE-4X8B-R1-Distill-Llama-3.1-Mad-Scientist-24B
Model Size	24.9b
Required VRAM	50.1 GB
Updated	2025-09-23
Maintainer	DavidAU
Model Type	mixtral
Model Files	4.9 GB: 1-of-11 5.0 GB: 2-of-11 4.9 GB: 3-of-11 5.0 GB: 4-of-11 5.0 GB: 5-of-11 4.9 GB: 6-of-11 5.0 GB: 7-of-11 5.0 GB: 8-of-11 4.9 GB: 9-of-11 4.4 GB: 10-of-11 1.1 GB: 11-of-11
Model Architecture	MixtralForCausalLM
Context Length	131072
Model Max Length	131072
Transformers Version	4.46.2
Tokenizer Class	LlamaTokenizer
Padding Token	<｜begin▁of▁sentence｜>
Vocabulary Size	128256
Torch Data Type	bfloat16

Best Alternatives to DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B

Best Alternatives	Context / RAM	Downloads	Likes
L3.1 ClaudeMaid 4x8B	128K / 50.1 GB	8	7
L3.1 MoE 4x8B V0.1	128K / 50.1 GB	13	3
L3.1 MoE 4x8B V0.2	128K / 50.1 GB	1	2
... Multi Tier Deep Reasoning 32B	128K / 50.1 GB	2	3
Llama Salad 4x8B V3	8K / 50.1 GB	7	6
...x8B Dark Planet Rebel FURY 25B	8K / 50.1 GB	9	1
...oE 4x8B Dark Planet Rising 25B	8K / 50.1 GB	13	0
L3 MoE 4X8B Grand Horror 25B	8K / 50.1 GB	5	0
OpenCrystal V4 L3 4x8B	8K / 50 GB	5	2
L3 SnowStorm V1.15 4x8B B	8K / 49.9 GB	1	11

Rank the DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B Capabilities

🆘 Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! 🌟

Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation

What open-source LLMs or SLMs are you in search of? 51558 in total.

Email us: info@extractum.io. Our Privacy Policy | Terms and Conditions | Suggest an improvement.

Our Social Media →

Original data from HuggingFace, OpenCompass and various public git repos.

Release v20241124

Support LLM Explorer

DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B by DavidAU

» All LLMs » DavidAU » DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B URL Share it on

DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B Benchmarks

DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B Parameters and Internals

Best Alternatives to DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B

Rank the DeepSeek MoE 4X8B R1 Distill Llama 3.1 Mad Scientist 24B Capabilities

What open-source LLMs or SLMs are you in search of? 51558 in total.