Gemma 2B 10M by mustafaaljadery

 ยป  All LLMs  ยป  mustafaaljadery  ยป  Gemma 2B 10M   URL Share it on

  Arxiv:1901.02860   Arxiv:2404.07143   Endpoints compatible   Region:us   Safetensors   Sharded   Tensorflow

Gemma 2B 10M Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Gemma 2B 10M (mustafaaljadery/gemma-2B-10M)
๐ŸŒŸ Advertise your project ๐Ÿš€

Gemma 2B 10M Parameters and Internals

Model Type 
causal language model
Additional Notes 
This is a very early checkpoint of the model, only 200 steps. The implementation features native inference optimized for CUDA.
Training Details 
Methodology:
Our approach splits the attention in local attention blocks as outlined by InfiniAttention. We take those local attention blocks and apply recurrance to the local attention blocks for the final result of 10M context global attention. A lot of the inspiration for our ideas comes from the Transformer-XL paper.
Context Length:
10000000
Input Output 
Input Format:
Specifically adjusted in main.py for desired prompt
Accepted Modalities:
text
Output Format:
Text
LLM NameGemma 2B 10M
Repository ๐Ÿค—https://huggingface.co/mustafaaljadery/gemma-2B-10M 
Model Size2b
Required VRAM10 GB
Updated2025-08-18
Maintainermustafaaljadery
Model Typegemma
Model Files  4.9 GB: 1-of-3   5.0 GB: 2-of-3   0.1 GB: 3-of-3
Model ArchitectureGemmaForCausalLM
Licensemit
Context Length8192
Model Max Length8192
Transformers Version4.40.0.dev0
Tokenizer ClassGemmaTokenizer
Padding Token<pad>
Vocabulary Size256000
Torch Data Typefloat32

Best Alternatives to Gemma 2B 10M

Best Alternatives
Context / RAM
Downloads
Likes
Gemma 1.1 2B It8K / 5.1 GB189555165
Pandas Tutor Gemma 2B8K / 5.1 GB561
Caca Tinny 2B V38K / 5.1 GB301
Codegemma 2B8K / 5.1 GB985284
Gemma Qlora Customer Support8K / 5.1 GB70
Gemma Ko 1.1 2B It8K / 5.1 GB10391
... 2B Finetuned Sft Navarasa 2.08K / 10 GB102826
EMO 2B8K / 5.1 GB20492
Gemma 2B Orpo8K / 5.1 GB48028
Octopus V28K / 5.1 GB1387885
Note: green Score (e.g. "73.2") means that the model is better than mustafaaljadery/gemma-2B-10M.

Rank the Gemma 2B 10M Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 50729 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241124