Gpt2 NoLN by apollo-research

 ยป  All LLMs  ยป  apollo-research  ยป  Gpt2 NoLN   URL Share it on

  Arxiv:2409.13710   Autotrain compatible   Endpoints compatible   Gpt2   Region:us   Safetensors

Gpt2 NoLN Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Gpt2 NoLN (apollo-research/gpt2_noLN)
๐ŸŒŸ Advertise your project ๐Ÿš€

Gpt2 NoLN Parameters and Internals

Model Type 
GPT2LMHeadModel
Additional Notes 
To fully remove all LayerNorms, replace 'ln_1' and 'ln_2' modules with identities, and modify 'ln_f' with adjustments to the unembed matrix and bias.
Training Details 
Data Sources:
OpenWebText
Data Volume:
~500M tokens
Methodology:
Fine-tuning with gradual LayerNorm disabling
Context Length:
1024
Release Notes 
Version:
v2
Notes:
Trained for 1000 iterations in a single training run
Version:
v1
Notes:
Trained for 900 iterations, with multiple interruptions, modifying LNs, and resume steps
LLM NameGpt2 NoLN
Repository ๐Ÿค—https://huggingface.co/apollo-research/gpt2_noLN 
Model Size124.4m
Required VRAM0.5 GB
Updated2025-09-23
Maintainerapollo-research
Model Typegpt2
Model Files  0.5 GB
Model ArchitectureGPT2LMHeadModel
Transformers Version4.42.4
Vocabulary Size50257
Torch Data Typefloat32
Activation Functiongelu_new

Best Alternatives to Gpt2 NoLN

Best Alternatives
Context / RAM
Downloads
Likes
Phrase To Story Generator0K / 0.5 GB510
Gpt2 Hoodie Final0K / 0.5 GB70
Autotrain Be6vh G5hv90K / 0.5 GB50
OCRonos Vintage0K / 0.2 GB1854280
Gpt2 Sft0K / 0.5 GB100
ArshGpt0K / 0.5 GB512
Gpt2 Coconut Gsm From Cot70K / 0.5 GB80
MindMate0K / 0.5 GB51
MindMate V10K / 0.5 GB41
Solacia0K / 0.5 GB23
Note: green Score (e.g. "73.2") means that the model is better than apollo-research/gpt2_noLN.

Rank the Gpt2 NoLN Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 51544 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241124