Analysis of feature dynamics and emergence in real-world language models.
Additional Notes
Trained on the roneneldan/TinyStories dataset. Consistent English text generation observed.
Training Details
Data Sources:
roneneldan/TinyStories
Methodology:
Inspired by the 21M parameter one-layer GPT-Neo of the Tiny Stories paper. Trained to reproduce results and acquire high-frequency checkpoints for further analysis.
Training Time:
~2 hours on a single H100
Hardware Used:
single H100
Model Architecture:
Single-layer Mistral model with hidden size 512 and MLP intermediate size 1024.
๐ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 52721 in total.