Japanese CC-100, Japanese C4, Japanese OSCAR, The Pile, Wikipedia, rinna curated Japanese dataset
Data Volume:
80B tokens
Model Architecture:
A 26-layer, 2304-hidden-size transformer-based language model
Input Output
Accepted Modalities:
text
Output Format:
generated text
Performance Tips:
Recommended to use eager attention when conducting batch inference under bfloat16 precision. Gemma 2 yields NaN values with padding sequences when default attention mechanism is employed with bfloat16.
Note: green Score (e.g. "73.2") means that the model is better than rinna/gemma-2-baku-2b.
Rank the Gemma 2 Baku 2B Capabilities
🆘 Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! 🌟
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 53254 in total.