Zamba is a pretrained base model and does not have moderation mechanisms. It is not fine-tuned for chat and expect reduced performance in chat applications.
Supported Languages
languages_supported ()
Training Details
Data Sources:
Open web datasets, 1T tokens for initial pretraining, 50B high-quality tokens for second phase
Data Volume:
1T tokens for initial pretraining
Methodology:
Next-token prediction
Model Architecture:
Hybrid SSM with a backbone of Mamba layers and shared transformer layers interspersed every 6 blocks
Input Output
Input Format:
Text input
Accepted Modalities:
Text
Output Format:
Text output
Performance Tips:
Running optimized Mamba implementations on a CUDA device is recommended for best performance.
๐ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 52721 in total.