The model may amplify biases and return toxic responses., The model may generate inaccurate or socially undesirable text., The model is susceptible to jailbreak attacks.
Additional Notes
Meta tokens, a set of learnable tokens, are prepended to every prompt to improve efficacy. The model shares KV cache between 2 layers and between heads in a single layer. 90% of attention layers are sliding window attention.
Training Details
Training Time:
September 1, 2024 - November 10, 2024
Model Architecture:
Hymba-1.5B-Base has a model embedding size of 1600, 25 attention heads, and an MLP intermediate dimension of 5504, with 32 layers in total, 16 SSM states, 3 full attention layers, the rest are sliding window attention. Each attention layer in Hymba has a hybrid combination of standard attention heads and Mamba heads in parallel. Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).
Responsible Ai Considerations
Accountability:
Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Mitigation Strategies:
Strong output validation controls are recommended to handle security and safety risks.
Input Output
Performance Tips:
The batch size needs to be 1 during generation due to current implementation limitations.
Note: green Score (e.g. "73.2") means that the model is better than nvidia/Hymba-1.5B-Base.
Rank the Hymba 1.5B Base Capabilities
๐ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 52721 in total.