Joint Attention, Mamba, Generative Text Model, Dense Model, Mixture-of-Experts
Use Cases
Areas:
Research, Commercial applications
Applications:
Fine-tuning for chat/instruct versions
Primary Use Cases:
Foundation layer for training and developing custom solutions
Limitations:
Did not undergo any alignment for instruct/chat interactions
Considerations:
Guardrails should be added for responsible and safe use.
Additional Notes
Jamba is the first production-scale Mamba implementation. It is a state-of-the-art, hybrid SSM-Transformer LLM.
Training Details
Methodology:
Joint Attention and Mamba
Context Length:
256000
Model Architecture:
Hybrid SSM-Transformer LLM
Responsible Ai Considerations
Mitigation Strategies:
Does not have safety moderation mechanisms and guardrails.
Input Output
Input Format:
Text prompts should include the 'BOS' token for evaluation.
Accepted Modalities:
Text
Output Format:
Generated text
Performance Tips:
Model can be loaded in BF16/FP16 using `torch_dtype` for better performance. Use `attn_implementation` for FlashAttention2. Quantization with bitsandbytes is supported to fit larger sequences.
Release Notes
Version:
v0.1
Notes:
Dense version without Mixture-of-Experts. Extracts weights of the first expert.
Note: green Score (e.g. "73.2") means that the model is better than TechxGenus/Jamba-v0.1-9B.
Rank the Jamba V0.1 9B Capabilities
๐ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 52721 in total.