The model uses LLaMA architecture for causal language modeling.
Training Details
Methodology:
Fine-tuned model surpassing the original on several evaluations.
Model Architecture:
LLaMA (LlamaForCausalLM) with 32 layers of LlamaDecoderLayer each consisting of LlamaAttention (uses Linear, RotaryEmbedding) and LlamaMLP (uses Linear, SiLUActivation).
Input Output
Input Format:
<|prompt|>Your input~~<|answer|>
Output Format:
Generated text corresponding to the prompt.
Performance Tips:
Ensure appropriate setup for transformer, accelerate, and torch.
Note: green Score (e.g. "73.2") means that the model is better than CobraMamba/mamba-gpt-3b-v2.
Rank the Mamba GPT 3B V2 Capabilities
๐ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 52721 in total.