MPT-7B-8k uses a modified transformer architecture, optimized for efficient training and inference with ALiBi for handling long inputs.
Context Length:
8192
Training Time:
9.5 days
Hardware Used:
440 A100-40GB GPUs
Model Architecture:
Decoder-only transformer with modifications such as FlashAttention, ALiBi, elimination of positional embeddings.
Safety Evaluation
Ethical Considerations:
MPT-7B-8k can produce factually incorrect, lewd, biased or offensive outputs. It should not be used for human-facing interactions without further guardrails and user consent.
Responsible Ai Considerations
Fairness:
Model may have biases inherited from training data.
Transparency:
Pretraining data was openly available, preprocessed to remove unsuitable content.
Accountability:
Responsibility of MosaicML.
Mitigation Strategies:
Guardrails recommended before deployment.
Input Output
Input Format:
Text sequences, up to 8k tokens
Accepted Modalities:
text
Output Format:
Generated text
Performance Tips:
Utilize optimized implementations like FlashAttention and ensure usage with bfloat16 precision on GPUs.
Note: green Score (e.g. "73.2") means that the model is better than mosaicml/mpt-7b-8k.
Rank the Mpt 7B 8K Capabilities
๐ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 52721 in total.