| Model Type | | decoder-style transformer, LLM, multimodal |
|
| Use Cases |
| Areas: | | research, commercial applications |
|
| Applications: | | text generation, long-form instruction following, dialogue generation |
|
| Primary Use Cases: | | finetuning for specific applications |
|
| Limitations: | | not intended for deployment without finetuning, can produce factually incorrect output |
|
| Considerations: | | Efforts made to clean pretraining data; however, outputs may still be offensive or biased. |
|
|
| Additional Notes | | This model builds on the MPT-7B with longer sequence handling and significant efficiency improvements. |
|
| Supported Languages | |
| Training Details |
| Data Sources: | | mc4, c4, togethercomputer/RedPajama-Data-1T, bigcode/the-stack, allenai/s2orc |
|
| Data Volume: | |
| Methodology: | | MPT-7B-8k uses a modified transformer architecture, optimized for efficient training and inference with ALiBi for handling long inputs. |
|
| Context Length: | |
| Training Time: | |
| Hardware Used: | |
| Model Architecture: | | Decoder-only transformer with modifications such as FlashAttention, ALiBi, elimination of positional embeddings. |
|
|
| Safety Evaluation |
| Ethical Considerations: | | MPT-7B-8k can produce factually incorrect, lewd, biased or offensive outputs. It should not be used for human-facing interactions without further guardrails and user consent. |
|
|
| Responsible Ai Considerations |
| Fairness: | | Model may have biases inherited from training data. |
|
| Transparency: | | Pretraining data was openly available, preprocessed to remove unsuitable content. |
|
| Accountability: | | Responsibility of MosaicML. |
|
| Mitigation Strategies: | | Guardrails recommended before deployment. |
|
|
| Input Output |
| Input Format: | | Text sequences, up to 8k tokens |
|
| Accepted Modalities: | |
| Output Format: | |
| Performance Tips: | | Utilize optimized implementations like FlashAttention and ensure usage with bfloat16 precision on GPUs. |
|
|
| Release Notes |
| Version: | |
| Date: | |
| Notes: | | Initial release of MPT-7B-8k. |
|
|
|