| Model Type | | chatbot, dialogue generation |
|
| Use Cases |
| Areas: | |
| Limitations: | | can produce factually incorrect output, may generate lewd, biased or offensive outputs |
|
|
| Additional Notes | | This model requires that trust_remote_code=True be passed to the from_pretrained method due to a custom MPT model architecture. |
|
| Supported Languages | |
| Training Details |
| Data Sources: | | jeffwan/sharegpt_vicuna, Hello-SimpleAI/HC3, tatsu-lab/alpaca, Anthropic/hh-rlhf, victor123/evol_instruct_70k |
|
| Context Length: | |
| Hardware Used: | | 8 A100-80GBs, 32 A100-40GBs |
|
| Model Architecture: | | Modified decoder-only transformer |
|
|
| Input Output |
| Accepted Modalities: | |
| Output Format: | |
| Performance Tips: | | To use the optimized triton implementation of FlashAttention, load the model on GPU with attn_impl='triton' and bfloat16 precision. |
|
|