Model Type | falcon, causal decoder-only |
|
Use Cases |
Areas: | |
Primary Use Cases: | ready-to-use chat/instruct model |
|
Limitations: | May not generalize well to non-English languages; trained on web data which may carry biases and stereotypes. |
|
Considerations: | Users are recommended to develop guardrails for production use. |
|
|
Additional Notes | Falcon-180B-Chat is fine-tuned on a mixture of datasets using a custom distributed training codebase called Gigatron. |
|
Supported Languages | en (primary), de (supported), es (supported), fr (supported) |
|
Training Details |
Data Sources: | Ultrachat, Platypus, Airoboros |
|
Methodology: | The data was tokenized with the Falcon tokenizer. |
|
Context Length: | |
Hardware Used: | AWS SageMaker, 4096 A100 40GB GPUs, P4d instances |
|
Model Architecture: | Causal decoder-only model with rotary position embeddings, multiquery and FlashAttention. |
|
|
Input Output |
Input Format: | User: {prompt} Assistant: {response} |
|
Accepted Modalities: | |
Output Format: | |
|