| Model Type | |
| Use Cases |
| Areas: | | research, commercial applications |
|
| Applications: | | foundational base model for application-specific fine-tuning |
|
| Limitations: | | May exhibit unreliable, unsafe, or other undesirable behaviors that must be corrected prior to deployment., Pre-training data may have contained offensive or inappropriate content. |
|
|
| Supported Languages | | en (intermediate), de (intermediate), es (intermediate), fr (intermediate), it (intermediate), nl (intermediate), pt (intermediate) |
|
| Training Details |
| Data Sources: | | tiiuae/falcon-refinedweb, togethercomputer/RedPajama-Data-1T, uonlp/CulturaX, CarperAI/pilev2-dev, bigcode/starcoderdata, DataProvenanceInitiative/Commercially-Verified-Licenses |
|
| Data Volume: | |
| Methodology: | | Pre-trained on diverse multilingual and code datasets for two epochs |
|
| Context Length: | |
| Hardware Used: | | 512 NVIDIA A100 40GB GPUs (AWS P4d instances) |
|
| Model Architecture: | | Decoder-only transformer similar to the LLaMA architecture with modifications |
|
|
| Input Output |
| Input Format: | | prompts in tokenized form |
|
| Accepted Modalities: | |
| Output Format: | |
| Performance Tips: | | Fine-tuning the base model is recommended for downstream tasks. |
|
|