| Supported Languages | | en (English), zh (Chinese), id (Indonesian), ms (Malay), tl (Filipino), my (Burmese), vi (Vietnamese), th (Thai), lo (Lao), km (Khmer), ta (Tamil) |
|
| Training Details |
| Data Sources: | | RefinedWeb - English, mC4 - Chinese, mC4 - Indonesian, mC4 - Malay, mC4 - Filipino, mC4 - Burmese, mC4 - Vietnamese, mC4 - Thai, WangChanBERTa - Thai, mC4 - Lao, mC4 - Khmer, mC4 - Tamil, the Stack - Python, the Stack - Javascript, the Stack - Shell, the Stack - SQL, the Stack - Markdown, RedPajama - StackExchange, RedPajama - ArXiv |
|
| Data Volume: | |
| Methodology: | | Pretrained and instruct-tuned for SEA region |
|
| Context Length: | |
| Training Time: | |
| Hardware Used: | | AWS EC2 p4d.24xlarge, Nvidia A100 40GB GPU |
|
| Model Architecture: | |
|