Model Type | text-generation, trilingual |
|
Use Cases |
Areas: | research, multilingual applications |
|
Limitations: | max_seq_length = 2048, float16, vocab size: 150 016 |
|
|
Supported Languages | hu (native), en (native), zh (native) |
|
Training Details |
Data Sources: | Hungarian corpus, English corpus, Github, Chinese corpus |
|
Data Volume: | Hungarian: 41.5B words (314 GB), English: 61.9B words (391 GB), Chinese: 98.7B Chinese characters (340 GB), 6 million Github documents (33 GB) |
|
Methodology: | Trained with EleutherAI's GPT-NeoX |
|
Context Length: | |
Model Architecture: | |
|