| Training Details |
| Data Sources: | | Korean blog posts, Korean news dataset, Modu corpus, Korean patent dataset, Korean Q & A dataset, KcBert dataset, Korean fiction dataset, Korean online comments, Korean wikipedia, Clova call, Naver sentiment movie corpus, Korean hate speech dataset, Open subtitles, AIHub various tasks datasets, Standard Korean language dictionary |
|
| Data Volume: | | 863 GB (1.2TB before processing) |
|
| Methodology: | | Trained for 167 billion tokens over 301,000 steps using GPT-NeoX framework with cross-entropy loss. |
|
| Context Length: | |
| Hardware Used: | |
| Model Architecture: | | 40 transformer layers, model dimension 5120, feedforward dimension 20480, 40 heads of dimension 128, Rotary Position Embedding applied to 64 dimensions. |
|
|