Training Details |
Data Sources: | Korean blog posts, Korean news dataset, Modu corpus, Korean patent dataset, Korean Q & A dataset, KcBert dataset, Korean fiction dataset, Korean online comments, Korean wikipedia, Clova call, Naver sentiment movie corpus, Korean hate speech dataset, Open subtitles, AIHub various tasks datasets, Standard Korean language dictionary |
|
Data Volume: | 863 GB (1.2TB before processing) |
|
Methodology: | Trained for 167 billion tokens over 301,000 steps using GPT-NeoX framework with cross-entropy loss. |
|
Context Length: | |
Hardware Used: | |
Model Architecture: | 40 transformer layers, model dimension 5120, feedforward dimension 20480, 40 heads of dimension 128, Rotary Position Embedding applied to 64 dimensions. |
|
|