Model Type | auto-regressive, decoder-only, causal-lm |
|
Use Cases |
Areas: | open-source community, chat-like applications |
|
Limitations: | Potential biases and toxicity in generated responses |
|
Considerations: | Do not treat model outputs as substitutes for human judgment or as sources of truth. Please use responsibly. |
|
|
Additional Notes | Utilizes the v1 version of the novelai-tokenizer for effective Japanese and English text processing. Contributions from the EleutherAI Polyglot-JA team and Stable Community Japan significantly impacted data collection for training. |
|
Supported Languages | |
Training Details |
Data Sources: | Japanese translation of the Databricks Dolly-15k dataset, Japanese translation of the subset of the Anthropic HH dataset, Wikinews subset of the izumi-lab/llm-japanese-dataset |
|
Context Length: | |
Model Architecture: | |
|