| Model Type | |
| Use Cases |
| Areas: | | Research, Commercial applications |
|
| Limitations: | | Models not tuned to ensure outputs align with human intent and safety considerations. |
|
|
| Additional Notes | | The models are continually pre-trained and instruction-tuned, emphasizing Japanese language capabilities. |
|
| Supported Languages | | supported_languages_list (Japanese, English), languages_details (The Swallow model has undergone continual pre-training with the addition of Japanese language data.) |
|
| Training Details |
| Data Sources: | | Japanese Wikipedia, RefinedWeb, Swallow Corpus, The Pile |
|
| Methodology: | | Supervised fine-tuning (SFT) and instruction tuning using Anthropic HH-RLHF, Databricks Dolly 15-k, and OpenAssistant Conversations Dataset. |
|
| Model Architecture: | | Refer to LLaMA-2 technical report for details on the model architecture. |
|
|
| Input Output |
| Accepted Modalities: | |
| Output Format: | |
| Performance Tips: | | Model employs a tokenizer with a broadened vocabulary based on Japanese data, offering efficient text representation and faster inference. |
|
|
| Release Notes |
| Version: | |
| Date: | |
| Notes: | | Release of Swallow-7b-instruct-v0.1, Swallow-13b-instruct-v0.1, and Swallow-70b-instruct-v0.1. |
|
| Version: | |
| Date: | |
| Notes: | | Release of Swallow-7b-plus-hf with twice as many Japanese tokens as Swallow-7b-hf. |
|
| Version: | |
| Date: | |
| Notes: | | Release of Swallow-13b-NVE-hf. |
|
| Version: | |
| Date: | |
| Notes: | | Release of Swallow-7b-NVE-hf, Swallow-7b-NVE-instruct-hf, Swallow-70b-NVE-hf, Swallow-70b-NVE-instruct-hf. |
|
| Version: | |
| Date: | |
| Notes: | | Release of Swallow-7b-hf, Swallow-7b-instruct-hf, Swallow-13b-hf, Swallow-13b-instruct-hf, Swallow-70b-hf, Swallow-70b-instruct-hf. |
|
|
|