| Model Type | |
| Use Cases |
| Areas: | |
| Applications: | | Research, Text Generation |
|
| Primary Use Cases: | |
| Limitations: | | May produce hallucinations or unreliable outputs |
|
| Considerations: | | Manual checks required for safety |
|
|
| Additional Notes | | Developed with grants from Andreessen Horowitz (a16z) |
|
| Supported Languages | | en (general), zh (general) |
|
| Training Details |
| Data Sources: | | JosephusCheung/GuanacoDataset, Open-Orca/OpenOrca, stingning/ultrachat, meta-math/MetaMathQA, liuhaotian/LLaVA-Instruct-150K, jondurbin/airoboros-3.1, WizardLM/WizardLM_evol_instruct_V2_196k, RyokoAI/ShareGPT52K, RyokoAI/Fandom23K, milashkaarshif/MoeGirlPedia_wikitext_raw_archive, wikipedia, wiki_lingua, fnlp/moss-003-sft-data, garage-bAInd/Open-Platypus, LDJnr/Puffin, openbmb/llava_zh, BAAI/COIG, TigerResearch/tigerbot-zhihu-zh-10k, liwu/MNBVC, teknium/openhermes |
|
| Data Volume: | |
| Methodology: | | Identical structure to LLaMA2, using synthetic data |
|
| Model Architecture: | | LLaMA2 architecture without scaling of RoPE |
|
|
| Safety Evaluation |
| Risk Categories: | | misinformation, bias, objectionable content, pornography, violence, offensive language |
|
| Ethical Considerations: | | Model trained on unfiltered internet data |
|
|
| Responsible Ai Considerations |
| Fairness: | | Synthetic data utilized for some language variants |
|
| Accountability: | | Developers have not vetted all content |
|
| Mitigation Strategies: | | Users advised to filter certain keywords |
|
|
| Input Output |
| Input Format: | |
| Accepted Modalities: | |
| Output Format: | |
|