| Model Type | | Causal decoder-only transformer language model |
|
| Use Cases |
| Areas: | | Research, Commercial applications |
|
| Applications: | |
| Primary Use Cases: | | Chat assistant applications |
|
| Limitations: | | Model outputs may be unpredictable, inaccurate, biased, or objectionable. |
|
| Considerations: | | Perform application-specific safety testing before deployment. |
|
|
| Additional Notes | | Embeddings padded to multiple of 128 for sharded inference compatibility. |
|
| Supported Languages | | en (full), de (limited), es (limited), fr (limited), it (limited), pt (limited), pl (limited), nl (limited), ro (limited), cs (limited), sv (limited) |
|
| Training Details |
| Data Sources: | | rombodawg/LosslessMegaCodeTrainingV2_1m_Evol_Uncensored, OpenAssistant/oasst1, shahules786/orca-best, argilla/databricks-dolly-15k-curated-multilingual |
|
| Methodology: | | Fine-tuned in two stages: first on synthetic instructions and coding tasks, then on top human demonstrations. |
|
| Context Length: | |
| Hardware Used: | | EPFL's Machine Learning and Optimization Laboratory, Natural Language Processing Lab |
|
| Model Architecture: | | Causal decoder-only transformer architecture |
|
|
| Responsible Ai Considerations |
| Fairness: | | Testing mainly in English, outputs may be unpredictable in other scenarios. |
|
| Transparency: | | Documented training processes and datasets. |
|
| Accountability: | | Open-Assistant development team is accountable for model outputs. |
|
| Mitigation Strategies: | | Developers should perform safety testing and tuning specific to their applications. |
|
|
| Input Output |
| Input Format: | | Prompt dialogue template with OpenAI's chatml format. |
|
| Accepted Modalities: | |
| Output Format: | |
| Performance Tips: | | Use the official Llama2 system message for improved inference. |
|
|