| Model Type | | text generation, bilingual LLM, causal-lm |
|
| Use Cases |
| Areas: | | Research, Commercial applications |
|
| Applications: | | Natural language understanding, Generation tasks, Chat assistants, Sentiment analysis, Summarization |
|
| Primary Use Cases: | | General AI assistant for Arabic and English, Cultural aligned language processing |
|
| Limitations: | | Not suitable for multipurpose languages, Requires responsible use avoiding prohibited applications., Has limitations in handling non-target languages |
|
| Considerations: | | Model should be used with understanding of its capability in Arabic and English. Not suitable for high-stakes decisions without human oversight. |
|
|
| Additional Notes | | Extensive release focuses on Arabic NLP, model adaptation, and providing bilingual capabilities. |
|
| Supported Languages | | languages_supported (Arabic, English), proficiency_level (High) |
|
| Training Details |
| Data Sources: | | Web pages, Wikipedia articles, News articles, Social network content, Code data, Books, Scientific papers, Synthetic translation |
|
| Data Volume: | |
| Methodology: | | Pre-training from scratch and adaptive pre-training from Llama-2 |
|
| Context Length: | |
| Hardware Used: | | Condor Galaxy Supercomputer, 64 Cerebras CS-2 Wafer-Scale Engines |
|
| Model Architecture: | | Auto-regressive transformer-based, decoder-only (GPT-3) with SwiGLU and ALiBi or RoPE and Grouped Query Attention |
|
|
| Safety Evaluation |
| Risk Categories: | |
| Ethical Considerations: | | Efforts to reduce bias; model may still exhibit biases and generate incorrect or misleading content. |
|
|
| Responsible Ai Considerations |
| Fairness: | | Techniques employed to reduce bias |
|
| Accountability: | | User is responsible for model usage and outcomes |
|
| Mitigation Strategies: | | Training data curated by Inception and includes techniques to minimize bias. |
|
|
| Input Output |
| Input Format: | |
| Accepted Modalities: | |
| Output Format: | |
|