| Model Type |  | 
| Use Cases | 
| Areas: |  |  | Applications: | | summarization, text classification, extraction, question-answering | 
 |  | Primary Use Cases: | | baseline to create specialized models | 
 |  | Limitations: | | Not undergone any safety alignment, may produce problematic outputs., Potential increased susceptibility to hallucination due to model size. | 
 |  | Considerations: | | Community urged to use the model with ethical intentions. | 
 |  | 
| Supported Languages | | English (supported), German (supported), Spanish (supported), French (supported), Japanese (supported), Portuguese (supported), Arabic (supported), Czech (supported), Italian (supported), Korean (supported), Dutch (supported), Chinese (supported) | 
 | 
| Training Details | 
| Data Sources: | | web, code, academic sources, books, math data, multilingual, instruction data | 
 |  | Data Volume: | | 12 trillion tokens for Stage 1 and 2 trillion tokens for Stage 2 | 
 |  | Methodology: | | Two-stage training strategy | 
 |  | Context Length: |  |  | Hardware Used: | | IBM's super computing cluster, Blue Vela, NVIDIA H100 GPUs | 
 |  | Model Architecture: | | Decoder-only dense transformer architecture with GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings | 
 |  | 
| Responsible Ai Considerations | 
| Fairness: | | Involves awareness of bias and fairness. | 
 |  | Mitigation Strategies: | | Ongoing research to address and mitigate issues. | 
 |  | 
| Input Output | 
| Input Format: | | Tokenized input using AutoTokenizer | 
 |  | Accepted Modalities: |  |  | Output Format: | | Decodes output tokens into text using AutoTokenizer | 
 |  | Performance Tips: | | Use appropriate libraries and follow examples provided. | 
 |  |