| Model Type | |
| Use Cases |
| Areas: | |
| Applications: | | summarization, text classification, extraction, question-answering |
|
| Primary Use Cases: | | baseline to create specialized models |
|
| Limitations: | | Not undergone any safety alignment, may produce problematic outputs., Potential increased susceptibility to hallucination due to model size. |
|
| Considerations: | | Community urged to use the model with ethical intentions. |
|
|
| Supported Languages | | English (supported), German (supported), Spanish (supported), French (supported), Japanese (supported), Portuguese (supported), Arabic (supported), Czech (supported), Italian (supported), Korean (supported), Dutch (supported), Chinese (supported) |
|
| Training Details |
| Data Sources: | | web, code, academic sources, books, math data, multilingual, instruction data |
|
| Data Volume: | | 12 trillion tokens for Stage 1 and 2 trillion tokens for Stage 2 |
|
| Methodology: | | Two-stage training strategy |
|
| Context Length: | |
| Hardware Used: | | IBM's super computing cluster, Blue Vela, NVIDIA H100 GPUs |
|
| Model Architecture: | | Decoder-only dense transformer architecture with GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings |
|
|
| Responsible Ai Considerations |
| Fairness: | | Involves awareness of bias and fairness. |
|
| Mitigation Strategies: | | Ongoing research to address and mitigate issues. |
|
|
| Input Output |
| Input Format: | | Tokenized input using AutoTokenizer |
|
| Accepted Modalities: | |
| Output Format: | | Decodes output tokens into text using AutoTokenizer |
|
| Performance Tips: | | Use appropriate libraries and follow examples provided. |
|
|