Model Type | |
Use Cases |
Areas: | Research, Foundation for further specialization |
|
Applications: | Summarization, Text generation, Chatbot |
|
Primary Use Cases: | Text generation, Chat applications |
|
Limitations: | Limited generalization to non-trained languages, Carries biases from web data |
|
Considerations: | Finetuning recommended for specific tasks |
|
|
Additional Notes | Model is raw and requires further finetuning for most use cases. |
|
Supported Languages | English (High), German (High), Spanish (High), French (High), Italian (Low), Portuguese (Low), Polish (Low), Dutch (Low), Romanian (Low), Czech (Low), Swedish (Low) |
|
Training Details |
Data Sources: | RefinedWeb, Books, Conversations, Code, Technical |
|
Data Volume: | |
Methodology: | 3D parallelism strategy with ZeRO |
|
Context Length: | |
Training Time: | |
Hardware Used: | |
Model Architecture: | rotary positionnal embeddings, multiquery attention, FlashAttention |
|
|
Input Output |
Input Format: | Text input for text generation |
|
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Finetuning required for specific use cases. |
|
|