| Model Type | |
| Use Cases |
| Areas: | | Research, Commercial applications |
|
| Applications: | | Coding, Mathematics assistance, Text generation |
|
| Primary Use Cases: | | Instruction following, Generating long texts, Understanding structured data, Generating structured outputs, Multilingual text generation |
|
|
| Additional Notes | | The model is GPTQ-quantized 8-bit instruction-tuned. |
|
| Supported Languages | | English (primary), others (Chinese, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more) |
|
| Training Details |
| Data Sources: | | Qwen specialized expert models |
|
| Methodology: | | Pretraining & Post-training with expert models in domains like coding and mathematics |
|
| Context Length: | |
| Model Architecture: | | transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings |
|
|
| Input Output |
| Accepted Modalities: | |
| Performance Tips: | | Use the latest version of transformers to avoid KeyError |
|
|