Text generation, Multilingual translation, Coding assistance, Mathematical computations
Limitations:
Not recommended for conversational models without post-training such as SFT, RLHF, etc.
Considerations:
Post-training is recommended for specialized conversational use cases.
Additional Notes
Qwen2.5 features include long-context support up to 128K tokens and can generate up to 8K tokens. It has significantly improved capabilities in instruction following, coding, and mathematics.
Supported Languages
languages_supported (:[), Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. ({{"languages_supported":""{"), description":"Supported languages ":null , null ] (), proficiency_level (N/A)
Training Details
Data Sources:
Multiple expert data sources in coding, mathematics, and multilingual domains
Methodology:
Pre-training with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings
Context Length:
32768
Model Architecture:
transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings
Note: green Score (e.g. "73.2") means that the model is better than unsloth/Qwen2.5-0.5B.
Rank the Qwen2.5 0.5B Capabilities
๐ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 52628 in total.