Hyperion 3.0 Mixtral 3x7B Parameters and Internals
Model Type
Mixture of Experts (MoE)
Use Cases
Limitations:
Not suitable for production environments or critical applications
Considerations:
Intended for research and experimentation purposes only.
Additional Notes
The model uses the `hyperion-3.0-beta` architecture as the base, with a `bfloat16` output dtype. The gating mechanism is set to `hidden` and two experts are consulted per token.
Note: green Score (e.g. "73.2") means that the model is better than Locutusque/Hyperion-3.0-Mixtral-3x7B.
Rank the Hyperion 3.0 Mixtral 3x7B Capabilities
๐ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 51631 in total.