Llama 3 8B Instruct Gradient 1048K Parameters and Internals
Model Type
text-generation
Use Cases
Areas:
commercial, research
Applications:
natural language generation
Primary Use Cases:
assistant-like chat
Limitations:
not suitable for use in languages other than English
Considerations:
developers to perform safety testing and tuning tailored to applications
Additional Notes
Model is static and trained on an offline dataset. Future versions will focus on safety improvements.
Supported Languages
en (primary)
Training Details
Data Sources:
publicly available online data, SlimPajama, UltraChat
Data Volume:
1.4B tokens total
Methodology:
NTK-aware interpolation for RoPE theta optimization, progressive training on increasing context lengths, supervised fine-tuning (SFT), reinforcement learning with human feedback (RLHF)
Context Length:
1048
Hardware Used:
Crusoe Energy high performance L40S cluster
Model Architecture:
auto-regressive language model using an optimized transformer architecture
Safety Evaluation
Methodologies:
red teaming, adversarial evaluations
Findings:
mitigations implemented to limit false refusals, CBRNE assessments
Rank the Llama 3 8B Instruct Gradient 1048K Capabilities
🆘 Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! 🌟
Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation
What open-source LLMs or SLMs are you in search of? 53127 in total.