Model Type | auto-regressive language model, transformer architecture |
|
Use Cases |
Areas: | research on large language models, question answering, natural language understanding, reading comprehension |
|
Primary Use Cases: | exploring potential applications, understanding capabilities and limitations of language models, developing techniques to improve models, evaluating and mitigating biases |
|
Limitations: | not trained with human feedback, generates potentially toxic or offensive content, generates incorrect information, not intended for downstream applications without risk evaluation |
|
|
Additional Notes | Recent evaluations indicate varying performance depending on the quantization and groupsize configurations. |
|
Supported Languages | English (high proficiency), Other included languages (less proficiency due to less training data) |
|
Training Details |
Data Sources: | CCNet, C4, GitHub, Wikipedia, Books, ArXiv, Stack Exchange |
|
Data Volume: | approximately 1-1.4T tokens depending on the model size |
|
Methodology: | not explicitly stated; general transformer-based neural network training |
|
Training Time: | between December 2022 and February 2023 |
|
Model Architecture: | transformer architecture with varying sizes ranging from 7B to 65B parameters |
|
|
Responsible Ai Considerations |
Fairness: | Evaluated on RAI datasets to measure biases |
|
Mitigation Strategies: | Data filtered based on proximity to Wikipedia using Kneser-Ney language model and fastText linear classifier |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | Response: processed input according to instruction tuning guidelines |
|
Performance Tips: | Use GPTQ and text-generation-webui for setup and follow guidelines for instruction and chat settings |
|
|
Release Notes |
Version: | |
Date: | |
Notes: | Update due to recent GPTQ commits introducing breaking changes. |
|
Version: | |
Date: | |
Notes: | Non-groupsize quantized model offers trade-offs between size and evaluation results. |
|
Version: | |
Date: | |
Notes: | New weights added, replacing old .pt version with 128 groupsize safetensors file. |
|
|
|