| Model Type | | text-generation, content moderation | 
 | 
| Use Cases | 
| Areas: | | Safety and content moderation | 
 |  | Applications: | | Online platforms requiring content moderation | 
 |  | Primary Use Cases: | | Classifying content for safety in both inputs and responses | 
 |  | Limitations: | | Performance limited by training data, Not designed for chat use cases, Susceptible to adversarial attacks | 
 |  | Considerations: | | Recommended to be used with additional solutions for unsupported categories | 
 |  | 
| Additional Notes | | Supports 11 out of the 13 categories included in the MLCommons AI Safety taxonomy. Election and Defamation categories are not addressed. | 
 | 
| Training Details | 
| Data Sources: | | Llama Guard training set, MLCommons taxonomy, hard samples from Llama 2 70B | 
 |  | Methodology: | | Fine-tuned for safety classification | 
 |  | 
| Safety Evaluation | 
| Methodologies: | | Harm Taxonomy, MLCommons taxonomy alignment | 
 |  | Findings: | | Strong adaptability to other policies, Superior tradeoff between F1 score and False Positive Rate | 
 |  | Risk Categories: | | Violent Crimes, Non-Violent Crimes, Sex-Related Crimes, Child Sexual Exploitation, Specialized Advice, Privacy, Intellectual Property, Indiscriminate Weapons, Hate, Suicide & Self-Harm, Sexual Content | 
 |  | Ethical Considerations: |  |  | 
| Responsible Ai Considerations | 
| Mitigation Strategies: | | Using external components like KNN | 
 |  | 
| Input Output | 
| Input Format: |  |  | Accepted Modalities: |  |  | Output Format: | | Binary classification (safe/unsafe) | 
 |  | Performance Tips: | | Align model with specific safety considerations for better moderation | 
 |  |