Classifiers for red teaming evaluation in HarmBench
-
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Paper • 2402.04249 • Published • 6 -
cais/HarmBench-Llama-2-13b-cls
Text Generation • 13B • Updated • 29.9k • • 24 -
cais/HarmBench-Llama-2-13b-cls-multimodal-behaviors
Text Generation • 13B • Updated • 128 • -
cais/HarmBench-Mistral-7b-val-cls
Text Generation • 7B • Updated • 80.7k • 6