Glossary
AI Red Teaming
Also known as: LLM red teaming, AI adversarial testing
Definition
AI red teaming is the practice of probing AI systems — particularly LLMs — for jailbreaks, prompt injection, policy bypass, and unsafe outputs before they ship to production. It applies adversarial-mindset security testing to the model layer, with reproducer prompts and severity tagging on findings.
Why it matters
LLMs introduce attack surfaces traditional security tooling does not cover. A jailbreak that bypasses a content policy can expose an organization to brand and regulatory risk. A prompt injection that exfiltrates data through a tool call can become a breach. A model that leaks PII in adversarial prompts can violate GDPR and HIPAA simultaneously.
NIST AI RMF MANAGE-2.1 explicitly calls for adversarial testing of AI systems. The EU AI Act Article 15 requires accuracy, robustness, and cybersecurity testing for high-risk AI. SR 11-7's effective challenge pillar is increasingly interpreted to include red-teaming for LLM-based models. Pre-deployment red teaming is no longer optional.
In practice
Prism Red Teaming runs a curated catalog of jailbreaks, prompt injection variants, and policy-bypass tests against any model or agent before it reaches production. Findings are severity-tagged with reproducer prompts. Cookbooks let teams re-run the same tests after each fix lands, providing the audit-grade evidence regulators expect.
Related
More glossary terms
Start tracing in 5 minutes
One SDK. Five minutes. Full audit trails, PII redaction, and guardrail enforcement, from day one.