BENCHMARK RESULTS
Public benchmark results for 344 ATR rules
Every benchmark below includes the raw data source, reproducible methodology, and ATR version that ran it. No cherry-picking.
Garak (NVIDIA jailbreak corpus)
NVIDIA garak is the leading open-source LLM red-teaming framework. We ran ATR v2.1.2 against the full garak corpus to measure adversarial-prompt detection.
Recall
97.1%
Sample size
666 samples
Layer
Regex only (no LLM second opinion)
ATR version
v2.1.2
Reproduce
pnpm bench:garak (in agent-threat-rules repo)SKILL.md (PanGuard wild corpus)
Manually labeled corpus of 498 AI agent skills from ClawHub, OpenClaw, and Skills.sh. Half malicious, half benign. Used to validate that ATR catches threats without false-positive bloat.
Recall
100%
Precision
97%
False positive rate
0.2%
Sample size
498 samples
Reproduce
pnpm bench:skill (in agent-threat-rules repo)PINT (Invariant Labs adversarial corpus)
Invariant Labs published an adversarial prompt corpus for prompt-injection detection benchmarking. Lower recall than Garak/SKILL.md reflects the corpus being designed for SIEM-style detection patterns — Sigma migration via PanGuard Migrator closes the gap.
Recall
62.5%
Precision
99.6%
Sample size
850 samples
Layer
Regex only
Reproduce
pnpm bench:pint (in agent-threat-rules repo)Wild Scan (full ecosystem audit)
Live audit of every AI agent skill we could crawl across ClawHub, OpenClaw, Skills.sh. Not a curated benchmark — actual production skills shipped by real authors. Result: 1.6% of scanned skills are confirmed malicious.
Entries crawled
90,792
Skills scanned
67,799
Confirmed malicious
1,096
Triple-threat packages
249
Reproduce
scripts/wild-scan.ts (in panguard-ai monorepo)HackAPrompt cluster mining
Engineering write-up from 2026-05-11. Baseline ATR v2.1.2 vs HackAPrompt 600K-corpus 5K deterministic sample. Result: 29.5% recall, 0 new false positives. The number is below closed-source ML detectors. Methodology and rule additions documented in the public engineering blog.
HackAPrompt recall
29.5%
Baseline recall
16.0%
Sample size
5,000 deterministic
New FPs introduced
0
Reproduce
pnpm bench:hackapromptWant to run ATR on your corpus and publish the results? Open a PR at Agent-Threat-Rule/agent-threat-rules. We add your benchmark to this page with full attribution.
Reviewed by Adam Lin · Last reviewed 2026-05-12