BENCHMARK RESULTS
Public benchmark results for 652 ATR rules
Every benchmark below includes the raw data source, reproducible methodology, and ATR version that ran it. No cherry-picking.
Garak (NVIDIA jailbreak corpus)
NVIDIA garak is the leading open-source LLM red-teaming framework. We ran ATR v3.5.0 against the full garak corpus to measure adversarial-prompt detection.
Recall
~97.2%
Sample size
650 samples
Layer
Regex only (no LLM second opinion)
ATR version
v2.1.2 (last verified measurement)
Reproduce
pnpm bench:garak (in agent-threat-rules repo)SKILL.md (PanGuard wild corpus)
Manually labeled corpus of 498 AI agent skills from ClawHub, OpenClaw, and Skills.sh. Half malicious, half benign. Used to validate that ATR catches threats without false-positive bloat.
Recall
1%
Precision
0.97%
False positive rate
0.002%
Sample size
498 samples
Reproduce
pnpm bench:skill (in agent-threat-rules repo)PINT (Invariant Labs adversarial corpus)
Invariant Labs published an adversarial prompt corpus for prompt-injection detection benchmarking. Lower recall than Garak/SKILL.md reflects the corpus being designed for SIEM-style detection patterns — Sigma migration via PanGuard Migrator closes the gap.
Recall
0.6363636363636364%
Precision
0.9965277777777778%
Sample size
850 samples
Layer
Regex only
Reproduce
pnpm bench:pint (in agent-threat-rules repo)Wild Scan (full ecosystem audit)
Live audit of every AI agent skill we could crawl across ClawHub, OpenClaw, Skills.sh. Not a curated benchmark — actual production skills shipped by real authors. Result: 1.6% of scanned skills are confirmed malicious.
Entries crawled
90,792
Skills scanned
67,799
Confirmed malicious
1,096
Triple-threat packages
249
Reproduce
scripts/wild-scan.ts (in panguard-ai monorepo)HackAPrompt cluster mining
ATR v3.5.0 against the HackAPrompt deterministic sample: 69.6% recall, 100% precision — up from the 29.5% v2.1.2 baseline documented in the 2026-05-11 cluster-mining write-up. The rule base keeps closing the gap on this corpus.
Recall (v3.5.0)
69.6%
Precision
100%
Sample size
4,780 deterministic
v2.1.2 baseline
29.5%
Reproduce
pnpm bench:hackapromptWant to run ATR on your corpus and publish the results? Open a PR at Agent-Threat-Rule/agent-threat-rules. We add your benchmark to this page with full attribution.
Reviewed by Adam Lin · Last reviewed 2026-05-12