BENCHMARK RESULTS

Public benchmark results for 652 ATR rules

Every benchmark below includes the raw data source, reproducible methodology, and ATR version that ran it. No cherry-picking.

Garak (NVIDIA jailbreak corpus)

2026-04-22

NVIDIA garak is the leading open-source LLM red-teaming framework. We ran ATR v3.5.0 against the full garak corpus to measure adversarial-prompt detection.

Recall

~97.2%

Sample size

650 samples

Layer

Regex only (no LLM second opinion)

ATR version

v2.1.2 (last verified measurement)

Source corpus: github.com/NVIDIA/garak Full methodology

Reproduce

pnpm bench:garak (in agent-threat-rules repo)

SKILL.md (PanGuard wild corpus)

2026-04-22

Manually labeled corpus of 498 AI agent skills from ClawHub, OpenClaw, and Skills.sh. Half malicious, half benign. Used to validate that ATR catches threats without false-positive bloat.

Recall

Precision

0.97%

False positive rate

0.002%

Sample size

498 samples

Source corpus: PanGuard Wild Scan dataset Full methodology

Reproduce

pnpm bench:skill (in agent-threat-rules repo)

PINT (Invariant Labs adversarial corpus)

2026-04-22

Invariant Labs published an adversarial prompt corpus for prompt-injection detection benchmarking. Lower recall than Garak/SKILL.md reflects the corpus being designed for SIEM-style detection patterns — Sigma migration via PanGuard Migrator closes the gap.

Recall

0.6363636363636364%

Precision

0.9965277777777778%

Sample size

850 samples

Layer

Regex only

Source corpus: github.com/invariantlabs-ai/invariant Full methodology

Reproduce

pnpm bench:pint (in agent-threat-rules repo)

Wild Scan (full ecosystem audit)

2026-04-14

DOI 10.5281/zenodo.19178002

Live audit of every AI agent skill we could crawl across ClawHub, OpenClaw, Skills.sh. Not a curated benchmark — actual production skills shipped by real authors. Result: 1.6% of scanned skills are confirmed malicious.

Entries crawled

90,792

Skills scanned

67,799

Confirmed malicious

1,096

Triple-threat packages

249

Source corpus: PanGuard Wild Scan Report

Reproduce

scripts/wild-scan.ts (in panguard-ai monorepo)

HackAPrompt cluster mining

2026-05-11

ATR v3.5.0 against the HackAPrompt deterministic sample: 69.6% recall, 100% precision — up from the 29.5% v2.1.2 baseline documented in the 2026-05-11 cluster-mining write-up. The rule base keeps closing the gap on this corpus.

Recall (v3.5.0)

69.6%

Precision

100%

Sample size

4,780 deterministic

v2.1.2 baseline

29.5%

Source corpus: HackAPrompt corpus Full methodology

Reproduce

pnpm bench:hackaprompt

Want to run ATR on your corpus and publish the results? Open a PR at Agent-Threat-Rule/agent-threat-rules. We add your benchmark to this page with full attribution.

Reviewed by Adam Lin · Last reviewed 2026-05-12

BENCHMARK RESULTS

Public benchmark results for 652 ATR rules

Every benchmark below includes the raw data source, reproducible methodology, and ATR version that ran it. No cherry-picking.

Garak (NVIDIA jailbreak corpus)

2026-04-22

NVIDIA garak is the leading open-source LLM red-teaming framework. We ran ATR v3.5.0 against the full garak corpus to measure adversarial-prompt detection.

Recall

~97.2%

Sample size

650 samples

Layer

Regex only (no LLM second opinion)

ATR version

v2.1.2 (last verified measurement)

Source corpus: github.com/NVIDIA/garak Full methodology

Reproduce

pnpm bench:garak (in agent-threat-rules repo)

SKILL.md (PanGuard wild corpus)

2026-04-22

Manually labeled corpus of 498 AI agent skills from ClawHub, OpenClaw, and Skills.sh. Half malicious, half benign. Used to validate that ATR catches threats without false-positive bloat.

Recall

Precision

0.97%

False positive rate

0.002%

Sample size

498 samples

Source corpus: PanGuard Wild Scan dataset Full methodology

Reproduce

pnpm bench:skill (in agent-threat-rules repo)

PINT (Invariant Labs adversarial corpus)

2026-04-22

Recall

0.6363636363636364%

Precision

0.9965277777777778%

Sample size

850 samples

Layer

Regex only

Source corpus: github.com/invariantlabs-ai/invariant Full methodology

Reproduce

pnpm bench:pint (in agent-threat-rules repo)

Wild Scan (full ecosystem audit)

2026-04-14

DOI 10.5281/zenodo.19178002

Entries crawled

90,792

Skills scanned

67,799

Confirmed malicious

1,096

Triple-threat packages

249

Source corpus: PanGuard Wild Scan Report

Reproduce

scripts/wild-scan.ts (in panguard-ai monorepo)

HackAPrompt cluster mining

2026-05-11

Recall (v3.5.0)

69.6%

Precision

100%

Sample size

4,780 deterministic

v2.1.2 baseline

29.5%

Source corpus: HackAPrompt corpus Full methodology

Reproduce

pnpm bench:hackaprompt

Want to run ATR on your corpus and publish the results? Open a PR at Agent-Threat-Rule/agent-threat-rules. We add your benchmark to this page with full attribution.

Reviewed by Adam Lin · Last reviewed 2026-05-12