Industry

Why Transparent 39.9% Recall Beats Opaque 98%

Adam LinMarch 20, 20268 min

Why we use "ATR Scanned" not "ATR Certified" -- and why we publish what we miss. The Snort analogy: lower detection but open and auditable beats proprietary black boxes every time. (PINT figures in this post are from the ATR v0.4.0 era; current benchmarks are on the benchmark hub.)

Our Numbers

On the PINT (Prompt Injection Test) benchmark, ATR v0.4.0 achieves 99.6% precision and 61.4% recall. That means: when ATR flags something as a threat, it is correct 99.6% of the time. But it only catches 61.4% of all threats in the benchmark. We started at 39.9% recall in v0.3.0 and improved to 61.4% through rule expansion -- 10 new rules targeting eval injection, shell escape, dynamic imports, OAuth abuse, and social engineering patterns.

What We Miss

ATR is primarily regex-based. Regex has a ceiling for natural language attack detection. Our analysis of missed detections shows: - 42% are paraphrase attacks: The same malicious intent expressed in different words. "Ignore previous instructions" gets caught. "Please disregard the guidelines established earlier" does not. - 31% are non-English attacks: ATR rules are primarily English-language patterns. Prompt injection in Mandarin, Spanish, or Arabic bypasses most rules. - 18% are encoding attacks: Base64, ROT13, Unicode homoglyphs, and other encoding schemes that transform the payload into a format regex cannot match. - 9% are multi-turn attacks: Instructions split across multiple messages or tool calls, where no single message contains a detectable pattern.

Why We Say "ATR Scanned" Not "ATR Certified"

A badge that says "Certified Secure" implies completeness. It suggests that everything dangerous has been found and eliminated. At 61.4% recall, we cannot make that claim. "ATR Scanned" means: this package has been analyzed against 113 known attack patterns and no threats were detected. It does not mean the package is safe. It means it passed the checks we have.

This distinction matters. If a developer sees "Certified" and installs a package without further review, and that package contains a paraphrase-based injection that ATR missed, we have made the developer less safe by giving them false confidence. "Scanned" maintains the right level of trust: the tool did its job, but human judgment is still required.

The Snort Precedent

In 1998, Martin Roesch released Snort -- an open-source intrusion detection system. Snort did not catch everything. Its detection rate was lower than commercial alternatives from ISS (now IBM) and Cisco (later acquired). But Snort rules were open. Anyone could read them, write them, share them, and audit them.

Snort became the most deployed IDS in history. Not because it had the best detection rate, but because the community trusted it. Administrators could verify what it detected and what it missed. Researchers could contribute new rules that were available to everyone within hours. The openness created a flywheel: more users meant more eyes on the rules, which meant better detection, which meant more users.

ATR is following the same playbook. Our recall will never match a well-funded proprietary solution with LLM-powered analysis (we use LLM analysis too, but as an optional second pass, not as the primary detection layer). What we offer instead is transparency, auditability, and community contribution. Every rule is public. Every benchmark result is published. Every missed detection is documented.

The Regex Ceiling and How We Break Through It

Pure regex detection tops out around 55% recall for natural language injection attacks. We have pushed to 61.4% through three techniques: 1. Context signals: The same pattern has different severity depending on where it appears. "Ignore previous instructions" in a tool description is HIGH severity. In a code comment explaining prompt injection defense, it is suppressed. Context awareness adds ~5% recall over raw regex. 2. Composite rules: Multi-pattern rules that combine weak signals. No single pattern is enough to flag, but three weak signals in the same file trigger a detection. This catches subtle attacks that use individually benign phrases. 3. Threat Crystallization: When the optional LLM review catches something regex missed, the system auto-generates a new regex rule from the finding and distributes it. This continuously expands the rule set based on real-world encounters.

The Honest Position

We would rather have 61.4% recall that you can verify than 98% recall that you cannot audit. We would rather tell you what we miss than pretend we catch everything. We would rather be the tool you trust because you can read the source code than the tool you use because a sales deck told you it was the best.

Security is not a product feature. It is a trust relationship. And trust requires transparency.