Threat Intelligence

Why AI Agents Need Their Own Sigma: The Case for ATR

Adam LinMarch 12, 20268 min

Servers got Sigma. Networks got Suricata. Malware got YARA. AI agents face prompt injection, tool poisoning, and MCP exploitation -- and until now, there was no standardized way to detect any of them. ATR changes that.

Every Era Gets the Detection Standard It Deserves

When servers became the backbone of business infrastructure, the security community built Sigma -- a universal rule format that lets any SIEM detect threats using the same shared language. When network traffic became the attack surface, Suricata gave us deep packet inspection rules that any IDS could run. When malware exploded, YARA gave researchers a pattern-matching language that turned indicators of compromise into executable detection.

These standards share three properties: they are open, they are community-driven, and they are framework-agnostic. A Sigma rule written by a researcher in Tokyo works in a SOC in Berlin. A YARA signature discovered by one analyst protects every scanner that loads it.

AI agents are now the fastest-growing attack surface in software. They execute code, call APIs, read files, manage credentials, and make decisions on behalf of users. Yet when a prompt injection bypasses an agent, when a poisoned MCP tool exfiltrates environment variables, when a supply-chain attack slips a malicious skill into a marketplace -- there is no standardized way to write a detection rule for it.

That is the gap ATR fills.

What ATR Actually Is

ATR (Agent Threat Rules) is a YAML-based detection standard for AI agent threats. Think of it as Sigma, but designed from the ground up for the unique event types that AI agents produce: LLM inputs and outputs, tool calls and responses, MCP exchanges, multi-agent communication, skill lifecycle events, and agent behavioral patterns.

Each ATR rule is a self-contained YAML file that specifies what to detect, how to detect it, and what to do when a match is found. Rules include embedded test cases -- both true positives and true negatives -- so every rule ships with its own validation suite.

id: ATR-2026-001
title: Direct Prompt Injection via User Input
severity: high
category: prompt-injection
detection:
  patterns:
    - "ignore previous instructions"
    - "you are now [A-Z]"
    - "system prompt override"
response:
  actions: [block_input, alert, snapshot]

The format is deliberately simple. If you can read a Sigma rule, you can read an ATR rule. If you can write regex, you can contribute detection patterns. The barrier to entry is low by design -- because the standard only works if the community writes rules faster than attackers evolve.

Three Layers of Detection

ATR recognizes that regex alone cannot catch every attack. The standard defines three detection tiers that work together:

Layer 1: Pattern Matching. Fast, deterministic regex detection. Catches known attack signatures in under 1 millisecond. This handles approximately 90% of threats -- the obvious prompt injections, the known tool poisoning payloads, the credential patterns that should never appear in agent output. Zero external dependencies, zero cost.

Layer 2: Behavioral Fingerprinting. Builds a baseline of what each skill "normally does" -- which files it reads, which network endpoints it contacts, which environment variables it accesses. After a learning period, any deviation from the baseline triggers an alert. This catches supply-chain attacks where a trusted skill turns malicious after an update.

Layer 3: LLM-as-Judge. For ambiguous cases that escape pattern matching and behavioral analysis, an LLM evaluates the suspicious content with full context. This is the slowest and most expensive tier (1-5 seconds, approximately $0.008 per evaluation), reserved for high-stakes decisions where false negatives are unacceptable.

The tiers are additive, not replacements. Layer 1 is the fast path. Layer 3 is the slow path. Most threats never reach Layer 3.

What Existing Standards Do Not Cover

OWASP names the risks but provides no executable detection rules. The OWASP Top 10 for Agentic Applications (2026) identifies 10 critical threats -- prompt injection, tool misuse, excessive agency -- but stops at risk description. There is no YAML file you can load into your agent framework to detect them.

MITRE ATLAS catalogs attack techniques but offers no detection format. It maps AI attacks to a taxonomy, but does not provide rules that a security engine can execute in real-time.

ATR bridges this gap. Every ATR rule maps to specific OWASP and MITRE ATLAS entries, creating a direct link from risk taxonomy to executable detection. ATR currently covers 7 of 10 OWASP LLM risks and 6 of 10 OWASP Agentic AI risks, with coverage expanding every release.

113 Rules Covering 9 Attack Categories

ATR ships with 113 detection rules across 9 categories:

●Prompt Injection (22 rules) -- Direct injection, indirect via external content, jailbreak attempts, system prompt override, multi-turn attacks, encoding evasion, CJK-specific patterns
●Tool Poisoning (11 rules) -- Malicious MCP tool responses, instruction injection via tool output, unauthorized tool calls, SSRF via agent tools
●Context Exfiltration (7 rules) -- System prompt leakage, credential exposure in agent output
●Agent Manipulation (10 rules) -- Cross-agent attacks, goal hijacking, memory poisoning, trust exploitation
●Privilege Escalation (6 rules) -- Admin function access, agent scope creep
●Excessive Autonomy (5 rules) -- Runaway loops, resource exhaustion, cascading failures, unauthorized financial actions
●Skill Compromise (7 rules) -- Supply chain poisoning, description-behavior mismatch, hidden capabilities, multi-skill chain attacks
●Data Poisoning (1 rule) -- RAG contamination, knowledge base injection
●Model Security (2 rules) -- Model extraction, malicious fine-tuning data

Framework-Agnostic by Design

ATR works with any agent framework: LangChain, CrewAI, AutoGen, Claude, OpenAI, Ollama, and custom implementations. The TypeScript engine evaluates agent events against loaded rules and returns matches with severity, confidence, and recommended response actions.

import { ATREngine } from 'agent-threat-rules';

const engine = new ATREngine();
await engine.loadRules();

const matches = engine.evaluate({
  type: 'llm_input',
  timestamp: new Date().toISOString(),
  content: userMessage,
});

The engine also ships as an MCP server with 11 tools -- scan, list rules, validate rules, submit proposals, analyze coverage gaps, and generate threat summaries -- so Claude Code, Cursor, and Windsurf users can query ATR directly from their IDE.

Open Standard, Not a Product

ATR is MIT licensed. Rules contributed belong to the community. There is no CLA, no proprietary tooling, no telemetry. The standard is early -- RFC status, v0.2.1 -- and intentionally transparent about its limitations. The LIMITATIONS.md file documents exactly what regex-based detection cannot catch: paraphrase attacks, multi-modal injection, GCG adversarial suffixes, and novel zero-day techniques.

This honesty is deliberate. A detection standard that overpromises is worse than no standard at all. ATR is one layer in a defense-in-depth strategy, not a silver bullet.

How to Get Started

Install ATR and scan your first agent event in under 60 seconds:

npx agent-threat-rules scan events.json

Or integrate the MCP server into Claude Code:

{
  "mcpServers": {
    "atr": {
      "command": "npx",
      "args": ["agent-threat-rules", "mcp"]
    }
  }
}

ATR is early, imperfect, and open. If AI agents are going to be safe, the detection standard cannot belong to any single company. It has to be built together.