Engineering

Why We Created ATR -- The Missing Detection Standard for AI Agents

Panguard TeamMarch 8, 202610 min

Sigma protects servers. YARA catches malware. But nothing detects prompt injection, tool poisoning, or agent manipulation. That is why we built ATR.

The Gap Nobody Was Filling

Security has mature detection standards for nearly every attack surface. Sigma rules detect threats in log data. YARA signatures identify malware in files. Snort and Suricata rules catch network-based attacks. These standards have been battle-tested across millions of deployments and refined by thousands of contributors over decades.

But AI agents? Nothing.

As of early 2026, there is no standard format for describing threats that target AI agents. No shared rule language for detecting prompt injection. No community-maintained signature database for tool poisoning attacks. No interoperable way to say "this pattern in an MCP tool response is malicious."

That gap is why we created Agent Threat Rules (ATR).

The Threat Landscape Is Already Here

This is not a theoretical concern. AI agents are handling production workloads today. They execute code, manage infrastructure, access databases, and interact with external APIs through MCP servers and tool frameworks. Each of these interactions is an attack surface.

Here are three real attack patterns we have observed:

Prompt Injection via MCP Tools

A malicious MCP tool embeds hidden instructions in its description or output. When an agent processes the tool response, the injected prompt overrides the user's intent. The agent might exfiltrate data, modify files, or execute arbitrary commands -- all while appearing to follow legitimate instructions.

Tool Output Injection

A compromised tool returns output that looks like normal data but contains embedded shell commands or code snippets. If the agent passes this output to a code execution environment without sanitization, the payload executes with the agent's full permissions.

Credential Exfiltration via Agent Chains

An attacker crafts a multi-step interaction where each step appears benign. Step one asks the agent to read a configuration file. Step two asks it to summarize the contents. Step three asks it to send the summary to an external endpoint. No single step triggers obvious alarms, but the chain results in credential theft.

Why Existing Standards Cannot Solve This

Sigma rules operate on structured log data with well-defined field names like EventID, CommandLine, and ParentImage. YARA rules match byte patterns in files. Snort rules inspect network packet payloads. None of these formats have fields for agent context, tool descriptions, model responses, or prompt content.

You cannot write a Sigma rule that says "match when a tool_response field contains a shell command pattern." You cannot write a YARA rule that detects prompt injection in a conversational context. The fundamental data model is wrong.

AI agent threats need their own detection format -- one that understands the unique structure of agent interactions.

ATR Design Philosophy

We designed ATR with four guiding principles:

1. YAML-Based, Like Sigma

Security teams already know YAML-based rule formats. ATR follows the same structural conventions as Sigma: a metadata header with id, title, description, severity, and status fields, followed by a detection block with field-value matching logic. If you can read a Sigma rule, you can read an ATR rule.

2. Regex-Driven for Performance

Every ATR rule compiles to a set of regular expressions that can be evaluated in microseconds. There is no heavyweight inference step, no LLM-in-the-loop for detection. Rules run at the speed of regex, which means they can be applied in real time to every agent interaction without introducing latency.

3. Open RFC for Community Input

ATR is published as an open RFC. The specification, the schema, and the initial rule set are all public. We want security researchers, AI developers, and tool builders to shape this standard. A detection format only works if the community trusts it and contributes to it.

4. Agent-Native Field Model

ATR defines fields that map directly to agent interaction data: tool_name, tool_description, tool_response, user_prompt, model_response, system_prompt, agent_context, and more. These are the fields where AI agent attacks actually manifest.

The 9 Attack Categories

ATR organizes threats into 9 categories, each targeting a different aspect of the agent interaction pipeline:

Category	What It Covers

|----------|---------------|

Prompt Injection	Direct and indirect prompt injection in user inputs, tool outputs, and retrieved context

Tool Poisoning	Malicious tool descriptions, hidden instructions in tool metadata, weaponized tool responses

Credential Theft	Agent-mediated exfiltration of API keys, tokens, SSH keys, environment variables

Privilege Escalation	Agents being manipulated into executing commands with elevated permissions

Data Exfiltration	Unauthorized data transfer through agent-accessible channels

Code Execution	Injection of executable code through tool responses, prompt manipulation, or context poisoning

Evasion	Techniques to bypass agent safety filters, including encoding tricks and multi-step chains

Denial of Service	Resource exhaustion, infinite loops, and context window flooding attacks

Each category has a dedicated set of rules, and each rule specifies exactly which fields to inspect and what patterns to match.

What an ATR Rule Looks Like

Here is a simplified example:

id: ATR-2026-001
title: Prompt Injection via System Prompt Override
description: Detects attempts to override system prompt through user input
severity: critical
status: stable
category: prompt-injection
detection:
  field: user_prompt
  patterns:
    - "ignore (all |any )?previous instructions"
    - "you are now [A-Za-z]+"
    - "disregard (your |the )?system prompt"
  condition: any
metadata:
  author: Panguard Team
  created: 2026-02-15
  references:
    - https://panguard.ai/atr/ATR-2026-001

The rule is human-readable, machine-parseable, and evaluates in microseconds. Security teams can review it, modify it, and extend it without specialized tooling.

How ATR Fits Into Panguard

ATR is the detection engine behind Panguard Guard's agent protection layer. When Guard monitors an AI agent, every tool interaction is evaluated against the full ATR rule set in real time. Matches trigger alerts, block actions, or quarantine responses depending on the rule severity and user configuration.

ATR rules also power the Skill Auditor. When you run panguard audit skill, the tool descriptions and sample interactions are evaluated against ATR rules to identify potential threats before the skill is ever installed.

Get Involved

ATR is an open standard because AI agent security is too important to be proprietary. Here is how you can participate:

●Read the spec: The full ATR specification is available at https://github.com/panguard-ai/atr-spec
●Review the rules: The initial rule set covers 69 rules across all 9 categories
●Submit rules: If you discover a new attack pattern, write an ATR rule and submit a pull request
●Join the RFC: Comment on the specification, propose new fields, suggest improvements
●Integrate ATR: Build ATR evaluation into your own agent framework or security tool

The goal is simple: build the Sigma equivalent for AI agents. A shared language that the entire security community can use to describe, detect, and defend against threats targeting the agent layer.