SECURITY GLOSSARY

What is Indirect Prompt Injection?

Indirect prompt injection is an attack where malicious instructions are embedded in content the AI agent reads as part of doing its job — tool outputs, web pages, retrieved documents, email bodies, screenshots, even image text. The user never directly sends the malicious prompt; the agent encounters it while doing work.

Indirect prompt injection is the most dangerous variant of prompt injection because the user is unaware the attack is happening. The user asks the agent to "summarize this PDF" or "check my inbox" or "browse this URL." The agent retrieves content, processes it, and obeys instructions hidden inside. Result: the agent acts against the user, not for them.

Real-world examples are not theoretical. A 2024 attack on Microsoft Copilot used markdown image references to exfiltrate chat history through DNS lookups. A 2025 demonstration showed a poisoned npm README causing an agent to install backdoor packages. In 2026, microsoft/agent-governance-toolkit issue #1981 (Semantic Kernel CVE-2026-26030) documented how an indirect injection in a SK plugin description could chain to RCE.

The taxonomy spans modalities. Text: README files, markdown comments, JSON tool responses. Web: HTML attributes, hidden CSS pseudo-elements, JS-rendered content. Multimodal: text rendered in screenshots, OCR'd image content, alt-text descriptions. Cross-channel: an email tells the agent to read a Notion page; the Notion page tells the agent to run a tool. Each link in the chain is a fresh injection opportunity.

Defense requires content-source tagging at every retrieval boundary. PanGuard Guard tags every byte of content with its origin (user input vs tool result vs retrieved document) and runs ATR rules against retrieved content before it joins the model's context. 33 ATR rules in the context-exfiltration category specifically target indirect-injection patterns in tool outputs and retrieved documents.

OWASP

LLM01:2025 (indirect variant) + ASI01:2026

References

Related terms

Prompt Injection

Prompt injection is an attack where untrusted input embedded in a prompt causes a large language model to follow instructions from the input instead of the system prompt. OWASP classifies it as the top risk in both the LLM Top 10 (LLM01:2025) and the Agentic Top 10 (ASI01:2026).

Learn more

Tool Poisoning

Tool poisoning is an attack where a malicious tool description or tool response injects instructions into the agent. The agent reads the tool definition or output as plain text and treats embedded instructions as authoritative — a special case of indirect prompt injection focused on the MCP and skill ecosystem.

Learn more

MCP Poisoning

MCP poisoning is a class of attack where malicious instructions are embedded in an MCP (Model Context Protocol) server's tool descriptions, tool responses, or resource content. The agent reads them as part of its operating context and follows them as if they were system instructions.

Learn more

Agent Threat Rule (ATR)

An Agent Threat Rule (ATR) is a YAML-formatted detection rule for AI agent security threats. ATR is to AI agents what Sigma is to SIEM logs and YARA is to malware files: an open, machine-readable detection standard with multi-vendor adoption.

Learn more

Reviewed byAdam Lin·Last reviewed2026-05-12