Engineering

When Microsoft says prompts are shells, ATR ships detection rules within 2 hours

May 13, 20267 min

On 2026-05-11, between 06:07 and 08:24 UTC, a closed loop ran end-to-end in 2 hours and 16 minutes. Microsoft Copilot SWE Agent opened an issue against microsoft/agent-governance-toolkit containing regression-test fixtures that presumed ATR rule IDs for two Semantic Kernel CVEs disclosed four days earlier (CVE-2026-26030 lambda+eval RCE and CVE-2026-25592 autostart-write persistence). ATR v2.1.2 shipped on npm with rules ATR-2026-00440 and ATR-2026-00441 covering both CVEs. The agent-governance-toolkit issue closed the same day. The rules sat in an external open-source repository the entire time — Microsoft Copilot, operating as a software engineering agent inside AGT, was writing regression tests against a detection contract that lived outside Microsoft. This is a small data point and it is also the first time we have observed a major-vendor Copilot SWE Agent write tests that presume an external open-source detection standard. The Microsoft Security Blog framed the upstream story as a category move: prompt injection is no longer just a content-policy problem, it is a code-execution primitive. If that framing is right, detection has to move down the stack — and the rule has to be the contract.

On 2026-05-11, between 06:07 and 08:24 UTC, a closed loop ran end-to-end in 2 hours and 16 minutes:

1. Microsoft's Copilot SWE Agent opened an issue against microsoft/agent-governance-toolkit containing regression-test fixtures that presumed ATR rules.

2. ATR v2.1.2 shipped on npm with the two rules those fixtures were testing.

3. The agent-governance-toolkit issue closed nine hours later, on the same day.

The rules sat in an external open-source repository the entire time. Microsoft Copilot, operating as a software engineering agent inside AGT, was writing regression tests against a detection contract that lived outside Microsoft.

That is a small data point. It is also the first time we have observed a major-vendor Copilot SWE Agent write tests that presume an external open-source detection standard.

The upstream framing: prompts as shells

Four days earlier, on 2026-05-07, the Microsoft Security Response Center disclosed two Semantic Kernel CVEs:

●CVE-2026-26030: In-Memory Vector Store. A lambda/eval-based filter parser allowed arbitrary code execution when an attacker controlled the filter string.
●CVE-2026-25592: SessionsPythonPlugin. Attacker-controlled file paths could write to autostart locations, achieving persistence.

The same week, the Microsoft Security Blog published *When prompts become shells: RCE vulnerabilities in AI agent frameworks*. The framing is the load-bearing part of that post. The Microsoft Security team argues that once an AI model is wired to tools, prompt injection sits on a thin line between being a content-security problem and becoming a code-execution primitive — and that vulnerabilities in the AI layer are no longer just a content issue, they are an execution risk.

That is a category move. The industry has been treating prompt injection as a content problem, where the model refuses to say bad things. The two Semantic Kernel CVEs say it is also a code-execution primitive, where the agent runs attacker code.

If that framing is right, detection has to move down the stack. You cannot guard against lambda arbitrary-expression evaluation with an output filter. You need rules that look at the call site.

The 2 hour 16 minute timeline

Times are UTC.

Time	Event

---	---

2026-05-07	MSRC publishes CVE-2026-26030 and CVE-2026-25592. Microsoft Security Blog publishes "When prompts become shells."

2026-05-11 06:07	Microsoft Copilot SWE Agent opens `microsoft/agent-governance-toolkit#1981`. The issue carries regression-test fixtures presuming ATR rule IDs for both CVEs.

2026-05-11 ~08:24	ATR v2.1.2 npm-published. Two new rules: ATR-2026-00440 and ATR-2026-00441. GitHub release cut against `Agent-Threat-Rule/agent-threat-rules`. PR #50 in that repo links AGT#1981.

2026-05-11 15:30	Imran Siddique merges AGT#1981.

End to end: 2 hours 16 minutes from Copilot's issue to ATR npm release. 9 hours 23 minutes from issue open to AGT merge.

Cite this cleanly: Microsoft Copilot was the SWE Agent that opened the issue. The MSRC advisory itself, the Microsoft Security Blog post, and Imran Siddique's merge of AGT#1981 are separate facts. The signal here is the existence of the regression-test contract, not any explicit Microsoft endorsement of ATR.

What rules 00440 and 00441 actually do

ATR-2026-00440: agent-manipulation

Targets CVE-2026-26030 lambda+eval RCE in the In-Memory Vector Store filter parser.

The rule looks at the call-site shape, not the input string. Three primitive patterns:

1. AST-traversal via `__mro__`: ().__class__.__mro__[1].__subclasses__() and its variants. This is the classic sandbox-escape primitive that climbs the type hierarchy to reach subprocess.Popen or os.system.

2. `BuiltinImporter` reflective access: walking through __builtins__ to reach __import__ without naming it.

3. `Function` constructor variants: dynamically constructing a function from a string in environments where eval is partially restricted.

Benign corpus check on 466 samples: 8 true positives, 5 true negatives, 0 false positives.

ATR-2026-00441: privilege-escalation

Targets CVE-2026-25592 autostart-write persistence in SessionsPythonPlugin.

Persistence paths covered:

●Windows: %APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup, HKCU\Software\Microsoft\Windows\CurrentVersion\Run
●Linux: ~/.config/autostart/*.desktop (XDG), ~/.config/systemd/user/, crontab
●macOS: ~/Library/LaunchAgents/*.plist

Benign corpus check: 7 true positives, 5 true negatives, 0 false positives.

Both rules ship with status: production, maturity: stable, and full test_cases blocks. The rule files are in data/rules/agent-manipulation/ and data/rules/privilege-escalation/ of Agent-Threat-Rule/agent-threat-rules.

Why this loop matters

Most agent runtime security today follows the SIEM-of-2008 pattern: every vendor writes its own rules, the rules do not compose, and customers pay for the rule-writing labour as part of the product.

Snort and later Sigma broke that pattern for network and host detection. The rule was the contract. Vendors that loaded the rules competed on execution, telemetry, and response.

ATR is positioning to play that role for AI agent runtime. The detection-standard-as-contract model works like this:

●One canonical rule file: e.g. ATR-2026-00440.yml.
●Multiple loaders: PanGuard Guard, Cisco Skill Scanner (cisco-ai-defense/skill-scanner#79 merged), MISP galaxy entries (misp/misp-galaxy#1207 merged) and MISP taxonomies (misp/misp-taxonomies#323 merged), OWASP Agent Security Resource Hub (OWASP A-S-R-H #74 merged), Sage (gendigitalinc/sage open).
●Regression tests written against the rule ID, not the loader.

The AGT#1981 loop is one data point that the contract is starting to hold. Microsoft Copilot wrote tests against rule IDs in an external repo. Those tests passed once the rule existed.

It is worth being precise about what is verifiable and what is not. Verifiable: 4 external standards-body or framework merges (MISP galaxy, MISP taxonomies, OWASP A-S-R-H, microsoft/agent-governance-toolkit, cisco-ai-defense/skill-scanner). 13 ecosystem PRs merged across 6 external organisations. Not verifiable: any statement of MSRC endorsement, any sales-led F500 adoption claim. We have not made and will not make those claims.

Partial coverage caveat

ATR v2.1.2 covered 2 of the 4 regression-test fixtures Microsoft Copilot wrote. The other 2 used patterns that did not match v2.1.2's canonical regex shape. Subsequent versions closed more of the gap:

●v2.1.4 added Spring AI MCP server CVEs and tightened the lambda-eval pattern.
●v2.2.2 (current at time of writing) shipped Check Point management CVEs the same day this post went live, bringing the total to 421 rules. HackAPrompt corpus recall is at 66.2 percent.

The direction is right. Absolute coverage is still imperfect. Anyone running ATR rules in production should know which fixtures match and which do not, and should subscribe to releases.

What to do if you operate an AI agent runtime

If you are running Semantic Kernel, AutoGen, LangChain, LlamaIndex, or any framework that lets an agent evaluate attacker-controlled strings or write to attacker-controlled paths:

1. Patch Semantic Kernel to the post-CVE versions if you have not. MSRC advisory has the version table.

2. Subscribe to release notifications on Agent-Threat-Rule/agent-threat-rules.

3. Install rules: npm install agent-threat-rules pulls the latest version. Current 2.2.2.

4. Or run PanGuard Guard, which loads ATR rules at runtime and enforces them at the agent boundary. Free and open-source community tier. Enterprise tier adds compliance evidence and SOC 2 audit trails.

If you are an OSS maintainer in the AI safety space, the rule files are MIT-licensed. Load them. We will fix bugs upstream.