Engineering

When Microsoft says prompts are shells, ATR ships detection rules within 2 hours

2026年5月13日7 min

On 2026-05-11, between 06:07 and 08:24 UTC, a closed loop ran end-to-end in 2 hours and 16 minutes. Microsoft Copilot SWE Agent opened an issue against microsoft/agent-governance-toolkit containing regression-test fixtures that presumed ATR rule IDs for two Semantic Kernel CVEs disclosed four days earlier (CVE-2026-26030 lambda+eval RCE and CVE-2026-25592 autostart-write persistence). ATR v2.1.2 shipped on npm with rules ATR-2026-00440 and ATR-2026-00441 covering both CVEs. The agent-governance-toolkit issue closed the same day. The rules sat in an external open-source repository the entire time — Microsoft Copilot, operating as a software engineering agent inside AGT, was writing regression tests against a detection contract that lived outside Microsoft. This is a small data point and it is also the first time we have observed a major-vendor Copilot SWE Agent write tests that presume an external open-source detection standard. The Microsoft Security Blog framed the upstream story as a category move: prompt injection is no longer just a content-policy problem, it is a code-execution primitive. If that framing is right, detection has to move down the stack — and the rule has to be the contract.

2026-05-11 UTC 06:07 到 08:24 之間,一個閉環在 2 小時 16 分鐘內跑完:

1. Microsoft 的 Copilot SWE Agent 在 microsoft/agent-governance-toolkit 開了一個 issue,裡面的 regression-test fixtures 已經預設了 ATR 的偵測規則。

2. ATR v2.1.2 在 npm 發布,內含那些 fixtures 所測試的兩條規則。

3. 同一天晚一點,AGT 的 issue 被 merge 關閉。

那兩條規則,從頭到尾都在一個 Microsoft 外部的開源 repo 裡。Microsoft Copilot 以 software engineering agent 的身分在 AGT 內部運作時,寫的測試是針對 Microsoft 外部的一份偵測契約。

這是一個小資料點。同時也是我們第一次觀察到一家主流廠商的 Copilot SWE Agent,把 regression test 寫在一份外部開源偵測標準的契約上。

上游語境:prompts 即 shells

四天前,2026-05-07,Microsoft Security Response Center 公告了兩個 Semantic Kernel CVE:

●CVE-2026-26030:In-Memory Vector Store。filter parser 使用 lambda/eval,當攻擊者能控制 filter 字串時可達 RCE。
●CVE-2026-25592:SessionsPythonPlugin。攻擊者控制的檔案路徑可寫入 autostart 位置,達到 persistence。

同一週,Microsoft Security Blog 發了一篇 *When prompts become shells: RCE vulnerabilities in AI agent frameworks*。那篇文章最關鍵的不是漏洞細節,而是 framing。Microsoft Security 團隊主張:當 AI model 被接上 tool 之後,prompt injection 就站在一條細線上 — 一邊是內容安全問題,一邊是 code-execution primitive。他們明確說,AI 層的漏洞已經不再只是內容問題,而是執行風險。

這是一次分類上的搬家。業界長期把 prompt injection 當成內容政策問題,只要模型拒絕說出壞話就算解決。但這兩個 Semantic Kernel CVE 說的是,它同時也是一個 code-execution primitive,agent 會直接執行攻擊者的 code。

如果這個 framing 對,偵測就必須往下挪到 stack 的更深層。lambda 任意表達式求值,你沒辦法用 output filter 擋住。你需要看到 call site 的規則。

2 小時 16 分鐘 timeline

時間皆 UTC。

時間	事件

---	---

2026-05-07	MSRC 公告 CVE-2026-26030 與 CVE-2026-25592。Microsoft Security Blog 發布 "When prompts become shells"。

2026-05-11 06:07	Microsoft Copilot SWE Agent 開啟 `microsoft/agent-governance-toolkit#1981`。issue 內含預設了 ATR 規則 ID 的 regression-test fixtures,兩個 CVE 各一組。

2026-05-11 約 08:24	ATR v2.1.2 在 npm 發布。新增兩條規則:ATR-2026-00440 與 ATR-2026-00441。`Agent-Threat-Rule/agent-threat-rules` 切 GitHub release。同一 repo 的 PR #50 將 AGT#1981 連回。

2026-05-11 15:30	Imran Siddique merge AGT#1981。

End to end:從 Copilot 開 issue 到 ATR npm release,2 小時 16 分鐘。從 issue 開啟到 AGT merge,9 小時 23 分鐘。

引用時要乾淨地說清楚:開 issue 的是 Microsoft Copilot SWE Agent。MSRC 的 advisory、Microsoft Security Blog 的文章、以及 Imran Siddique merge AGT#1981,這三件事是各自獨立的事實。這裡的訊號是 regression-test 契約本身存在,而不是 Microsoft 對 ATR 的任何明確背書。

規則 00440 與 00441 實際在做什麼

ATR-2026-00440:agent-manipulation

對應 CVE-2026-26030,In-Memory Vector Store filter parser 的 lambda+eval RCE。

規則看的是 call-site 的形狀,不是輸入字串本身。三個 primitive pattern:

1. `__mro__` AST traversal:().__class__.__mro__[1].__subclasses__() 及其變形。經典的 sandbox 逃脫技,沿著類型階層一路爬到 subprocess.Popen 或 os.system。

2. `BuiltinImporter` reflective access:走 __builtins__ 拿到 __import__,不直接命名它。

3. `Function` constructor 變形:在 eval 被部分限制的環境下,用字串動態組出 function。

466 條 benign corpus 樣本上:8 true positives, 5 true negatives, 0 false positives。

ATR-2026-00441:privilege-escalation

對應 CVE-2026-25592,SessionsPythonPlugin 的 autostart-write persistence。

涵蓋的 persistence 路徑:

●Windows:%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup、HKCU\Software\Microsoft\Windows\CurrentVersion\Run
●Linux:~/.config/autostart/*.desktop (XDG)、~/.config/systemd/user/、crontab
●macOS:~/Library/LaunchAgents/*.plist

Benign corpus 結果:7 true positives, 5 true negatives, 0 false positives。

兩條規則皆 status: production、maturity: stable,並附完整 test_cases 區塊。檔案路徑分別在 Agent-Threat-Rule/agent-threat-rules 的 data/rules/agent-manipulation/ 與 data/rules/privilege-escalation/。

為什麼這個 loop 值得看

今天大部分的 AI agent runtime 安全,還停留在 2008 年的 SIEM 模式:每家廠商寫自己的規則,規則之間不通,客戶買的其實是「廠商雇人寫規則」這件勞動。

Snort 和後來的 Sigma 在網路與主機偵測上打破了這個模式。規則本身變成契約。loader 在執行、telemetry、回應上面競爭,而不是在誰的規則寫得多。

ATR 在 AI agent runtime 這個垂直上,正在嘗試扮演同樣的角色。Detection-standard-as-contract 模型運作如下:

●一份正典規則檔案,例如 ATR-2026-00440.yml。
●多個 loader:PanGuard Guard、Cisco Skill Scanner(cisco-ai-defense/skill-scanner#79 已 merge)、MISP galaxy(misp/misp-galaxy#1207 已 merge)與 MISP taxonomies(misp/misp-taxonomies#323 已 merge)、OWASP Agent Security Resource Hub(OWASP A-S-R-H #74 已 merge)、Sage(gendigitalinc/sage 進行中)。
●Regression test 寫在 rule ID 上,而不是 loader 上。

AGT#1981 這個 loop,是契約開始能撐住的第一個資料點。Microsoft Copilot 在一個外部 repo 的 rule ID 上寫了測試。當規則存在時,那些測試就過了。

也要把可驗證和不可驗證的部分分開講。可驗證:4 個外部標準組織或框架的 merge(MISP galaxy、MISP taxonomies、OWASP A-S-R-H、microsoft/agent-governance-toolkit、cisco-ai-defense/skill-scanner)。13 條 ecosystem PR 在 6 個外部組織內被 merge。不可驗證:任何 MSRC 背書的說法、任何 sales-led F500 採用的說法。我們沒有,也不會,做這類聲明。

部分覆蓋的誠實註記

ATR v2.1.2 涵蓋了 Microsoft Copilot 寫的 4 條 regression-test fixtures 中的 2 條。另外 2 條使用的 pattern 不符合 v2.1.2 的 canonical regex 形狀。後續版本陸續補上:

●v2.1.4 加入 Spring AI MCP server CVE,並把 lambda-eval pattern 收緊。
●v2.2.2(寫稿當下的版本)在本篇發布同一天 ship 了 Check Point management CVE,總規則數來到 421 條。HackAPrompt corpus recall 在 66.2%。

方向是對的。絕對覆蓋率還不完美。任何在 production 跑 ATR 規則的人,都應該知道哪些 fixture 會 match、哪些不會,並且訂閱 release。

如果你在跑 AI agent runtime

如果你在跑 Semantic Kernel、AutoGen、LangChain、LlamaIndex,或任何會讓 agent 對攻擊者可控字串做 evaluation、或對攻擊者可控路徑做寫入的框架:

1. 還沒升級的話,把 Semantic Kernel patch 到 post-CVE 版本。MSRC advisory 裡有對照表。

2. 在 Agent-Threat-Rule/agent-threat-rules 上訂閱 release。

3. 裝規則:npm install agent-threat-rules 會抓最新版,目前 2.2.2。

4. 或跑 PanGuard Guard,它會在 runtime load ATR 規則並在 agent boundary 強制執行。社群版完全免費開源。Enterprise 版加上合規證據鏈和 EU AI Act、NIST AI RMF、ISO/IEC 42001 audit trail。

如果你是 AI 安全領域的 OSS maintainer,規則檔案以 MIT 授權。Load 它們。Bug 我們在上游修。

連結

●ATR repo:https://github.com/Agent-Threat-Rule/agent-threat-rules
●AGT issue:https://github.com/microsoft/agent-governance-toolkit/issues/1981
●Microsoft Security Blog "When prompts become shells":https://www.microsoft.com/en-us/security/blog/2026/05/07/prompts-become-shells-rce-vulnerabilities-ai-agent-frameworks/
●MSRC CVE-2026-26030:https://msrc.microsoft.com/update-guide/vulnerability/CVE-2026-26030
●MSRC CVE-2026-25592:https://msrc.microsoft.com/update-guide/vulnerability/CVE-2026-25592