Six CVEs, Two Vendors, One Detection Layer: ATR Ships the OX MCP Disclosure Pack
When the protocol vendor declines to patch, signature-based detection is the only realistic mitigation. Here are six MIT-licensed YAML rules covering the entire OX Security MCP-by-design batch (CVE-2026-40933 Flowise, CVE-2026-30623 LiteLLM, CVE-2026-22252 LibreChat, CVE-2026-22688 WeKnora, CVE-2025-54136 Cursor zero-click) plus Microsoft Copilot Studio CVE-2026-21520. All ship in agent-threat-rules v2.0.18.
On April 15 2026, OX Security published research showing that 200,000 MCP servers — and the 150 million+ SDK downloads behind them — are exposed to arbitrary command execution by design. Anthropic declined to modify the protocol, classifying the behavior as expected and assigning sanitization responsibility to downstream developers.
Five days later the CVEs started landing. Then the Microsoft Copilot Studio follow-on (CVE-2026-21520) reminded everyone that even patched indirect prompt injections keep exfiltrating data when source-origin trust is miscalibrated.
This post does one thing: ships the detection pack so security teams running ATR-compatible scanners have signatures for the entire batch as of today.
The six rules
| ATR ID | CVE | What it catches |
|---|
|---|---|---|
| ATR-2026-00415 | CVE-2026-40933 (CVSS 9.9) | Flowise Custom MCP node — npx -c, node -e, python -c, bash -c flag-bypass class |
|---|
| ATR-2026-00416 | CVE-2026-30623 | LiteLLM-class unauthenticated MCP server registration |
|---|
| ATR-2026-00417 | CVE-2026-22252 | LibreChat MCP STDIO argument injection (argv-level shell metachars) |
|---|
| ATR-2026-00418 | CVE-2026-22688 | WeKnora MCP plugin config-driven RCE (config-file as exec target) |
|---|
| ATR-2026-00419 | CVE-2025-54136 + OX batch | Cursor / Windsurf / Claude Code / Gemini CLI / Copilot zero-click MCP config |
|---|
| ATR-2026-00420 | CVE-2026-21520 | Microsoft Copilot Studio SharePoint indirect prompt injection |
|---|
All six ship in agent-threat-rules v2.0.18 on npm. MIT-licensed.
The pattern across all six
If you read the rule YAML, the same root-cause class shows up repeatedly: allow-listed binary + interpreter inline-execution flag = arbitrary code. The pattern is so consistent that detection is straightforward — the combination is what distinguishes safe invocation from RCE, not the binary alone.
# Excerpt from ATR-2026-00415
- field: tool_response
operator: regex
value: '(?i)"command"\s*:\s*"(?:npx|node|deno|bun)"\s*,\s*"args"\s*:\s*\[[^\]]*"-(?:c|e|-eval|-command|-exec)"\s*,\s*"[^"]{4,400}"'
description: "MCP server config invoking Node-family interpreter with inline-execution flag — direct CVE-2026-40933 RCE signature"The same anchor pattern, parameterized for shell binaries and PowerShell encoded commands, covers the cross-platform variants. Five rules, one detection idea.
The Copilot Studio rule (ATR-2026-00420) is structurally different — it detects internal-source channel + injection-prologue + external-domain forwarding intent. The shape is: (SharePoint|Teams|Outlook) + (ignore previous|disregard above|new instructions:) + email-to-non-Microsoft-domain. Source-origin trust failure is the architecture-level problem; this rule catches the surface signature.
Why a signature layer is the only option here
Anthropic's response to OX was that STDIO command execution is intended behavior. That is a defensible position from a protocol-design standpoint, but it leaves every downstream MCP client to implement their own input sanitization — and Cisco's State of AI Security 2026 reported that only 34.7% of organizations have prompt injection defenses deployed at all.
When the protocol vendor will not patch, a community-maintained signature layer is what stops the bleeding. Not because signatures replace architectural fixes, but because they buy time while the architectural debate plays out. That is what ATR is for.
False positive budget
All six rules ship with explicit false_positives blocks documenting what they will trip on (educational documentation, security tooling that scans for these payloads, internal team templates with reviewed configs). Aggregate FP rate across the six: 0% on the 432-sample benign skill corpus that ATR's auto-merge gate runs against every PR.
Evasion budget
Each rule also carries evasion_tests documenting the specific bypasses the regex tier will not catch. The recurring evasions across this pack:
- ●
/usr/bin/envwrapper — attacker uses env as the literal command field, putting bash/python in args[0]. Five rules in this pack are defeated by this technique. v2 needs an env-wrapper anchor. - ●Dropped binary indirection — attacker drops a payload binary first via a separate vector, then registers an absolute path. Command field is benign-looking. Behavioral / file-integrity detection is the answer here, not regex.
- ●Malicious package publication — attacker publishes a trojanized npm package and references it by name only. Falls into supply-chain detection (covered by ATR's package-hallucination + skill-malware rule families, not the new CVE pack).
These are documented because dishonest evasion budgets erode trust faster than missed catches. We tell users what we cannot do.
How to deploy
Cisco AI Defense (skill-scanner) and Microsoft Agent Governance Toolkit both auto-sync the upstream npm package; users on those integrations already received v2.0.18.
For self-hosted ATR consumers:
npm update agent-threat-rules
# or
pnpm up agent-threat-rulesFor GitHub Action users:
- uses: agent-threat-rules/atr-action@v2
with:
rules-version: '>=2.0.18'What is next
ATR's roadmap for May:
1. Publish env-wrapper detection v2 across the affected rules
2. Map the six CVE rules to the AARM control points once CSAI Foundation's spec lands
3. Add LLM-as-judge tier-2 confirmation for Copilot Studio rule (regex tier is the floor; semantic tier is needed for evasion-resistance)
If you run an MCP-using stack and want to evaluate ATR coverage against your specific deployment, the repo is at github.com/Agent-Threat-Rule. File an issue. Open a PR. The contribution gate is automated and turnaround is hours, not weeks.
When the protocol will not be patched, the only thing slower than detection is silence.