Engineering

Protect Your AI Agents: How to Audit Skills Before Installing

Panguard Team2026年3月7日8 min

AI agent skills can carry prompt injection, reverse shells, and credential theft. Learn how Panguard Skill Auditor scans SKILL.md files in under 1 second and returns a 0-100 risk score.

The Problem: AI Agent Skills Are Untrusted Code

OpenClaw, AgentSkills, MCP tools -- the AI agent ecosystem is exploding. Developers install dozens of skills to make their agents more capable. But every skill is essentially an instruction set that can manipulate your agent into doing anything: exfiltrating credentials, opening reverse shells, or quietly modifying files.

The current approach? A human reads the SKILL.md and "looks for red flags." That scales to about 5 skills before fatigue sets in.

What Is Panguard Skill Auditor

Panguard Skill Auditor is an automated security scanner purpose-built for AI agent skills. It runs 7 checks in under 1 second and produces a quantitative risk score (0-100) instead of a subjective "looks fine."

Install it:

bash
curl -fsSL https://panguard.ai/api/install | bash

Audit any skill:

bash
panguard audit skill ./path/to/skill

The 7 Security Checks

### 1. Manifest Validation

Verifies the SKILL.md frontmatter has required fields (name, description), valid YAML structure, and proper metadata formatting. A malformed manifest is often the first sign of a hastily constructed malicious skill.

### 2. Prompt Injection Detection

We maintain 11 regex patterns that catch the most common prompt injection techniques:

--**Identity override**: "you are now", "act as", "pretend to be"

--**Instruction hijacking**: "ignore previous instructions", "disregard system prompt"

--**Jailbreak patterns**: "DAN", "do anything now", "bypass safety"

--**Hidden directives**: HTML comments containing "ignore", "override", "inject"

--**System prompt manipulation**: attempts to inject `<|system|>` or `<<SYS>>` tokens

### 3. Hidden Unicode Detection

This is the attack most humans miss entirely. Zero-width characters (U+200B, U+200C, U+200D), right-to-left overrides (U+202E), and other invisible Unicode can hide malicious instructions that are literally invisible when reading the file.

// This looks innocent:
Hello world
// But actually contains:
Hello\u200B\u200D[hidden payload]\u200Bworld

Panguard detects all 15 categories of hidden Unicode characters and reports their exact position.

### 4. Encoded Payload Detection

Skills sometimes hide malicious code in Base64 blocks. Panguard extracts all Base64 strings longer than 40 characters, decodes them, and checks for suspicious keywords: `eval`, `exec`, `subprocess`, `child_process`, `curl`, `wget`.

### 5. Tool Poisoning Detection

Scans for dangerous command patterns:

--**Privilege escalation**: `sudo`, `chmod 777`, `chmod u+s`

--**Reverse shells**: `nc -e`, `bash -i >& /dev/tcp/`, `mkfifo`, `socat exec`

--**Remote code execution**: `curl ... | bash`, `wget ... | sh`

--**Credential theft**: `printenv | curl`, accessing `~/.ssh`, `.env`, `.aws/`

--**Destructive operations**: `rm -rf /`, `rm -rf ~`

### 6. Code Security (SAST + Secrets)

Beyond the SKILL.md itself, the auditor scans all files in the skill directory using Panguard Scan's SAST engine. This catches hardcoded API keys, AWS credentials, private keys, and common code vulnerabilities.

### 7. Permission & Dependency Analysis

Evaluates declared permissions against the skill's stated purpose. A weather skill requesting filesystem write access? That is a red flag. Dependencies are cross-referenced for known security issues.

Risk Scoring

Each finding carries a weight based on severity:

| Severity | Weight | Example |

|----------|--------|---------|

| Critical | 25 | Reverse shell, prompt injection with system prompt |

| High | 15 | Privilege escalation, credential theft |

| Medium | 5 | Suspicious but ambiguous patterns |

| Low | 1 | Minor style issues |

Weights are summed and capped at 100. The final score maps to a risk level:

--**0-14 LOW**: Safe to install after quick review

--**15-39 MEDIUM**: Review findings before installing

--**40-69 HIGH**: Requires thorough manual review

--**70-100 CRITICAL**: Do NOT install

How to Integrate with Your Agent

You can use Panguard Skill Auditor as a pre-install gate in your agent pipeline:

bash
# Bash: block if HIGH or CRITICAL
RISK=$(panguard audit skill "$SKILL_PATH" --json | jq -r '.riskLevel')
if [ "$RISK" = "HIGH" ] || [ "$RISK" = "CRITICAL" ]; then
  echo "Blocked: $RISK risk skill"
  exit 1
fi

Or use it programmatically in TypeScript:

typescript
import { auditSkill } from '@panguard-ai/panguard-skill-auditor';

const report = await auditSkill('./skills/untrusted-skill');
if (report.riskLevel === 'CRITICAL' || report.riskLevel === 'HIGH') {
  console.error(`Blocked: ${report.riskScore}/100 risk`);
  process.exit(1);
}
console.log(`Safe: ${report.riskScore}/100`);

Real Example: Catching a Malicious Skill

Here is what a scan looks like when it catches something:

PANGUARD SKILL AUDIT REPORT
============================
Skill:      suspicious-helper
Risk Score: 72/100
Risk Level: CRITICAL

FINDINGS:
  [CRITICAL] Prompt injection: ignore previous instructions
             SKILL.md:42
  [CRITICAL] Reverse shell pattern detected
             SKILL.md:87 - "bash -i >& /dev/tcp/..."
  [HIGH]     Environment variable exfiltration
             SKILL.md:23 - "printenv | curl..."

VERDICT: DO NOT INSTALL

Available on OpenClaw Marketplace

Panguard Skill Auditor is available as an OpenClaw skill. Install it directly into your agent:

bash
# From OpenClaw marketplace
claw install panguard-ai/panguard-skill-auditor

Or use the standalone CLI for CI/CD pipelines.

What Is Next

We are working on AI-powered analysis (using LLM reasoning to catch novel attack patterns), community threat feeds (crowdsourced malicious skill signatures), and a hosted API so you can audit skills without installing anything locally.

Get Started

bash
curl -fsSL https://panguard.ai/api/install | bash
panguard audit skill ./my-skill

Free forever for the Community plan. Full API access starts at $9/month.