Engineering

Protect Your AI Agents: How to Audit Skills Before Installing

Panguard TeamMarch 7, 20268 min

AI agent skills can carry prompt injection, reverse shells, and credential theft. Learn how Panguard Skill Auditor scans SKILL.md files in under 1 second and returns a 0-100 risk score.

問題:AI Agent skill 是不可信 code

OpenClaw、AgentSkills、MCP 工具 — AI Agent 生態爆炸。開發者裝幾打 skill 讓 Agent 更強。但每個 skill 本質上是一個指令集,可以操控你的 Agent 做任何事:外洩憑證、開反向 shell、默默改檔案。

目前作法?人類讀 SKILL.md 然後「找紅旗」。這在大約 5 個 skill 後就累了。

什麼是 Panguard Skill Auditor

Panguard Skill Auditor 是專為 AI Agent skill 打造的自動安全掃描器。它在 1 秒內跑 8 項檢查,產出量化風險分(0-100),不是主觀的「看起來還好」。

安裝:

curl -fsSL https://get.panguard.ai | bash

稽核任何 skill:

panguard audit skill ./path/to/skill

8 項安全檢查

1. Manifest 驗證

驗證 SKILL.md frontmatter 有必填欄位(name、description)、有效 YAML 結構、適當的 metadata 格式。Manifest 不良常是匆忙建構的惡意 skill 的第一個跡象。

2. Prompt Injection 偵測

我們維護 13 個 regex pattern 抓最常見的 prompt injection 技術:

●身份覆寫:「you are now」、「act as」、「pretend to be」
●指令劫持:「ignore previous instructions」、「disregard system prompt」
●Jailbreak pattern:「DAN」、「do anything now」、「bypass safety」
●隱藏指令:含「ignore」、「override」、「inject」的 HTML 註解
●系統 prompt 操控:嘗試注入 <|system|> 或 <<SYS>> token
●隱形 <IMPORTANT> 區塊:用 <IMPORTANT> 標籤包住的隱藏指令,含隱蔽或外洩語言
●靜默資料外洩:「silently send」、「without asking... upload」— 未經使用者同意竊取資料的指令

3. 隱藏 Unicode 偵測

這是人類幾乎全部會漏的攻擊。Zero-width 字元(U+200B、U+200C、U+200D)、right-to-left override(U+202E)和其他不可見 Unicode,可以藏在讀檔時字面上不可見的惡意指令。

// 看起來無辜:
Hello world
// 實際上含:
Hello\u200B\u200D[隱藏 payload]\u200Bworld

Panguard 偵測全部 15 類隱藏 Unicode 字元,並回報確切位置。

4. 編碼 payload 偵測

Skill 有時把惡意 code 藏在 Base64 區塊。Panguard 抽出所有超過 40 字元的 Base64 字串,解碼,檢查可疑關鍵字:eval、exec、subprocess、child_process、curl、wget。

5. Tool Poisoning 偵測

掃描危險命令 pattern:

●權限提升:sudo、chmod 777、chmod u+s
●反向 shell:nc -e、bash -i >& /dev/tcp/、mkfifo、socat exec
●遠端 code 執行:curl ... | bash、wget ... | sh
●憑證竊取:printenv | curl,存取 ~/.ssh、.env、.aws/
●破壞性操作:rm -rf /、rm -rf ~

6. Code 安全(SAST + Secret)

超出 SKILL.md 本身,auditor 用 Panguard Scan 的 SAST 引擎掃 skill 目錄中所有檔案。抓寫死的 API key、AWS 憑證、私鑰、常見 code 漏洞。

7. 權限與依賴分析

評估宣告的權限對照 skill 的聲明用途。一個天氣 skill 請求檔案系統寫入?那是紅旗。依賴交叉比對已知安全問題。

風險評分

每個發現根據嚴重度有權重:

嚴重度	權重	範例

|----------|--------|---------|

Critical	25	反向 shell、含系統 prompt 的 prompt injection

High	15	權限提升、憑證竊取

Medium	5	可疑但曖昧的 pattern

Low	1	微小風格問題

權重相加並上限 100。最終分對應風險等級:

●0-14 LOW:快速審查後可安全裝
●15-39 MEDIUM:裝前先看 findings
●40-69 HIGH:需要徹底手動審查
●70-100 CRITICAL:不要裝

如何與你的 Agent 整合

你可以把 Panguard Skill Auditor 當成 Agent pipeline 的安裝前閘門:

# Bash:HIGH 或 CRITICAL 時擋
RISK=$(panguard audit skill "$SKILL_PATH" --json | jq -r '.riskLevel')
if [ "$RISK" = "HIGH" ] || [ "$RISK" = "CRITICAL" ]; then
  echo "Blocked: $RISK risk skill"
  exit 1
fi

或在 TypeScript 程式中用:

import { auditSkill } from '@panguard-ai/panguard-skill-auditor';

const report = await auditSkill('./skills/untrusted-skill');
if (report.riskLevel === 'CRITICAL' || report.riskLevel === 'HIGH') {
  console.error(`Blocked: ${report.riskScore}/100 risk`);
  process.exit(1);
}
console.log(`Safe: ${report.riskScore}/100`);

真實範例:抓到惡意 skill

掃到東西時的輸出長這樣:

PANGUARD SKILL AUDIT REPORT
============================
Skill:      suspicious-helper
Risk Score: 72/100
Risk Level: CRITICAL

FINDINGS:
  [CRITICAL] Prompt injection: ignore previous instructions
             SKILL.md:42
  [CRITICAL] Reverse shell pattern detected
             SKILL.md:87 - "bash -i >& /dev/tcp/..."
  [HIGH]     Environment variable exfiltration
             SKILL.md:23 - "printenv | curl..."

VERDICT: DO NOT INSTALL

OpenClaw Marketplace 上可取得

Panguard Skill Auditor 以 OpenClaw skill 形式可取得。直接裝進你的 Agent:

# 從 OpenClaw marketplace
claw install panguard-ai/panguard-skill-auditor

或用獨立 CLI 給 CI/CD pipeline。

接下來

我們在做 AI 驅動分析(用 LLM 推理抓新型攻擊 pattern)、社群威脅 feed(眾包惡意 skill signature)、託管 API(可以不在本地裝任何東西就稽核 skill)。

開始

curl -fsSL https://get.panguard.ai | bash
panguard audit skill ./my-skill

MIT 授權開源。完整原始碼在 GitHub。