This is the “operator’s version” of the security discussion: not fear, not hype — practical defenses.
If you are new, read these first for context:
If you run email/browser tools, treat this guide as required reading.
What this guide helps you finish
By the end of this guide, you should be able to draw a line between “content the agent may read” and “actions the agent may actually take” without relying on vibes.
That means you should know:
- how to treat skills as supply-chain software rather than convenience downloads
- how to keep prompt injection from jumping straight into tools or side effects
- what least privilege means in OpenClaw terms: accounts, files, channels, and execution surfaces
- how to prove your hardening still works after a restart, new skill install, or bad run
Who this is for (and not for)
Use this guide if your OpenClaw instance can read external content, install third-party skills, call tools, or send messages outside a toy lab.
If your instance is strictly local, read-only, and disconnected from third-party identities, this guide is still useful, but the urgency is lower.
Before you expose tools or install skills: collect these five facts
- Which actions can this instance actually take today? File edits, shell, browser control, messaging, or email.
- Which identities are attached to those actions? Primary personal accounts and broad API keys are still the biggest blast-radius multiplier.
- Where does untrusted content enter? Email, PDFs, URLs, tickets, chat logs, or third-party skills.
- What approval gate exists before side effects happen? If there is none, you do not yet have a hardened workflow.
- What is your recovery drill after a bad run? Rotate, disable, restore, and re-enable from a known-good baseline.
0) Threat model in one paragraph
OpenClaw is powerful because it can:
- read/write files
- call external APIs with your keys
- control browser sessions
- send messages/emails
That means compromise can happen through:
- Supply chain (malicious skills / compromised dependencies).
- Prompt injection (malicious instructions embedded in content your agent processes).
- Credential leakage (logs, config, chat transcripts, or tool output).
Your goal isn’t “perfect security” — it’s bounded damage and fast recovery.
1) Skill supply chain safety (ClawHub and beyond)
1.1 Default posture: distrust
Treat every third-party skill as untrusted until proven otherwise.
High-value rules:
- Prefer skills from known maintainers or orgs you can verify.
- Pin versions/commits (avoid “latest” in production).
- Keep an allowlist of approved skills (not an open marketplace on prod).
1.2 Review before install (minimum viable)
Before installing a skill:
- Read its README and entrypoints (what tools does it call?)
- Search for network exfil paths (HTTP POST, webhook, socket, DNS)
- Search for filesystem grabs (
~/.ssh, browser cookies,~/.openclaw, env dumps)
If your OpenClaw instance is used by non-technical users, prefer: “curated skill sets” over free-for-all installs.
1.3 Separate “lab” from “prod”
Operate two environments:
lab(where you experiment with new skills)prod(where only reviewed skills are installed)
Config precedence + multiple state dirs:
2) Prompt injection: assume content can contain commands
Prompt injection is not only a “chat” problem. Any content source can be weaponized:
- emails (hidden text, quoted replies)
- web pages (invisible instructions, CSS tricks)
- documents/PDFs (embedded text)
2.1 The safe rule: separate “summarize” from “act”
Design your workflows as two phases:
- Read-only extraction: summarize, classify, extract structured fields.
- Human approval (or strict policy gate) before any side-effect actions:
- sending emails
- transferring files
- running shell commands
- installing skills
2.2 Constrain tool access by default
If a prompt injection succeeds, the attacker’s power equals your tool permissions.
Safer defaults:
- disable dangerous tools unless required
- restrict outbound messaging recipients (allowlists)
- restrict file read/write scope to a dedicated workspace
If you are integrating channels, keep inbound policies explicit:
- Telegram allowlists/pairing: /guides/telegram-setup
- WhatsApp separate number + allowlist: /guides/whatsapp-setup
3) Blast radius design (what “least privilege” means here)
The most effective “security features” for OpenClaw are operational:
- Separate accounts
- dedicated email inbox
- dedicated messaging accounts
- dedicated API keys (not your main keys)
- Separate machines/users
- run OpenClaw under a non-admin OS user
- isolate the state directory permissions
- Strict networking posture
- avoid exposing the gateway publicly
- prefer SSH tunnels/Tailscale over open ports
Control UI remote access hardening:
Persistence + backups (so recovery is fast):
4) Auditing: make actions provable
If you can’t answer “what happened?”, you can’t contain incidents.
Minimum viable auditing:
- keep logs (and know where they are)
- force runs to write artifacts (timestamped reports)
- keep a short incident checklist (rotate keys, disable channels, uninstall skills, restore known-good state)
If you’re building 24/7 automations, bake “evidence output” into every cron run:
5) Recovery drill (do this once, before you need it)
- Back up the full state dir.
- Practice reinstalling/upgrading without deleting state.
- Rotate a provider API key and confirm OpenClaw picks up the new value via env var substitution.
Install/runtime recovery:
Verification checklist after hardening
Before you call the system hardened, confirm all of these are true:
- new skills go through a review step before they reach the production instance
- untrusted content can be summarized without automatically gaining action rights
- side effects still require approvals, allowlists, or another explicit trust gate
- logs, artifacts, and backups exist well enough to support incident response
- you have practiced the disable / rotate / restore drill at least once
What to rotate first after a bad run
If you suspect a malicious skill, prompt injection, or accidental overreach, rotate in the order that reduces real damage fastest: outbound credentials and API keys first, then channel/session access, then state or skill installation. Recovery gets much calmer when you know which identity actually mattered most.