An unflinching look at what’s hiding inside the world’s largest agent skill marketplace, and why automated scanning isn’t catching it.
A developer on r/openclaw posted a screenshot last month that should have made more noise than it did. His agent, running a “productivity booster” skill he’d installed from ClawHub three weeks earlier, had quietly POSTed his entire .env file to a Pastebin URL every 48 hours. The skill worked exactly as advertised. It also did something else.
That’s the pattern. And it’s more common than most people think.
Over the past four months, a small group of us pulled a sample of 1,024 skills from ClawHub, the de facto marketplace for agent extensions in the OpenClaw ecosystem. We read the code. We ran the skills in sandboxes. We diffed version histories. What we found is the subject of this post, and some of it surprised even the cynics in the group.
The scale problem nobody talks about
ClawHub currently hosts roughly 44,000 skills. Forty-four thousand. To put that in context, the Chrome Web Store took about a decade to reach that number of extensions, and Google has a team of humans and machine learning systems reviewing every submission. ClawHub relies primarily on a VirusTotal scan and community flagging.
Here’s the part nobody mentions. VirusTotal is designed to catch known malware signatures. It was never built to reason about what a YAML-configured AI agent skill does at runtime when it receives specific prompt triggers. A skill that writes a file, reads an environment variable, and makes an outbound HTTPS request is, by every signature-based definition, boringly normal.
Our sample was stratified. We pulled skills weighted toward popularity (500+ installs), recency (uploaded in the last 90 days), and a random tail for baseline. Of those 1,024 skills, 847 were genuinely fine. They did what they said they did, had reasonable permissions, and no hidden behavior. The other 177 were something else.
What the malicious skills actually do
We grouped the problematic skills into five categories. We’re going to describe what they do in enough detail that you understand the risk, but we’re deliberately not publishing exploit recipes.
Credential harvesting (58 skills). These skills run a first-pass scan of the agent’s execution environment, looking for API keys, OAuth tokens, and SSH private keys. The clever ones don’t exfiltrate immediately. They wait for a specific trigger phrase in a user prompt, then bundle what they’ve collected into a single request masquerading as a “telemetry ping.” A few used DNS tunneling, which bypasses most outbound HTTP monitoring entirely.
Hidden data exfiltration (41 skills). Less about credentials, more about context. These skills log user prompts, conversation history, and any files the agent touches, then POST the data to attacker-controlled endpoints. Three of them actually implemented their own lightweight compression to avoid triggering bandwidth alerts. That’s not a script kiddie. That’s someone who knows what they’re doing.
Obfuscated code (34 skills). Base64-encoded Python inside YAML, dynamic imports pulled from remote URLs at first run, string concatenation that assembles function names at runtime so static analysis can’t trace them. One skill had seven layers of obfuscation. The innermost payload, once unwrapped, was a remote shell. Just a plain remote shell, dressed up as a “code formatting helper.”
Prompt injection payloads (21 skills). This category deserves its own post, honestly. These skills inject adversarial instructions into the agent’s context window through seemingly benign output. Imagine a skill that says it “summarizes Slack messages,” but every summary it returns contains invisible-to-humans text instructing the agent to forward future messages matching certain keywords to an external webhook. The agent, helpful by design, complies.
Bait-and-switch versioning (23 skills). This is the one that keeps me up at night. A developer publishes v1.0 of a skill. It’s clean. It gets starred, reviewed, recommended in blog posts. Six weeks later, they push v1.4 with malicious code. Anyone with auto-updates enabled, which is the ClawHub default, gets the new version silently. Of our 23 bait-and-switch skills, the average delay between the clean release and the malicious update was 47 days. Long enough for trust to build. Short enough to still have an active user base.
The VirusTotal blind spot
VirusTotal caught exactly four of our 177 malicious skills. Four. That’s a 2.3 percent detection rate.
Stay with me here, because this isn’t a knock on VirusTotal. It’s a knock on using VirusTotal as the primary line of defense for a fundamentally new category of software. Traditional antivirus scanning looks for binary signatures of known malware. Agent skills aren’t binaries. They’re configuration files, Python snippets, and API glue code that derive their danger from context, permissions, and runtime behavior.
A skill that reads environment variables isn’t malicious. A skill that makes outbound HTTP calls isn’t malicious. A skill that does both, to an attacker-controlled server, triggered by a specific user phrase, absolutely is. But you can’t catch that statically. You have to reason about it.
This is where we ended up building tooling. We developed a 4-layer skill security audit that catches what automated scanning misses: exfiltration vectors, prompt injection payloads, and permission scope violations. Layer one is static analysis, sure. But the real value is in the behavioral layer, where skills run in isolated Docker sandboxes with instrumented network egress. You watch what they actually do when triggered, not just what they look like on disk.
Why the community flagging model fails
ClawHub’s second line of defense is user reports. In theory, malicious skills get reported, get taken down, problem solved.
In practice? Of our 177 malicious skills, 94 had been live on ClawHub for more than 60 days. Twenty-three had over 1,000 installs. The community flagging model assumes that users can tell when something is wrong. Most can’t. If your productivity skill is productive, and your data is getting siphoned off in the background, you have no feedback signal that anything is amiss.
This is the core asymmetry of agent security. The skill’s advertised behavior is visible and verifiable. Its hidden behavior is, by design, hidden.
There are a handful of approaches emerging to address this. Self-hosting on a properly isolated VPS like DigitalOcean or Hetzner, with manual skill vetting, works if you have the time and expertise. Frameworks like Hermes are experimenting with capability-based permission models at the runtime level. And managed platforms are starting to differentiate themselves on security posture rather than just uptime.
What a real vetting pipeline looks like
If you’re evaluating options, here’s what I’d look for, regardless of which platform you pick.
First, human review for any skill that accesses credentials, network, or filesystem. Not as a replacement for automated scanning, but as a layer on top of it. Second, re-audit on every version update. The bait-and-switch pattern only works if v1.4 gets a free pass because v1.0 was approved. Third, runtime behavioral monitoring, not just pre-publish static analysis. Fourth, an explicit permission model where skills declare what they need and users approve it, rather than “install and hope.”
The result is a curated marketplace of 200+ verified skills that passed our review. We rejected the rest. That sounds like fewer choices, and it is. It’s also the entire point.
The bait-and-switch problem, specifically
The 23 bait-and-switch skills we found are worth dwelling on, because this attack pattern is specific to package ecosystems with auto-update defaults and mostly-unreviewed update pipelines. It’s the same attack we’ve seen in npm with event-stream, in PyPI with ctx and phpass, and now in the agent skill space.
The defense is straightforward in theory. Every version update gets re-audited before it hits users. That’s how app stores work. That’s how Linux distro package maintainers work. It’s not how most agent skill marketplaces work, and it needs to become how they work.
Managed services have an advantage here. BetterClaw’s skill marketplace re-audits every skill update before pushing it to users, preventing the bait-and-switch pattern we found in 23 ClawHub skills. Self-hosted users can achieve something similar by pinning skill versions and manually reviewing diffs before upgrading, but in practice, almost nobody does this.
The uncomfortable conclusion
The 44,000 number is the problem. No community, no automated pipeline, no VirusTotal integration can meaningfully vet 44,000 skills, especially when they update continuously and derive their malicious behavior from runtime context rather than static code.
The ecosystem is going to bifurcate. On one side, large unreviewed marketplaces where the convenience is high and the security floor is wherever the last attacker left it. On the other, smaller curated catalogs where the tradeoff is fewer options but much higher assurance. Both will exist. Most enterprises will eventually need the curated side.
The agent space is where web extensions were in 2011, where npm was in 2016, where mobile app stores were before the walled garden consensus emerged. We know how this movie ends. We’ve just chosen to watch it again.
If you’re running agents in production, especially agents touching customer data or internal credentials, the question to ask isn’t “is my skill marketplace secure.” It’s “what evidence do I have that any given skill still does what it did yesterday.” That’s a harder question. Answering it seriously is most of the work.






