On April 4, 2026, Anthropic cut off Claude Pro/Max subscription access for third-party tools. 135,000+ OpenClaw instances had to face API billing overnight. Here’s how I went from $400 to $60 and why switching to OpenRouter alone won’t save you.
Shortly after noon on April 4, my OpenClaw threw me a billing prompt for the first time. I had been running it on a Claude Max $200/month subscription. Comfortable. That day Anthropic ended third-party subscription passthrough. I switched to API-key billing. One week later the meter read $93, a projected $400/month. That’s more than my entire previous year of Max.
I wasn’t alone. Per dplooy’s analysis, the cutoff affected 135,000+ active OpenClaw instances; c’t 3003 testing showed a heavy user burning $109.55 in one day on Opus. Tech blogger Federico Viticci’s public number was 180M tokens, $3,600 for one month. My $400 was median.
Post-cutoff there are three paths: direct API, OpenRouter, or another provider layer. This is how I went from $400 to $60, not by switching providers, but by figuring out where the money was actually going and fixing four things.
OpenClaw Token Usage: Six Reasons Your Bill Is Higher Than Expected
OpenClaw is open-source and works well, but its defaults are tuned for maximum capability, not minimum cost. That means six things quietly inflate your bill:
- Every task routes to the primary model. Heartbeat checks, calendar scans, sub-agent parallel work — all hit the primary. If that’s Opus (the default), you’re using a $15/M-token model for tasks a $0.50/M model could handle.
- Heartbeat carries full context. The docs state it directly: each heartbeat defaults to ~100K tokens of session context. Every 30 minutes. 48 times a day. This is the quietest bleed on your bill.
- The system prompt is re-sent every turn. OpenClaw assembles SOUL.md, AGENTS.md, TOOLS.md and other bootstrap files into the system prompt on every call, and re-sends the whole thing with each turn. The per-file and total character caps are configurable (bootstrapMaxChars and bootstrapTotalMaxChars), but the default limits are generous enough that the combined payload lands in the tens of thousands of tokens on a typical setup.
- Cache defaults to 5-minute TTL. For Anthropic models, OpenClaw seeds cacheRetention: “short” = 5 minutes. Heartbeat at 30 minutes > 5 minute TTL = every heartbeat is a cache miss, re-writing the full 100K context.
- Thinking tokens can blow usage up 10-50x. With reasoning/thinking mode enabled, the model’s internal reasoning counts as output tokens. OpenClaw’s own help docs note thinking mode can amplify token usage 10-50x; Gemini 2.5 Pro specifically can consume “1.9M+ input tokens in just a few dozen API calls.”
- Sessions grow indefinitely; tool output pollutes history. “I just asked it to check my project structure. It traversed the whole directory and output tens of thousands of lines, all of which went into session history. Now every message resends that useless junk to the model.” OpenClaw does ship /clear (manual) and compaction (auto), but compaction’s trigger threshold is high — it only kicks in near the 200K context window, so sessions accumulate tens of thousands of tokens before anything cleans up.
Best Models for OpenClaw: How to Tier Without Breaking Heartbeat
The biggest single cut is model tiering. The principle is simple: heartbeats, simple lookups, classification, use the cheapest model that works. Only real reasoning, code, and content generation touches Opus.
Reference pricing (input + output per 1M tokens, rough combined): Claude Opus ~$30, Sonnet 4.5 ~$15, Gemini 2.5 Flash-Lite ~$0.50, DeepSeek V3.2 ~$0.53. That’s a 60x gap between Opus and the cheapest competent models.
OpenClaw’s config schema offers several tiering fields: agents.defaults.model.primary, agents.defaults.heartbeat.model, agents.defaults.subagents.model, channels.modelByChannel.
But I don’t trust heartbeat.model. It has a history of breaking. There have been repeated reports over the past several months of users configuring a cheap model here and still seeing the primary used in their logs. The issue has been closed and reopened at least once; as of this writing, recent release notes don’t clearly confirm it’s fixed.
My strategy avoids that field. Instead I use two heartbeat knobs that are confirmed working:
{
“agents”: {
“defaults”: {
“heartbeat”: {
“every”: “30m”,
“lightContext”: true, // only HEARTBEAT.md in bootstrap
“isolatedSession”: true // no session history
}
}
}
}
The docs give the number directly: these two flags together take heartbeat tokens from ~100K down to ~2-5K per run — 20-50x. ClawCloud’s 2026.3.24 release notes independently confirm: “two heartbeat session modes that cut per-run token cost by over 95%.”
Primary model and sub-agent tiering work through agents.list[].model and agents.defaults.subagents.model. Manual switching via /model — the most reliable path, what all the guides use.
OpenClaw Prompt Cache: Why You’re Paying Full Price on Every Turn
Anthropic’s prompt cache is theoretically the biggest savings lever: cache read is 0.1x the base input rate — a 90% discount. The same 50,000-token system prompt hit 1,000 times a month costs nearly 10x more with cache broken:

But OpenClaw’s defaults don’t let you capture this. Three reasons — all documented:
TTL mismatch. OpenClaw defaults cacheRetention: “short” = 5 min for Anthropic models. Heartbeat default is 30 min. 5 < 30 = every heartbeat misses cache, re-writing all bootstrap files. Fix: set cacheRetention: “long” (1h TTL) and move heartbeat to 55 minutes. Note: 1h cache writes cost 2x base input (vs 1.25x at 5min), so this math only works if reads are frequent enough.
Volatile system prompt. If your SOUL.md, AGENTS.md, or any bootstrap file contains timestamps, dates, or dynamic variables — the cache prefix changes every turn, never hits. OpenClaw’s prompt-caching Quick troubleshooting says it plainly: “High cacheWrite on most turns: check for volatile system-prompt inputs.”
Multi-session routing fragmentation. If you run many concurrent sessions through a routing layer like OpenRouter, identical prompts can get routed to different GPU clusters — each cluster has its own cold cache, overall hit rate drops. Solving this needs prompt_cache_key metadata support at the routing layer, and not every gateway has it.
Diagnosing your own cache is concrete: run openclaw /usage full and check cacheRead vs cacheWrite. If cacheWrite is high most turns and cacheRead sits at 0, your cache isn’t working. For deeper inspection, enable diagnostics.cacheTrace.
OpenClaw Fallback Is Broken — Here’s What to Do Instead
I used to think configuring a fallbacks array was enough. It’s not.
OpenClaw’s native fallback has two known bugs that together make it fragile right now. First: when a fallback triggers, the primary model’s tool definitions get forwarded to the fallback in the wrong format, the fallback rejects the request, and your agent stops with a cryptic “provider rejected the request schema”. Second, and worse: when a fallback succeeds, the session quietly stays on the fallback model. Wvery subsequent turn starts from there, not your configured primary. Someone running a multi-provider chain in production watched their session drift through three different models in a single day without issuing any switch commands, eventually landing on a free model that started sending English “thinking out loud” reasoning text to their Chinese users.
You configured fallbacks — the schema may be rejected when they trigger; if fallback succeeds, your session drifts permanently and silently.
The combined effect: OpenClaw’s native fallback is in a shaky state right now. If your strategy relies on it always catching you when a provider fails, you may already be silently losing requests.
Two paths forward: use a single provider without fallbacks (sacrificing reliability), or push fallback decisions out of OpenClaw. Let OpenClaw see just one model endpoint, and let an external provider layer handle routing, schema normalization, and session-drift prevention.
OpenClaw vs OpenRouter vs Direct API: Real Cost Comparison
Post-April 4, the community is discussing three main paths:

AI Inference Optimization: Why I Moved from OpenRouter to Infron
Honestly: I started on OpenRouter. More mature, more discussion. After two weeks I moved to Infron. Three specific reasons.
First: passthrough pricing + low fee. Infron’s FAQ is explicit: “We pass through the pricing of the underlying model providers without any markup, you pay the same rate as you would directly with the provider.” The fee is $0.35 + 5% on credit purchase only. For OpenClaw-scale individual users, roughly comparable to OpenRouter’s 5.5%, but Infron doesn’t charge a BYOK markup.
Second: multi-provider pooling solves the schema fallback problem. Infron aggregates the same model across 100+ providers. When direct Anthropic goes down (the ~10-hour outage on April 6, 2026), Infron pools across AWS Bedrock and GCP Vertex Claude instances. From OpenClaw’s perspective, the model endpoint never died. Schema normalization happens inside Infron, not in OpenClaw’s fallback module (the one with the known bugs I mentioned above).
Third: smart routing does the optimization you’d otherwise do by hand. Infron’s default routing isn’t fixed, it load-balances each request in real time across providers hosting the same model, in its own words “to maximize uptime and best price.” Every request is evaluated on four dimensions simultaneously: latency, throughput, reliability, and price, over a rolling 5-minute statistics window. You get the cheapest healthy provider at the moment of the call, not whichever one you happened to guess right about when you wrote your config.
More interestingly, you can tell it “give me the cheapest provider that meets my performance floor”:
“provider”: {
“sort”: “price”,
“preferred_max_latency”: { “p90”: 3 } // 90% of calls < 3s
}
Reads as: from all providers whose p90 latency is under 3 seconds, pick the cheapest. For OpenClaw’s heartbeats and sub-agent calls this is exactly right.
For my own setup: $400 to ~$60. The Infron-attributable share came mostly from smart routing keeping every call on the cheapest healthy provider in the moment; model tiering and cache did the rest. None of it required me to profile providers weekly or manually rebalance.
The Final openclaw.json
Below is a simplified version of what I actually run. The value is in the comments. Each line addresses one of the problems above:
{
“models”: {
“providers”: {
“infron”: {
“baseUrl”: “https://llm.onerouter.pro/v1”,
“apiKey”: “<your-infron-key>”,
“api”: “openai-completions”,
“models”: [
{ “id”: “anthropic/claude-sonnet-4.5”, “contextWindow”: 200000 },
{ “id”: “google/gemini-2.5-flash-lite”, “contextWindow”: 1000000 },
{ “id”: “deepseek/deepseek-v3.2”, “contextWindow”: 128000 }
]
}
}
},
“agents”: {
“defaults”: {
“model”: {
“primary”: “infron/anthropic/claude-sonnet-4.5”,
“fallbacks”: [“infron/google/gemini-2.5-flash-lite”]
},
“models”: {
“infron/anthropic/claude-sonnet-4.5”: {
“params”: { “cacheRetention”: “long” }
}
},
“heartbeat”: {
“every”: “55m”,
“lightContext”: true,
“isolatedSession”: true
}
}
}
}
Four moves: primary is not Opus, cacheRetention set to long, heartbeat at 55m + lightContext + isolatedSession, baseUrl points at a passthrough provider layer.
The last one matters because it stabilizes the first three. You could go direct to Anthropic, you stop when they do. You could use OpenRouter, the 5.5% + BYOK markup + sticky routing limits bite at multi-session scale. Or you can use Infron. Its passthrough pricing and 100+ provider pooling are why I landed there.
Four OpenClaw API Cost Fixes You Can Apply Right Now
Even if you don’t switch provider layers, these four cost you nothing and save meaningful money:
- Run openclaw /usage full — check cacheRead and cacheWrite. If cacheWrite is chronically high and cacheRead near zero, your cache isn’t working.
- Open ~/.openclaw/openclaw.json — check if your primary is Opus. If yes, change to Sonnet or cheaper immediately.
- Add lightContext: true and isolatedSession: true to heartbeat — officially confirmed 95%+ cost reduction.
- Check SOUL.md, AGENTS.md, HEARTBEAT.md for timestamps or dynamic variables — remove them. They’re making your cache miss every turn.
April 4 was a watershed for the OpenClaw community, no more “subscription covers everything” comfort. Every token now costs real money. But that’s what forced us to figure out how the system actually spends. Every claim above is documented in the official docs or public GitHub issues. The money you save is also real.






