Close Menu
NERDBOT
    Facebook X (Twitter) Instagram YouTube
    Subscribe
    NERDBOT
    • News
      • Reviews
    • Movies & TV
    • Comics
    • Gaming
    • Collectibles
    • Science & Tech
    • Culture
    • Nerd Voices
    • About Us
      • Join the Team at Nerdbot
    NERDBOT
    Home»Nerd Voices»NV Tech»OpenClaw API Cost: How I Cut Token Usage 85% After Anthropic Blocked OpenRouter
    NV Tech

    OpenClaw API Cost: How I Cut Token Usage 85% After Anthropic Blocked OpenRouter

    Nerd VoicesBy Nerd VoicesApril 27, 202612 Mins Read
    Share
    Facebook Twitter Pinterest Reddit WhatsApp Email

    On April 4, 2026, Anthropic cut off Claude Pro/Max subscription access for third-party tools. 135,000+ OpenClaw instances had to face API billing overnight. Here’s how I went from $400 to $60 and why switching to OpenRouter alone won’t save you.


    Shortly after noon on April 4, my OpenClaw threw me a billing prompt for the first time. I had been running it on a Claude Max $200/month subscription. Comfortable. That day Anthropic ended third-party subscription passthrough. I switched to API-key billing. One week later the meter read $93, a projected $400/month. That’s more than my entire previous year of Max.

    I wasn’t alone. Per dplooy’s analysis, the cutoff affected 135,000+ active OpenClaw instances; c’t 3003 testing showed a heavy user burning $109.55 in one day on Opus. Tech blogger Federico Viticci’s public number was 180M tokens, $3,600 for one month. My $400 was median.

    Post-cutoff there are three paths: direct API, OpenRouter, or another provider layer. This is how I went from $400 to $60, not by switching providers, but by figuring out where the money was actually going and fixing four things.


    OpenClaw Token Usage: Six Reasons Your Bill Is Higher Than Expected

    OpenClaw is open-source and works well, but its defaults are tuned for maximum capability, not minimum cost. That means six things quietly inflate your bill:

    1. Every task routes to the primary model. Heartbeat checks, calendar scans, sub-agent parallel work — all hit the primary. If that’s Opus (the default), you’re using a $15/M-token model for tasks a $0.50/M model could handle.
    2. Heartbeat carries full context. The docs state it directly: each heartbeat defaults to ~100K tokens of session context. Every 30 minutes. 48 times a day. This is the quietest bleed on your bill.
    3. The system prompt is re-sent every turn. OpenClaw assembles SOUL.md, AGENTS.md, TOOLS.md and other bootstrap files into the system prompt on every call, and re-sends the whole thing with each turn. The per-file and total character caps are configurable (bootstrapMaxChars and bootstrapTotalMaxChars), but the default limits are generous enough that the combined payload lands in the tens of thousands of tokens on a typical setup.
    4. Cache defaults to 5-minute TTL. For Anthropic models, OpenClaw seeds cacheRetention: “short” = 5 minutes. Heartbeat at 30 minutes > 5 minute TTL = every heartbeat is a cache miss, re-writing the full 100K context.
    5. Thinking tokens can blow usage up 10-50x. With reasoning/thinking mode enabled, the model’s internal reasoning counts as output tokens. OpenClaw’s own help docs note thinking mode can amplify token usage 10-50x; Gemini 2.5 Pro specifically can consume “1.9M+ input tokens in just a few dozen API calls.”
    6. Sessions grow indefinitely; tool output pollutes history. “I just asked it to check my project structure. It traversed the whole directory and output tens of thousands of lines, all of which went into session history. Now every message resends that useless junk to the model.” OpenClaw does ship /clear (manual) and compaction (auto), but compaction’s trigger threshold is high — it only kicks in near the 200K context window, so sessions accumulate tens of thousands of tokens before anything cleans up.

    Best Models for OpenClaw: How to Tier Without Breaking Heartbeat

    The biggest single cut is model tiering. The principle is simple: heartbeats, simple lookups, classification, use the cheapest model that works. Only real reasoning, code, and content generation touches Opus.

    Reference pricing (input + output per 1M tokens, rough combined): Claude Opus ~$30, Sonnet 4.5 ~$15, Gemini 2.5 Flash-Lite ~$0.50, DeepSeek V3.2 ~$0.53. That’s a 60x gap between Opus and the cheapest competent models.

    OpenClaw’s config schema offers several tiering fields: agents.defaults.model.primary, agents.defaults.heartbeat.model, agents.defaults.subagents.model, channels.modelByChannel.

    But I don’t trust heartbeat.model. It has a history of breaking. There have been repeated reports over the past several months of users configuring a cheap model here and still seeing the primary used in their logs. The issue has been closed and reopened at least once; as of this writing, recent release notes don’t clearly confirm it’s fixed.

    My strategy avoids that field. Instead I use two heartbeat knobs that are confirmed working:

    {

      “agents”: {

        “defaults”: {

          “heartbeat”: {

            “every”: “30m”,

            “lightContext”: true,    // only HEARTBEAT.md in bootstrap

            “isolatedSession”: true  // no session history

          }

        }

      }

    }

    The docs give the number directly: these two flags together take heartbeat tokens from ~100K down to ~2-5K per run — 20-50x. ClawCloud’s 2026.3.24 release notes independently confirm: “two heartbeat session modes that cut per-run token cost by over 95%.”

    Primary model and sub-agent tiering work through agents.list[].model and agents.defaults.subagents.model. Manual switching via /model — the most reliable path, what all the guides use.


    OpenClaw Prompt Cache: Why You’re Paying Full Price on Every Turn

    Anthropic’s prompt cache is theoretically the biggest savings lever: cache read is 0.1x the base input rate — a 90% discount. The same 50,000-token system prompt hit 1,000 times a month costs nearly 10x more with cache broken:

    But OpenClaw’s defaults don’t let you capture this. Three reasons — all documented:

    TTL mismatch. OpenClaw defaults cacheRetention: “short” = 5 min for Anthropic models. Heartbeat default is 30 min. 5 < 30 = every heartbeat misses cache, re-writing all bootstrap files. Fix: set cacheRetention: “long” (1h TTL) and move heartbeat to 55 minutes. Note: 1h cache writes cost 2x base input (vs 1.25x at 5min), so this math only works if reads are frequent enough.

    Volatile system prompt. If your SOUL.md, AGENTS.md, or any bootstrap file contains timestamps, dates, or dynamic variables — the cache prefix changes every turn, never hits. OpenClaw’s prompt-caching Quick troubleshooting says it plainly: “High cacheWrite on most turns: check for volatile system-prompt inputs.”

    Multi-session routing fragmentation. If you run many concurrent sessions through a routing layer like OpenRouter, identical prompts can get routed to different GPU clusters — each cluster has its own cold cache, overall hit rate drops. Solving this needs prompt_cache_key metadata support at the routing layer, and not every gateway has it.

    Diagnosing your own cache is concrete: run openclaw /usage full and check cacheRead vs cacheWrite. If cacheWrite is high most turns and cacheRead sits at 0, your cache isn’t working. For deeper inspection, enable diagnostics.cacheTrace.


    OpenClaw Fallback Is Broken — Here’s What to Do Instead

    I used to think configuring a fallbacks array was enough. It’s not.

    OpenClaw’s native fallback has two known bugs that together make it fragile right now. First: when a fallback triggers, the primary model’s tool definitions get forwarded to the fallback in the wrong format, the fallback rejects the request, and your agent stops with a cryptic “provider rejected the request schema”. Second, and worse: when a fallback succeeds, the session quietly stays on the fallback model. Wvery subsequent turn starts from there, not your configured primary. Someone running a multi-provider chain in production watched their session drift through three different models in a single day without issuing any switch commands, eventually landing on a free model that started sending English “thinking out loud” reasoning text to their Chinese users.

    You configured fallbacks — the schema may be rejected when they trigger; if fallback succeeds, your session drifts permanently and silently.

    The combined effect: OpenClaw’s native fallback is in a shaky state right now. If your strategy relies on it always catching you when a provider fails, you may already be silently losing requests.

    Two paths forward: use a single provider without fallbacks (sacrificing reliability), or push fallback decisions out of OpenClaw. Let OpenClaw see just one model endpoint, and let an external provider layer handle routing, schema normalization, and session-drift prevention.


    OpenClaw vs OpenRouter vs Direct API: Real Cost Comparison

    Post-April 4, the community is discussing three main paths:


    AI Inference Optimization: Why I Moved from OpenRouter to Infron

    Honestly: I started on OpenRouter. More mature, more discussion. After two weeks I moved to Infron. Three specific reasons.

    First: passthrough pricing + low fee. Infron’s FAQ is explicit: “We pass through the pricing of the underlying model providers without any markup, you pay the same rate as you would directly with the provider.” The fee is $0.35 + 5% on credit purchase only. For OpenClaw-scale individual users, roughly comparable to OpenRouter’s 5.5%, but Infron doesn’t charge a BYOK markup.

    Second: multi-provider pooling solves the schema fallback problem. Infron aggregates the same model across 100+ providers. When direct Anthropic goes down (the ~10-hour outage on April 6, 2026), Infron pools across AWS Bedrock and GCP Vertex Claude instances. From OpenClaw’s perspective, the model endpoint never died. Schema normalization happens inside Infron, not in OpenClaw’s fallback module (the one with the known bugs I mentioned above).

    Third: smart routing does the optimization you’d otherwise do by hand. Infron’s default routing isn’t fixed, it load-balances each request in real time across providers hosting the same model, in its own words “to maximize uptime and best price.” Every request is evaluated on four dimensions simultaneously: latency, throughput, reliability, and price, over a rolling 5-minute statistics window. You get the cheapest healthy provider at the moment of the call, not whichever one you happened to guess right about when you wrote your config.

    More interestingly, you can tell it “give me the cheapest provider that meets my performance floor”:

    “provider”: {

      “sort”: “price”,

      “preferred_max_latency”: { “p90”: 3 }  // 90% of calls < 3s

    }

    Reads as: from all providers whose p90 latency is under 3 seconds, pick the cheapest. For OpenClaw’s heartbeats and sub-agent calls this is exactly right.

    For my own setup: $400 to ~$60. The Infron-attributable share came mostly from smart routing keeping every call on the cheapest healthy provider in the moment; model tiering and cache did the rest. None of it required me to profile providers weekly or manually rebalance.


    The Final openclaw.json

    Below is a simplified version of what I actually run. The value is in the comments. Each line addresses one of the problems above:

    {

      “models”: {

        “providers”: {

          “infron”: {

            “baseUrl”: “https://llm.onerouter.pro/v1”,

            “apiKey”: “<your-infron-key>”,

            “api”: “openai-completions”,

            “models”: [

              { “id”: “anthropic/claude-sonnet-4.5”, “contextWindow”: 200000 },

              { “id”: “google/gemini-2.5-flash-lite”, “contextWindow”: 1000000 },

              { “id”: “deepseek/deepseek-v3.2”, “contextWindow”: 128000 }

            ]

          }

        }

      },

      “agents”: {

        “defaults”: {

          “model”: {

            “primary”: “infron/anthropic/claude-sonnet-4.5”,

            “fallbacks”: [“infron/google/gemini-2.5-flash-lite”]

          },

          “models”: {

            “infron/anthropic/claude-sonnet-4.5”: {

              “params”: { “cacheRetention”: “long” }

            }

          },

          “heartbeat”: {

            “every”: “55m”,

            “lightContext”: true,

            “isolatedSession”: true

          }

        }

      }

    }

    Four moves: primary is not Opus, cacheRetention set to long, heartbeat at 55m + lightContext + isolatedSession, baseUrl points at a passthrough provider layer.

    The last one matters because it stabilizes the first three. You could go direct to Anthropic, you stop when they do. You could use OpenRouter, the 5.5% + BYOK markup + sticky routing limits bite at multi-session scale. Or you can use Infron. Its passthrough pricing and 100+ provider pooling are why I landed there.


    Four OpenClaw API Cost Fixes You Can Apply Right Now

    Even if you don’t switch provider layers, these four cost you nothing and save meaningful money:

    1. Run openclaw /usage full — check cacheRead and cacheWrite. If cacheWrite is chronically high and cacheRead near zero, your cache isn’t working.
    2. Open ~/.openclaw/openclaw.json — check if your primary is Opus. If yes, change to Sonnet or cheaper immediately.
    3. Add lightContext: true and isolatedSession: true to heartbeat — officially confirmed 95%+ cost reduction.
    4. Check SOUL.md, AGENTS.md, HEARTBEAT.md for timestamps or dynamic variables — remove them. They’re making your cache miss every turn.

    April 4 was a watershed for the OpenClaw community, no more “subscription covers everything” comfort. Every token now costs real money. But that’s what forced us to figure out how the system actually spends. Every claim above is documented in the official docs or public GitHub issues. The money you save is also real.

    Do You Want to Know More?

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Email
    Previous ArticleSnapchat Stories Disappear in 24 Hours — Here’s How to Save Them
    Next Article Why Gen Z is increasingly turning to AI to find love
    Nerd Voices

    Here at Nerdbot we are always looking for fresh takes on anything people love with a focus on television, comics, movies, animation, video games and more. If you feel passionate about something or love to be the person to get the word of nerd out to the public, we want to hear from you!

    Related Posts

    EIM on Setting Acceptable Risk Thresholds for SaaS Startups

    June 27, 2026

    Seedance 2.5 Just Dropped, and It Changes the One-Take Game

    June 27, 2026
    How Cleared DevOps Cloud Jobs Are Shaping Federal Tech Careers

    How Cleared DevOps Cloud Jobs Are Shaping Federal Tech Careers

    June 27, 2026
    The Importance of Dig Trace and IP Blacklist Check Tools for Monitoring IP Reputation and Improving Cybersecurity Performance

    The Importance of Dig Trace and IP Blacklist Check Tools for Monitoring IP Reputation and Improving Cybersecurity Performance

    June 27, 2026
    Office Software

    How Office Software Helps Users Work Across Windows and Mobile Devices

    June 26, 2026
    https://unsplash.com/photos/person-using-smartphone-GWkioAj5aB4

    Find Out Who Called Me: Simple Ways to Identify Unknown Numbers 

    June 26, 2026
    • Latest
    • News
    • Movies
    • TV
    • Reviews

    How Are Online Courses Helping Indians Make Successful Career Switches in 2026?

    June 28, 2026

    Faraday Future Didn’t Bring One Robot to Chicago. It Brought a Whole Robot Civilization.

    June 28, 2026

    Best Crypto Casinos 2026: 3 Platforms Ranked & Reviewed by My Personal Experience

    June 27, 2026

    EIM on Setting Acceptable Risk Thresholds for SaaS Startups

    June 27, 2026
    Jackass

    “Jackass: Best and Last” A Swan Song for Nut Taps [review]

    June 27, 2026
    Supergirl

    “Supergirl” Milly Alcock Shines in a Disappointing Superhero Film [review]

    June 26, 2026

    7 Reasons Why Physical Media is Better Than Streaming

    June 25, 2026

    New Polls Show American are Reading Less. Why?

    June 23, 2026
    Jackass

    “Jackass: Best and Last” A Swan Song for Nut Taps [review]

    June 27, 2026

    “The Texas Chain Saw Massacre” Will Hit Theaters Agian, This Time in 4K

    June 26, 2026
    Supergirl

    “Supergirl” Milly Alcock Shines in a Disappointing Superhero Film [review]

    June 26, 2026

    “Ever After” Unites Several Horror Icons For a Fairy Tale Slasher

    June 25, 2026

    “Dark Shadows” is Getting an Animated Series From Warner Bros. Animation

    June 26, 2026

    Leslie Jones Talks About ‘Frustrating’ “SNL” Experiences, & Being Typecast

    June 24, 2026
    "Kevin," 2026

    Aubrey Plaza Reveals Amazon‘s Prime Canceled Animated Series “Kevin”

    June 22, 2026

    Netflix’s Little House on the Prairie Is Expanding the Story of Dr. George Tann

    June 22, 2026
    Jackass

    “Jackass: Best and Last” A Swan Song for Nut Taps [review]

    June 27, 2026
    Supergirl

    “Supergirl” Milly Alcock Shines in a Disappointing Superhero Film [review]

    June 26, 2026

    Mammotion Wins! I’m Now Excited to Mow My Giant Rural Lawn

    June 22, 2026

    “Disclosure Day” A Disappointing Alien Adventure [review]

    June 14, 2026
    Check Out Our Latest
      • Product Reviews
      • Reviews
      • SDCC 2021
      • SDCC 2022
    Related Posts

    None found

    NERDBOT
    Facebook X (Twitter) Instagram YouTube
    Nerdbot is owned and operated by Nerds! If you have an idea for a story or a cool project send us a holler on Editors@Nerdbot.com

    Type above and press Enter to search. Press Esc to cancel.