Close Menu
NERDBOT
    Facebook X (Twitter) Instagram YouTube
    Subscribe
    NERDBOT
    • News
      • Reviews
    • Movies & TV
    • Comics
    • Gaming
    • Collectibles
    • Science & Tech
    • Culture
    • Nerd Voices
    • About Us
      • Join the Team at Nerdbot
    NERDBOT
    Home»Nerd Voices»NV Tech»OpenClaw API Cost: How I Cut Token Usage 85% After Anthropic Blocked OpenRouter
    NV Tech

    OpenClaw API Cost: How I Cut Token Usage 85% After Anthropic Blocked OpenRouter

    Nerd VoicesBy Nerd VoicesApril 27, 202612 Mins Read
    Share
    Facebook Twitter Pinterest Reddit WhatsApp Email

    On April 4, 2026, Anthropic cut off Claude Pro/Max subscription access for third-party tools. 135,000+ OpenClaw instances had to face API billing overnight. Here’s how I went from $400 to $60 and why switching to OpenRouter alone won’t save you.


    Shortly after noon on April 4, my OpenClaw threw me a billing prompt for the first time. I had been running it on a Claude Max $200/month subscription. Comfortable. That day Anthropic ended third-party subscription passthrough. I switched to API-key billing. One week later the meter read $93, a projected $400/month. That’s more than my entire previous year of Max.

    I wasn’t alone. Per dplooy’s analysis, the cutoff affected 135,000+ active OpenClaw instances; c’t 3003 testing showed a heavy user burning $109.55 in one day on Opus. Tech blogger Federico Viticci’s public number was 180M tokens, $3,600 for one month. My $400 was median.

    Post-cutoff there are three paths: direct API, OpenRouter, or another provider layer. This is how I went from $400 to $60, not by switching providers, but by figuring out where the money was actually going and fixing four things.


    OpenClaw Token Usage: Six Reasons Your Bill Is Higher Than Expected

    OpenClaw is open-source and works well, but its defaults are tuned for maximum capability, not minimum cost. That means six things quietly inflate your bill:

    1. Every task routes to the primary model. Heartbeat checks, calendar scans, sub-agent parallel work — all hit the primary. If that’s Opus (the default), you’re using a $15/M-token model for tasks a $0.50/M model could handle.
    2. Heartbeat carries full context. The docs state it directly: each heartbeat defaults to ~100K tokens of session context. Every 30 minutes. 48 times a day. This is the quietest bleed on your bill.
    3. The system prompt is re-sent every turn. OpenClaw assembles SOUL.md, AGENTS.md, TOOLS.md and other bootstrap files into the system prompt on every call, and re-sends the whole thing with each turn. The per-file and total character caps are configurable (bootstrapMaxChars and bootstrapTotalMaxChars), but the default limits are generous enough that the combined payload lands in the tens of thousands of tokens on a typical setup.
    4. Cache defaults to 5-minute TTL. For Anthropic models, OpenClaw seeds cacheRetention: “short” = 5 minutes. Heartbeat at 30 minutes > 5 minute TTL = every heartbeat is a cache miss, re-writing the full 100K context.
    5. Thinking tokens can blow usage up 10-50x. With reasoning/thinking mode enabled, the model’s internal reasoning counts as output tokens. OpenClaw’s own help docs note thinking mode can amplify token usage 10-50x; Gemini 2.5 Pro specifically can consume “1.9M+ input tokens in just a few dozen API calls.”
    6. Sessions grow indefinitely; tool output pollutes history. “I just asked it to check my project structure. It traversed the whole directory and output tens of thousands of lines, all of which went into session history. Now every message resends that useless junk to the model.” OpenClaw does ship /clear (manual) and compaction (auto), but compaction’s trigger threshold is high — it only kicks in near the 200K context window, so sessions accumulate tens of thousands of tokens before anything cleans up.

    Best Models for OpenClaw: How to Tier Without Breaking Heartbeat

    The biggest single cut is model tiering. The principle is simple: heartbeats, simple lookups, classification, use the cheapest model that works. Only real reasoning, code, and content generation touches Opus.

    Reference pricing (input + output per 1M tokens, rough combined): Claude Opus ~$30, Sonnet 4.5 ~$15, Gemini 2.5 Flash-Lite ~$0.50, DeepSeek V3.2 ~$0.53. That’s a 60x gap between Opus and the cheapest competent models.

    OpenClaw’s config schema offers several tiering fields: agents.defaults.model.primary, agents.defaults.heartbeat.model, agents.defaults.subagents.model, channels.modelByChannel.

    But I don’t trust heartbeat.model. It has a history of breaking. There have been repeated reports over the past several months of users configuring a cheap model here and still seeing the primary used in their logs. The issue has been closed and reopened at least once; as of this writing, recent release notes don’t clearly confirm it’s fixed.

    My strategy avoids that field. Instead I use two heartbeat knobs that are confirmed working:

    {

      “agents”: {

        “defaults”: {

          “heartbeat”: {

            “every”: “30m”,

            “lightContext”: true,    // only HEARTBEAT.md in bootstrap

            “isolatedSession”: true  // no session history

          }

        }

      }

    }

    The docs give the number directly: these two flags together take heartbeat tokens from ~100K down to ~2-5K per run — 20-50x. ClawCloud’s 2026.3.24 release notes independently confirm: “two heartbeat session modes that cut per-run token cost by over 95%.”

    Primary model and sub-agent tiering work through agents.list[].model and agents.defaults.subagents.model. Manual switching via /model — the most reliable path, what all the guides use.


    OpenClaw Prompt Cache: Why You’re Paying Full Price on Every Turn

    Anthropic’s prompt cache is theoretically the biggest savings lever: cache read is 0.1x the base input rate — a 90% discount. The same 50,000-token system prompt hit 1,000 times a month costs nearly 10x more with cache broken:

    But OpenClaw’s defaults don’t let you capture this. Three reasons — all documented:

    TTL mismatch. OpenClaw defaults cacheRetention: “short” = 5 min for Anthropic models. Heartbeat default is 30 min. 5 < 30 = every heartbeat misses cache, re-writing all bootstrap files. Fix: set cacheRetention: “long” (1h TTL) and move heartbeat to 55 minutes. Note: 1h cache writes cost 2x base input (vs 1.25x at 5min), so this math only works if reads are frequent enough.

    Volatile system prompt. If your SOUL.md, AGENTS.md, or any bootstrap file contains timestamps, dates, or dynamic variables — the cache prefix changes every turn, never hits. OpenClaw’s prompt-caching Quick troubleshooting says it plainly: “High cacheWrite on most turns: check for volatile system-prompt inputs.”

    Multi-session routing fragmentation. If you run many concurrent sessions through a routing layer like OpenRouter, identical prompts can get routed to different GPU clusters — each cluster has its own cold cache, overall hit rate drops. Solving this needs prompt_cache_key metadata support at the routing layer, and not every gateway has it.

    Diagnosing your own cache is concrete: run openclaw /usage full and check cacheRead vs cacheWrite. If cacheWrite is high most turns and cacheRead sits at 0, your cache isn’t working. For deeper inspection, enable diagnostics.cacheTrace.


    OpenClaw Fallback Is Broken — Here’s What to Do Instead

    I used to think configuring a fallbacks array was enough. It’s not.

    OpenClaw’s native fallback has two known bugs that together make it fragile right now. First: when a fallback triggers, the primary model’s tool definitions get forwarded to the fallback in the wrong format, the fallback rejects the request, and your agent stops with a cryptic “provider rejected the request schema”. Second, and worse: when a fallback succeeds, the session quietly stays on the fallback model. Wvery subsequent turn starts from there, not your configured primary. Someone running a multi-provider chain in production watched their session drift through three different models in a single day without issuing any switch commands, eventually landing on a free model that started sending English “thinking out loud” reasoning text to their Chinese users.

    You configured fallbacks — the schema may be rejected when they trigger; if fallback succeeds, your session drifts permanently and silently.

    The combined effect: OpenClaw’s native fallback is in a shaky state right now. If your strategy relies on it always catching you when a provider fails, you may already be silently losing requests.

    Two paths forward: use a single provider without fallbacks (sacrificing reliability), or push fallback decisions out of OpenClaw. Let OpenClaw see just one model endpoint, and let an external provider layer handle routing, schema normalization, and session-drift prevention.


    OpenClaw vs OpenRouter vs Direct API: Real Cost Comparison

    Post-April 4, the community is discussing three main paths:


    AI Inference Optimization: Why I Moved from OpenRouter to Infron

    Honestly: I started on OpenRouter. More mature, more discussion. After two weeks I moved to Infron. Three specific reasons.

    First: passthrough pricing + low fee. Infron’s FAQ is explicit: “We pass through the pricing of the underlying model providers without any markup, you pay the same rate as you would directly with the provider.” The fee is $0.35 + 5% on credit purchase only. For OpenClaw-scale individual users, roughly comparable to OpenRouter’s 5.5%, but Infron doesn’t charge a BYOK markup.

    Second: multi-provider pooling solves the schema fallback problem. Infron aggregates the same model across 100+ providers. When direct Anthropic goes down (the ~10-hour outage on April 6, 2026), Infron pools across AWS Bedrock and GCP Vertex Claude instances. From OpenClaw’s perspective, the model endpoint never died. Schema normalization happens inside Infron, not in OpenClaw’s fallback module (the one with the known bugs I mentioned above).

    Third: smart routing does the optimization you’d otherwise do by hand. Infron’s default routing isn’t fixed, it load-balances each request in real time across providers hosting the same model, in its own words “to maximize uptime and best price.” Every request is evaluated on four dimensions simultaneously: latency, throughput, reliability, and price, over a rolling 5-minute statistics window. You get the cheapest healthy provider at the moment of the call, not whichever one you happened to guess right about when you wrote your config.

    More interestingly, you can tell it “give me the cheapest provider that meets my performance floor”:

    “provider”: {

      “sort”: “price”,

      “preferred_max_latency”: { “p90”: 3 }  // 90% of calls < 3s

    }

    Reads as: from all providers whose p90 latency is under 3 seconds, pick the cheapest. For OpenClaw’s heartbeats and sub-agent calls this is exactly right.

    For my own setup: $400 to ~$60. The Infron-attributable share came mostly from smart routing keeping every call on the cheapest healthy provider in the moment; model tiering and cache did the rest. None of it required me to profile providers weekly or manually rebalance.


    The Final openclaw.json

    Below is a simplified version of what I actually run. The value is in the comments. Each line addresses one of the problems above:

    {

      “models”: {

        “providers”: {

          “infron”: {

            “baseUrl”: “https://llm.onerouter.pro/v1”,

            “apiKey”: “<your-infron-key>”,

            “api”: “openai-completions”,

            “models”: [

              { “id”: “anthropic/claude-sonnet-4.5”, “contextWindow”: 200000 },

              { “id”: “google/gemini-2.5-flash-lite”, “contextWindow”: 1000000 },

              { “id”: “deepseek/deepseek-v3.2”, “contextWindow”: 128000 }

            ]

          }

        }

      },

      “agents”: {

        “defaults”: {

          “model”: {

            “primary”: “infron/anthropic/claude-sonnet-4.5”,

            “fallbacks”: [“infron/google/gemini-2.5-flash-lite”]

          },

          “models”: {

            “infron/anthropic/claude-sonnet-4.5”: {

              “params”: { “cacheRetention”: “long” }

            }

          },

          “heartbeat”: {

            “every”: “55m”,

            “lightContext”: true,

            “isolatedSession”: true

          }

        }

      }

    }

    Four moves: primary is not Opus, cacheRetention set to long, heartbeat at 55m + lightContext + isolatedSession, baseUrl points at a passthrough provider layer.

    The last one matters because it stabilizes the first three. You could go direct to Anthropic, you stop when they do. You could use OpenRouter, the 5.5% + BYOK markup + sticky routing limits bite at multi-session scale. Or you can use Infron. Its passthrough pricing and 100+ provider pooling are why I landed there.


    Four OpenClaw API Cost Fixes You Can Apply Right Now

    Even if you don’t switch provider layers, these four cost you nothing and save meaningful money:

    1. Run openclaw /usage full — check cacheRead and cacheWrite. If cacheWrite is chronically high and cacheRead near zero, your cache isn’t working.
    2. Open ~/.openclaw/openclaw.json — check if your primary is Opus. If yes, change to Sonnet or cheaper immediately.
    3. Add lightContext: true and isolatedSession: true to heartbeat — officially confirmed 95%+ cost reduction.
    4. Check SOUL.md, AGENTS.md, HEARTBEAT.md for timestamps or dynamic variables — remove them. They’re making your cache miss every turn.

    April 4 was a watershed for the OpenClaw community, no more “subscription covers everything” comfort. Every token now costs real money. But that’s what forced us to figure out how the system actually spends. Every claim above is documented in the official docs or public GitHub issues. The money you save is also real.

    Do You Want to Know More?

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Email
    Previous ArticleSnapchat Stories Disappear in 24 Hours — Here’s How to Save Them
    Next Article Why Gen Z is increasingly turning to AI to find love
    Nerd Voices

    Here at Nerdbot we are always looking for fresh takes on anything people love with a focus on television, comics, movies, animation, video games and more. If you feel passionate about something or love to be the person to get the word of nerd out to the public, we want to hear from you!

    Related Posts

    DTF Printer Game Changer

    DTF Printer Game Changer: 6 Design Secrets of the D2 You Probably Didn’t Know

    May 18, 2026
    How CSPs Streamline the Transition from Legacy to Cloud

    Cii Technology Celebrates 45 Years as Raleigh’s Longest Running Provider of Managed IT Services

    May 18, 2026

    How Invisible Security Technologies Are Fighting Modern Counterfeiting

    May 18, 2026
    RPG Games Online

     Based RPG Games Online — Ranked for 2026

    May 18, 2026
    CNC Roll Bender

    Vacuum Casting Service and Swiss Machining: Two Smart Solutions for Modern Manufacturing

    May 17, 2026
    The Best AI Tools for Creating Consistent Cartoon Characters for Comics and Children's Books (2026)

    The Best AI Tools for Creating Consistent Cartoon Characters for Comics and Children’s Books (2026)

    May 16, 2026
    • Latest
    • News
    • Movies
    • TV
    • Reviews

    Cassandra Gordon of Organisational Intelligence Group Pty Ltd on Why Leaders Stay in Misaligned Roles

    May 18, 2026
    "Obsession," 2026

    Curry Barker Want to Turn “Obsession” Into an Anthology Series

    May 18, 2026
    DTF Printer Game Changer

    DTF Printer Game Changer: 6 Design Secrets of the D2 You Probably Didn’t Know

    May 18, 2026
    Dog Needs Flea

    Top Signs Your Dog Needs Flea, Tick & Worming Treatment

    May 18, 2026

    A24 Secures Global Rights to “Club Kid” After Cannes Bidding War

    May 18, 2026

    Julianne Moore Honored at Kering Women in Motion Awards at Cannes

    May 18, 2026

    Keanu Reeves Set to Voice Lead in Stop-Motion Samurai Film “Hidari”

    May 18, 2026

    “Sonic 4” Wraps Production, Metal Sonic Finally Revealed

    May 18, 2026
    "Obsession," 2026

    Curry Barker Want to Turn “Obsession” Into an Anthology Series

    May 18, 2026

    Keanu Reeves Set to Voice Lead in Stop-Motion Samurai Film “Hidari”

    May 18, 2026

    “Sonic 4” Wraps Production, Metal Sonic Finally Revealed

    May 18, 2026
    "Hope," 2026

    Na Hong-jin Cosmic Creature Feature “Hope” Gets Teaser Trailer

    May 18, 2026

    Netflix Officially Greenlit “Barbaric” Fantasy Series

    May 14, 2026

    Larry David Asks Obama to Be His Emergency Contact in New HBO Teaser

    May 12, 2026

    Ryan Coogler’s X-Files Reboot with Amy Madigan, Steve Buscemi, Ben Foster and More

    May 11, 2026

    “Saturday Night Live UK” Gets Second Season Renewal

    May 8, 2026
    Is God Is

    “Is God Is” Vengeance, Violence and Voice to Black Rage [review]

    May 17, 2026

    “Mortal Kombat 2” Slight Improvement But No Flawless Victory

    May 8, 2026
    How Lucky Am I by Christian Watson

    “How Lucky Am I” by Christian Watson is a Must Read During Hard Times

    May 7, 2026

    “The Devil Wears Prada 2” A Passible Legacy Sequel, That’s All (review)

    May 2, 2026
    Check Out Our Latest
      • Product Reviews
      • Reviews
      • SDCC 2021
      • SDCC 2022
    Related Posts

    None found

    NERDBOT
    Facebook X (Twitter) Instagram YouTube
    Nerdbot is owned and operated by Nerds! If you have an idea for a story or a cool project send us a holler on Editors@Nerdbot.com

    Type above and press Enter to search. Press Esc to cancel.