A free tool that catches when your AI is charging you for answers it already gave.
Here’s something that should bother you: every time your AI chatbot answers “How do I reset my password?”, you get charged. And when the next customer asks “What’s the process for password recovery?”, you get charged again. Different words, same question, two bills.
Ausaf Qazi, a senior software engineer with a background in NLP and text classification, noticed something wasteful: businesses were paying for identical AI answers over and over, sometimes dozens of times a day. The same questions, the same responses, fresh charges every time.
“It was economically absurd,” Qazi wrote. “You wouldn’t charge a customer every time they accessed a frequently-read database record. Why do it with AI responses?”
So he built Mimir, a free tool that catches duplicate questions before they cost you money. The name comes from Norse mythology. Mimir was the god of wisdom and memory. Fitting for a tool that remembers what’s already been answered.
The Problem With How AI Billing Works
Most AI services charge per request. Ask ChatGPT or Claude a question through their API, you pay. Ask it again, you pay again. The system doesn’t care if it just answered the exact same thing five minutes ago.
For a business handling customer inquiries, this adds up fast. Think about how many ways customers ask the same things:
Order tracking:
- “Where’s my order?”
- “Can you track my package?”
- “When will my stuff arrive?”
- “I need an update on my delivery”
Return policies:
- “How do I return something?”
- “What’s your return policy?”
- “Can I send this back?”
- “I want a refund”
Pricing and payments:
- “How much does shipping cost?”
- “Do you offer free shipping?”
- “What are the delivery fees?”
Account issues:
- “I forgot my password”
- “How do I reset my login?”
- “I can’t get into my account”
Each of these variations triggers a separate API call. Each one costs money. A busy e-commerce site might field hundreds of these per week, paying full price every single time for what amounts to maybe a dozen unique answers.
Traditional caching doesn’t help because it only catches exact matches. If one customer types “What are your hours?” and another types “When are you open?”, that’s two different strings. Cache miss. Pay twice.
Mimir does something smarter. It looks at meaning, not just text.
How Semantic Caching Actually Works
The word “semantic” just means “meaning.” So semantic caching is caching based on what a question means, not how it’s worded.
Here’s what happens under the hood:
When a question comes in, Mimir converts it into something called a vector embedding. Think of this as translating the question into a set of coordinates. Not coordinates on a map, but coordinates in “meaning space.” Questions that mean similar things end up with similar coordinates.
So “What are your hours?” might translate to something like [0.23, 0.87, 0.12, …] (but with hundreds of numbers). And “When are you open?” translates to something very close, maybe [0.24, 0.86, 0.13, …]. The numbers are almost identical because the meaning is almost identical.
When a new question arrives, Mimir does a quick distance check: how close is this new question to anything we’ve seen before? If it’s close enough (you set the threshold, typically 95% similarity), Mimir returns the cached answer instead of calling the AI.
If it’s a genuinely new question, Mimir forwards it to the AI provider, gets the response, caches it, and now that answer is available for all future similar questions.
The beauty is that this happens in milliseconds. The user doesn’t notice any delay. They just get their answer, and you don’t get charged for the same response you already paid for yesterday.
What It Saves
According to academic research, semantic caching can cut API calls by up to 68%. Real-world implementations report savings between 40% and 70%.
Let’s make that concrete. Say you’re a small business running an AI customer service bot that handles 25,000 queries a month. At typical GPT-4 pricing, you might be looking at $900 a month, or around $10,800 a year.
If 65% of those queries are variations of questions you’ve already answered (which is pretty normal for customer service), semantic caching drops your bill to somewhere around $3,700 a year.
That’s a $7,000 difference. For a small business, that’s not nothing.
Who This Is For
Mimir isn’t for everyone. If you’re just chatting with ChatGPT personally, this doesn’t apply to you. It’s for businesses and developers running AI through the API, where you pay per request.
Customer service bots are the obvious use case. Any business that handles repetitive inquiries (retail, hospitality, utilities, healthcare admin) is probably answering the same twenty questions over and over. Semantic caching catches most of those.
FAQ chatbots are even better suited. If you’ve built an AI assistant to answer questions about your product or service, the questions are going to cluster around common topics. Pricing, features, compatibility, troubleshooting. These are exactly the kind of repetitive queries that caching handles well.
Internal helpdesks work too. IT departments fielding “how do I connect to VPN” and “my email isn’t syncing” a hundred times a month? Same principle. Cache the common answers, stop paying for them repeatedly.
Educational platforms running AI tutors see similar patterns. Students ask about the same concepts in different ways. “What’s the Pythagorean theorem?” and “How do I calculate the hypotenuse?” don’t need two separate AI calls.
The common thread: anywhere questions cluster around predictable topics, semantic caching saves money.
The Impact
Right now, about 14% of small businesses use AI compared to 34% of larger companies. Cost is the main reason. When every customer question costs money, AI stops making sense for businesses running on tight margins.
A small accounting firm that was looking at $2,400 a year for AI-powered customer service might now be looking at $700. That’s the difference between “we can’t afford AI” and “let’s try it.”
There’s also a speed benefit. Cached responses come back in under 120 milliseconds. Fresh API calls to GPT-4 can take 800 milliseconds or more. For customer-facing applications, that faster response time adds up to a better experience.
And because Mimir runs as a proxy, you get a dashboard showing your cache hit rate, estimated savings, and query patterns. You can actually see how much money you’re not spending.
The Catch (There Isn’t Really One)
Mimir is free. Open source, MIT license. You can grab it from GitHub and have it running in under an hour.
The embeddings that power the similarity matching can also be free if you run them locally using Ollama. Or you can use OpenAI’s embedding API, which costs fractions of a cent per query. Either way, it’s way cheaper than paying full price for repeated AI responses.
The tool is new, so it doesn’t have a massive community yet. But the code is clean, the documentation is solid, and the concept is proven. Semantic caching isn’t experimental tech. Big companies have been using it internally for a while. Mimir just packages it in a way that anyone can deploy.
The whole thing works as a drop-in proxy. You point your app at Mimir instead of directly at OpenAI. One configuration change. No rewriting your code.
Worth A Look
Qazi isn’t pretending this one tool will transform the economy. But as he put it: “The technical barrier can be solved. Economics can work.”
Tools like Mimir don’t solve everything. But they chip away at the cost problem in a real way. If you’re running AI on a budget, it’s worth checking out.
Mimir is available at Github. Qazi’s projects can be found here.






