AI Prompt Caching Cost Calculator
Find out how much prompt caching cuts your LLM API bill. Enter your reused prefix size, per-request tokens, and request volume for Claude, OpenAI, or Gemini, and see the cost with and without caching, the dollar savings, and the break-even point — using each provider's official cache-write and cache-read multipliers.
How it works
Prompt caching stores the unchanging front of your prompt — the system prompt, tool definitions, and few-shot examples — so repeated requests pay a fraction of the normal input price instead of re-billing the whole prefix every time. The savings depend on three things: how large the cached prefix is, how often it is reused inside one cache window, and the cache multipliers your provider charges.
Every token cost here is tokens × pricePerMillion ÷ 1,000,000. Let P be the cached prefix, F the fresh input per request, O the output per request, N the requests in the window, inP the input price and outP the output price (in $ per 1M tokens). The two scenarios are:
- Without caching, every request pays full input price on the whole prefix: N·(P+F)·inP/1e6 + N·O·outP/1e6.
- With caching, the first request writes the prefix once and the rest read it: a write of P·(inP·writeMult)/1e6, reads of (N−1)·P·(inP·readMult)/1e6, fresh input of N·F·inP/1e6, and output of N·O·outP/1e6.
The multipliers come from each provider's docs. Anthropic charges a cache read at 0.1× the input price and a cache write at 1.25× for the five-minute cache or 2× for the one-hour cache. OpenAI caches automatically with cached input at roughly 0.5× the input price and no write premium. Gemini bills cached tokens at a reduced rate plus an hourly storage charge per million cached tokens, which this tool adds to the cached scenario.
Savings are simply without − with, and the savings percentage is that figure divided by the un-cached cost. The break-even read count is the smallest number of requests at which caching the prefix beats paying full price for it: the smallest n where writeMult + (n−1)·readMult < n. For Anthropic this resolves to 2 requests on the five-minute cache and 3 on the one-hour cache, matching the worked examples in Anthropic's prompt-caching documentation. A warning fires when your prefix falls below the model's minimum cacheable size, because caching silently will not engage on a prefix that is too short.
Worked examples
Frequently asked questions
Sources & references
- Anthropic — Prompt Caching (cache-read / cache-write multipliers)
- Anthropic — Pricing (base input/output rates)
- OpenAI — Prompt Caching guide
- Google — Gemini API pricing & context caching
The Anthropic cache multipliers and base rates were last cross-checked against Anthropic's prompt-caching and pricing pages on 2026-06-05. OpenAI and Gemini figures are published list prices that change without notice — confirm them against each provider's current pricing page before relying on a number for budgeting.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, edge case, or want to suggest an improvement?
Email me at [email protected] — most fixes ship within 24 hours.