How much does prompt caching save?

It depends on how large your reused prefix is and how often you reuse it. A 10,000-token system prompt on Claude Sonnet 4.6 served 1,000 times in a five-minute window cuts the input bill by roughly 86% and the total bill by about 75%, because cache reads cost a tenth of the normal input price.

What is the difference between cache write and cache read pricing?

The first request writes your prefix into the cache and pays a premium — 1.25× the input price for Anthropic's five-minute cache, 2× for the one-hour cache. Every later request reads that prefix at 0.1× the input price. OpenAI caches automatically with no write premium; Gemini adds an hourly storage charge.

Is the one-hour cache TTL worth the extra cost?

Only if your traffic has gaps longer than five minutes. The one-hour cache doubles the write cost (2× input) but keeps the prefix alive across quiet periods. For steady traffic that hits the cache every few seconds, the cheaper five-minute TTL wins. The calculator shows the break-even read count for each.

How many requests does prompt caching need to break even?

For Anthropic's five-minute cache it breaks even at 2 requests (write 1.25× plus one read 0.1× = 1.35×, versus 2× uncached). The one-hour cache breaks even at 3 requests (2× plus 0.2× = 2.2×, versus 3×). Below that, the write premium costs more than caching saves.

Does prompt caching reduce OpenAI and Claude API bills?

Yes, when a large chunk of your prompt repeats across requests — a system prompt, tool definitions, or few-shot examples. Anthropic and Gemini need an explicit cache_control marker; OpenAI caches prompts over 1,024 tokens automatically. If every request is unique with no shared prefix, caching saves nothing.

Why isn't my prompt caching engaging?

The most common cause is a prefix below the model's minimum cacheable size — 2,048 tokens for Claude Sonnet 4.6, 4,096 for Opus 4.8 and Haiku 4.5. The other is a silent invalidator: a timestamp, request ID, or reordered JSON near the start of the prompt changes the bytes and breaks the prefix match.

How accurate are the prices in this calculator?

The Anthropic cache multipliers and base rates are cross-checked against Anthropic's published prompt-caching and pricing docs (last verified 2026-06-05). OpenAI and Gemini figures are list prices that change without notice — treat them as estimates and confirm against each provider's current pricing page before budgeting.

Are output tokens cached too?

No. Caching applies only to the input prefix you mark — your system prompt, tools, and examples. Output tokens are always billed at the normal output rate, and the fresh part of each request (the user's new message) is billed at the normal input rate. That is why caching helps most with a large, stable prefix.

AI · Developer

AI Prompt Caching Cost Calculator

Find out how much prompt caching cuts your LLM API bill. Enter your reused prefix size, per-request tokens, and request volume for Claude, OpenAI, or Gemini, and see the cost with and without caching, the dollar savings, and the break-even point — using each provider's official cache-write and cache-read multipliers.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 5, 2026

Prompt caching savings

Anthropic-verified rates

Provider & model

Cache lifetime (TTL)

Cached prefix tokens

System prompt + tools + few-shot — the part reused every request.

Fresh input / request

New tokens that change each request (the user's message).

Output tokens / request

Tokens Claude or the model generates per reply.

Requests in window

How many requests reuse the cache before it expires.

Example workloads

Cost without caching

$36.00

Cost with caching

$9.0345

You save

$26.97

over 1,000 requests

Savings

74.9%

lower bill

Caching pays off after 2 requests — you're well past it at 1,000.

Cost breakdown (with caching)

Component	Rate	Cost
Cache write (first request)	$3.75/1M	$0.0375
Cache reads (subsequent)	$0.3/1M	$2.997
Fresh input	$3/1M	$1.50
Output	$15/1M	$4.50
Total with caching		$9.0345

Costs are per cache window. Multiply by the number of windows per month for a monthly bill. Excludes batch discounts, image tokens, and per-tenant contract pricing.

Anthropic cache multipliers (read 0.1×, write 1.25× / 2.0×) are cross-checked against Anthropic's prompt-caching docs. OpenAI and Gemini rows are published list prices that change without notice. Rates last verified 2026-06-05. Full sources are listed below the calculator.

How it works

Prompt caching stores the unchanging front of your prompt — the system prompt, tool definitions, and few-shot examples — so repeated requests pay a fraction of the normal input price instead of re-billing the whole prefix every time. The savings depend on three things: how large the cached prefix is, how often it is reused inside one cache window, and the cache multipliers your provider charges.

Every token cost here is tokens × pricePerMillion ÷ 1,000,000. Let P be the cached prefix, F the fresh input per request, O the output per request, N the requests in the window, inP the input price and outP the output price (in $ per 1M tokens). The two scenarios are:

Without caching, every request pays full input price on the whole prefix: N·(P+F)·inP/1e6 + N·O·outP/1e6.
With caching, the first request writes the prefix once and the rest read it: a write of P·(inP·writeMult)/1e6, reads of (N−1)·P·(inP·readMult)/1e6, fresh input of N·F·inP/1e6, and output of N·O·outP/1e6.

The multipliers come from each provider's docs. Anthropic charges a cache read at 0.1× the input price and a cache write at 1.25× for the five-minute cache or 2× for the one-hour cache. OpenAI caches automatically with cached input at roughly 0.5× the input price and no write premium. Gemini bills cached tokens at a reduced rate plus an hourly storage charge per million cached tokens, which this tool adds to the cached scenario.

Savings are simply without − with, and the savings percentage is that figure divided by the un-cached cost. The break-even read count is the smallest number of requests at which caching the prefix beats paying full price for it: the smallest n where writeMult + (n−1)·readMult < n. For Anthropic this resolves to 2 requests on the five-minute cache and 3 on the one-hour cache, matching the worked examples in Anthropic's prompt-caching documentation. A warning fires when your prefix falls below the model's minimum cacheable size, because caching silently will not engage on a prefix that is too short.

Worked examples

Claude Sonnet 4.6 support bot — 5-minute cache

P=10,000 · F=500 · O=300 · N=1,000 · input $3/1M · output $15/1M

Without caching, input: 1,000 × 10,500 × 3 ÷ 1e6 = $31.50
Without caching, output: 1,000 × 300 × 15 ÷ 1e6 = $4.50 → total $36.00
With caching, write (1.25× → $3.75/1M): 10,000 × 3.75 ÷ 1e6 = $0.0375
With caching, reads (0.1× → $0.30/1M): 999 × 10,000 × 0.30 ÷ 1e6 = $2.997
Fresh input $1.50 + output $4.50 → total $9.03
Savings: $36.00 − $9.03 = $26.97 (about 75% lower)

Opus 4.8 break-even burst — 5-minute vs 1-hour

P=8,000 · input $5/1M · prefix-only comparison

Without caching, 2 calls: 2 × 8,000 × 5 ÷ 1e6 = $0.080
5-minute cache, write (1.25× → $6.25/1M): 8,000 × 6.25 ÷ 1e6 = $0.050
5-minute cache, read (0.1× → $0.50/1M): 8,000 × 0.50 ÷ 1e6 = $0.004 → $0.054, saves at 2 calls
1-hour cache, write (2× → $10/1M): $0.080 + $0.004 = $0.084 > $0.080 → not worth it at 2
1-hour cache at 3 calls: $0.080 + 2 × $0.004 = $0.088 vs $0.120 uncached → worth it at 3

Edge case — single request, no reuse

P=10,000 · F=0 · O=0 · N=1 · Opus 4.8 · 5-minute cache

Without caching: 1 × 10,000 × 5 ÷ 1e6 = $0.050
With caching, write only (1.25× → $6.25/1M): 10,000 × 6.25 ÷ 1e6 = $0.0625
Savings: $0.050 − $0.0625 = −$0.0125 — caching costs more
One request never benefits: you pay the write premium with no reads to amortise it

Frequently asked questions

Sources & references

The Anthropic cache multipliers and base rates were last cross-checked against Anthropic's prompt-caching and pricing pages on 2026-06-05. OpenAI and Gemini figures are published list prices that change without notice — confirm them against each provider's current pricing page before relying on a number for budgeting.

Related tools

LiveAI

Batch API Cost Calc

Estimate the 50% savings from running large OpenAI, Claude, and Gemini jobs through the asynchronous Batch API instead of standard synchronous calls. Compare standard vs batch cost in USD and LKR, with per-token math cited to each provider's docs.

Open tool

LiveAI

AI API Cost Calculator

Estimate the monthly and per-request USD and LKR bill for any major LLM API. Pick a model, enter input/output tokens and requests per month, and compare every model cheapest-first — with optional 50% batch and cached-input discounts.

Open tool

LiveAI

Reasoning Token Cost Calc

Estimate the true cost of reasoning-model API calls by accounting for the hidden reasoning/thinking tokens that o-series, Claude, and Gemini bill at output rates. See per-call and monthly cost in USD and LKR, plus how much more it is than a naive estimate.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.