induwara.lkinduwara.lk
induwara.lkAI · Developer

AI Prompt Caching Cost Calculator

Find out how much prompt caching cuts your LLM API bill. Enter your reused prefix size, per-request tokens, and request volume for Claude, OpenAI, or Gemini, and see the cost with and without caching, the dollar savings, and the break-even point — using each provider's official cache-write and cache-read multipliers.

By Induwara AshinsanaUpdated Jun 5, 2026
Prompt caching savings
Anthropic-verified rates

System prompt + tools + few-shot — the part reused every request.

New tokens that change each request (the user's message).

Tokens Claude or the model generates per reply.

How many requests reuse the cache before it expires.

Example workloads
Cost without caching
$36.00
Cost with caching
$9.0345
You save
$26.97
over 1,000 requests
Savings
74.9%
lower bill
Caching pays off after 2 requests — you're well past it at 1,000.

Cost breakdown (with caching)

ComponentRateCost
Cache write (first request)$3.75/1M$0.0375
Cache reads (subsequent)$0.3/1M$2.997
Fresh input$3/1M$1.50
Output$15/1M$4.50
Total with caching$9.0345

Costs are per cache window. Multiply by the number of windows per month for a monthly bill. Excludes batch discounts, image tokens, and per-tenant contract pricing.

Anthropic cache multipliers (read 0.1×, write 1.25× / 2.0×) are cross-checked against Anthropic's prompt-caching docs. OpenAI and Gemini rows are published list prices that change without notice. Rates last verified 2026-06-05. Full sources are listed below the calculator.

How it works

Prompt caching stores the unchanging front of your prompt — the system prompt, tool definitions, and few-shot examples — so repeated requests pay a fraction of the normal input price instead of re-billing the whole prefix every time. The savings depend on three things: how large the cached prefix is, how often it is reused inside one cache window, and the cache multipliers your provider charges.

Every token cost here is tokens × pricePerMillion ÷ 1,000,000. Let P be the cached prefix, F the fresh input per request, O the output per request, N the requests in the window, inP the input price and outP the output price (in $ per 1M tokens). The two scenarios are:

  • Without caching, every request pays full input price on the whole prefix: N·(P+F)·inP/1e6 + N·O·outP/1e6.
  • With caching, the first request writes the prefix once and the rest read it: a write of P·(inP·writeMult)/1e6, reads of (N−1)·P·(inP·readMult)/1e6, fresh input of N·F·inP/1e6, and output of N·O·outP/1e6.

The multipliers come from each provider's docs. Anthropic charges a cache read at 0.1× the input price and a cache write at 1.25× for the five-minute cache or 2× for the one-hour cache. OpenAI caches automatically with cached input at roughly 0.5× the input price and no write premium. Gemini bills cached tokens at a reduced rate plus an hourly storage charge per million cached tokens, which this tool adds to the cached scenario.

Savings are simply without − with, and the savings percentage is that figure divided by the un-cached cost. The break-even read count is the smallest number of requests at which caching the prefix beats paying full price for it: the smallest n where writeMult + (n−1)·readMult < n. For Anthropic this resolves to 2 requests on the five-minute cache and 3 on the one-hour cache, matching the worked examples in Anthropic's prompt-caching documentation. A warning fires when your prefix falls below the model's minimum cacheable size, because caching silently will not engage on a prefix that is too short.

Worked examples

Claude Sonnet 4.6 support bot — 5-minute cache

P=10,000 · F=500 · O=300 · N=1,000 · input $3/1M · output $15/1M

  1. Without caching, input: 1,000 × 10,500 × 3 ÷ 1e6 = $31.50
  2. Without caching, output: 1,000 × 300 × 15 ÷ 1e6 = $4.50 → total $36.00
  3. With caching, write (1.25× → $3.75/1M): 10,000 × 3.75 ÷ 1e6 = $0.0375
  4. With caching, reads (0.1× → $0.30/1M): 999 × 10,000 × 0.30 ÷ 1e6 = $2.997
  5. Fresh input $1.50 + output $4.50 → total $9.03
  6. Savings: $36.00 − $9.03 = $26.97 (about 75% lower)

Opus 4.8 break-even burst — 5-minute vs 1-hour

P=8,000 · input $5/1M · prefix-only comparison

  1. Without caching, 2 calls: 2 × 8,000 × 5 ÷ 1e6 = $0.080
  2. 5-minute cache, write (1.25× → $6.25/1M): 8,000 × 6.25 ÷ 1e6 = $0.050
  3. 5-minute cache, read (0.1× → $0.50/1M): 8,000 × 0.50 ÷ 1e6 = $0.004 → $0.054, saves at 2 calls
  4. 1-hour cache, write (2× → $10/1M): $0.080 + $0.004 = $0.084 > $0.080 → not worth it at 2
  5. 1-hour cache at 3 calls: $0.080 + 2 × $0.004 = $0.088 vs $0.120 uncached → worth it at 3

Edge case — single request, no reuse

P=10,000 · F=0 · O=0 · N=1 · Opus 4.8 · 5-minute cache

  1. Without caching: 1 × 10,000 × 5 ÷ 1e6 = $0.050
  2. With caching, write only (1.25× → $6.25/1M): 10,000 × 6.25 ÷ 1e6 = $0.0625
  3. Savings: $0.050 − $0.0625 = −$0.0125 — caching costs more
  4. One request never benefits: you pay the write premium with no reads to amortise it

Frequently asked questions

Sources & references

The Anthropic cache multipliers and base rates were last cross-checked against Anthropic's prompt-caching and pricing pages on 2026-06-05. OpenAI and Gemini figures are published list prices that change without notice — confirm them against each provider's current pricing page before relying on a number for budgeting.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.