AI Agent Cost Calculator
Estimate what a multi-step LLM agent really costs to run. Because every tool-calling step re-sends the whole accumulated context, cost grows faster than the step count — this tool models that correctly and prices the same workload across Claude, GPT, and Gemini, in dollars and rupees.
How it works
A single API call is easy to price: tokens in × input rate, plus tokens out × output rate. An agentis the trap. An agent solves a task in several steps, and each step calls a tool, reads the result, and decides what to do next. Because the model is stateless, every step re-sends the system prompt, the tool definitions, the original request, and the entire transcript of prior outputs and tool results. The cost of one agent run is therefore far higher than the cost of one call — and most online “API cost” calculators ignore this entirely.
Let S be the system-prompt and tool-definition tokens (re-sent every step), U the initial user-input tokens, O the average output tokens per step, T the average tool-result tokens fed back per step, and N the number of steps. At step i the request carries the fixed block, the opening request, and everything accumulated so far:
inputTokens(i) = S + U + (i − 1)·(O + T)
Summing every step of one run gives the closed form the calculator uses:
- total input = N·(S + U) + (O + T)·N·(N − 1)/2
- total output = N·O
The N·(N − 1)/2 term is the accumulation tax — the reason a longer agent loop costs disproportionately more. The calculator cross-checks this closed form against an explicit step-by-step summation so the two methods always agree to the token, and the per-step table shows the input count climbing on every row.
Cost per run is then (totalInput/1e6)·inputPrice + (totalOutput/1e6)·outputPrice; daily cost multiplies by runs per day, and monthly cost multiplies daily by 30. Rupee figures multiply the dollar cost by an editable CBSL indicative exchange rate. The “naive vs real” panel prices the same run as if every step sent only the opening S + U tokens, so you can see exactly how badly a single-call estimate under-counts.
Prompt caching (the toggle) re-prices the fixed block. The system prompt and tool definitions are written to cache once on step 1 (at 1.25× input on Anthropic) and read back on steps 2…N at roughly 10% of the input price; the opening request and the growing transcript stay at full input price. It is a conservative estimate — caching the transcript too would save more, but those savings depend on the 5-minute cache window and how fast steps follow one another. Claude rates are authoritative; GPT and Gemini rates are transcribed from the official pricing pages and dated below.
Worked examples
Frequently asked questions
Sources & references
- Anthropic — Claude API pricing (authoritative for Claude rates)
- Anthropic — Prompt caching (cache read 0.1× / write 1.25× of input)
- OpenAI — API pricing (GPT models)
- Google — Gemini API pricing
- Central Bank of Sri Lanka — indicative exchange rates (USD→LKR)
Claude rates are authoritative. GPT and Gemini rates were transcribed from the official pricing pages above and last verified on 2026-06-06; they are reviewed each quarter and whenever a provider announces a price change. The tool runs entirely in your browser — no inputs leave your device.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, a pricing change, or want another provider added?
Email me at [email protected] — most fixes ship within 24 hours.