induwara.lkinduwara.lk
induwara.lkAI · Developer tools

AI Agent Cost Calculator

Estimate what a multi-step LLM agent really costs to run. Because every tool-calling step re-sends the whole accumulated context, cost grows faster than the step count — this tool models that correctly and prices the same workload across Claude, GPT, and Gemini, in dollars and rupees.

By Induwara AshinsanaUpdated Jun 6, 2026
AI agent run cost

Grouped by provider. Price shown is input / output per 1M tokens.

Currency & caching

Tokens, re-sent every step

Tokens, the opening request

Tool-calling rounds (1–100)

Assistant tokens per step

Tokens fed back per step

Agent invocations / day

Thinking in words? Tokens ≈ words × 1.33. So a 1,500-word system prompt ≈ 1,995 tokens. Enter token counts above.

Quick presets
Cost per run
$0.09300
23,500 in · 1,500 out tokens
Daily cost
$93.00
1,000 runs/day
Monthly cost
$2,790.00
× 30 days
Caching could save
28.51%
On input cost — toggle caching on

Naive estimate vs. real cost

Naive estimate
$0.06000
Real cost / run
$0.09300
Under-count
+55%

The filled segment is the accumulation tax a single-call estimate misses. Closed-form total verified against step-by-step summation.

Per-step breakdown

StepInput tokensOutput tokensStep costCumulative
12,500300$0.01200$0.01200
23,600300$0.01530$0.02730
34,700300$0.01860$0.04590
45,800300$0.02190$0.06780
56,900300$0.02520$0.09300
Total23,5001,500$0.09300

The final (most expensive) step alone sends 6,900 input tokens — many times step 1's 2,500.

Same workload, every model

ModelPer runMonthly
Gemini 2.0 Flash
One of the cheapest hosted models. Good for simple, high-throughput agents.
$0.00295$88.50
GPT-4o mini
Among the cheapest hosted models. Tight budgets, simple tool loops.
$0.00443$132.75
GPT-5 mini
Cheaper GPT-5 tier for high-volume, lighter agent work.
$0.00888$266.25
Gemini 2.5 Flash
Fast, low-cost Gemini tier built for high-volume agents.
$0.01080$324.00
Claude Haiku 4.5
Fastest, cheapest Claude. Strong fit for high-volume, tool-light agents.
$0.03100$930.00
GPT-5
OpenAI flagship. Automatic prompt caching on repeated prefixes, billed at the cached-input rate.
$0.04438$1,331.25
Gemini 2.5 Pro
Google flagship. Listed price is the ≤200K-token-prompt tier; longer prompts cost more.
$0.04438$1,331.25
GPT-4o
Previous-generation flagship. Still widely deployed for agentic apps.
$0.07375$2,212.50
Claude Sonnet 4.6selected
Balanced speed and intelligence. 1M-token context. A common production agent default.
$0.09300$2,790.00
Claude Opus 4.8
Most capable Claude model. 1M-token context. Best for hard multi-step reasoning agents.
$0.15500$4,650.00

Sources cited: Anthropic pricing & prompt-caching docs (authoritative for Claude), OpenAI and Google Gemini pricing pages (transcribed, dated 2026-06-06), and the CBSL indicative USD→LKR rate. Full links are in the Sources & references section below. Caching figures are an estimate — v1 caches only the fixed system + tool-definition block; real savings also depend on the 5-minute cache window and your traffic pattern.

How it works

A single API call is easy to price: tokens in × input rate, plus tokens out × output rate. An agentis the trap. An agent solves a task in several steps, and each step calls a tool, reads the result, and decides what to do next. Because the model is stateless, every step re-sends the system prompt, the tool definitions, the original request, and the entire transcript of prior outputs and tool results. The cost of one agent run is therefore far higher than the cost of one call — and most online “API cost” calculators ignore this entirely.

Let S be the system-prompt and tool-definition tokens (re-sent every step), U the initial user-input tokens, O the average output tokens per step, T the average tool-result tokens fed back per step, and N the number of steps. At step i the request carries the fixed block, the opening request, and everything accumulated so far:

inputTokens(i) = S + U + (i − 1)·(O + T)

Summing every step of one run gives the closed form the calculator uses:

  • total input = N·(S + U) + (O + T)·N·(N − 1)/2
  • total output = N·O

The N·(N − 1)/2 term is the accumulation tax — the reason a longer agent loop costs disproportionately more. The calculator cross-checks this closed form against an explicit step-by-step summation so the two methods always agree to the token, and the per-step table shows the input count climbing on every row.

Cost per run is then (totalInput/1e6)·inputPrice + (totalOutput/1e6)·outputPrice; daily cost multiplies by runs per day, and monthly cost multiplies daily by 30. Rupee figures multiply the dollar cost by an editable CBSL indicative exchange rate. The “naive vs real” panel prices the same run as if every step sent only the opening S + U tokens, so you can see exactly how badly a single-call estimate under-counts.

Prompt caching (the toggle) re-prices the fixed block. The system prompt and tool definitions are written to cache once on step 1 (at 1.25× input on Anthropic) and read back on steps 2…N at roughly 10% of the input price; the opening request and the growing transcript stay at full input price. It is a conservative estimate — caching the transcript too would save more, but those savings depend on the 5-minute cache window and how fast steps follow one another. Claude rates are authoritative; GPT and Gemini rates are transcribed from the official pricing pages and dated below.

Worked examples

5-step research agent — Claude Sonnet 4.6, no caching

  1. Workload: S=2,000, U=500, O=300, T=800, N=5
  2. Total input = 5·(2,000+500) + (300+800)·5·4/2 = 12,500 + 1,100·10 = 23,500 tokens
  3. Total output = 5·300 = 1,500 tokens
  4. Per run = 23,500/1e6·$3 + 1,500/1e6·$15 = $0.0705 + $0.0225 = $0.0930
  5. Naive estimate (5 steps × first-step 2,500 in) = $0.0600 → real is +55%
  6. At 1,000 runs/day → $93.00/day → $2,790/month (≈ Rs 850,950 at Rs 305)
  7. Per-step input climbs: 2,500 / 3,600 / 4,700 / 5,800 / 6,900 (sum 23,500 ✓)

10-step agent — GPT-4o mini, no caching

  1. Workload: S=3,000, U=1,000, O=400, T=1,500, N=10
  2. Total input = 10·(3,000+1,000) + (400+1,500)·10·9/2 = 40,000 + 1,900·45 = 125,500 tokens
  3. Total output = 10·400 = 4,000 tokens
  4. Per run = 125,500/1e6·$0.15 + 4,000/1e6·$0.60 = $0.018825 + $0.0024 = $0.021225
  5. At 500 runs/day → $10.61/day → $318.38/month

Edge case — caching the fixed block on the research agent

  1. Same example-A workload on Sonnet 4.6, prompt caching ON
  2. Cache write (step 1): 2,000 tokens × $3·1.25 = $0.0075
  3. Cache reads (steps 2–5): 2,000·4 = 8,000 tokens × $0.30/1M = $0.0024
  4. Full-price input: 23,500 − 5·2,000 = 13,500 tokens × $3/1M = $0.0405
  5. Cached input cost = $0.0504 vs $0.0705 uncached → 28% less on input

Frequently asked questions

Sources & references

Claude rates are authoritative. GPT and Gemini rates were transcribed from the official pricing pages above and last verified on 2026-06-06; they are reviewed each quarter and whenever a provider announces a price change. The tool runs entirely in your browser — no inputs leave your device.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, a pricing change, or want another provider added?

Email me at [email protected] — most fixes ship within 24 hours.