Why does an AI agent cost more than the number of API calls suggests?

Because every step re-sends the whole accumulated context. Each tool call's result is fed back into the next prompt, so step 5 carries the system prompt, the original request, and all four prior outputs and tool results. Counting only the number of steps, or pricing each step as one fresh call, ignores this build-up. Total input tokens grow with the square of the step count — the gap is the accumulation tax this tool measures.

How do I estimate the monthly cost of running an AI agent at scale?

Estimate five numbers: system-prompt and tool-definition size, the opening request size, average output and tool-result tokens per step, and how many steps a run takes. The calculator multiplies out the context accumulation, prices it on your chosen model, then scales by runs per day × 30. As a worked example, a 5-step research agent on Claude Sonnet 4.6 at 1,000 runs a day comes to about $2,790 a month.

Does prompt caching reduce AI agent costs?

Yes. The fixed system prompt and tool definitions are re-sent on every step, so caching that block serves it at roughly 10% of the input price after the first write. For the default research-agent workload, caching the fixed block cuts input cost by about 28%. This tool caches only that stable block in v1; caching the growing transcript as well can save more, but real savings depend on the 5-minute cache window and how fast steps follow one another.

How much does a GPT-4o or Claude agent cost per task?

For a 5-step run with a 2,000-token prompt, 800-token tool results, and 300-token outputs, the real cost is about $0.0930 per run on Claude Sonnet 4.6, around $0.031 on GPT-4o, and under a cent on Claude Haiku 4.5 or Gemini Flash. The comparison table prices your exact workload across every model and sorts cheapest first, so you pick on numbers rather than reputation.

How do tool-call results affect LLM token usage in an agent loop?

Each tool result is appended to the running transcript and re-sent on every later step, so a single 800-token tool result is paid for many times over the life of a run, not once. Larger tool outputs (full file reads, long search results, verbose API JSON) are the most common reason an agent's bill runs higher than expected. Trimming or summarising tool results before feeding them back is the highest-leverage cost saving.

Is a single-call API cost calculator wrong for agents?

It under-counts. A single-call calculator prices one prompt and one response. An agent makes many calls where each re-sends everything before it, so the true cost is higher by the accumulation term — 55% more in the default example, and the gap widens as steps grow. Use a single-call calculator for one-shot completions; use this one for any tool-calling loop.

Are these prices accurate and current?

Claude rates are authoritative and taken from Anthropic's pricing documentation. GPT and Gemini rates are transcribed from each vendor's official pricing page and dated 2026-06-06. Provider prices change often, so every rate links to its source below and the page shows when it was last verified. Confirm against the live pricing page before committing a budget.

What is not included in this estimate?

This tool models a single linear tool-calling loop. It excludes parallel or branching tool calls and sub-agents, embeddings and vector-store retrieval, fine-tuning, image and audio tokens, batch discounts, and self-hosting or GPU costs — those have their own calculators. It also assumes fixed average output and tool-result sizes per step rather than a per-step schedule.

AI · Developer tools

AI Agent Cost Calculator

Estimate what a multi-step LLM agent really costs to run. Because every tool-calling step re-sends the whole accumulated context, cost grows faster than the step count — this tool models that correctly and prices the same workload across Claude, GPT, and Gemini, in dollars and rupees.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 6, 2026

AI agent run cost

Model

Grouped by provider. Price shown is input / output per 1M tokens.

Currency & caching

System + tool definitions

Tokens, re-sent every step

Initial user input

Tokens, the opening request

Steps per run

Tool-calling rounds (1–100)

Avg output / step

Assistant tokens per step

Avg tool result / step

Tokens fed back per step

Runs per day

Agent invocations / day

Thinking in words? Tokens ≈ words × 1.33. So a 1,500-word system prompt ≈ 1,995 tokens. Enter token counts above.

Quick presets

Cost per run

$0.09300

23,500 in · 1,500 out tokens

Daily cost

$93.00

1,000 runs/day

Monthly cost

$2,790.00

× 30 days

Caching could save

28.51%

On input cost — toggle caching on

Naive estimate vs. real cost

Naive estimate

$0.06000

Real cost / run

$0.09300

Under-count

+55%

The filled segment is the accumulation tax a single-call estimate misses. Closed-form total verified against step-by-step summation.

Per-step breakdown

Step	Input tokens	Output tokens	Step cost	Cumulative
1	2,500	300	$0.01200	$0.01200
2	3,600	300	$0.01530	$0.02730
3	4,700	300	$0.01860	$0.04590
4	5,800	300	$0.02190	$0.06780
5	6,900	300	$0.02520	$0.09300
Total	23,500	1,500	—	$0.09300

The final (most expensive) step alone sends 6,900 input tokens — many times step 1's 2,500.

Same workload, every model

Model	Per run	Monthly
Gemini 2.0 Flash One of the cheapest hosted models. Good for simple, high-throughput agents.	$0.00295	$88.50
GPT-4o mini Among the cheapest hosted models. Tight budgets, simple tool loops.	$0.00443	$132.75
GPT-5 mini Cheaper GPT-5 tier for high-volume, lighter agent work.	$0.00888	$266.25
Gemini 2.5 Flash Fast, low-cost Gemini tier built for high-volume agents.	$0.01080	$324.00
Claude Haiku 4.5 Fastest, cheapest Claude. Strong fit for high-volume, tool-light agents.	$0.03100	$930.00
GPT-5 OpenAI flagship. Automatic prompt caching on repeated prefixes, billed at the cached-input rate.	$0.04438	$1,331.25
Gemini 2.5 Pro Google flagship. Listed price is the ≤200K-token-prompt tier; longer prompts cost more.	$0.04438	$1,331.25
GPT-4o Previous-generation flagship. Still widely deployed for agentic apps.	$0.07375	$2,212.50
Claude Sonnet 4.6selected Balanced speed and intelligence. 1M-token context. A common production agent default.	$0.09300	$2,790.00
Claude Opus 4.8 Most capable Claude model. 1M-token context. Best for hard multi-step reasoning agents.	$0.15500	$4,650.00

Sources cited: Anthropic pricing & prompt-caching docs (authoritative for Claude), OpenAI and Google Gemini pricing pages (transcribed, dated 2026-06-06), and the CBSL indicative USD→LKR rate. Full links are in the Sources & references section below. Caching figures are an estimate — v1 caches only the fixed system + tool-definition block; real savings also depend on the 5-minute cache window and your traffic pattern.

How it works

A single API call is easy to price: tokens in × input rate, plus tokens out × output rate. An agentis the trap. An agent solves a task in several steps, and each step calls a tool, reads the result, and decides what to do next. Because the model is stateless, every step re-sends the system prompt, the tool definitions, the original request, and the entire transcript of prior outputs and tool results. The cost of one agent run is therefore far higher than the cost of one call — and most online “API cost” calculators ignore this entirely.

Let S be the system-prompt and tool-definition tokens (re-sent every step), U the initial user-input tokens, O the average output tokens per step, T the average tool-result tokens fed back per step, and N the number of steps. At step i the request carries the fixed block, the opening request, and everything accumulated so far:

inputTokens(i) = S + U + (i − 1)·(O + T)

Summing every step of one run gives the closed form the calculator uses:

total input = N·(S + U) + (O + T)·N·(N − 1)/2
total output = N·O

The N·(N − 1)/2 term is the accumulation tax — the reason a longer agent loop costs disproportionately more. The calculator cross-checks this closed form against an explicit step-by-step summation so the two methods always agree to the token, and the per-step table shows the input count climbing on every row.

Cost per run is then (totalInput/1e6)·inputPrice + (totalOutput/1e6)·outputPrice; daily cost multiplies by runs per day, and monthly cost multiplies daily by 30. Rupee figures multiply the dollar cost by an editable CBSL indicative exchange rate. The “naive vs real” panel prices the same run as if every step sent only the opening S + U tokens, so you can see exactly how badly a single-call estimate under-counts.

Prompt caching (the toggle) re-prices the fixed block. The system prompt and tool definitions are written to cache once on step 1 (at 1.25× input on Anthropic) and read back on steps 2…N at roughly 10% of the input price; the opening request and the growing transcript stay at full input price. It is a conservative estimate — caching the transcript too would save more, but those savings depend on the 5-minute cache window and how fast steps follow one another. Claude rates are authoritative; GPT and Gemini rates are transcribed from the official pricing pages and dated below.

Worked examples

5-step research agent — Claude Sonnet 4.6, no caching

Workload: S=2,000, U=500, O=300, T=800, N=5
Total input = 5·(2,000+500) + (300+800)·5·4/2 = 12,500 + 1,100·10 = 23,500 tokens
Total output = 5·300 = 1,500 tokens
Per run = 23,500/1e6·$3 + 1,500/1e6·$15 = $0.0705 + $0.0225 = $0.0930
Naive estimate (5 steps × first-step 2,500 in) = $0.0600 → real is +55%
At 1,000 runs/day → $93.00/day → $2,790/month (≈ Rs 850,950 at Rs 305)
Per-step input climbs: 2,500 / 3,600 / 4,700 / 5,800 / 6,900 (sum 23,500 ✓)

10-step agent — GPT-4o mini, no caching

Workload: S=3,000, U=1,000, O=400, T=1,500, N=10
Total input = 10·(3,000+1,000) + (400+1,500)·10·9/2 = 40,000 + 1,900·45 = 125,500 tokens
Total output = 10·400 = 4,000 tokens
Per run = 125,500/1e6·$0.15 + 4,000/1e6·$0.60 = $0.018825 + $0.0024 = $0.021225
At 500 runs/day → $10.61/day → $318.38/month

Edge case — caching the fixed block on the research agent

Same example-A workload on Sonnet 4.6, prompt caching ON
Cache write (step 1): 2,000 tokens × $3·1.25 = $0.0075
Cache reads (steps 2–5): 2,000·4 = 8,000 tokens × $0.30/1M = $0.0024
Full-price input: 23,500 − 5·2,000 = 13,500 tokens × $3/1M = $0.0405
Cached input cost = $0.0504 vs $0.0705 uncached → 28% less on input

Frequently asked questions

Sources & references

Claude rates are authoritative. GPT and Gemini rates were transcribed from the official pricing pages above and last verified on 2026-06-06; they are reviewed each quarter and whenever a provider announces a price change. The tool runs entirely in your browser — no inputs leave your device.

Related tools

LiveAI

AI Chatbot Cost Calculator

Estimate the monthly API cost of a multi-turn AI chatbot across Claude, GPT, and Gemini. Models the quadratic context re-sending that single-call calculators miss, with and without prompt caching, in USD and LKR.

Open tool

LiveAI

Reasoning Token Cost Calc

Estimate the true cost of reasoning-model API calls by accounting for the hidden reasoning/thinking tokens that o-series, Claude, and Gemini bill at output rates. See per-call and monthly cost in USD and LKR, plus how much more it is than a naive estimate.

Open tool

LiveAI

AI API Cost Calculator

Estimate the monthly and per-request USD and LKR bill for any major LLM API. Pick a model, enter input/output tokens and requests per month, and compare every model cheapest-first — with optional 50% batch and cached-input discounts.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, a pricing change, or want another provider added?

Email me at [email protected] — most fixes ship within 24 hours.