How much does it cost to run an AI chatbot per month?

It depends on the model, prompt size, conversation length, and volume. As a worked example: a 500-token system prompt, 8-turn conversations averaging 80-token questions and 200-token replies, at 3,000 conversations a month, costs about $61 on Claude Haiku 4.5 and about $307 on Claude Opus 4.8. Enter your own numbers above for a figure tailored to your bot.

Why does my LLM API bill grow faster than my message count?

Because every turn re-sends the whole conversation so far. Turn 1 sends the system prompt plus one message; turn 8 sends the system prompt plus all seven previous exchanges plus the new message. Total input tokens grow with the square of the turn count, not linearly. Doubling conversation length roughly quadruples the per-conversation input cost — the single biggest surprise in chatbot budgeting.

Does prompt caching reduce chatbot costs?

Yes, often substantially. Caching serves the stable prefix (system prompt plus prior transcript) at roughly 10% of the input price, so only each turn's new content is billed at full rate. In the support-bot example, caching cuts input cost by about 63%. Real savings depend on the cache's time-to-live (5 minutes on Anthropic) and how quickly users reply, so the toggle here is labelled an estimate.

Is Claude Haiku or GPT cheaper for a support bot?

For a high-volume support bot the cheapest seeded models are Gemini 2.0 Flash and GPT-4o mini, followed by Claude Haiku 4.5 — all in the $1/$5-per-1M-token range or below. The comparison table prices your exact workload across every model and sorts cheapest first, so you can see the trade-off rather than guess.

How do I estimate OpenAI or Anthropic API costs before building?

Estimate four numbers: your system-prompt size in tokens, the average user message and AI reply length, and how many turns a typical conversation runs. Multiply out the context re-sending (this tool does it for you), then multiply by your monthly conversation count. Tokens are roughly words × 1.33, so a 60-word message is about 80 tokens.

Are these prices accurate and current?

Claude rates are authoritative and taken from Anthropic's pricing documentation. GPT and Gemini rates are transcribed from each vendor's official pricing page and dated 2026-06-05. Provider prices change often, so every rate links to its source below and the page shows when it was last verified. Always confirm against the live pricing page before committing a budget.

What is not included in this estimate?

This tool models conversational text chat only. It excludes fine-tuning, embeddings, image and audio tokens, batch discounts, tool-call overhead, and self-hosting or GPU costs. Those are covered by separate calculators. It also assumes a fixed average message size rather than modelling each real message.

Why convert to Sri Lankan rupees?

Most founders here budget and raise in rupees even though the API is billed in US dollars. The rupee figures use an editable CBSL indicative exchange rate so you can match your card's actual conversion, and the dollar figure stays visible alongside.

AI · Developer tools

AI Chatbot Conversation Cost Calculator

Estimate what a multi-turn AI chatbot really costs to run. Because every turn re-sends the whole conversation, cost grows with the square of conversation length — this tool models that correctly and prices the same workload across Claude, GPT, and Gemini, in dollars and rupees.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 5, 2026

Chatbot monthly cost

Model

Grouped by provider. Price shown is input / output per 1M tokens.

Currency & caching

System prompt

Tokens, sent every turn

Avg user message

Tokens per user turn

Avg AI reply

Tokens per reply

Turns / conversation

Back-and-forth rounds

Conversations / month

Total monthly chats

Thinking in words? Tokens ≈ words × 1.33. So a 60-word message ≈ 80 tokens. Enter token counts above.

Quick presets

Cost per conversation

$0.02048

Estimated monthly cost

$61.44

Input tokens / conversation

12,480

Output: 1,600 tokens

Caching could save

62.95%

On input cost — toggle caching on

Why input dominates

Input: 12,480 tokens (88.64%)Output: 1,600 tokens

The final (most expensive) turn alone sends 2,540 input tokens. Closed-form total verified against turn-by-turn summation.

Cost per conversation vs. turns

Same prompt and message sizes, more turns. The curve bends upward — cost grows faster than the turn count because context re-sending compounds.

Turns per conversation

Same workload, every model

Model	Per chat	Monthly
Gemini 2.0 Flash One of the cheapest hosted models. Good for simple, high-throughput bots.	$0.00189	$5.66
GPT-4o mini Among the cheapest hosted chat models. Tight budgets, simple intents.	$0.00283	$8.50
GPT-5 mini Cheaper GPT-5 tier for high-volume, lighter conversational work.	$0.00632	$18.96
Gemini 2.5 Flash Fast, low-cost Gemini tier built for high-volume chat.	$0.00774	$23.23
Claude Haiku 4.5selected Fastest, cheapest Claude. Strong fit for high-volume support and FAQ bots.	$0.02048	$61.44
GPT-5 OpenAI flagship. Automatic prompt caching on repeated prefixes, billed at the cached-input rate.	$0.03160	$94.80
Gemini 2.5 Pro Google flagship. Listed price is the ≤200K-token-prompt tier; longer prompts cost more.	$0.03160	$94.80
GPT-4o Previous-generation flagship. Still widely deployed for chat.	$0.04720	$141.60
Claude Sonnet 4.6 Balanced speed and intelligence. 1M-token context. A common production default.	$0.06144	$184.32
Claude Opus 4.8 Most capable Claude model. 1M-token context. Best for hard reasoning; overkill for a simple support bot.	$0.10240	$307.20

Sources cited: Anthropic pricing & prompt-caching docs (authoritative for Claude), OpenAI and Google Gemini pricing pages (transcribed, dated 2026-06-05), and the CBSL indicative USD→LKR rate. Full links are in the Sources & references section below. Caching figures are an estimate — real savings depend on the 5-minute cache window and your traffic pattern.

How it works

A single API call is easy to price: tokens in × input rate, plus tokens out × output rate. A conversationis the trap. Chat models are stateless, so to remember what was said your app re-sends the system prompt and the entire transcript on every single turn. The cost of one conversation is therefore far higher than the cost of one message — and most online “API cost” calculators ignore this entirely.

Let s be the system-prompt tokens, u the average user-message tokens, a the average AI-reply tokens, and N the number of turns. At turn t the request carries the system prompt, the whole transcript so far, and the new user message:

inputTokens(t) = s + (t − 1)·(u + a) + u

Summing every turn of one conversation gives the closed form the calculator uses:

total input = N·s + N·u + (u + a)·N·(N − 1)/2
total output = N·a

The N·(N − 1)/2 term is the quadratic blow-up — the reason a longer conversation costs disproportionately more. The calculator cross-checks this closed form against an explicit turn-by-turn summation so the two methods always agree to the token.

Cost per conversation is then (totalInput/1e6)·inputPrice + (totalOutput/1e6)·outputPrice, and the monthly bill is that figure times your conversation volume. Rupee figures multiply the dollar cost by an editable CBSL indicative exchange rate.

Prompt caching(the toggle) re-prices the stable prefix. The system prompt and prior transcript are served as cache reads at roughly 10% of the input price; only each turn's new user message is billed at full price, plus a one-time cache write at 1.25× input for newly added content (Anthropic's published model — OpenAI and Gemini auto-cache reads at their own published cached-input rate with no separate write fee). It is an estimate: real savings depend on the 5-minute cache window and how quickly users reply. Claude rates are authoritative; GPT and Gemini rates are transcribed from the official pricing pages and dated below.

Worked examples

Shared workload: a 500-token system prompt, 80-token user messages, 200-token replies, 8 turns per conversation, 3,000 conversations a month, at Rs 305 per US dollar.

Claude Haiku 4.5 — the budget support bot

Total input = 8·500 + 8·80 + (280)·8·7/2 = 4,000 + 640 + 7,840 = 12,480 tokens
Total output = 8·200 = 1,600 tokens
Per conversation = 12,480/1e6·$1 + 1,600/1e6·$5 = $0.01248 + $0.00800 = $0.02048
Monthly = $0.02048 × 3,000 = $61.44 (≈ Rs 18,739)

Claude Opus 4.8 — the same bot on the flagship model

Tokens are identical: 12,480 input, 1,600 output
Per conversation = 12,480/1e6·$5 + 1,600/1e6·$25 = $0.06240 + $0.04000 = $0.10240
Monthly = $0.10240 × 3,000 = $307.20 (≈ Rs 93,696)
Same workload, 5× the rates → 5× the bill. That is the model-choice decision in one line.

Edge case — a 20-turn conversation shows the quadratic blow-up

Keep s=500, u=80, a=200 but raise N to 20
Total input = 20·500 + 20·80 + 280·20·19/2 = 10,000 + 1,600 + 53,200 = 64,800 tokens
On Haiku: per conversation = 64,800/1e6·$1 + 4,000/1e6·$5 = $0.0648 + $0.0200 = $0.0848
2.5× the turns (8 → 20) but ~4× the cost ($0.0205 → $0.0848) — context re-sending compounds.

Frequently asked questions

Sources & references

Claude rates are authoritative. GPT and Gemini rates were transcribed from the official pricing pages above and last verified on 2026-06-05; they are reviewed each quarter and whenever a provider announces a price change. The tool runs entirely in your browser — no inputs leave your device.

Related tools

LiveAI

AI Agent Cost Calculator

Estimate the real per-run, daily, and monthly cost of a multi-step LLM agent across Claude, GPT, and Gemini. Models the context accumulation single-call calculators miss — each tool result is re-sent every step — with caching, in USD and LKR.

Open tool

LiveAI

Reasoning Token Cost Calc

Estimate the true cost of reasoning-model API calls by accounting for the hidden reasoning/thinking tokens that o-series, Claude, and Gemini bill at output rates. See per-call and monthly cost in USD and LKR, plus how much more it is than a naive estimate.

Open tool

LiveAI

AI Web Search Cost

Estimate the true monthly cost of LLM web search — the per-search/grounding tool fee plus the retrieved-content input tokens — across Claude, GPT, Gemini and Sonar, with a cheapest-provider comparison.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, a pricing change, or want another provider added?

Email me at [email protected] — most fixes ship within 24 hours.