How much cheaper is the OpenAI Batch API?

Exactly 50%. OpenAI's Batch API charges half the standard synchronous per-token price for both input and output, in return for a completion window of up to 24 hours. A job that costs $200 at the standard endpoint costs $100 through Batch — the saving is always half the standard bill.

Does Anthropic's Message Batches API give a discount?

Yes — Anthropic's Message Batches API applies a 50% discount to both input and output tokens versus standard pricing, with batches processed within 24 hours. The discount stacks with the model you pick, so a Claude Sonnet 4.5 batch job is half the price of the same job run synchronously.

Is Gemini batch mode 50% off?

Yes. Google's Gemini API Batch Mode is priced at 50% of the standard interactive price for the same model, with results returned within 24 hours. The discount applies to both input and output tokens, matching OpenAI and Anthropic, which is why this calculator uses one flat multiplier for all three.

When should I use the Batch API instead of real-time calls?

Use batch for any large job that does not need an immediate answer: bulk classification, dataset labelling, embeddings prep, content generation, and evals. Use the standard endpoint for anything user-facing, interactive, or that needs streaming. If a job can tolerate a wait of up to 24 hours, batch halves the bill.

How long does a batch job take to finish?

All three providers quote a completion window of up to 24 hours. Many jobs finish far sooner — often minutes for small batches — but the contract is "within 24 hours," and there is no streaming. Plan your pipeline around the 24-hour ceiling rather than the best case.

Does the batch discount change my output quality?

No. The Batch API runs the same model weights as the synchronous endpoint — the only differences are turnaround time, separate rate limits, and the 50% price. Output quality is identical; you trade latency for cost.

Why is the saving always exactly half the standard cost?

Because the discount is a flat 50% on every token, the batch total is the standard total × 0.5, so the amount you save (standard − batch) also equals the standard total × 0.5. Input/output mix and request count change the totals but never the 50% ratio.

Are these prices accurate for Sri Lankan developers billing in USD?

LLM APIs bill in USD regardless of where you are, so the dollar figures apply directly. The LKR column converts at an editable rate (default a CBSL indicative figure) for local budgeting only. Standard per-token prices were last verified 2026-06-06; confirm against each provider's pricing page before committing a large spend.

AI · Developer

AI Batch API Cost Calculator

Estimate how much you save by sending a large LLM job through the asynchronous Batch API instead of the standard endpoint. Pick OpenAI, Claude, or Gemini, enter tokens per request and request count, and see standard vs batch cost — the flat 50% discount, in dollars and rupees.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 6, 2026

Batch vs standard cost

50% off · provider-verified

Provider & model

USD → LKR rate

Editable. Used only for the LKR figures. Default is a CBSL indicative rate.

Input tokens / request

Average prompt tokens sent per request.

Output tokens / request

Average tokens the model generates per reply.

Number of requests

How many requests are in the whole batch job.

Example jobs

Standard (sync) cost

$75.00

Rs 22,875

Batch API cost

$37.50

Rs 11,437

You save

$37.50

Rs 11,437

Discount

50%

flat batch rate

Batch mode halves this bill, but the job runs asynchronously — results come back within 24 hours, not in real time. Use it for work that can wait: bulk classification, dataset labelling, content generation, and evals. Skip it for anything user-facing or that needs streaming.

Standard vs batch breakdown

Component	Rate (std → batch)	Standard	Batch
Input tokens	$0.15 → $0.08 /1M	$60.00	$30.00
Output tokens	$0.6 → $0.3 /1M	$15.00	$7.50
Total (USD)	—	$75.00	$37.50
Total (LKR)	@ Rs 305/$	Rs 22,875	Rs 11,437

Per request: $0.0015 standard vs $0.0007 batch. Excludes prompt caching, image tokens, and any per-tenant contract pricing.

The 50% batch discount is a published term for OpenAI, Anthropic, and Google Gemini. Standard per-token prices are each provider's list prices and change without notice — last verified 2026-06-06. Full sources are listed below the calculator.

How it works

The Batch API is an asynchronous endpoint for jobs that do not need an immediate reply. You upload many requests at once; the provider processes them within a completion window of up to 24 hours and returns the results in one file. In exchange for giving up real-time latency and streaming, OpenAI, Anthropic, and Google Gemini each charge a flat 50% discount on their standard synchronous per-token prices. That discount is a published, vendor-guaranteed term, so this tool applies one multiplier of 0.5 across all three providers.

Token costs use the standard LLM formula. Let I be the input tokens per request, O the output tokens per request, N the number of requests, Pin the standard input price and Pout the standard output price (in USD per 1,000,000 tokens):

inputCostStd = I ÷ 1,000,000 × Pin × N
outputCostStd = O ÷ 1,000,000 × Pout × N
standardTotal = inputCostStd + outputCostStd
batchTotal = standardTotal × 0.5 — the 50% batch discount
saving = standardTotal − batchTotal, which equals standardTotal × 0.5 — the saving is always exactly half the standard bill

The LKR figures are the USD results multiplied by an editable USD→LKR rate; they are a secondary convenience for local budgeting, not a live exchange feed. The calculator also cross-checks itself: the batch total is computed twice — once as half the standard total, and once by applying the halved per-token rates (Pin × 0.5, Pout × 0.5) directly to the token counts. Both methods agree to the cent, the same way the income-tax calculator reconciles two IRD formulas.

Input and output are priced separately because the model spends compute generating each output token, while input tokens are read once — output is almost always dearer. Because the discount is flat, the model you pick and the input/output mix change the totals but never the 50% ratio. The figures exclude prompt caching, image or audio tokens, and per-tenant contract pricing, which are handled by separate tools linked below.

Worked examples

Bulk ticket labelling — GPT-4o mini

I=8,000 · O=500 · N=50,000 · $0.15 in / $0.60 out per 1M

inputCostStd = 8,000 ÷ 1e6 × 0.15 × 50,000 = 0.0012 × 50,000 = $60.00
outputCostStd = 500 ÷ 1e6 × 0.60 × 50,000 = 0.0003 × 50,000 = $15.00
standardTotal = $75.00
batchTotal = 75.00 × 0.5 = $37.50
saving = $37.50 (50%) → at Rs 305/$: standard Rs 22,875, batch Rs 11,437.50

Content generation — Claude Sonnet 4.5

I=2,000 · O=1,000 · N=10,000 · $3 in / $15 out per 1M

inputCostStd = 2,000 ÷ 1e6 × 3.00 × 10,000 = 0.006 × 10,000 = $60.00
outputCostStd = 1,000 ÷ 1e6 × 15.00 × 10,000 = 0.015 × 10,000 = $150.00
standardTotal = $210.00
batchTotal = 210.00 × 0.5 = $105.00
saving = $105.00 (50%) → at Rs 305/$: standard Rs 64,050, batch Rs 32,025

Edge case — large embeddings-style run, GPT-4o

I=1,000 · O=50 · N=200,000 · $2.50 in / $10 out per 1M

inputCostStd = 1,000 ÷ 1e6 × 2.50 × 200,000 = 0.0025 × 200,000 = $500.00
outputCostStd = 50 ÷ 1e6 × 10.00 × 200,000 = 0.0005 × 200,000 = $100.00
standardTotal = $600.00
batchTotal = $300.00 ; saving = $300.00 (50%)
Confirms the invariant holds at 200,000 requests with no precision drift.

Frequently asked questions

Sources & references

The 50% batch term is documented by all three providers and does not drift. Standard per-token prices are list prices that change without notice; they were last cross-checked on 2026-06-06. Confirm against each provider's current pricing page before relying on a number for a large spend.

Related tools

LiveAI

Prompt Caching Calculator

Calculate how much prompt caching saves on your LLM API bill. Compare cost with vs without caching, the dollar savings, and the break-even point for Claude, OpenAI, and Gemini using each provider's official cache-write and cache-read multipliers.

Open tool

LiveAI

AI API Cost Calculator

Estimate the monthly and per-request USD and LKR bill for any major LLM API. Pick a model, enter input/output tokens and requests per month, and compare every model cheapest-first — with optional 50% batch and cached-input discounts.

Open tool

LiveAI

Reasoning Token Cost Calc

Estimate the true cost of reasoning-model API calls by accounting for the hidden reasoning/thinking tokens that o-series, Claude, and Gemini bill at output rates. See per-call and monthly cost in USD and LKR, plus how much more it is than a naive estimate.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.