induwara.lk
induwara.lkAI · Developer

AI Batch API Cost Calculator

Estimate how much you save by sending a large LLM job through the asynchronous Batch API instead of the standard endpoint. Pick OpenAI, Claude, or Gemini, enter tokens per request and request count, and see standard vs batch cost — the flat 50% discount, in dollars and rupees.

By Induwara AshinsanaUpdated Jun 6, 2026
Batch vs standard cost
50% off · provider-verified
Rs

Editable. Used only for the LKR figures. Default is a CBSL indicative rate.

Average prompt tokens sent per request.

Average tokens the model generates per reply.

How many requests are in the whole batch job.

Example jobs
Standard (sync) cost
$75.00
Rs 22,875
Batch API cost
$37.50
Rs 11,437
You save
$37.50
Rs 11,437
Discount
50%
flat batch rate
Batch mode halves this bill, but the job runs asynchronously — results come back within 24 hours, not in real time. Use it for work that can wait: bulk classification, dataset labelling, content generation, and evals. Skip it for anything user-facing or that needs streaming.

Standard vs batch breakdown

ComponentRate (std → batch)StandardBatch
Input tokens$0.15 → $0.08 /1M$60.00$30.00
Output tokens$0.6 → $0.3 /1M$15.00$7.50
Total (USD)$75.00$37.50
Total (LKR)@ Rs 305/$Rs 22,875Rs 11,437

Per request: $0.0015 standard vs $0.0007 batch. Excludes prompt caching, image tokens, and any per-tenant contract pricing.

The 50% batch discount is a published term for OpenAI, Anthropic, and Google Gemini. Standard per-token prices are each provider's list prices and change without notice — last verified 2026-06-06. Full sources are listed below the calculator.

How it works

The Batch API is an asynchronous endpoint for jobs that do not need an immediate reply. You upload many requests at once; the provider processes them within a completion window of up to 24 hours and returns the results in one file. In exchange for giving up real-time latency and streaming, OpenAI, Anthropic, and Google Gemini each charge a flat 50% discount on their standard synchronous per-token prices. That discount is a published, vendor-guaranteed term, so this tool applies one multiplier of 0.5 across all three providers.

Token costs use the standard LLM formula. Let I be the input tokens per request, O the output tokens per request, N the number of requests, Pin the standard input price and Pout the standard output price (in USD per 1,000,000 tokens):

  1. inputCostStd = I ÷ 1,000,000 × Pin × N
  2. outputCostStd = O ÷ 1,000,000 × Pout × N
  3. standardTotal = inputCostStd + outputCostStd
  4. batchTotal = standardTotal × 0.5 — the 50% batch discount
  5. saving = standardTotal − batchTotal, which equals standardTotal × 0.5 — the saving is always exactly half the standard bill

The LKR figures are the USD results multiplied by an editable USD→LKR rate; they are a secondary convenience for local budgeting, not a live exchange feed. The calculator also cross-checks itself: the batch total is computed twice — once as half the standard total, and once by applying the halved per-token rates (Pin × 0.5, Pout × 0.5) directly to the token counts. Both methods agree to the cent, the same way the income-tax calculator reconciles two IRD formulas.

Input and output are priced separately because the model spends compute generating each output token, while input tokens are read once — output is almost always dearer. Because the discount is flat, the model you pick and the input/output mix change the totals but never the 50% ratio. The figures exclude prompt caching, image or audio tokens, and per-tenant contract pricing, which are handled by separate tools linked below.

Worked examples

Bulk ticket labelling — GPT-4o mini

I=8,000 · O=500 · N=50,000 · $0.15 in / $0.60 out per 1M

  1. inputCostStd = 8,000 ÷ 1e6 × 0.15 × 50,000 = 0.0012 × 50,000 = $60.00
  2. outputCostStd = 500 ÷ 1e6 × 0.60 × 50,000 = 0.0003 × 50,000 = $15.00
  3. standardTotal = $75.00
  4. batchTotal = 75.00 × 0.5 = $37.50
  5. saving = $37.50 (50%) → at Rs 305/$: standard Rs 22,875, batch Rs 11,437.50

Content generation — Claude Sonnet 4.5

I=2,000 · O=1,000 · N=10,000 · $3 in / $15 out per 1M

  1. inputCostStd = 2,000 ÷ 1e6 × 3.00 × 10,000 = 0.006 × 10,000 = $60.00
  2. outputCostStd = 1,000 ÷ 1e6 × 15.00 × 10,000 = 0.015 × 10,000 = $150.00
  3. standardTotal = $210.00
  4. batchTotal = 210.00 × 0.5 = $105.00
  5. saving = $105.00 (50%) → at Rs 305/$: standard Rs 64,050, batch Rs 32,025

Edge case — large embeddings-style run, GPT-4o

I=1,000 · O=50 · N=200,000 · $2.50 in / $10 out per 1M

  1. inputCostStd = 1,000 ÷ 1e6 × 2.50 × 200,000 = 0.0025 × 200,000 = $500.00
  2. outputCostStd = 50 ÷ 1e6 × 10.00 × 200,000 = 0.0005 × 200,000 = $100.00
  3. standardTotal = $600.00
  4. batchTotal = $300.00 ; saving = $300.00 (50%)
  5. Confirms the invariant holds at 200,000 requests with no precision drift.

Frequently asked questions

Sources & references

The 50% batch term is documented by all three providers and does not drift. Standard per-token prices are list prices that change without notice; they were last cross-checked on 2026-06-06. Confirm against each provider's current pricing page before relying on a number for a large spend.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.