AI Batch API Cost Calculator
Estimate how much you save by sending a large LLM job through the asynchronous Batch API instead of the standard endpoint. Pick OpenAI, Claude, or Gemini, enter tokens per request and request count, and see standard vs batch cost — the flat 50% discount, in dollars and rupees.
How it works
The Batch API is an asynchronous endpoint for jobs that do not need an immediate reply. You upload many requests at once; the provider processes them within a completion window of up to 24 hours and returns the results in one file. In exchange for giving up real-time latency and streaming, OpenAI, Anthropic, and Google Gemini each charge a flat 50% discount on their standard synchronous per-token prices. That discount is a published, vendor-guaranteed term, so this tool applies one multiplier of 0.5 across all three providers.
Token costs use the standard LLM formula. Let I be the input tokens per request, O the output tokens per request, N the number of requests, Pin the standard input price and Pout the standard output price (in USD per 1,000,000 tokens):
- inputCostStd = I ÷ 1,000,000 × Pin × N
- outputCostStd = O ÷ 1,000,000 × Pout × N
- standardTotal = inputCostStd + outputCostStd
- batchTotal = standardTotal × 0.5 — the 50% batch discount
- saving = standardTotal − batchTotal, which equals standardTotal × 0.5 — the saving is always exactly half the standard bill
The LKR figures are the USD results multiplied by an editable USD→LKR rate; they are a secondary convenience for local budgeting, not a live exchange feed. The calculator also cross-checks itself: the batch total is computed twice — once as half the standard total, and once by applying the halved per-token rates (Pin × 0.5, Pout × 0.5) directly to the token counts. Both methods agree to the cent, the same way the income-tax calculator reconciles two IRD formulas.
Input and output are priced separately because the model spends compute generating each output token, while input tokens are read once — output is almost always dearer. Because the discount is flat, the model you pick and the input/output mix change the totals but never the 50% ratio. The figures exclude prompt caching, image or audio tokens, and per-tenant contract pricing, which are handled by separate tools linked below.
Worked examples
Frequently asked questions
Sources & references
- OpenAI — Batch API guide (50% discount, 24-hour window)
- OpenAI — API pricing (standard per-token rates)
- Anthropic — Message Batches API (50% discount on input & output)
- Anthropic — Pricing (standard per-token rates)
- Google — Gemini API Batch Mode (50% of interactive price)
- Google — Gemini API pricing (standard per-token rates)
- Central Bank of Sri Lanka — indicative USD→LKR rate
The 50% batch term is documented by all three providers and does not drift. Standard per-token prices are list prices that change without notice; they were last cross-checked on 2026-06-06. Confirm against each provider's current pricing page before relying on a number for a large spend.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, edge case, or want to suggest an improvement?
Email me at [email protected] — most fixes ship within 24 hours.