induwara.lk
induwara.lkAI · Developer tools

AI API Rate Limit Calculator

Will your workload hit a 429? Pick a provider, tier, and model, enter your tokens per request, and get your real maximum requests per minute — the lower of the RPM and token caps — which limit binds, and how long a batch will take. OpenAI, Claude, and Gemini.

By Induwara AshinsanaUpdated Jun 5, 2026
Rate limit & throughput

Prompt + context sent per call

Tokens the model generates per call

Workload

Total calls in the job

Quick presets

Published tier limits

RPM500TPM200,000RPD10,000
Effective max requests / min
133
Token limit (TPM) binds first
Time to clear the batch
38m
At the effective ceiling of 133 req/min

Ceiling verified: it is feasible at this rate and infeasible one request higher.

Same workload, every provider

Provider · modelMax req/minBinding
OpenAIselected
Tier 1 · GPT-4o mini
133TPM
Anthropic (Claude)
Tier 1 · Claude 3.5 Sonnet
16OTPM
Google Gemini
Tier 1 · Gemini 2.0 Flash
2,000RPM

How to avoid 429s

  • Add a client-side rate limiter that paces requests to the effective ceiling above, not to the headline RPM.
  • Retry 429s with exponential backoff and jitter, honouring the Retry-After header when present.
  • For large offline jobs, use each provider's Batch API — typically a 50% discount and a separate queue that does not eat your real-time RPM/TPM.

Sources cited: OpenAI, Anthropic, and Google Gemini rate-limit documentation, transcribed and dated 2026-06-05. Full links are in the Sources & references section below. Token counts are your input — get them from the AI Token Counter. The tool runs entirely in your browser; nothing leaves your device.

How it works

Every LLM provider enforces more than one rate limit at once, and a request is rejected with HTTP 429 the moment it would cross any of them. Your real ceiling is therefore not the headline requests-per-minute (RPM) number — it is the lower of the request cap and the token cap, once you account for how many tokens each request actually carries.

Let i be the average input tokens per request, o the average output tokens, and t = i + o the total. The calculator reads the published caps for your provider, tier, and model and applies the documented logic:

  • OpenAI / Gemini: tokenRpm = floor(TPM / t)
  • Anthropic: inputRpm = floor(ITPM / i)
  • Anthropic: outputRpm = floor(OTPM / o)

OpenAI and Gemini meter a single combined tokens-per-minute (TPM) pool, so the token-bound rate is the TPM divided by total tokens. Anthropic is different: it meters input and output tokens in separatebuckets — input-tokens-per-minute (ITPM) and output-tokens-per-minute (OTPM) — and the output bucket is much smaller. That is why a high-output Claude job throttles long before a naive “TPM ÷ total tokens” guess predicts.

The effective ceiling is then the smallest applicable cap:

  • OpenAI / Gemini: rpmEff = min(RPM, tokenRpm)
  • Anthropic: rpmEff = min(RPM, inputRpm, outputRpm)

The term that produced the minimum is the binding limit reported back to you. From there, a batch of N requests needs ceil(N / rpmEff) minutes of paced sending, rendered as days, hours, and minutes. A throughput target simply checks whether your desired requests-per-minute sits under the ceiling, with the headroom or overage. Where a provider also publishes a requests-per-day (RPD) cap — common on Gemini free tiers — the tool surfaces it, because RPD binds regardless of how slowly you pace. Each effective ceiling is cross-checked by an independent feasibility test: it must be achievable at that rate and impossible one request higher.

Worked examples

OpenAI Tier 1, gpt-4o-mini — the token limit binds

  1. Caps: RPM 500, TPM 200,000. Request = 1,200 in + 400 out = 1,600 tokens.
  2. tokenRpm = floor(200,000 / 1,600) = 125
  3. rpmEff = min(500, 125) = 125 → TPM binds (125 < 500)
  4. Batch of 8,000 = ceil(8,000 / 125) = 64 min = 1h 4m

Anthropic Tier 2, Claude 3.5 Sonnet — output tokens bind

  1. Caps: RPM 1,000, ITPM 80,000, OTPM 16,000. Request = 2,000 in + 1,000 out.
  2. inputRpm = floor(80,000 / 2,000) = 40
  3. outputRpm = floor(16,000 / 1,000) = 16
  4. rpmEff = min(1,000, 40, 16) = 16 → OTPM binds
  5. Batch of 1,000 = ceil(1,000 / 16) = 63 min = 1h 3m

Gemini free tier, 2.0 Flash — RPM and the daily cap bind

  1. Caps: RPM 15, TPM 1,000,000, RPD 1,500. Request = 4,000 in + 1,000 out = 5,000 tokens.
  2. tokenRpm = floor(1,000,000 / 5,000) = 200
  3. rpmEff = min(15, 200) = 15 → RPM binds
  4. Target 30 req/min → 30 ≤ 15 is false → over cap by 15
  5. Daily ceiling = 1,500 requests/day no matter how you pace

Edge case — one request larger than the per-minute budget

  1. OpenAI Tier 1, gpt-4o (TPM 30,000). Request = 40,000 input tokens.
  2. tokenRpm = floor(30,000 / 40,000) = 0
  3. rpmEff = 0 → you cannot send even one request per minute under the cap
  4. Fix: shorten the prompt, lower max-output, or move to a higher tier.

Frequently asked questions

Sources & references

The limit tables were transcribed from the official documentation above and last verified on 2026-06-05. They are reviewed each quarter and whenever a provider announces a change. Providers grant per-account overrides on request, so your dashboard is always the final word. The tool runs entirely in your browser — no inputs leave your device.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a stale limit, a bug, or want another provider added?

Email me at [email protected] — most fixes ship within 24 hours.