What is TPM and RPM in OpenAI?

RPM is requests per minute — how many API calls you may start in a minute. TPM is tokens per minute — the combined input plus output tokens you may process in a minute. OpenAI rejects a call with a 429 error if it would push you over either cap, so your real ceiling is the lower of the two once you factor in tokens per request.

How do I calculate my OpenAI rate limit?

Take your tier's TPM and divide it by the tokens in one request (input plus output), rounding down — that is your token-bound requests per minute. Your effective ceiling is the smaller of that number and the RPM cap. This tool does the division for you and tells you which limit binds, for OpenAI, Anthropic, and Gemini.

Why am I getting a 429 error from the OpenAI API?

A 429 means you crossed a rate limit — usually TPM rather than RPM. Firing many calls at once, long prompts, or large max-output settings push your tokens-per-minute over the cap even when your request count looks modest. Pace requests to the effective ceiling shown above and retry with exponential backoff, honouring the Retry-After header.

How many requests per minute can I send to Claude?

Anthropic meters input and output tokens in separate buckets (ITPM and OTPM) alongside an RPM cap, so your ceiling is the smallest of RPM, ITPM ÷ input tokens, and OTPM ÷ output tokens. Output is usually the scarce bucket: on Tier 2 Claude 3.5 Sonnet with 1,000 output tokens per call, the 16,000 OTPM cap limits you to about 16 requests per minute regardless of the 1,000 RPM headline.

Does output token count toward the rate limit?

Yes. On OpenAI and Gemini, output tokens count toward the combined TPM along with input tokens. On Anthropic they count toward a separate output-tokens-per-minute (OTPM) bucket that is far smaller than the input bucket, which is why high-output workloads on Claude throttle sooner than a naive input-only estimate suggests.

What is the difference between RPM, TPM and RPD?

RPM caps requests per minute, TPM caps tokens per minute, and RPD caps requests per day. RPD binds independently of pacing — a Gemini free-tier app can sit comfortably under 15 RPM yet still stop at 1,500 requests for the day. The calculator surfaces the daily cap whenever a provider publishes one.

Are these rate-limit numbers accurate and current?

They are transcribed from each provider's official rate-limit documentation and dated 2026-06-05. Providers change tier limits without notice and grant per-account overrides on request, so treat the tables as a dated default snapshot and confirm against your own dashboard before sizing a production job. Every source links below.

How do I get the token counts to enter here?

This tool takes token counts as input rather than estimating them from text. Use the AI Token Counter or Tokens to Words Converter to measure a representative request, then enter the average input and output tokens here. A rough rule of thumb is tokens ≈ words × 1.33.

AI · Developer tools

AI API Rate Limit Calculator

Will your workload hit a 429? Pick a provider, tier, and model, enter your tokens per request, and get your real maximum requests per minute — the lower of the RPM and token caps — which limit binds, and how long a batch will take. OpenAI, Claude, and Gemini.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 5, 2026

Rate limit & throughput

Provider

Usage tier

Model

Avg input tokens / request

Prompt + context sent per call

Avg output tokens / request

Tokens the model generates per call

Workload

Requests to process (N)

Total calls in the job

Quick presets

Published tier limits

RPM500TPM200,000RPD10,000

Effective max requests / min

133

Token limit (TPM) binds first

Time to clear the batch

38m

At the effective ceiling of 133 req/min

Ceiling verified: it is feasible at this rate and infeasible one request higher.

Same workload, every provider

Provider · model	Max req/min	Binding
OpenAIselected Tier 1 · GPT-4o mini	133	TPM
Anthropic (Claude) Tier 1 · Claude 3.5 Sonnet	16	OTPM
Google Gemini Tier 1 · Gemini 2.0 Flash	2,000	RPM

How to avoid 429s

Add a client-side rate limiter that paces requests to the effective ceiling above, not to the headline RPM.
Retry 429s with exponential backoff and jitter, honouring the Retry-After header when present.
For large offline jobs, use each provider's Batch API — typically a 50% discount and a separate queue that does not eat your real-time RPM/TPM.

Sources cited: OpenAI, Anthropic, and Google Gemini rate-limit documentation, transcribed and dated 2026-06-05. Full links are in the Sources & references section below. Token counts are your input — get them from the AI Token Counter. The tool runs entirely in your browser; nothing leaves your device.

How it works

Every LLM provider enforces more than one rate limit at once, and a request is rejected with HTTP 429 the moment it would cross any of them. Your real ceiling is therefore not the headline requests-per-minute (RPM) number — it is the lower of the request cap and the token cap, once you account for how many tokens each request actually carries.

Let i be the average input tokens per request, o the average output tokens, and t = i + o the total. The calculator reads the published caps for your provider, tier, and model and applies the documented logic:

OpenAI / Gemini: tokenRpm = floor(TPM / t)
Anthropic: inputRpm = floor(ITPM / i)
Anthropic: outputRpm = floor(OTPM / o)

OpenAI and Gemini meter a single combined tokens-per-minute (TPM) pool, so the token-bound rate is the TPM divided by total tokens. Anthropic is different: it meters input and output tokens in separatebuckets — input-tokens-per-minute (ITPM) and output-tokens-per-minute (OTPM) — and the output bucket is much smaller. That is why a high-output Claude job throttles long before a naive “TPM ÷ total tokens” guess predicts.

The effective ceiling is then the smallest applicable cap:

OpenAI / Gemini: rpmEff = min(RPM, tokenRpm)
Anthropic: rpmEff = min(RPM, inputRpm, outputRpm)

The term that produced the minimum is the binding limit reported back to you. From there, a batch of N requests needs ceil(N / rpmEff) minutes of paced sending, rendered as days, hours, and minutes. A throughput target simply checks whether your desired requests-per-minute sits under the ceiling, with the headroom or overage. Where a provider also publishes a requests-per-day (RPD) cap — common on Gemini free tiers — the tool surfaces it, because RPD binds regardless of how slowly you pace. Each effective ceiling is cross-checked by an independent feasibility test: it must be achievable at that rate and impossible one request higher.

Worked examples

OpenAI Tier 1, gpt-4o-mini — the token limit binds

Caps: RPM 500, TPM 200,000. Request = 1,200 in + 400 out = 1,600 tokens.
tokenRpm = floor(200,000 / 1,600) = 125
rpmEff = min(500, 125) = 125 → TPM binds (125 < 500)
Batch of 8,000 = ceil(8,000 / 125) = 64 min = 1h 4m

Anthropic Tier 2, Claude 3.5 Sonnet — output tokens bind

Caps: RPM 1,000, ITPM 80,000, OTPM 16,000. Request = 2,000 in + 1,000 out.
inputRpm = floor(80,000 / 2,000) = 40
outputRpm = floor(16,000 / 1,000) = 16
rpmEff = min(1,000, 40, 16) = 16 → OTPM binds
Batch of 1,000 = ceil(1,000 / 16) = 63 min = 1h 3m

Gemini free tier, 2.0 Flash — RPM and the daily cap bind

Caps: RPM 15, TPM 1,000,000, RPD 1,500. Request = 4,000 in + 1,000 out = 5,000 tokens.
tokenRpm = floor(1,000,000 / 5,000) = 200
rpmEff = min(15, 200) = 15 → RPM binds
Target 30 req/min → 30 ≤ 15 is false → over cap by 15
Daily ceiling = 1,500 requests/day no matter how you pace

Edge case — one request larger than the per-minute budget

OpenAI Tier 1, gpt-4o (TPM 30,000). Request = 40,000 input tokens.
tokenRpm = floor(30,000 / 40,000) = 0
rpmEff = 0 → you cannot send even one request per minute under the cap
Fix: shorten the prompt, lower max-output, or move to a higher tier.

Frequently asked questions

Sources & references

The limit tables were transcribed from the official documentation above and last verified on 2026-06-05. They are reviewed each quarter and whenever a provider announces a change. Providers grant per-account overrides on request, so your dashboard is always the final word. The tool runs entirely in your browser — no inputs leave your device.

Related tools

LiveAI

AI Audio Token Cost Calc

Convert an audio clip's duration (or a measured audio_tokens count) into the exact audio input tokens GPT-4o-audio and Gemini bill, then price it per request and per month in USD and LKR. Gemini's fixed 32 tokens/second rule is cited; compares all four models side by side. Runs in your browser, no signup.

Open tool

LiveAI

AI Vision Token Calculator

Calculate how many tokens an image costs on GPT-4o, GPT-4o mini, Claude, and Gemini from its pixel dimensions — plus the per-image and total cost in USD and LKR, side by side. Runs entirely in your browser; the image is never uploaded.

Open tool

LiveAI

AI Max Output Tokens

Look up the maximum output (completion) tokens for every current LLM — Claude, GPT-4o, Gemini, Llama and more — and check whether your desired response fits in a single API call or needs chunking. Per-model caps cited from vendor docs, separate from the context window.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a stale limit, a bug, or want another provider added?

Email me at [email protected] — most fixes ship within 24 hours.