AI Model Max Output Tokens Lookup
The max output tokens for every current LLM — Claude, GPT, Gemini, Llama — in one table. Pick a model, enter how long a reply you want, and see instantly whether it fits in a single API call or needs chunking. Sources cited, no signup.
How it works
Every large language model has two different token limits, and developers constantly confuse them. The context windowis the total budget for one request — your prompt, any attached documents, and the model's reply all share it. The max output tokens limit is a separate, smaller cap on just the completion. A model can have a one-million token context window and still refuse to write more than 64,000 tokens in a single answer. This tool is about that second number — the one that produces a max_tokens stop reason and cuts your generation off mid-sentence.
The numbers themselves are not estimated. Each model's output cap is taken from the vendor's own documentation — Anthropic's models overview, OpenAI's models reference, Google's Gemini API model list, and the Llama and Mistral model cards — and every row in the table above links straight to its source. Open-weight models (Llama, Mistral) publish no separate output cap at all: their only hard limit is the context window, so they are flagged "open" and the host you run on usually applies its own lower default.
The fit check uses four small, deterministic steps:
- Convert your length to tokens.If you enter tokens, they are used as-is. If you enter words, they are converted with OpenAI's published average of about 1.33 tokens per word:
tokens = ceil(words × 1.33). - Compare to the cap. The reply fits in one call when
requestedTokens ≤ maxOutputTokens. - Count chunks when it does not fit.
chunks = ceil(requestedTokens / maxOutputTokens)— how many calls you would split the job into. - Show headroom. The bar reads
min(100, round(requestedTokens / maxOutputTokens × 100))% of the cap used.
The "≈ words" columns invert the same ratio: words = floor(maxOutputTokens × 0.75). Only this word↔token bridge is an approximation — exact counts depend on each model's tokeniser and your specific text. The fit verdict is cross-checked two independent ways (in the token domain and the word domain) so the answer is consistent for realistic inputs. The output caps are exact, cited figures.
Worked examples
Frequently asked questions
Sources & references
- Anthropic — Models overview (per-model max output & context window)
- OpenAI — Models reference (max_output_tokens per model)
- Google — Gemini API models (outputTokenLimit per model)
- Meta — Llama models (open weights; output bounded by context)
- OpenAI — Tokens and how to count them (1 token ≈ 0.75 words)
The 24 model caps on this page were last cross-checked against the vendor documentation above on 2026-06-13. LLM limits change often; the page is reviewed when new models ship or vendors revise their docs.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Spotted an output limit that has changed, or a model I should add?
Email me at [email protected] — most updates ship within 24 hours.