induwara.lk
induwara.lkAI · Developer reference

AI Model Max Output Tokens Lookup

The max output tokens for every current LLM — Claude, GPT, Gemini, Llama — in one table. Pick a model, enter how long a reply you want, and see instantly whether it fits in a single API call or needs chunking. Sources cited, no signup.

By Induwara AshinsanaUpdated Jun 13, 2026
Will my reply fit in one call?Claude Opus 4.8
Limits verified 2026-06-13

Max output: 128,000 tokens · context 1M.

Converted at ~1.33 tokens per word (OpenAI average).

Try
Verdict
Fits in one call
The whole reply can be generated in a single API call.
Requested length
2,660 tok
≈ 2,000 words
Model max output
128,000 tok
≈ 96,000 words per call
Of the cap used
2%
125,340 tokens to spare
Output cap used2,660 / 128,000 tokens

2% of Claude Opus 4.8's single-call output cap.

Max output tokens by model

Sort

24 of 24 models can return your requested length in a single call.

ModelMax output≈ wordsContextYour requestSource
Claude Opus 4.8
Anthropic
128,00096,0001M2%docs
Claude Opus 4.7
Anthropic
128,00096,0001M2%docs
Claude Opus 4.6
Anthropic
128,00096,0001M2%docs
Claude Sonnet 4.6
Anthropic
64,00048,0001M4%docs
Claude Haiku 4.5
Anthropic
64,00048,000200K4%docs
GPT-5
OpenAI
128,00096,000400K2%docs
OpenAI o3
OpenAI
100,00075,000200K3%docs
OpenAI o4-mini
OpenAI
100,00075,000200K3%docs
GPT-4.1
OpenAI
32,76824,5761M8%docs
GPT-4.1 mini
OpenAI
32,76824,5761M8%docs
GPT-4o
OpenAI
16,38412,288128K16%docs
GPT-4o mini
OpenAI
16,38412,288128K16%docs
GPT-4 Turbo
OpenAI
4,0963,072128K65%docs
GPT-3.5 Turbo
OpenAI
4,0963,07216K65%docs
Gemini 2.5 Pro
Google
65,53649,1521M4%docs
Gemini 2.5 Flash
Google
65,53649,1521M4%docs
Gemini 2.0 Flash
Google
8,1926,1441M32%docs
Gemini 1.5 Pro
Google
8,1926,1442M32%docs
Gemini 1.5 Flash
Google
8,1926,1441M32%docs
Llama 4 Maverickopen
Meta
1,000,000750,0001M0%docs
Llama 3.3 70Bopen
Meta
128,00096,000128K2%docs
DeepSeek-V3
DeepSeek
8,1926,144128K32%docs
DeepSeek-R1
DeepSeek
8,1926,144128K32%docs
Mistral Large 2open
Mistral
128,00096,000128K2%docs

24models listed. "open" = open-weight model with no separate output cap; its limit is the context window and the host may cap it lower.

Max-output caps are vendor-documented values verified on 2026-06-13 (each row links its source below). Only the word↔token bridge is an average (1 token ≈ 0.75 words ≈ 4 characters; 1 word ≈ 1.33 tokens, OpenAI guidance). Max output is the single-call completion cap — separate from, and smaller than, the context window.

How it works

Every large language model has two different token limits, and developers constantly confuse them. The context windowis the total budget for one request — your prompt, any attached documents, and the model's reply all share it. The max output tokens limit is a separate, smaller cap on just the completion. A model can have a one-million token context window and still refuse to write more than 64,000 tokens in a single answer. This tool is about that second number — the one that produces a max_tokens stop reason and cuts your generation off mid-sentence.

The numbers themselves are not estimated. Each model's output cap is taken from the vendor's own documentation — Anthropic's models overview, OpenAI's models reference, Google's Gemini API model list, and the Llama and Mistral model cards — and every row in the table above links straight to its source. Open-weight models (Llama, Mistral) publish no separate output cap at all: their only hard limit is the context window, so they are flagged "open" and the host you run on usually applies its own lower default.

The fit check uses four small, deterministic steps:

  1. Convert your length to tokens.If you enter tokens, they are used as-is. If you enter words, they are converted with OpenAI's published average of about 1.33 tokens per word: tokens = ceil(words × 1.33).
  2. Compare to the cap. The reply fits in one call when requestedTokens ≤ maxOutputTokens.
  3. Count chunks when it does not fit. chunks = ceil(requestedTokens / maxOutputTokens) — how many calls you would split the job into.
  4. Show headroom. The bar reads min(100, round(requestedTokens / maxOutputTokens × 100))% of the cap used.

The "≈ words" columns invert the same ratio: words = floor(maxOutputTokens × 0.75). Only this word↔token bridge is an approximation — exact counts depend on each model's tokeniser and your specific text. The fit verdict is cross-checked two independent ways (in the token domain and the word domain) so the answer is consistent for realistic inputs. The output caps are exact, cited figures.

Worked examples

5,000-word article on GPT-4o

Fits in one call

  1. Tokens: ceil(5,000 × 1.33) = 6,650
  2. GPT-4o max output: 16,384 tokens
  3. 6,650 ≤ 16,384 → fits in one call
  4. Headroom: round(6,650 / 16,384 × 100) = 41% used
  5. Spare: 16,384 − 6,650 = 9,734 tokens left

200-page book (~100,000 words) on Claude Opus 4.8

Too long — needs 2 calls

  1. Tokens: ceil(100,000 × 1.33) = 133,000
  2. Opus 4.8 max output: 128,000 tokens
  3. 133,000 > 128,000 → does not fit
  4. Chunks: ceil(133,000 / 128,000) = 2
  5. Even the highest-output model needs ≥ 2 calls — chunk it.

At the boundary: 64,000 tokens on Claude Sonnet 4.6

Fits exactly

  1. Sonnet 4.6 max output: 64,000 tokens
  2. Request 64,000 tokens: 64,000 ≤ 64,000 → fits (boundary inclusive)
  3. Headroom: 100% used — zero tokens to spare
  4. Request 65,000 tokens: 65,000 > 64,000 → ceil(65,000 / 64,000) = 2 calls

Frequently asked questions

Sources & references

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Spotted an output limit that has changed, or a model I should add?

Email me at [email protected] — most updates ship within 24 hours.