How many pages fit in a 128K token context window?

About 192 A4 pages of English prose. A 128,000-token window holds roughly 96,000 words (128,000 × 0.75), and a typical single-spaced A4 page is about 500 words, so 96,000 ÷ 500 ≈ 192 pages. Code is denser, so a code file fits fewer lines per token.

How many words is Claude's 200K context window?

Around 150,000 words. Using the common estimate of 0.75 words per token, a 200,000-token window holds about 200,000 × 0.75 = 150,000 words, or roughly 300 A4 pages. That budget is shared between your input and the model's reply, so reserve room for the answer.

How big is Gemini's context window in pages?

Gemini 2.5 Pro's 1,000,000-token window holds about 750,000 words, which is roughly 1,500 A4 pages of prose. Gemini 1.5 Pro's 2,000,000-token window is about double that — near 3,000 pages. These are estimates; exact counts depend on the tokeniser and the text.

How many tokens does a page of text use?

About 667 tokens for a single-spaced A4 page of English prose (~500 words ÷ 0.75 words per token). Dense academic or formatted text runs higher; double-spaced or sparse pages run lower. This tool uses the 500-words-per-page average for its page estimates.

Will my document fit in the model's context window?

Estimate your document in tokens (words ÷ 0.75 is a good start), add the tokens you want to reserve for the reply, and compare to the model's window. If input plus reserved reply is under the window, it fits. This calculator does that for you and shows the percentage used.

Is the token estimate exact?

No — it is an average. Real tokenisation depends on the model's tokeniser, language, formatting and rare words. For an exact count of specific text, paste it into the AI Token Counter. This tool is a capacity planner for when you only know the rough size of your input.

Does the context window include the model's output?

For most models the window is a shared budget: your prompt and the generated reply both consume it. That is why this tool lets you reserve output tokens. A few APIs publish separate input and output limits; when in doubt, reserve generously so the reply is not cut off.

Why do code and prose give different token counts?

Code has more punctuation, symbols and short tokens, so it averages about 3.3 characters per token versus roughly 4 for English prose. Switching the content type to Code applies the denser ratio, and lines of code are estimated at about 10 tokens each.

When were these context-window sizes last verified?

The model windows and token ratios on this page were last cross-checked against the vendors' documentation on 2026-06-09. Open-model windows (Llama, DeepSeek, Mistral) change per release, so treat those as the value verified on that date rather than a live figure.

AI · Developer tools

AI Context Window Calculator

Pick a model — GPT, Claude, Gemini, Llama or DeepSeek — enter your text in words, pages, code lines or tokens, and instantly see whether it fits the context window, what percentage it uses, and how many tokens are left for the reply. No signup, sources cited.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 9, 2026

Will it fit?Claude Sonnet 4 · 200K

Windows verified 2026-06-09

Model

200,000-token context window.

Content type

Claude tokeniser: ~3.5 characters per token.

How much do you want to send?

Unit

English words of prose. Estimates only — for an exact count of real text, use the AI Token Counter.

Try

Reserve for the model's reply (tokens)

Verdict

Fits

Input + reserved output fit inside the window.

Input tokens

32,000

of 200,000 in the window

Window used

16%

18% incl. reserved reply

Tokens remaining

164,000

≈ 123,000 words · 246 A4 pages

Window usage200,000 tokens

Input · 32,000Reserved reply · 4,000Free · 164,000

What Claude Sonnet 4's full window holds

At 100% of the 200K window	Approximate capacity
Words	150,000
Characters	800,000
A4 pages (~500 words each)	300
Lines of code (~10 tokens each)	20,000
Average novels (~90,000 words each)	1.67

Conversation capacity

How many back-and-forth turns fit before the oldest messages are truncated.

System prompt (tokens)

Avg tokens per turn

Max turns

398

Same input across every model

GPT-5400K

GPT-4.11M

3.2%

GPT-4o128K

25%

GPT-4 Turbo128K

25%

OpenAI o3200K

16%

GPT-3.5 Turbo16K

195.3%

Claude Opus 4200K

16%

Claude Sonnet 4200K

16%

Claude Sonnet 4 (1M beta)1M

3.2%

Claude Haiku 3.5200K

16%

Gemini 2.5 Pro1M

3.2%

Gemini 2.5 Flash1M

3.2%

Gemini 1.5 Pro2M

1.6%

Llama 4 Scout10M

0.32%

Llama 4 Maverick1M

3.2%

Llama 3.1 / 3.3128K

25%

DeepSeek-V3128K

25%

DeepSeek-R1128K

25%

Mistral Large 2128K

25%

19models compared. Bars in red mean the input alone exceeds that model's window.

Token figures are documented averages (OpenAI: 1 token ≈ 4 chars ≈ 0.75 words; Claude ≈ 3.5 chars/token). Context windows are vendor-published values verified on 2026-06-09. Exact counts depend on each model's tokeniser. Sources are listed in full below the tool.

How it works

A model's context windowis the maximum number of tokens it can hold at once — your prompt, any documents or conversation history, and the reply all share that budget. This calculator answers one question: does your input fit the model you picked, and how much room is left? It does this in five steps, using the vendors' own documented averages.

Estimate your input in tokens.OpenAI's guidance is that one token is about four characters, or roughly three-quarters of an English word (100 tokens ≈ 75 words). So words convert with tokens = words ÷ 0.75, characters with tokens = characters ÷ 4 (3.5 for Claude, 3.3 for code), an A4 page at ~500 words (≈ 667 tokens), and a line of code at about 10 tokens.
Read the model's window.Each window — GPT-4o's 128K, Claude's 200K, Gemini 2.5 Pro's 1M — is a vendor-published figure stored in this tool with a verification date.
Compute usage. The percentage used is input ÷ window. The verdict is Fits when input plus your reserved reply stay inside the window, Tight when the input fits but leaves no room for the reserved reply, and Too large when the input alone overflows the window.
Show the remaining budget. remaining = window − input − reserved, converted back to words (× 0.75) and A4 pages (÷ 667) so the number means something.
Estimate conversation length. Given a system prompt and an average tokens-per-turn, the tool reports floor((window − system) ÷ avgTurn) — how many back-and-forth turns fit before the oldest messages must be dropped.

Every ratio is an average: real tokenisation depends on the model, the language and the exact text. When you need an exact count of specific text rather than a capacity estimate, use the AI Token Counter. For a plain words-to-tokens conversion, see the Tokens to Words converter, and for pricing and specs across models, the AI Model Comparison.

Worked examples

24,000-word lease into Claude Sonnet (200K)

Input: 24,000 words ÷ 0.75 = 32,000 tokens
Usage: 32,000 ÷ 200,000 = 16.0% of the window
Reserve 4,000 tokens for the reply: 32,000 + 4,000 = 36,000 ≤ 200,000 → ✅ Fits
Remaining: 200,000 − 32,000 − 4,000 = 164,000 tokens
That is ≈ 123,000 words ≈ 246 A4 pages still free

8,000-line codebase + 2,000-token system prompt into GPT-4o (128K)

Code: 8,000 lines × 10 = 80,000 tokens
Plus system prompt: 80,000 + 2,000 = 82,000 tokens
Usage: 82,000 ÷ 128,000 = 64.1% of the window
Reserve 8,000 for the reply: 82,000 + 8,000 = 90,000 ≤ 128,000 → ✅ Fits
Remaining: 128,000 − 82,000 − 8,000 = 38,000 tokens (≈ 28,500 words)
Same 82,000 tokens vs Gemini 2.5 Pro (1M) = 8.2%; vs Claude (200K) = 41.0%

Edge case — 500,000-token input into GPT-4o (128K)

Input: 500,000 tokens (e.g. a very large export)
500,000 > 128,000 → ❌ Too large: the input alone overflows the window
Remaining: 0 tokens — split the input or pick a 1M-token model
The cross-model strip shows it fits Gemini 2.5 Pro (50.0%) and Llama 4 Scout (5.0%)

Conversation capacity — Claude 200K, system 1,000, 500 per turn

Usable after system prompt: 200,000 − 1,000 = 199,000 tokens
Per turn: 500 tokens
Max turns: floor(199,000 ÷ 500) = floor(398.0) = 398 turns
After ~398 turns the oldest messages start to truncate

Frequently asked questions

Sources & references

The model windows and token ratios on this page were last cross-checked against these sources on 2026-06-09. Vendor-published windows and ratios are averages; exact token counts depend on each model's tokeniser. Open-model windows (Llama, DeepSeek, Mistral) change per release and are stated as the value verified on that date.

Related tools

LiveAI

AI Max Output Tokens

Look up the maximum output (completion) tokens for every current LLM — Claude, GPT-4o, Gemini, Llama and more — and check whether your desired response fits in a single API call or needs chunking. Per-model caps cited from vendor docs, separate from the context window.

Open tool

LiveAI

AI Video Token Cost Calc

Estimate how many input tokens a video costs when you send it into a multimodal LLM — Gemini's native per-second tokenization versus frame-sampling into GPT-4o and Claude — priced per video and per month in USD and LKR. Runs in your browser; no video is uploaded.

Open tool

LiveAI

AI Token Counter

Count tokens for any text against GPT-5, GPT-4o, Claude 4.x, Gemini 3, and Llama 4. See how much of each model's context window you'll use before sending. Runs entirely in your browser, no signup, sources cited.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want a model added?

Email me at [email protected] — most fixes ship within 24 hours.