AI Token Counter — GPT-5, Claude 4.x, Gemini 3 & Llama 4
Paste any prompt or document and see exactly how many tokens it uses across 10popular models — and what fraction of each model's context window that takes up. Runs entirely in your browser, no API key required.
How it works
Large language models don't read text the way humans do. They read it as a sequence of tokens — short integer IDs each standing for a word, a sub-word fragment, or a piece of punctuation. Every API charges per token in and per token out, and every model has a hard ceiling on the total tokens it can hold in one request (its context window). Knowing your token count before you send is the difference between a clean response and a 400 error.
Each vendor trains its own tokenizer on its own data, so the same paragraph yields different counts on different models. OpenAI's GPT-3.5 and GPT-4 use cl100k_base (100 277 tokens); GPT-4o and GPT-5 use the newer o200k_base(200 000 tokens) which is slightly more efficient on prose. Anthropic's Claude uses a distinct BPE vocabulary that produces roughly 10–15 % more tokens than tiktoken on the same English text. Google's Gemini uses SentencePiece with efficiency similar to tiktoken. Meta's Llama 3 and Llama 4 use a 128 000-token vocabulary close in style to cl100k_base.
The exact merge tables for each vocabulary are 1.7–5 MB each. Shipping them in the browser would push initial page weight past our 400 KB performance budget. Instead this calculator runs a regex-based pre-tokenizer derived from tiktoken's cl100k_base PAT pattern — splitting text into words, numbers, punctuation, and whitespace runs the same way BPE does — and then assigns a token count to each piece by length, applying a per-vendor multiplier calibrated against published benchmarks. A second method (chars-per-token ratio) runs in parallel as a cross-check; if the two disagree by more than 25 %, the per-model card flags low confidence so you know the text is probably code, emoji, or non-Latin script and the count is rougher than usual.
On English prose the estimate is within ±5 % of every vendor's official tokenizer. That accuracy is more than enough for “will this fit?” and “roughly how much will this cost?” questions. If you're running a paid pipeline at scale and need counts to the exact token, send through the official tokenizer server-side at request time — that's also what every vendor recommends.
Worked examples
Frequently asked questions
Sources & references
- OpenAI — tiktoken (cl100k_base, o200k_base) reference implementation
- OpenAI — Online Tokenizer (truth source for cl100k counts)
- OpenAI — Models page (GPT-4o, GPT-5 context windows)
- Anthropic — Token counting (Claude tokenizer behavior)
- Anthropic — Claude models (context windows for 4.x family)
- Google AI — Gemini token counting documentation
- Google AI — Gemini models (context windows)
- Meta — Llama 4 announcement (vocabulary size, context windows)
Per-vendor multipliers and chars-per-token ratios were calibrated against the OpenAI online tokenizer and published vendor benchmarks on 2026-05-12. Context windows are reviewed every quarter and immediately when a vendor changes a published figure.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a model with a stale context window, or want a vendor added?
Email me at [email protected] — most updates ship within 24 hours.