GPT-4o Image Token & Vision Cost Calculator
Enter an image's width and height and see how many tokens it costs on GPT-4o, GPT-4o mini, Claude, and Gemini — plus the price per image and total in USD and LKR, side by side. Each provider's formula is taken from its own docs. The image never leaves your browser.
How it works
A vision model doesn't see your image as pixels — it converts the image into tokens, the same units it charges for text, and bills them at its input-token rate. Every provider counts those tokens with a different, published algorithm, so the same 1024×1024 photo can be 765 tokens on one model and 25,501 on another. This calculator reproduces each algorithm exactly from the source docs.
OpenAI (GPT-4o family, high detail). The image is first scaled to fit inside a 2048×2048 box, then scaled down so its shortest side is 768px. The result is covered with 512×512 tiles — tiles = ceil(w/512) × ceil(h/512) — and tokens are base + per_tile × tiles. GPT-4o uses base 85, per-tile 170; GPT-4o mini reports about 33.3× that (base 2,833, per-tile 5,667). Low detail skips tiling and charges a flat base. The breakdown panel in the tool shows every step.
Anthropic (Claude). Claude estimates image tokens as (width × height) / 750, rounded to the nearest whole token. If the longest edge is above 1568px, Claude resizes the image down proportionally first, so oversize images are capped before the division.
Google (Gemini). An image that is 384px or smaller on both edges is a flat 258 tokens. Anything larger is tiled into 768×768 crops at 258 tokens each: 258 × ceil(w/768) × ceil(h/768). The token count is identical across Gemini Flash and Pro — only the price per token differs.
Cost. For all providers, image inputs are billed as input tokens, so cost = tokens / 1,000,000 × input_rate, then the LKR figure is the USD cost times your exchange rate. Totals multiply by the number of images. Constants and rates are cross-checked on each build; the GPT-4o mini multiplier is verified against OpenAI's documented 33.3× factor so its large token counts are auditable rather than guessed.
Worked examples
Frequently asked questions
Sources & references
- OpenAI — Images and vision guide (high/low-detail tiling)
- OpenAI — API pricing (input-token rates)
- Anthropic — Vision docs (width × height ÷ 750, 1568px max edge)
- Anthropic — Pricing (Claude input-token rates)
- Google — Gemini image understanding and token counting
- Google — Gemini API pricing
Algorithms and per-1M input-token rates were last cross-checked against these sources on 2026-06-14. Providers revise vision pricing periodically — confirm against your latest invoice for exact billing.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, edge case, or want another model added?
Email me at [email protected] — most fixes ship within 24 hours.