induwara.lk
induwara.lkAI · Developer

GPT-4o Image Token & Vision Cost Calculator

Enter an image's width and height and see how many tokens it costs on GPT-4o, GPT-4o mini, Claude, and Gemini — plus the price per image and total in USD and LKR, side by side. Each provider's formula is taken from its own docs. The image never leaves your browser.

By Induwara AshinsanaUpdated Jun 14, 2026
Vision token & cost

The image's pixel width.

The image's pixel height.

How many images you'll send.

Affects GPT-4o models only.

Common sizes
Rs

CBSL indicative rate. Edit to match your bank.

Cheapest total
$0.000103
Gemini 2.0 Flash · Rs 0.03
Tokens / image
1,032
on Gemini 2.0 Flash
Images costed
1
1,024 × 1,024px each
ModelTokens/imageCost/imageTotalTotal LKR
Gemini 2.0 Flash Cheapest
Google · $0.10/1M in
1,032$0.000103$0.000103Rs 0.03
GPT-4o
OpenAI · $2.50/1M in
765$0.001913$0.001913Rs 0.58
GPT-4o mini
OpenAI · $0.15/1M in
25,501$0.003825$0.003825Rs 1
Claude Sonnet
Anthropic · $3.00/1M in
1,398$0.004194$0.004194Rs 1

OpenAI tiling breakdown

  1. Original: 1,024 × 1,024px
  2. Fit 2048²: 1,024 × 1,024px
  3. Shortest side → 768: 768 × 768px
  4. Tiles: ceil(768/512) × ceil(768/512) = 2 × 2 = 4
  5. Tokens = 85 + 170 × 4 = 765

Token formulas follow each provider's published vision docs. Image inputs bill as input tokens; output/prompt-text tokens are separate.

All math runs in your browser. Dropped images never leave your device.

How it works

A vision model doesn't see your image as pixels — it converts the image into tokens, the same units it charges for text, and bills them at its input-token rate. Every provider counts those tokens with a different, published algorithm, so the same 1024×1024 photo can be 765 tokens on one model and 25,501 on another. This calculator reproduces each algorithm exactly from the source docs.

OpenAI (GPT-4o family, high detail). The image is first scaled to fit inside a 2048×2048 box, then scaled down so its shortest side is 768px. The result is covered with 512×512 tiles — tiles = ceil(w/512) × ceil(h/512) — and tokens are base + per_tile × tiles. GPT-4o uses base 85, per-tile 170; GPT-4o mini reports about 33.3× that (base 2,833, per-tile 5,667). Low detail skips tiling and charges a flat base. The breakdown panel in the tool shows every step.

Anthropic (Claude). Claude estimates image tokens as (width × height) / 750, rounded to the nearest whole token. If the longest edge is above 1568px, Claude resizes the image down proportionally first, so oversize images are capped before the division.

Google (Gemini). An image that is 384px or smaller on both edges is a flat 258 tokens. Anything larger is tiled into 768×768 crops at 258 tokens each: 258 × ceil(w/768) × ceil(h/768). The token count is identical across Gemini Flash and Pro — only the price per token differs.

Cost. For all providers, image inputs are billed as input tokens, so cost = tokens / 1,000,000 × input_rate, then the LKR figure is the USD cost times your exchange rate. Totals multiply by the number of images. Constants and rates are cross-checked on each build; the GPT-4o mini multiplier is verified against OpenAI's documented 33.3× factor so its large token counts are auditable rather than guessed.

Worked examples

1024×1024 image — three providers compared

  1. GPT-4o (high): fit 2048² → 1024×1024; shortest side → 768 ⇒ 768×768
  2. Tiles: ceil(768/512) × ceil(768/512) = 2 × 2 = 4
  3. GPT-4o tokens = 85 + 170 × 4 = 765 → $0.0019125 (Rs 0.58 @305)
  4. Claude Sonnet: round(1024 × 1024 / 750) = 1,398 → $0.004194 (Rs 1.28)
  5. Gemini 2.0 Flash: 258 × (2 × 2) = 1,032 → $0.0001032 (Rs 0.03)
  6. Cheapest by far: Gemini 2.0 Flash

10,000 receipt photos at 1024×1024 (budget check)

  1. GPT-4o: 765 tokens × 10,000 = 7,650,000 tokens → $19.13 (Rs 5,834 @305)
  2. Claude Sonnet: 1,398 × 10,000 = 13,980,000 tokens → $41.94 (Rs 12,792)
  3. Gemini 2.0 Flash: 1,032 × 10,000 = 10,320,000 tokens → $1.03 (Rs 315)
  4. Only Gemini fits comfortably inside a $50 image-processing budget.

Edge case — a 4096×8192 banner on GPT-4o, high detail

  1. Longest edge 8192 > 2048 ⇒ scale ×0.25 → 1024×2048
  2. Shortest side 1024 → 768 ⇒ scale ×0.75 → 768×1536
  3. Tiles: ceil(768/512) × ceil(1536/512) = 2 × 3 = 6
  4. Tokens = 85 + 170 × 6 = 1,105 → $0.0027625 (Rs 0.84 @305)
  5. The 2048 fit + 768 short-side cap stop huge images from exploding in cost.

Frequently asked questions

Sources & references

Algorithms and per-1M input-token rates were last cross-checked against these sources on 2026-06-14. Providers revise vision pricing periodically — confirm against your latest invoice for exact billing.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want another model added?

Email me at [email protected] — most fixes ship within 24 hours.