induwara.lk
induwara.lkAI · Model comparison

AI Model Comparison — GPT, Claude, Gemini, Llama side by side

18 of the most-used LLMs in one table — context window, input and output pricing, vision, audio, function calling, training cutoff. Pick three to compare side by side and project the monthly cost at your workload. Every figure cites the vendor source.

By Induwara AshinsanaUpdated May 12, 2026
Compare AI models side by side18 models · 7 vendors
Quick presets

Side-by-side

Cheapest pick
GPT-5GPT

Frontier general-purpose: hard reasoning, long context, coding.

Input $/M
$1.25
Output $/M
$10
Context
400K tok
Output max
128K tok
Released
2025-08
Train cutoff
2024-10
VisionAudioToolsReasoningOpen weights

Strengths

  • +Strong on coding and math
  • +400K context window
  • +Vision in the same model

Watch out

  • No audio input
  • Reasoning latency on hard prompts
At your workload
$2.63/mo
$0.0026 per call
Vendor pricing
Claude Sonnet 4.5Claude

Default pick for agentic coding (Claude Code, Cursor, Cline).

Input $/M
$3
Output $/M
$15
Context
200K tok
Output max
64K tok
Released
2025-10
Train cutoff
2024-10
VisionAudioToolsReasoningOpen weights

Strengths

  • +Top-tier on real-world coding benchmarks
  • +Computer-use API
  • +1M-token tier available at 2× price

Watch out

  • Output cap of 64K may force chunking on huge generations
At your workload
$4.50/mo
$0.0045 per call
Vendor pricing
Cheapest pick
Gemini 2.5 ProGemini

Long-context workloads — 2M tokens of input in one shot.

Input $/M
$1.25
Output $/M
$10
Context
2M tok
Output max
65.5K tok
Released
2025-03
Train cutoff
2025-01
VisionAudioToolsReasoningOpen weights

Strengths

  • +2M-token context window
  • +Native vision + audio + video
  • +Built-in thinking

Watch out

  • Prices double above 200K input tokens
At your workload
$2.63/mo
$0.0026 per call
Vendor pricing

Project the cost

≈ 750 words per 1,000 tokens.

A 200-word reply ≈ 280 tokens.

One user, one chat. Multiply for fleets.

Best value across all 18 models: GPT-5 nano at $0.105/month.

Full comparison (18 of 18)

Vendor
Capability
Claude Haiku 4.5
Anthropic
$1$5
Claude Opus 4.5
Anthropic
$5$25
Claude Sonnet 4.5
Anthropic
$3$15
DeepSeek R1
DeepSeek
$0.55$2.19
DeepSeek V3
DeepSeek
$0.27$1.1
Gemini 2.0 Flash
Google
$0.10$0.40
Gemini 2.5 Flash
Google
$0.30$2.5
Gemini 2.5 Pro
Google
$1.25$10
Llama 4 Maverick
Meta
$0.27$0.85
Llama 4 Scout
Meta
$0.18$0.59
Mistral Large 2
Mistral
$2$6
GPT-4o
OpenAI
$2.5$10
GPT-5
OpenAI
$1.25$10
GPT-5 mini
OpenAI
$0.25$2
GPT-5 nano
OpenAI
$0.05$0.40
o1
OpenAI
$15$60
o3-mini
OpenAI
$1.1$4.4
Grok 4
xAI
$3$15
No proxy, no API key, no logging.

This page is a static comparison. Picking a model here does not send anything to any AI provider. Pricing and capability data is reviewed quarterly; every figure traces back to the vendor's own docs (see the Sources section below).

How it works

Picking an AI model is a multi-axis decision: input price, output price, context window, output cap, modalities, training cutoff, and whether the weights are downloadable. Most comparison pages on the web pick two of those axes and call it done. This page lays out all of them at once for the 18 models that almost every Sri Lankan developer, student, or startup ends up considering — drawn from 7 vendors.

1. The pricing formula

Every commercial LLM API charges separately for input and output tokens. The cost of one call is:

usd_per_call = (input_tokens ÷ 1,000,000) × input_$/M + (output_tokens ÷ 1,000,000) × output_$/M

Monthly cost is then per-call multiplied by the number of calls you make per month. The cost projection in the tool above uses this formula directly — no caching credits, no batch discounts, no enterprise rates. That is the published list price, which is what you actually pay on most plans.

2. Reasoning models bill hidden tokens

Models with extended thinking (OpenAI o1, o3-mini, Claude with extended thinking, DeepSeek R1) emit a chain-of-thought before the visible answer. The vendor bills all of those tokens as output. A 200-word visible reply can consume 2,000–5,000 output tokens on a hard problem. When you compare a reasoning model to a chat model on this page, multiply the reasoning model's output number by 3–10× before drawing conclusions.

3. Context window vs output cap

Context window is the size of the prompt the model can read. Output max is the size of the reply the model can write. These are different limits. Gemini 2.5 Pro reads 2M tokens but only emits ~65K. Llama 4 Maverick advertises 1M context with an 8K output cap. For long-form generation (article drafts, code refactors), the output cap is the one that bites first.

4. Capability flags

Vision, audio, function calling, reasoning, and open weights are independent dimensions. A model can support any combination. The comparison table marks each capability with a chip; struck-through chips mean the model lacks that capability. The vendor docs URL on each row is the authoritative source — capabilities sometimes ship in API tiers behind allowlists or paid plans.

5. Cross-check

The data module exports a deterministic verifyWorkedExamples() function that recomputes seven hand-derived test cases — including zero-input edges, the 1M-token boundary, and a 10⁹-token large input — to assert the cost math matches the file-header arithmetic to within a millionth of a cent. A second integrity check asserts unique ids, non-negative prices, positive context windows, and non-empty positioning notes. Both run at typecheck time. If a row drifts during a quarterly update, the build fails.

Worked examples

Cheap support chatbot — 1,000 calls/month

A small team Slack bot. Short questions (≈500 input tokens) with concise replies (≈200 output tokens).

  1. Workload: 500 input + 200 output, 1,000 calls/month.
  2. GPT-5 nano: per call = 500/1M × $0.05 + 200/1M × $0.40 = $0.000025 + $0.00008 = $0.000105. Monthly: $0.11.
  3. Claude Haiku 4.5: per call = 500/1M × $1.00 + 200/1M × $5.00 = $0.0005 + $0.001 = $0.0015. Monthly: $1.50.
  4. Gemini 2.0 Flash: per call = 500/1M × $0.10 + 200/1M × $0.40 = $0.00005 + $0.00008 = $0.00013. Monthly: $0.13.
  5. Winner on price: GPT-5 nano at $0.11/month. Pick this unless you need vision (use Gemini 2.0 Flash) or are already in the Anthropic ecosystem (Haiku 4.5).

Agentic coding assistant — 200 calls/day

Long prompts with files attached (≈8,000 input tokens), substantial replies (≈1,500 output tokens), 6,000 calls/month.

  1. Workload: 8,000 input + 1,500 output, 6,000 calls/month.
  2. Claude Sonnet 4.5: per call = 8000/1M × $3.00 + 1500/1M × $15.00 = $0.024 + $0.0225 = $0.0465. Monthly: $279.
  3. GPT-5: per call = 8000/1M × $1.25 + 1500/1M × $10.00 = $0.010 + $0.015 = $0.025. Monthly: $150.
  4. DeepSeek V3: per call = 8000/1M × $0.27 + 1500/1M × $1.10 = $0.00216 + $0.00165 = $0.00381. Monthly: $22.86.
  5. GPT-5 is roughly half the price of Sonnet 4.5. DeepSeek V3 is 12× cheaper than Sonnet but lacks tool-use parity for agentic work — quality test before switching.

Edge case — long-context summarisation, 1M-token doc

One-off: load a 1M-token codebase and ask for a summary. ≈1,000 output tokens.

  1. Workload: 1,000,000 input + 1,000 output, 1 call.
  2. Gemini 2.5 Pro (1M fits): per call = 1,000,000/1M × $1.25 + 1,000/1M × $10.00 = $1.25 + $0.01 = $1.26.
  3. Claude Sonnet 4.5 (1M tier, 2× price above 200K): per call ≈ 1,000,000/1M × $6.00 + 1,000/1M × $30.00 = $6.00 + $0.03 = $6.03.
  4. GPT-5 (400K context — would need three calls + a reduce step): too lossy to compare directly.
  5. For this workload Gemini 2.5 Pro is structurally cheaper because it doesn't double-price above 200K input. Pick it for retrieval-on-a-whole-codebase prompts.

Frequently asked questions

Sources & references

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Spot a stale price, a missing model, or a misclaimed capability?

Email me at [email protected] — most fixes ship within 24 hours.