Question 1

Which AI model is the cheapest?

Accepted Answer

For short prompts and short replies, GPT-5 nano ($0.05/M input, $0.40/M output) and Gemini 2.0 Flash ($0.10/M, $0.40/M) are the cheapest closed-source options. Llama 4 Scout via Together AI is the cheapest open-weight tier. Use the cost projection above with your real workload — the cheapest model depends on whether you generate more input tokens or output tokens.

Question 2

What is a context window?

Accepted Answer

The context window is the total number of tokens (roughly, three-quarters of a word each) the model can read in one request — your system prompt, conversation history, and any documents combined. If you exceed it the model either errors or silently drops the oldest content. Gemini 2.5 Pro currently leads at 2,000,000 tokens; most frontier models sit at 200K–400K.

Question 3

What does training cutoff mean?

Accepted Answer

It is the date of the most recent data the model saw during pre-training. After that date the model has no knowledge of new events unless the application gives it tools (web search, RAG, file upload). Vendors state this on their model cards. A 2024-10 cutoff means the model will not, on its own, know about anything that happened after October 2024.

Question 4

Why do reasoning models cost more per output token?

Accepted Answer

Reasoning models (OpenAI o1/o3, Claude with extended thinking, DeepSeek R1) generate hidden chain-of-thought tokens before producing the visible answer. Vendors bill all of those tokens as output, so a 200-word reply can actually consume 2,000–5,000 output tokens depending on problem difficulty. Budget 3–10× more output tokens than a chat model when comparing apples to apples.

Question 5

Are these prices accurate today?

Accepted Answer

Every row's input and output price is sourced from the vendor's official pricing page on 2026-05-12. We review quarterly and after any pricing announcement. If you spot a stale figure, email me and I'll fix it within 24 hours.

Question 6

What is the difference between vision, audio, and function calling?

Accepted Answer

Vision = the model can read images you attach. Audio = the model can ingest speech directly (GPT-4o real-time, Gemini live). Function calling = the model can output a structured JSON call for a tool or API you defined, instead of just text. These are independent — some models do one but not the others. The capability chips on each row show what is supported.

Question 7

Should I pick the cheapest model?

Accepted Answer

Only if it actually solves your task. Cheap models save dollars but cost user time when they hallucinate, format poorly, or refuse. The rule of thumb: start with a frontier model (GPT-5, Claude Sonnet 4.5, Gemini 2.5 Pro), get the output quality right, then evaluate whether a cheaper tier produces equivalent results on your eval set. Lock in the cheapest tier that still passes.

Question 8

Do open-weight models work the same way?

Accepted Answer

Same API shape, different running costs. Llama 4, DeepSeek V3, Mistral Large 2 — you can call them through a hosted endpoint (Together AI, DeepSeek API, Mistral La Plateforme) at the prices shown here, or download the weights and self-host on your own GPUs to skip the per-token markup. Self-hosting only wins economically above ~100M tokens per day; below that, the hosted endpoints are cheaper than the GPU rental.

Question 9

How do I budget for AI API spend from Sri Lanka?

Accepted Answer

Per-token pricing is billed in USD. Multiply the monthly USD by your bank's USD/LKR rate plus the FX margin (typically 1.5–3% above the CBSL indicative rate) plus card-processing fees (2–3% on most LKR cards). For a 100 USD/month workload, budget roughly Rs 32,000 at current rates. Most providers also require a USD-denominated card; Sampath WealthMate, NTB Visa Platinum, and Wise USD cards all work.

Question 10

When were these brackets last verified?

Accepted Answer

Prices, context windows, and capability flags were last cross-checked against the official vendor documentation on 2026-05-12. The dataset passes a deterministic integrity check at build time (unique ids, non-negative prices, positive context windows, non-empty positioning notes) — see verifyDatasetIntegrity() in the data module.

				Capabilities
Claude Haiku 4.5 Anthropic	200K	$1	$5	VisionAudioToolsReasonOpen	2025-10
Claude Opus 4.5 Anthropic	200K	$5	$25	VisionAudioToolsReasonOpen	2025-11
Claude Sonnet 4.5 Anthropic	200K	$3	$15	VisionAudioToolsReasonOpen	2025-10
DeepSeek R1 DeepSeek	64K	$0.55	$2.19	VisionAudioToolsReasonOpen	2025-01
DeepSeek V3 DeepSeek	64K	$0.27	$1.1	VisionAudioToolsReasonOpen	2024-12
Gemini 2.0 Flash Google	1M	$0.10	$0.40	VisionAudioToolsReasonOpen	2024-12
Gemini 2.5 Flash Google	1M	$0.30	$2.5	VisionAudioToolsReasonOpen	2025-03
Gemini 2.5 Pro Google	2M	$1.25	$10	VisionAudioToolsReasonOpen	2025-03
Llama 4 Maverick Meta	1M	$0.27	$0.85	VisionAudioToolsReasonOpen	2025-04
Llama 4 Scout Meta	10M	$0.18	$0.59	VisionAudioToolsReasonOpen	2025-04
Mistral Large 2 Mistral	128K	$2	$6	VisionAudioToolsReasonOpen	2024-07
GPT-4o OpenAI	128K	$2.5	$10	VisionAudioToolsReasonOpen	2024-05
GPT-5 OpenAI	400K	$1.25	$10	VisionAudioToolsReasonOpen	2025-08
GPT-5 mini OpenAI	400K	$0.25	$2	VisionAudioToolsReasonOpen	2025-08
GPT-5 nano OpenAI	400K	$0.05	$0.40	VisionAudioToolsReasonOpen	2025-08
o1 OpenAI	200K	$15	$60	VisionAudioToolsReasonOpen	2024-12
o3-mini OpenAI	200K	$1.1	$4.4	VisionAudioToolsReasonOpen	2025-01
Grok 4 xAI	256K	$3	$15	VisionAudioToolsReasonOpen	2025-07

AI Model Comparison — GPT, Claude, Gemini, Llama side by side

Side-by-side

Project the cost

Full comparison (18 of 18)

How it works

1. The pricing formula

2. Reasoning models bill hidden tokens

3. Context window vs output cap

4. Capability flags

5. Cross-check

Worked examples

Frequently asked questions

Sources & references

Related tools

AI Token Counter

AI Prompt Library

Prompt Formatter

Comments & feedback