induwara.lk
induwara.lkAI · Developer tools

Fastest LLM — Speed & Latency Comparison

Find the fastest LLM API for your workload. Pick 2–6 models, enter your output length, and rank them by estimated total response time — output throughput (tokens/sec) and time-to-first-token (TTFT), separated so you know what is fast to start versus fast to finish. Every figure is a cited Artificial Analysis median.

By Induwara AshinsanaUpdated Jun 24, 2026
Compare LLM speed16 endpoints · 9 hosts
Artificial Analysis medians · verified 2026-06-24
Pick models to compare (26) · 3 selected

Tap to add or remove a model. The same model on a different host is a separate row.

375 English words. A token is ~4 characters.

Quick lengths
Fastest to finish
3.00 s
Gemini 3 Flash
Google AI Studio

1.27 s sooner than GPT-4o (30% faster).

Fast to start
0.45 s TTFT
GPT-4o

Lowest time-to-first-token — the typing indicator feels instant.

Fast to finish
200 tok/s
Gemini 3 Flash

Highest throughput — dominates total time on long outputs.

Heads up: the model that starts fastest isn't the one that finishes fastest at this length. Lower TTFT wins on short replies; higher throughput wins on long ones.

Results for 500-token output

Model · host Output TTFTms/token Total
Gemini 3 Flash
Google AI Studio
200t/s0.50s5.00
3.00 s
GPT-4o
OpenAI API
131t/s0.45s7.63
4.27 s
1.42× slower
Claude Sonnet 4.6
Anthropic API
104t/s0.74s9.62
5.55 s
1.85× slower

“Total” is the estimated wall-clock time for a streaming response of this length. Lower is faster. Figures are medians and exclude network round-trips from your own location.

Output speed and TTFT are independently-measured Artificial Analysis medians, last verified 2026-06-24. Real latency varies by provider, region, prompt length, and server load — these are medians, not guarantees. Estimated total time = TTFT + (tokens ÷ output speed).

How it works

This tool turns two independently-measured serving metrics into the one number you actually care about: how long your request takes from “send” to the last token. The two inputs per endpoint come from Artificial Analysis, which continuously measures every major hosted model across providers and publishes the median (not best-case) figure:

  • Output speed — median output throughput in tokens per second once the stream is flowing.
  • Time-to-first-token (TTFT) — median delay before the first token streams back, in seconds.

For an output of N tokens, the estimated total response time is computed with one exact formula:

totalSeconds = TTFT + (N ÷ output speed)

The first term is the latency before anything appears; the second is the time to stream the rest of the answer at the model’s median throughput. Per-token latency is shown as 1000 ÷ output speed milliseconds per token, and the relative-speed bar normalises every row against the fastest selected endpoint. To guard the arithmetic, the data module computes each total a second way — through the per-token latency path — and a self-check reconciles the two to one-millionth of a second before the page can build, the same belt-and-braces approach the site’s tax calculator uses against the IRD’s alternate formula.

Two deliberate design choices keep the answer honest. First, the host is part of each endpoint’s identity: the same weights served by Cerebras, Groq, Together, or a first-party API differ by up to an order of magnitude in throughput, so they appear as separate rows (16 endpoints across 9hosts). Second, the “fastest to start” (lowest TTFT) and “fastest to finish” (highest throughput) verdicts are surfaced separately, because for short replies a low TTFT wins and for long ones throughput wins — a single “fastest model” label hides that trade-off. The figures are medians and exclude the network round-trip from your own location, so treat the ranking as a relative guide rather than a guarantee.

Worked examples

Short chatbot reply — 500 tokens, default trio

  1. Formula: totalSeconds = TTFT + (N ÷ output speed), N = 500
  2. GPT-4o (131 t/s, 0.45 s): 0.45 + 500/131 = 0.45 + 3.817 = 4.27 s
  3. Claude Sonnet 4.6 (104 t/s, 0.74 s): 0.74 + 500/104 = 5.55 s
  4. Gemini 3 Flash (200 t/s, 0.50 s): 0.50 + 500/200 = 3.00 s ← winner
  5. Gap to runner-up GPT-4o: 4.27 − 3.00 = 1.27 s (≈30% faster)

Tiny reply — 50 tokens, low TTFT wins

  1. GPT-4o (131 t/s, 0.45 s): 0.45 + 50/131 = 0.45 + 0.382 = 0.83 s ← winner
  2. GPT-5.5 Mini (168 t/s, 0.61 s): 0.61 + 50/168 = 0.61 + 0.298 = 0.91 s
  3. GPT-4o finishes 0.08 s sooner despite LOWER throughput,
  4. because at 50 tokens the 0.16 s TTFT head start outweighs speed.

Same pair, 500 tokens — throughput overtakes (edge case)

  1. GPT-4o (131 t/s, 0.45 s): 0.45 + 500/131 = 4.27 s
  2. GPT-5.5 Mini (168 t/s, 0.61 s): 0.61 + 500/168 = 3.59 s ← winner
  3. Mini now wins by 0.68 s even though it STARTS 0.16 s later.
  4. Crossover: 0.45 + N/131 = 0.61 + N/168 → N ≈ 95 tokens.
  5. Below ~95 tokens GPT-4o wins; above it, GPT-5.5 Mini wins.

Frequently asked questions

Sources & references

Output-speed and TTFT medians were transcribed from Artificial Analysis on 2026-06-24. They are independently-measured medians that vary by provider, region, prompt length, and server load — no SLA is implied. Each row in the tool links to its Artificial Analysis source page.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

A median looks off, or want another model or host added?

Email me at [email protected] — most fixes ship within 24 hours.