Fastest LLM — Speed & Latency Comparison
Find the fastest LLM API for your workload. Pick 2–6 models, enter your output length, and rank them by estimated total response time — output throughput (tokens/sec) and time-to-first-token (TTFT), separated so you know what is fast to start versus fast to finish. Every figure is a cited Artificial Analysis median.
How it works
This tool turns two independently-measured serving metrics into the one number you actually care about: how long your request takes from “send” to the last token. The two inputs per endpoint come from Artificial Analysis, which continuously measures every major hosted model across providers and publishes the median (not best-case) figure:
- Output speed — median output throughput in tokens per second once the stream is flowing.
- Time-to-first-token (TTFT) — median delay before the first token streams back, in seconds.
For an output of N tokens, the estimated total response time is computed with one exact formula:
totalSeconds = TTFT + (N ÷ output speed)
The first term is the latency before anything appears; the second is the time to stream the rest of the answer at the model’s median throughput. Per-token latency is shown as 1000 ÷ output speed milliseconds per token, and the relative-speed bar normalises every row against the fastest selected endpoint. To guard the arithmetic, the data module computes each total a second way — through the per-token latency path — and a self-check reconciles the two to one-millionth of a second before the page can build, the same belt-and-braces approach the site’s tax calculator uses against the IRD’s alternate formula.
Two deliberate design choices keep the answer honest. First, the host is part of each endpoint’s identity: the same weights served by Cerebras, Groq, Together, or a first-party API differ by up to an order of magnitude in throughput, so they appear as separate rows (16 endpoints across 9hosts). Second, the “fastest to start” (lowest TTFT) and “fastest to finish” (highest throughput) verdicts are surfaced separately, because for short replies a low TTFT wins and for long ones throughput wins — a single “fastest model” label hides that trade-off. The figures are medians and exclude the network round-trip from your own location, so treat the ranking as a relative guide rather than a guarantee.
Worked examples
Frequently asked questions
Sources & references
- Artificial Analysis — independent LLM serving benchmarks (median output speed & TTFT)
- Artificial Analysis — model leaderboard (per-model speed & latency pages)
- OpenAI — model identifiers & streaming docs
- Anthropic — Claude model reference
- Google — Gemini API model reference
Output-speed and TTFT medians were transcribed from Artificial Analysis on 2026-06-24. They are independently-measured medians that vary by provider, region, prompt length, and server load — no SLA is implied. Each row in the tool links to its Artificial Analysis source page.
Related tools
Speed is one axis. Compare model quality on benchmarks, project pricing and capabilities, or estimate self-hosted GPU throughput before you ship.
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
A median looks off, or want another model or host added?
Email me at [email protected] — most fixes ship within 24 hours.