AI Hallucination Rate Comparison
Compare how often leading AI models invent facts. Pick two LLMs and a document count to see each one's hallucination rate, factual consistency, and the expected number of wrong summaries — every figure taken from the Vectara HHEM leaderboard. No signup, sources cited below.
How it works
Every number on this page comes from the Vectara Hughes Hallucination Evaluation Model (HHEM) leaderboard, the most-cited public benchmark for LLM factual faithfulness. It does not measure reasoning or coding skill — it measures one thing: when you hand a model a document and ask for a summary, how often does the summary contain a claim the document never made?
The protocol is fixed and reproducible. Each model is given the same corpus of 7,700+ articles (50–24,000 words each)and asked only to summarise. Vectara's HHEM model then scores every summary for factual consistency against its source. The published figures, scored with HHEM-2.3, are:
- Hallucination Rate — the share of answered summaries with at least one unsupported claim. Lower is better.
- Factual Consistency Rate — the complement,
FCR = 100 − HallucinationRate. This page recomputes it from the hallucination rate as an internal cross-check, so the two columns can never silently disagree. - Answer Rate — how often the model actually produced a summary instead of refusing. A low answer rate can flatter the hallucination rate, because skipped documents are never scored.
From those cited values the tool derives just two figures, with plain arithmetic and no scoring weights:
- Expected wrong summaries for a batch of N documents:
E = round( (HallucinationRate ÷ 100) × N ). This turns an abstract percentage into a concrete count for the scale you actually operate at. - Relative hallucination reduction of the more accurate model over the other:
((HR_high − HR_low) ÷ HR_high) × 100. A drop from 4.1% to 3.1% is a 24% relative reduction, even though the absolute gap is just one percentage point.
Because the data is a dated snapshot (2026-05-11), the tool makes no network call and renders instantly. When Vectara refreshes the leaderboard, this snapshot and its LAST_VERIFIED date are updated together.
Worked examples
Frequently asked questions
Sources & references
- Vectara — Hallucination Leaderboard (HHEM-2.3, snapshot 2026-05-11)
- Vectara — Hallucination Leaderboard (HuggingFace Space)
- Vectara — HHEM factual-consistency model card (metric definition)
The model rates on this page were last cross-checked against the Vectara HHEM leaderboard on 2026-06-21. Figures are a dated snapshot and are refreshed when Vectara updates the leaderboard. Vendor and model names are trademarks of their respective owners.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Spotted a model that should be added, or a number that looks off?
Email me at [email protected] — most fixes ship within 24 hours.