Word Error Rate (WER) & CER Calculator
Paste a reference transcript and a model hypothesis to get Word Error Rate, Character Error Rate, word accuracy, and a colour-coded alignment of every error — substitution, deletion, and insertion. Uses the NIST SCTK formula. No signup, no upload, runs entirely in your browser.
How it works
Word Error Rate is the standard accuracy metric for automatic speech recognition (ASR). It compares a system's output (the hypothesis) against a correct, human-verified transcript (the reference) and reports the fraction of reference words the system got wrong. The formula, defined by the NIST Speech Recognition Scoring Toolkit, is:
WER = (S + D + I) / N
Here S is substitutions (a word recognised as a different word), D is deletions (a reference word the system missed), I is insertions (an extra word the system added), and N is the total number of words in the reference. The three error types are not counted by eye — they come from an optimal alignment of the two transcripts:
- Normalise. Both transcripts are optionally lowercased, stripped of Unicode punctuation and symbols, and have their whitespace collapsed, then split into word tokens. Benchmarks normalise this way because a speech model should not be penalised for capitalisation or commas.
- Align. The reference and hypothesis word sequences are aligned with the minimum-edit-distance (Levenshtein) algorithm, which fills a dynamic-programming table where each cell holds the fewest edits needed to turn one prefix into the other. Each substitution, deletion, and insertion costs one; a match costs zero.
- Backtrace. Walking the table back from the bottom-right corner reconstructs the cheapest sequence of edits, which yields the exact S, D, and I counts and the word-by-word alignment shown in the diff view.
- Score. WER is the total errors divided by N. If the reference is empty, WER is undefined (division by zero), so the tool shows a guard message instead of a number.
Character Error Rate (CER) repeats the same alignment at the character level — CER = (S + D + I) / N over characters — which is gentler on near-miss spellings and suits scripts without clear word boundaries. Word accuracy is max(0, 1 − WER). As a cross-check, the tool computes the word-level edit distance a second time with an independent space-efficient pass; when that total matches the count from the alignment backtrace, the result is marked cross-checked. This is the same definition used by the Python jiwer library and the HuggingFace evaluate WER metric, so numbers are directly comparable when normalisation matches.
Worked examples
Frequently asked questions
Sources & references
- NIST Speech Recognition Scoring Toolkit (SCTK / sclite) — the reference WER implementation
- HuggingFace evaluate — WER metric documentation
- HuggingFace evaluate — CER metric documentation
- Levenshtein, V. I. (1966) — Binary codes capable of correcting deletions, insertions, and reversals
The WER and CER formulas and the four worked examples on this page were last reconciled against the SCTK definition and the HuggingFace metric docs on 2026-06-09. The calculation module ships with a built-in assertion that re-runs every worked example, so a regression in the alignment math fails fast.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, edge case, or want to suggest an improvement?
Email me at [email protected] — most fixes ship within 24 hours.