How is word error rate calculated?

Word Error Rate is WER = (S + D + I) / N, where S is substitutions, D is deletions, I is insertions, and N is the number of words in the reference transcript. S, D, and I come from a minimum-edit-distance (Levenshtein) alignment of the reference and hypothesis word sequences, the method defined by the NIST scoring toolkit.

What is a good word error rate?

It depends on the domain. For clean read speech, modern systems reach 2–5% WER, which is near human parity. Conversational or accented audio, background noise, and specialist vocabulary commonly push WER to 10–25%. Below 10% is usually considered usable for transcription; under 5% is excellent.

What is the difference between WER and CER?

WER counts errors in whole words, so one wrong letter makes the entire word an error. CER (Character Error Rate) counts errors per character, so a near-miss like 'colour' vs 'color' costs one character, not one word. CER is more forgiving and is preferred for languages without clear word boundaries or for spelling-level comparison.

Can word error rate be greater than 100%?

Yes. Because insertions are counted against the reference word count N, a hypothesis with many extra words on a short reference can produce more than N errors, giving a WER above 100%. This tool shows the raw value and flags it, rather than capping it at 100%.

How do you calculate accuracy from WER?

Word accuracy = max(0, 1 − WER), shown as a percentage. For a 25% WER the accuracy is 75%. The max(0, …) guard keeps accuracy from going negative when WER exceeds 100%, which can happen with heavy over-insertion.

Should I lowercase and remove punctuation before scoring?

Most ASR benchmarks normalise text first: lowercase everything and strip punctuation, since a speech model is not graded on capitalisation or commas. This tool does both by default, with toggles to turn them off when case or punctuation genuinely matters for your comparison.

Is my transcript uploaded anywhere?

No. The entire calculation — normalisation, alignment, and scoring — runs in your browser with JavaScript. Nothing is sent to a server, logged, or stored, so it is safe for confidential meeting notes, medical dictation, or unreleased product audio.

Does this match Python jiwer or the HuggingFace WER metric?

Yes, for the same normalisation settings. This tool uses the identical (S + D + I) / N formula and unit-cost Levenshtein alignment as jiwer and the HuggingFace evaluate metric. Small differences usually come from different tokenisation or normalisation, so match the case and punctuation options to your pipeline.

AI · Speech Recognition

Word Error Rate (WER) & CER Calculator

Paste a reference transcript and a model hypothesis to get Word Error Rate, Character Error Rate, word accuracy, and a colour-coded alignment of every error — substitution, deletion, and insertion. Uses the NIST SCTK formula. No signup, no upload, runs entirely in your browser.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 9, 2026

Word Error Rate & CER

Reference (ground truth)

Stays on your device.0 words

Hypothesis (model output)

Stays on your device.0 words

Options

Examples

WER

—

CER

—

Word accuracy

—

Total word errors

—

Runs entirely in your browser — transcripts are never uploaded, logged, or stored. Method: minimum-edit-distance (Levenshtein) alignment, WER = (S + D + I) / N, per the NIST SCTK / sclite definition and the HuggingFace evaluate metric. Up to 100,000 characters per box.

How it works

Word Error Rate is the standard accuracy metric for automatic speech recognition (ASR). It compares a system's output (the hypothesis) against a correct, human-verified transcript (the reference) and reports the fraction of reference words the system got wrong. The formula, defined by the NIST Speech Recognition Scoring Toolkit, is:

WER = (S + D + I) / N

Here S is substitutions (a word recognised as a different word), D is deletions (a reference word the system missed), I is insertions (an extra word the system added), and N is the total number of words in the reference. The three error types are not counted by eye — they come from an optimal alignment of the two transcripts:

Normalise. Both transcripts are optionally lowercased, stripped of Unicode punctuation and symbols, and have their whitespace collapsed, then split into word tokens. Benchmarks normalise this way because a speech model should not be penalised for capitalisation or commas.
Align. The reference and hypothesis word sequences are aligned with the minimum-edit-distance (Levenshtein) algorithm, which fills a dynamic-programming table where each cell holds the fewest edits needed to turn one prefix into the other. Each substitution, deletion, and insertion costs one; a match costs zero.
Backtrace. Walking the table back from the bottom-right corner reconstructs the cheapest sequence of edits, which yields the exact S, D, and I counts and the word-by-word alignment shown in the diff view.
Score. WER is the total errors divided by N. If the reference is empty, WER is undefined (division by zero), so the tool shows a guard message instead of a number.

Character Error Rate (CER) repeats the same alignment at the character level — CER = (S + D + I) / N over characters — which is gentler on near-miss spellings and suits scripts without clear word boundaries. Word accuracy is max(0, 1 − WER). As a cross-check, the tool computes the word-level edit distance a second time with an independent space-efficient pass; when that total matches the count from the alignment backtrace, the result is marked cross-checked. This is the same definition used by the Python jiwer library and the HuggingFace evaluate WER metric, so numbers are directly comparable when normalisation matches.

Worked examples

One substitution → 25% WER

Reference: the quick brown fox
Hypothesis: the quick brown box

Reference words (N): the, quick, brown, fox → N = 4
Align: the=the, quick=quick, brown=brown, fox→box
fox→box is 1 substitution. S=1, D=0, I=0
WER = (1 + 0 + 0) / 4 = 0.25 → 25%
Word accuracy = max(0, 1 − 0.25) = 75%

Two deletions → 28.57% WER

Reference: I am going to the market today
Hypothesis: I going to market today

Reference words (N): I, am, going, to, the, market, today → N = 7
Align: 'am' and 'the' have no hypothesis match → 2 deletions
S=0, D=2, I=0
WER = (0 + 2 + 0) / 7 = 0.2857… → 28.57%
Word accuracy = 71.43%

Substitution + insertion → 50% WER

Reference: she sells sea shells
Hypothesis: she sell the sea shells

Reference words (N): she, sells, sea, shells → N = 4
Align: 'sells' becomes 'sell' (substitution), 'the' is extra (insertion)
S=1, D=0, I=1
WER = (1 + 0 + 1) / 4 = 0.50 → 50%
CER spot-check: ref 'cat' vs 'car' → t→r → 1/3 = 33.33%

Frequently asked questions

Sources & references

The WER and CER formulas and the four worked examples on this page were last reconciled against the SCTK definition and the HuggingFace metric docs on 2026-06-09. The calculation module ships with a built-in assertion that re-runs every worked example, so a regression in the alignment math fails fast.

Related tools

LiveAI

Brier Score Calculator

Compute the Brier score and Brier Skill Score for probabilistic predictions. Paste forecast probabilities and 0/1 outcomes to get the mean-squared-error of the probabilities, the skill score versus a baseline, the exact formula and a per-pair breakdown. Matches scikit-learn, runs in the browser.

Open tool

LiveAI

Tokens to Words

Convert tokens to words, words to tokens, and characters for GPT, Claude, Gemini and Llama. See reading time, A4 pages, context-window use, and an approximate API cost in USD and LKR.

Open tool

LiveAI

Precision@K & Recall@K

Compute Precision@K, Recall@K, F1@K and Hit Rate@K for a retriever, search ranker or RAG pipeline — from a ranked 0/1 relevance list or from raw counts. Shows a per-K breakdown and every step of the arithmetic, cross-checked and run entirely in your browser.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.