induwara.lk
induwara.lkAI · Machine Translation

METEOR Score Calculator

Paste a candidate translation and a reference to get the METEOR score with the full breakdown — unigram matches, precision, recall, the recall-weighted Fmean, the chunk-based fragmentation penalty, and the aligned tokens highlighted. Matches NLTK, no signup, runs in your browser.

By Induwara AshinsanaUpdated Jun 12, 2026
METEOR Score
One sentence. Stays on your device.0 words
One reference. Stays on your device.0 words
Matching
Examples
METEOR
Rating
Fmean
Penalty

Runs entirely in your browser — your text is never uploaded, logged, or stored. Method: one-to-one unigram alignment, recall-weighted Fmean, and a chunk-based fragmentation penalty, per Banerjee & Lavie (2005); reconciled to NLTK single_meteor_score. Up to 50,000 characters per box.

How it works

METEOR (Metric for Evaluation of Translation with Explicit ORdering), defined by Banerjee & Lavie (2005), scores a candidate against a reference by first aligning their words and then balancing how much content is shared against how well the word order is preserved. Unlike BLEU, it rewards recall heavily and adds an explicit penalty for jumbled output. The score is:

METEOR = Fmean · (1 − Pen)

It is built in four steps from the tokenised, optionally lowercased texts:

  1. Align unigrams.Build the largest one-to-one mapping between candidate and reference words. This tool uses exact matching and, optionally, a Porter-stem stage applied to the words left over after the exact pass — the same exact/stem ordering as NLTK's meteor_score. WordNet synonym matching is out of scope here.
  2. Precision and recall. With m mapped unigrams, candidate length c and reference length r, P = m/c and R = m/r.
  3. Fmean. A recall-weighted harmonic mean, Fmean = (P·R)/(α·P + (1 − α)·R). With the default α = 0.9 this equals 10·P·R/(R + 9·P), so recall pulls nine times harder than precision.
  4. Fragmentation penalty. Group the mapped unigrams into the fewest chunks — runs adjacent in both the candidate and the reference. With ch chunks over m matches, Pen = γ·(ch/m)^β using γ = 0.5 and β = 3. Many short chunks (scrambled word order) drive the penalty up; one long chunk barely dents the score.

The α = 0.9, γ = 0.5 and β = 3 defaults are the values from Banerjee & Lavie (2005), confirmed against NLTK's single_meteor_score defaults; Lavie & Agarwal (2007) discuss tuning them per language. Fmean is computed as the direct ratio and independently re-derived as the reciprocal 1/(α/R + (1 − α)/P); when the two agree the score is flagged “cross-checked”. One quirk worth knowing: two identical sentences score about 0.998, not 1, because a single chunk still incurs Pen = γ·(1/m)^β — a correct, documented property of METEOR.

Worked examples

One substitution → 0.8067

Candidate
the cat is on the mat
Reference
the cat sat on the mat
  1. Matches: the, cat, on, the, mat → m = 5 (sat ≠ is)
  2. c = 6, r = 6 → P = R = 5/6 = 0.8333
  3. Fmean = 10·(5/6)² / ((5/6) + 9·(5/6)) = 0.8333
  4. Chunks {the, cat} and {on, the, mat} → ch = 2
  5. Pen = 0.5·(2/5)³ = 0.5·0.064 = 0.0320
  6. METEOR = 0.8333 · (1 − 0.0320) = 0.8067

Same words, reordered → 0.8519

Candidate
the bird flew over a house
Reference
a bird flew over the house
  1. Every word matches → m = 6, P = R = 1, Fmean = 1
  2. Best alignment chunks: {the}, {bird, flew, over}, {a}, {house}
  3. ch = 4 over m = 6
  4. Pen = 0.5·(4/6)³ = 0.5·0.2963 = 0.1481
  5. METEOR = 1 · (1 − 0.1481) = 0.8519
  6. Same words as a perfect match, but the swapped order costs ~0.15

Porter-stem mode → 0.6389

Candidate
the cats are running
Reference
the cat is run
  1. Exact: the = the. Stem: cats → cat, running → run
  2. m = 3 (are/is do not match), c = r = 4
  3. P = R = 3/4 = 0.7500 → Fmean = 0.7500
  4. Chunks {the, cat} and {run} → ch = 2
  5. Pen = 0.5·(2/3)³ = 0.1481
  6. METEOR = 0.7500 · (1 − 0.1481) = 0.6389

Frequently asked questions

Sources & references

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.