induwara.lk
induwara.lkStrings · Coding theory

Hamming Distance Calculator

Find the Hamming distance between two equal-length inputs — binary codewords, text strings, or numeric vectors — as the count of positions that differ. See the normalised Hamming loss, the similarity, and exactly which positions mismatch. Runs entirely in your browser, no signup.

By Induwara AshinsanaUpdated Jun 10, 2026
Hamming distance
Cross-checked ✓
7 / 5,000
7 / 5,000
Examples
Hamming distance
2
out of 7 bits
Normalised (loss)
0.2857
distance ÷ length
Similarity
71.43%
1 − normalised
Length
7
bits per input

Position-by-position

011^
100^
210^
311^
410^
500^
611^

Differing positions

0-indexed: 2, 4

1-indexed: 3, 5

Sources cited: Hamming (1950), Error detecting and error correcting codes; scikit-learn hamming_loss; SciPy distance.hamming. Full list with links in the references section below.

How it works

The Hamming distance between two sequences A and B of equal length n is the number of positions at which their symbols differ. It was introduced by Richard Hamming in his 1950 paper on error-detecting and error-correcting codes, and it is one of the building blocks of coding theory: a block code whose codewords are all at least a Hamming distance d apart can detect up to d − 1 errors and correct up to ⌊(d − 1) / 2⌋.

  1. Parse each input into a list of symbols. Binary mode keeps each 0/1 character (and rejects anything else); text mode keeps each Unicode character, optionally lower-cased; vector mode splits on a comma or space and parses each element as a number, so 2 and 2.0 compare equal.
  2. Length guard. If the two inputs parse to different lengths, the distance is undefined, so the tool stops and shows an error rather than padding the shorter one.
  3. Count the mismatches with the core formula, where [A_i ≠ B_i] is 1 when the symbols at position i differ and 0 otherwise:
    d_H(A, B) = Σ  [ A_i ≠ B_i ]   for i = 0 … n−1
  4. Normalise. The normalised Hamming distance, or Hamming loss, is d_H / n— the fraction of positions that differ, between 0 and 1. This matches scikit-learn's hamming_loss and SciPy's distance.hamming.
  5. Similarity is 1 − d_H / n, shown as a percentage — the share of positions that agree. When both inputs are empty the loss is defined as 0 and similarity as 100%, avoiding a division by zero.

The integer distance is exact with no rounding; only the normalised value and similarity are shown to a fixed number of decimals for display. Every result is independently cross-checked against a second, reduce-based implementation of the same count — if the two ever disagreed, the badge in the tool would flag it.

Worked examples

Binary codewords: 1011101 vs 1001001

distance = 2 · normalised ≈ 0.2857 · similarity ≈ 71.43%

  1. Align the 7-bit words: 1 0 1 1 1 0 1 vs 1 0 0 1 0 0 1
  2. Position 0: 1 = 1 .......... match
  3. Position 1: 0 = 0 .......... match
  4. Position 2: 1 ≠ 0 .......... differ ✗
  5. Position 3: 1 = 1 .......... match
  6. Position 4: 1 ≠ 0 .......... differ ✗
  7. Positions 5–6: 0 = 0, 1 = 1 match
  8. 2 differing positions → d = 2. Loss = 2/7 ≈ 0.2857, similarity = (1 − 2/7) ≈ 71.43%

Text strings: karolin vs kathrin

distance = 3 · normalised ≈ 0.4286 · similarity ≈ 57.14%

  1. Align the 7 characters: k a r o l i n vs k a t h r i n
  2. Positions 0–1: k = k, a = a match
  3. Position 2: r ≠ t .......... differ ✗
  4. Position 3: o ≠ h .......... differ ✗
  5. Position 4: l ≠ r .......... differ ✗
  6. Positions 5–6: i = i, n = n match
  7. 3 differing positions → d = 3 (the textbook value). Loss = 3/7 ≈ 0.4286

Unequal length (edge case): 101 vs 1011

distance = undefined · the length guard fires

  1. Input A has 3 bits; Input B has 4 bits.
  2. Hamming distance is only defined for equal-length inputs (Hamming, 1950).
  3. The tool stops and shows a clear error instead of padding B to length 3.
  4. For different-length comparison, use the Levenshtein distance calculator instead.

Frequently asked questions

Sources & references

The definition, formula, and worked examples on this page were last cross-checked against these sources on 2026-06-10. Every distance is deterministic and verified against an independent second implementation on each calculation.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.