Jaccard Similarity Calculator
Find the Jaccard index between two sets, two texts, or two binary vectors, in your browser. See the similarity score, the Jaccard distance, the intersection and union members, and the substituted formula behind every result. No signup, nothing uploaded.
How it works
The Jaccard similarity coefficient — also called the Jaccard index or coefficient of community — measures how much two sets overlap as intersection over union. Two identical sets score 1, two sets with nothing in common score 0. The definition is the one used by scikit-learn's jaccard_score and goes back to Paul Jaccard's 1912 study of alpine flora.
For two sets A and B, the similarity is the number of shared members divided by the number of distinct members across both:
J(A, B) = |A ∩ B| / |A ∪ B| = |A ∩ B| / (|A| + |B| − |A ∩ B|)
The tool builds the two sets, then computes this in three steps:
- Intersection. The members present in both sets —
A ∩ B. Its size is the numerator. - Union. Every distinct member across both sets —
A ∪ B. Its size is the denominator. If the union is empty (both sets empty), the result is defined as 0 rather than a divide-by-zero, matching scikit-learn. - Divide, then derive distance. The similarity is
|A ∩ B| / |A ∪ B|, and the Jaccard distance is1 − J.
The three input modes only differ in how the sets are built. Sets mode splits a list on your chosen separator, trims each item, and removes duplicates and order — because a set ignores both. Text mode tokenises each snippet either into a set of words or into a set of character n-grams (the contiguous length-n substrings, the shingling approach used in near-duplicate detection). Binarymode reads two equal-length 0/1 label vectors and takes each vector's “present” set to be the positions holding a 1 — exactly how scikit-learn's jaccard_score treats binary indicator arrays. As a credibility check, the calculator also computes the coefficient a second way — from the inclusion–exclusion identity |A ∪ B| = |A| + |B| − |A ∩ B| — and, in binary mode, against jaccard_score, confirming all routes agree.
Worked examples
Frequently asked questions
Sources & references
- scikit-learn — sklearn.metrics.jaccard_score (definition and the empty-union J = 0 convention)
- P. Jaccard (1912) — The distribution of the flora in the alpine zone, New Phytologist 11(2):37–50
- Tan, Steinbach & Kumar — Introduction to Data Mining, Ch. 2: Jaccard coefficient and Jaccard distance = 1 − J
The formulas on this page were last cross-checked against these sources on 2026-06-10. The Jaccard index is a stable mathematical definition, so this tool needs no rate or schedule updates — only the worked examples are periodically re-reconciled.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, edge case, or want to suggest an improvement?
Email me at [email protected] — most fixes ship within 24 hours.