What is a good Adjusted Rand Index score?

ARI runs from −0.5 to 1. A value of 1 means the two clusterings are identical (up to relabeling), and a value near 0 means the agreement is no better than random. There is no single official cut-off, but a useful reading is: above 0.90 the partitions are near-identical, 0.65–0.90 is strong agreement, 0.30–0.65 is moderate, and anything within about ±0.05 of zero is effectively random. Always report ARI alongside what you are comparing against.

What is the difference between the Rand Index and the Adjusted Rand Index?

The Rand Index is the share of item pairs that two clusterings treat the same way — together in both or apart in both — so it sits between 0 and 1. The problem is that even random labelings score a high Rand Index when there are many clusters. The Adjusted Rand Index subtracts the expected (chance) Rand value and rescales, so a random labeling scores about 0 and a perfect match scores 1. Use ARI when you need a chance-corrected comparison.

Can the Adjusted Rand Index be negative?

Yes. ARI is negative when two clusterings agree less than random labeling would predict — they are systematically pulling items apart that the other groups together. Its theoretical minimum is −0.5, reached by maximally independent partitions such as A = [0,0,1,1] versus B = [0,1,0,1]. A negative ARI usually signals that the predicted clusters bear no real relationship to the reference labels.

How do you calculate the Adjusted Rand Index by hand?

Build the contingency table n_ij (items in cluster i of A and cluster j of B), with row sums a_i and column sums b_j. Compute Index = Σ C(n_ij,2), sumA = Σ C(a_i,2) and sumB = Σ C(b_j,2), where C(x,2) = x(x−1)/2. Then Expected = sumA·sumB / C(n,2) and Max = (sumA+sumB)/2. Finally ARI = (Index − Expected) / (Max − Expected). The pair-count table on this page shows each intermediate value.

Is a higher or lower Adjusted Rand Index better?

Higher is better. ARI measures similarity, so 1.0 is a perfect match with your reference labels and 0 means no better than chance. When you compare several clustering algorithms — say k-means, DBSCAN and hierarchical — on the same ground truth, the one with the highest ARI reproduces the reference grouping most faithfully.

Does ARI care which labels I use for the clusters?

No. Clustering comparison is permutation-invariant: a partition is defined only by which items share a cluster, not by the cluster's name. So [0,0,1,1] and [5,5,9,9] describe the same grouping and score ARI = 1 against each other. That is why the two lists do not need to share a vocabulary — you can compare scikit-learn's integer cluster ids directly against species names or any other labels.

Will this match scikit-learn's adjusted_rand_score?

Yes. The tool implements the same definitions as scikit-learn's adjusted_rand_score, rand_score and fowlkes_mallows_score. Pair counts are kept as exact integers and only the final ratios are floating point — the same arithmetic NumPy performs — so results match to full double precision. As a safeguard, every ARI is recomputed through an independent pair-count formula and the two must agree before a result is shown.

Why does my ARI show as 1.0 when every item is its own cluster?

When every item sits alone in both clusterings (all singletons), or every item is in one big cluster in both, there are no informative pairs to compare and the ARI formula becomes 0/0. scikit-learn returns 1.0 in this degenerate case by convention, treating the two structureless partitions as trivially identical, and this calculator mirrors that. The tool flags when the result comes from that convention rather than from real agreement.

Does this calculator send my labels anywhere?

No. Everything runs in your browser with plain integer arithmetic — there is no model, no API call and no upload. Your label lists never leave your device, so it is safe for unpublished experiment results or data from a labelling contract.

AI · Clustering evaluation

Adjusted Rand Index (ARI) Calculator

Paste two clustering label lists — a clustering and its ground truth, or two clusterings — and get the Adjusted Rand Index, the raw Rand Index and the Fowlkes–Mallows index, with the full pair-count and contingency working shown. Matches scikit-learn. Free, no signup, runs in your browser.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 11, 2026

Adjusted Rand Index

Labels A (true / clustering 1)

One label per item — commas, spaces or new lines.6 items

Labels B (predicted / clustering 2)

Must match the item count of list A.6 items

Examples

Adjusted Rand Index

Weak

0.2424

Weak agreement — only a little better than random.

Rand Index (RI)

0.6667

Share of pairs the two clusterings treat the same. Not chance-corrected.

Fowlkes–Mallows (FM)

0.4714

Geometric mean of pair precision and recall.

Pair-count breakdown

a	Together in both A and B	2
b	Together in A only	4
c	Together in B only	1
d	Separated in both	8
Total pairs C(6, 2)		15

Contingency table

A \ B	0	1	2	Row Σ
0	2	1	0	3
1	0	1	2	3
Col Σ	2	2	2	6

Rows = clusters of A, columns = clusters of B. Grand total n = 6 items.

Computed entirely in your browser — nothing is uploaded. Formulas per scikit-learn and Hubert & Arabie (1985); last verified 2026-06-11.

How it works

The Adjusted Rand Index compares two ways of grouping the same set of items — for example the clusters a k-means run produced versus the true class labels. It works at the level of pairs of items. For every one of the C(n,2) pairs, the two clusterings either place the pair in the same cluster or in different clusters, and the metric counts how often they make the same call.

Concretely, you build a contingency table where n_ij is the number of items in cluster i of partition A and cluster j of partition B. Writing the row sums as a_i, the column sums as b_j, and C(x,2) = x(x−1)/2, the calculation follows scikit-learn and Hubert & Arabie (1985):

Index = Σ C(n_ij, 2)
sumA = Σ C(a_i, 2), sumB = Σ C(b_j, 2)
Expected = sumA · sumB / C(n, 2)
Max = ½ (sumA + sumB)
ARI = (Index − Expected) / (Max − Expected)

The subtraction of Expectedis what makes the index “adjusted”: it removes the agreement you would get from random labeling, so a chance clustering scores about 0 instead of the inflated value a raw Rand Index gives. The raw Rand Index itself is RI = (a + d) / C(n,2), where a = Index (pairs together in both) and d = C(n,2) − sumA − sumB + Index (pairs apart in both). The Fowlkes–Mallows index is the geometric mean of pair precision and recall, FM = Index / √(sumA · sumB).

ARI ranges from −0.5 to 1: 1 is a perfect match (identical partitions up to relabeling), 0 is what chance predicts, and negative values mean the two clusterings disagree more than random. Because the comparison only depends on which items share a cluster, it is permutation-invariant — renaming clusters changes nothing. One special case: when both partitions are structureless (every item alone, or all items together), the formula is 0/0 and scikit-learn defines the result as 1.0; this tool follows the same convention and flags it. All pair counts stay exact integers, so the results match scikit-learn to full double precision, and every ARI is independently re-derived from the four pair counts before it is shown.

Worked examples

Example 1 — partial agreement, n = 6

A = [0,0,0,1,1,1], B = [0,0,1,1,2,2]; contingency rows [2,1,0] and [0,1,2].
Index = C(2,2)+C(1,2)+C(1,2)+C(2,2) = 1+0+0+1 = 2
sumA = C(3,2)·2 = 6; sumB = C(2,2)·3 = 3; C(6,2) = 15
Expected = 6·3/15 = 1.2; Max = (6+3)/2 = 4.5
ARI = (2 − 1.2)/(4.5 − 1.2) = 0.8/3.3 = 0.2424
RI = (2 + 8)/15 = 0.6667; FM = 2/√18 = 0.4714

Example 2 — identical partitions under relabeling, n = 4

A = [0,0,1,1], B = [1,1,0,0] — same grouping, different cluster names.
Index = C(2,2)+C(2,2) = 2; sumA = 2; sumB = 2; C(4,2) = 6
Expected = 2·2/6 = 0.6667; Max = (2+2)/2 = 2
ARI = (2 − 0.6667)/(2 − 0.6667) = 1.0 → Near-identical
RI = 1.0; FM = 2/√4 = 1.0 — relabeling does not affect the score.

Example 3 — edge case, independent labelings (the ARI minimum), n = 4

A = [0,0,1,1], B = [0,1,0,1] — every cluster of A is split evenly across B.
Index = 0; sumA = 2; sumB = 2; C(4,2) = 6
Expected = 2·2/6 = 0.6667; Max = 2
ARI = (0 − 0.6667)/(2 − 0.6667) = −0.6667/1.3333 = −0.5 → Worse than random
RI = (0 + 2)/6 = 0.3333, yet ARI exposes that this is the worst possible case.

Frequently asked questions

Sources & references

Every formula on this page was cross-checked against these sources on 2026-06-11, and each ARI is verified against an independent pair-count formula inside the tool. Your label lists never leave your browser.

Related tools

LiveAI

Jaccard Similarity Calc

Compute the Jaccard similarity coefficient and distance between two sets, text snippets, or binary vectors, with the intersection/union breakdown and the substituted formula — all in the browser. Matches scikit-learn jaccard_score.

Open tool

LiveAI

ROUGE Score Calculator

Calculate ROUGE-1, ROUGE-2, ROUGE-L and ROUGE-Lsum precision, recall and F1 between a generated summary and one or more references, entirely in your browser. Shows matched n-grams and the longest common subsequence. Matches Google rouge-score, no signup.

Open tool

LiveAI

Silhouette Score Calc

Compute the silhouette score (silhouette coefficient) of a clustering from raw data points and labels. Get the overall score, per-cluster means, and the full per-sample a(i)/b(i)/s(i) working — with misassigned points flagged. Matches scikit-learn silhouette_score, runs entirely in your browser.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, an edge case, or want Normalized Mutual Information added next?

Email me at [email protected] — most fixes ship within 24 hours.