Cohen's Kappa Calculator (Inter-Rater Reliability)
Paste your two-rater agreement matrix and get Cohen's kappa (κ) — the chance-corrected measure of how much two annotators really agree — with the observed and chance agreement, a 95% confidence interval, optional weighted kappa for ordinal scales, and the Landis & Koch band. Free, no signup, runs in your browser.
How it works
Cohen's kappa measures how much two raters agree when each independently sorts the same items into the same set of categories — and crucially, it corrects for the agreement you would expect from pure chance. You give the tool a k×k agreement matrix where nᵢⱼ is the number of items Rater A put in category i and Rater B put in category j. The diagonal holds the agreements; everything off the diagonal is a disagreement.
From the grand total N, the row marginals rᵢ and the column marginals cⱼ, the calculation follows Cohen (1960):
- Observed agreement: pₒ = (Σ nᵢᵢ) / N
- Chance agreement: pₑ = Σ (rᵢ / N)(cᵢ / N)
- Cohen's kappa: κ = (pₒ − pₑ) / (1 − pₑ)
κ = 1 is perfect agreement, κ = 0 is exactly what chance predicts, and κ < 0 means the raters disagree more than random labelling would. Because pₑ depends on the marginal totals, a lopsided task — where one category dominates — has a high chance agreement, which is why two raters can match on 80% of items yet earn only a modest kappa.
When the categories are ordered (say Low / Medium / High), a Medium-vs-High mix-up is a smaller error than Low-vs-High. Weighted kappa, from Cohen (1968), captures that with a weight matrix built from the category distance |i − j|: linear weights are wᵢⱼ = 1 − |i−j|/(k−1) and quadratic weights are wᵢⱼ = 1 − (i−j)²/(k−1)². The same formula then runs on the weighted proportions: κ_w = (pₒ(w) − pₑ(w)) / (1 − pₑ(w)). For a 2×2 table every weighting scheme collapses to plain kappa, since there is only one disagreement distance.
The 95% confidence interval uses Cohen's simplified large-sample standard error, SE = √(pₒ(1 − pₒ) / (N·(1 − pₑ)²)), giving κ ± 1.96·SE clamped to the valid −1…1 range. This normal approximation is dependable for N ≥ 30 and is flagged as indicative below that; the full asymptotic variance is given by Fleiss, Cohen & Everitt (1969). Finally the headline kappa is mapped to the Landis & Koch (1977) band — < 0.00 poor, 0.00–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, 0.81–1.00 almost perfect — for a one-line verdict. All arithmetic is exact and runs in your browser; nothing is uploaded.
Worked examples
Frequently asked questions
Sources & references
- Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20(1), 37–46 — the original κ definition and standard error.
- Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement. Psychological Bulletin, 70(4), 213–220 — linear/quadratic weighted kappa.
- Landis, J. R., & Koch, G. G. (1977). The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1), 159–174 — the strength-of-agreement bands.
- Fleiss, J. L., Cohen, J., & Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72(5), 323–327 — the full asymptotic variance.
Every formula on this page was cross-checked against these sources on 2026-06-10, and the unweighted result is verified against the direct one-line formula inside the tool. Your agreement matrix never leaves your browser.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, an edge case, or want Fleiss' kappa for 3+ raters added?
Email me at [email protected] — most fixes ship within 24 hours.