How do you calculate F1 score from a confusion matrix?

F1 is the harmonic mean of precision and recall: F1 = 2·(P·R)/(P+R), which simplifies to the count form 2·TP/(2·TP + FP + FN). For TP=90, FP=10, FN=5 that is 180/195 = 0.9231. The harmonic mean punishes imbalance, so a high F1 needs both precision and recall to be high — one strong number cannot rescue a weak one.

What is the difference between precision and recall?

Precision = TP/(TP+FP) asks: of everything the model flagged as positive, how much was correct? Recall = TP/(TP+FN) asks: of all the real positives, how many did the model catch? A spam filter with high precision rarely mislabels good mail; with high recall it rarely lets spam through. Tightening one usually loosens the other, which is why F1 reports both at once.

Why is accuracy misleading for imbalanced datasets?

Accuracy counts every correct call equally, so when one class dominates a lazy model scores high by always predicting the majority. With 20 positives in 1,000 cases, predicting 'negative' every time gives 98% accuracy yet catches zero positives. The matrix TP=5, FP=5, FN=15, TN=975 shows 98% accuracy but F1=0.33 and MCC=0.34 — precision, recall, F1 and MCC expose the failure that accuracy hides.

What is a good F1 score?

It depends on the task and the class balance — there is no universal cut-off. As a rough guide on balanced data, above 0.9 is excellent, 0.8–0.9 is strong, 0.5–0.8 is moderate, and below 0.5 is weak. Always compare against a baseline: on a 95/5 split, even an F1 of 0.6 may beat the majority-class baseline of 0. Report F1 alongside precision and recall, never alone.

What is the Matthews correlation coefficient (MCC) and when should I use it?

MCC = (TP·TN − FP·FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN)). It runs from −1 (total disagreement) through 0 (no better than chance) to +1 (perfect). Unlike F1 it uses all four cells, including true negatives, so it stays honest on imbalanced data. Many researchers treat MCC as the single most informative binary metric — use it when classes are skewed or when you want one balanced number.

What does “undefined” mean for precision or MCC here?

A metric is undefined when its denominator is zero. Precision is TP/(TP+FP), so with no predicted positives (TP+FP=0) it is 0/0 — genuinely undefined, not zero. MCC is undefined when any row or column total is zero. The calculator shows 'undefined' in those cases instead of silently printing 0 or NaN, because reporting a fake 0 would misrepresent the model.

Does this calculator work for multi-class confusion matrices?

Not yet — this version handles the binary 2×2 case only. For a multi-class problem, reduce it to one-vs-rest per class (treat the target class as positive and all others as negative) and run each 2×2 matrix through this tool, then average the results with macro or weighted averaging. Native multi-class input with macro/micro averaging is on the roadmap.

What is F-beta and how is the beta weight used?

F-beta generalises F1 by weighting recall β times as important as precision: F_β = (1+β²)·P·R / (β²·P + R). β=1 is plain F1. Use β=2 when missing a positive is costly (medical screening, fraud) so recall matters more; use β=0.5 when false alarms are costly so precision matters more. Set the β field above to recompute the score for your trade-off.

AI · Machine learning

Confusion Matrix Calculator — Precision, Recall, F1 & MCC

Paste the four cells of a binary confusion matrix and get every standard classification metric at once — accuracy, precision, recall, specificity, F1, F-beta, balanced accuracy and the Matthews correlation coefficient — each shown with the exact formula it came from. Free, no signup, runs in your browser.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 7, 2026

Confusion matrix metrics

Enter the four counts

True Positives(TP)

Predicted positive, actually positive.

False Positives(FP)

Predicted positive, actually negative (Type I).

False Negatives(FN)

Predicted negative, actually positive (Type II).

True Negatives(TN)

Predicted negative, actually negative.

F-beta weight (β)

β = 1 is F1. β > 1 weights recall higher; β < 1 weights precision.

Positive class label

Cosmetic — names the positive class in the rendered matrix below.

Examples

Confusion matrix

	Actual
	Positive	Negative	Total
Pred. Positive	90	10	100
Pred. Negative	5	95	100
Total	95	105	200

Accuracy

0.9250

92.5% · (TP + TN) / N

Precision

0.9000

90% · TP / (TP + FP)

Recall

0.9474

94.74% · TP / (TP + FN)

F1 score

0.9231

92.31% · 2·TP / (2·TP + FP + FN)

MCC

0.8511

(TP·TN − FP·FN) / √(…)

Balanced acc.

0.9261

92.61% · (TPR + TNR) / 2

F1 = 0.9231

Excellent balance of precision and recall.

MCC = 0.8511

Strong positive correlation with the true labels.

All metrics

Metric	Value	%	Formula
Accuracy	0.9250	92.5%	(TP + TN) / N
Precision (PPV)	0.9000	90%	TP / (TP + FP)
Recall / Sensitivity (TPR)	0.9474	94.74%	TP / (TP + FN)
Specificity (TNR)	0.9048	90.48%	TN / (TN + FP)
F1 score	0.9231	92.31%	2·TP / (2·TP + FP + FN)
F1 score	0.9231	92.31%	(1 + β²)·P·R / (β²·P + R)
Balanced accuracy	0.9261	92.61%	(TPR + TNR) / 2
Matthews corr. coef. (MCC)	0.8511	—	(TP·TN − FP·FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN))
Informedness (Youden's J)	0.8521	—	TPR + TNR − 1
Neg. predictive value (NPV)	0.9500	95%	TN / (TN + FN)
False positive rate (FPR)	0.0952	9.52%	FP / (FP + TN)
False negative rate (FNR)	0.0526	5.26%	FN / (FN + TP)
False discovery rate (FDR)	0.1000	10%	FP / (FP + TP)
Prevalence	0.4750	47.5%	(TP + FN) / N

“undefined” means the metric's denominator is zero (e.g. precision when TP + FP = 0) — reported honestly rather than shown as 0.

Computed entirely in your browser — nothing is uploaded. Formulas per scikit-learn and Wikipedia; last verified 2026-06-07.

How it works

A confusion matrix is the 2×2 table a binary classifier produces when you compare its predictions against the truth. It has four cells: true positives (TP) and true negatives (TN), where the model agreed with reality, and false positives (FP, a Type I error) and false negatives (FN, a Type II error), where it did not. Every metric on this page is derived from those four counts, with the sample size N = TP + FP + FN + TN.

The headline metrics follow the standard definitions used by scikit-learn and the Wikipedia confusion-matrix table:

Accuracy = (TP + TN) / N
Precision (PPV) = TP / (TP + FP)
Recall / Sensitivity (TPR) = TP / (TP + FN)
Specificity (TNR) = TN / (TN + FP)
F1 = 2·TP / (2·TP + FP + FN)
F-beta = (1 + β²)·P·R / (β²·P + R)

F1 is the harmonic mean of precision and recall, which is why it drops hard when either one is weak. The F-beta form, from van Rijsbergen's 1979 information-retrieval text, lets you weight recall β times as heavily as precision — β > 1 favours recall, β < 1 favours precision. This calculator computes F-beta from the raw counts and cross-checks it against the precision/recall form so the two always agree to the last decimal.

The Matthews correlation coefficient takes the whole table into account:

MCC = (TP·TN − FP·FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN))

Because MCC uses all four cells — including the true negatives that precision, recall and F1 ignore — it stays trustworthy when one class vastly outnumbers the other. It ranges from −1 to +1, where 0 means the predictions are no better than chance. Balanced accuracy, (TPR + TNR) / 2, and informedness (Youden's J), TPR + TNR − 1, are two other imbalance-aware summaries shown in the full table. Each metric is computed independently from the integer counts, never from rounded intermediates, so no rounding error compounds. When a denominator is zero the metric is genuinely undefined (for example precision when nothing is predicted positive), and the tool labels it “undefined” rather than printing a misleading 0.

Worked examples

Example 1 — balanced classifier (TP=90, FP=10, FN=5, TN=95)

N = 90 + 10 + 5 + 95 = 200
Accuracy = (90 + 95) / 200 = 0.9250
Precision = 90 / (90 + 10) = 0.9000
Recall = 90 / (90 + 5) = 0.9474
F1 = 2·90 / (2·90 + 10 + 5) = 180 / 195 = 0.9231
MCC = (90·95 − 10·5) / √(100·95·105·100) = 8500 / 9987.49 = 0.8511

Example 2 — imbalanced data, the accuracy paradox (TP=5, FP=5, FN=15, TN=975)

N = 5 + 5 + 15 + 975 = 1,000 (only 20 actual positives)
Accuracy = (5 + 975) / 1000 = 0.9800 ← looks excellent
Precision = 5 / (5 + 5) = 0.5000
Recall = 5 / (5 + 15) = 0.2500
F1 = 2·5 / (2·5 + 5 + 15) = 10 / 30 = 0.3333
MCC = (5·975 − 5·15) / √(10·20·980·990) = 4800 / 13929.8 = 0.3446
Verdict: 98% accuracy, but F1 and MCC reveal a weak classifier.

Example 3 — edge case, no predicted positives (TP=0, FP=0, FN=10, TN=90)

N = 0 + 0 + 10 + 90 = 100
Precision = 0 / (0 + 0) = 0/0 → undefined (nothing was predicted positive)
Recall = 0 / (0 + 10) = 0.0000
Accuracy = (0 + 90) / 100 = 0.9000 (high, but it never finds a positive)
F1 = 2·0 / (2·0 + 0 + 10) = 0 / 10 = 0.0000
MCC = (0·90 − 0·10) / √(0·…) = 0/0 → undefined

Frequently asked questions

Sources & references

Every formula on this page was cross-checked against the scikit-learn and Wikipedia definitions on 2026-06-07. The tool runs entirely in your browser — your counts never leave your device.

Related tools

LiveAI

F-beta Score Calculator

Compute the F-beta score of a binary classifier from precision & recall or from raw confusion-matrix counts (TP/FP/FN). Presets for F1, F2, and F0.5 plus any custom beta, with all three shown side by side — using the exact scikit-learn formula and cross-checked against the confusion-matrix form.

Open tool

LiveAI

F1 Score Calculator

Calculate the F1 score, precision, recall and F-beta of a binary classifier from confusion-matrix counts (TP, FP, FN) or directly from precision and recall, with every step of the arithmetic shown. Matches scikit-learn, runs in your browser.

Open tool

LiveAI

Confusion Matrix Calc

Enter the four counts of a binary classifier's confusion matrix (TP, FP, FN, TN) and instantly get every standard metric — accuracy, precision, recall, specificity, F1, balanced accuracy, MCC and Cohen's kappa — each with the exact formula and your numbers substituted in. Matches scikit-learn, runs in your browser, no signup.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, an edge case, or want multi-class support added?

Email me at [email protected] — most fixes ship within 24 hours.