How do you calculate the F1 score from precision and recall?

F1 is the harmonic mean of the two: F1 = 2 × (precision × recall) / (precision + recall). The harmonic mean punishes imbalance, so a model with 0.9 precision but 0.1 recall scores only 0.18, not 0.5. If both precision and recall are zero the formula is undefined and F1 is reported as 0.

How do you compute F1 from a confusion matrix (TP, FP, FN)?

First precision = TP / (TP + FP) and recall = TP / (TP + FN), then take their harmonic mean. Equivalently, F1 = 2·TP / (2·TP + FP + FN) straight from the counts — this tool computes it both ways and checks they agree. True negatives (TN) never enter the F1 formula.

What is a good F1 score?

It depends on the task and class balance, so there is no universal cutoff. As a rough guide on balanced data: above 0.9 is excellent, 0.8–0.9 strong, 0.5–0.8 moderate, below 0.5 weak. On heavily imbalanced data a lower F1 can still beat a high-accuracy model, so always compare against a sensible baseline.

What is the difference between F1, F2 and F0.5 score?

They are all F-beta scores with different β. F1 (β=1) weights precision and recall equally. F2 (β=2) weights recall four times as heavily — use it when missing a positive is costly, like disease screening. F0.5 (β=0.5) weights precision more — use it when false alarms are costly, like spam filtering.

Why use F1 score instead of accuracy?

Accuracy is misleading on imbalanced data. If 99% of cases are negative, a model that always predicts negative scores 99% accuracy while catching zero positives. F1 only rewards correctly finding the positive class, so it exposes that failure where accuracy hides it.

What does this tool do when precision or recall is zero?

It follows scikit-learn's default zero_division=0 behaviour. If TP+FP=0 precision is undefined and reported as 0; if TP+FN=0 recall is reported as 0; and if precision+recall=0 the F1 is reported as 0. Each case is flagged in the result so you can see it was forced, not a genuine score.

Does this match scikit-learn's f1_score?

Yes. For the same TP, FP and FN the value here equals sklearn.metrics.f1_score on the equivalent label vectors, and the F-beta matches fbeta_score. The worked example TP=80, FP=20, FN=10 gives F1 = 0.8421 in both. This is binary, single-class F1 — multi-class macro/micro averaging is a separate calculation.

Machine Learning · Metrics

F1 Score Calculator — Precision, Recall & F-beta

Compute the F1 score of a binary classifier in seconds — from confusion-matrix counts (TP, FP, FN) or straight from precision and recall. See precision, recall, F-beta and the full working, matching scikit-learn. No signup, no ads.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 10, 2026

F1 score calculator

True positives(TP)

Predicted positive, actually positive.

False positives(FP)

Predicted positive, actually negative (Type I).

False negatives(FN)

Predicted negative, actually positive (Type II).

True negatives (TN) are not needed — F1 ignores them.

F-beta weight (β)

β = 1 is F1. β > 1 weights recall higher; β < 1 weights precision.

Examples

Precision

0.8000

80%

Recall

0.8889

88.89%

F1 score

0.8421

84.21%

F-beta (β = 1)

0.8421

84.21%

F-beta family

F0.50.8163

F10.8421

F20.8696

Strong — a well-balanced classifier for most tasks.

Step-by-step working

Precision = TP / (TP + FP) = 80 / (80 + 20) = 0.8000
Recall = TP / (TP + FN) = 80 / (80 + 10) = 0.8889
F1 = 2 · P · R / (P + R) = 2 · 0.8000 · 0.8889 / (0.8000 + 0.8889) = 0.8421
Fβ = (1 + β²) · P · R / (β² · P + R) = (1 + 1.0000) · 0.8000 · 0.8889 / (1.0000 · 0.8000 + 0.8889) = 0.8421

Computed entirely in your browser — nothing is uploaded. Definitions per scikit-learn and Wikipedia; last verified 2026-06-10.

How it works

The F1 score answers one question about a binary classifier: how well does it find the positive class without raising too many false alarms? It combines two simpler metrics — precision and recall — into a single number, and this calculator shows every step using the definitions implemented by scikit-learn, the de-facto standard machine-learning library.

Step 1 — Precision. Of all the cases the model labelled positive, how many really were positive?P = TP / (TP + FP)If the model never predicts positive (TP + FP = 0) precision is undefined; following scikit-learn's zero_division=0 default the tool reports it as 0 and flags that it was forced.

Step 2 — Recall (sensitivity). Of all the cases that were really positive, how many did the model catch?R = TP / (TP + FN)True negatives never appear in either formula, which is exactly why F1 stays honest on imbalanced data where one class dwarfs the other.

Step 3 — F1, the harmonic mean. F1 is not the simple average of precision and recall but their harmonic mean:F1 = 2 · P · R / (P + R)The harmonic mean sits close to the smaller of the two numbers, so a model cannot earn a high F1 by being strong on one and weak on the other. An equivalent count form, F1 = 2·TP / (2·TP + FP + FN), gives the same answer to the last digit — the calculator computes both and cross-checks them, which is what the “Formulas verified” badge means.

Step 4 — F-beta, the general form. Sometimes precision and recall are not equally important. The F-beta score adds a weight β:Fβ = (1 + β²) · P · R / (β² · P + R)β > 1 weights recall more (F2 is common in medical screening, where a missed case is dangerous); β < 1 weights precision more (F0.5 suits spam filtering, where false alarms annoy users). At β = 1 this reduces exactly to F1. The weighting traces back to C. J. van Rijsbergen's 1979 effectiveness measure, the origin of the F-measure.

Worked examples

Fraud-detection model from confusion counts (TP=80, FP=20, FN=10)

Precision = TP / (TP + FP) = 80 / (80 + 20) = 80 / 100 = 0.8000
Recall = TP / (TP + FN) = 80 / (80 + 10) = 80 / 90 = 0.8889
F1 = 2 × (0.8 × 0.8889) / (0.8 + 0.8889) = 1.42222 / 1.68889 = 0.8421
Cross-check: 2·TP / (2·TP + FP + FN) = 160 / 190 = 0.8421 — matches
scikit-learn f1_score on the equivalent labels also returns 0.8421

Perfect recall, half precision — entered directly (P=0.5, R=1.0)

F1 = 2 × (0.5 × 1.0) / (0.5 + 1.0) = 1.0 / 1.5 = 0.6667
F2 = 5 × (0.5 × 1.0) / (4 × 0.5 + 1.0) = 2.5 / 3.0 = 0.8333 (recall-weighted, higher)
F0.5 = 1.25 × (0.5 × 1.0) / (0.25 × 0.5 + 1.0) = 0.625 / 1.125 = 0.5556 (precision-weighted, lower)
Same precision and recall, three different F-beta verdicts depending on what you value

Zero-division edge case (TP=0, FP=0, FN=10)

TP + FP = 0 → precision is undefined, reported as 0 (zero_division=0)
Recall = 0 / (0 + 10) = 0
Precision + recall = 0 → F1 is reported as 0
The tool flags each forced 0 so you know it is not a genuine score

Frequently asked questions

Sources & references

The formulas on this page were last cross-checked against scikit-learn and the Wikipedia F-score article on 2026-06-10. F1 is verified by computing it two independent ways (harmonic mean of precision and recall, and the count form) and confirming they agree.

Related tools

LiveAI

Confusion Matrix Calculator

Enter the four cells of a binary confusion matrix (TP, FP, FN, TN) and instantly get accuracy, precision, recall, specificity, F1, F-beta, balanced accuracy and the Matthews correlation coefficient — each shown with its exact formula. Runs in your browser, no signup.

Open tool

LiveAI

Silhouette Score Calc

Compute the silhouette score (silhouette coefficient) of a clustering from raw data points and labels. Get the overall score, per-cluster means, and the full per-sample a(i)/b(i)/s(i) working — with misassigned points flagged. Matches scikit-learn silhouette_score, runs entirely in your browser.

Open tool

LiveAI

Brier Score Calculator

Compute the Brier score and Brier Skill Score for probabilistic predictions. Paste forecast probabilities and 0/1 outcomes to get the mean-squared-error of the probabilities, the skill score versus a baseline, the exact formula and a per-pair breakdown. Matches scikit-learn, runs in the browser.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.