F1 Score Calculator — Precision, Recall & F-beta
Compute the F1 score of a binary classifier in seconds — from confusion-matrix counts (TP, FP, FN) or straight from precision and recall. See precision, recall, F-beta and the full working, matching scikit-learn. No signup, no ads.
How it works
The F1 score answers one question about a binary classifier: how well does it find the positive class without raising too many false alarms? It combines two simpler metrics — precision and recall — into a single number, and this calculator shows every step using the definitions implemented by scikit-learn, the de-facto standard machine-learning library.
Step 1 — Precision. Of all the cases the model labelled positive, how many really were positive?P = TP / (TP + FP)If the model never predicts positive (TP + FP = 0) precision is undefined; following scikit-learn's zero_division=0 default the tool reports it as 0 and flags that it was forced.
Step 2 — Recall (sensitivity). Of all the cases that were really positive, how many did the model catch?R = TP / (TP + FN)True negatives never appear in either formula, which is exactly why F1 stays honest on imbalanced data where one class dwarfs the other.
Step 3 — F1, the harmonic mean. F1 is not the simple average of precision and recall but their harmonic mean:F1 = 2 · P · R / (P + R)The harmonic mean sits close to the smaller of the two numbers, so a model cannot earn a high F1 by being strong on one and weak on the other. An equivalent count form, F1 = 2·TP / (2·TP + FP + FN), gives the same answer to the last digit — the calculator computes both and cross-checks them, which is what the “Formulas verified” badge means.
Step 4 — F-beta, the general form. Sometimes precision and recall are not equally important. The F-beta score adds a weight β:Fβ = (1 + β²) · P · R / (β² · P + R)β > 1 weights recall more (F2 is common in medical screening, where a missed case is dangerous); β < 1 weights precision more (F0.5 suits spam filtering, where false alarms annoy users). At β = 1 this reduces exactly to F1. The weighting traces back to C. J. van Rijsbergen's 1979 effectiveness measure, the origin of the F-measure.
Worked examples
Frequently asked questions
Sources & references
- scikit-learn — f1_score API reference (precision, recall, F1, zero_division)
- scikit-learn — fbeta_score API reference (general F-beta)
- Wikipedia — F-score (harmonic-mean derivation and F-beta form)
- C. J. van Rijsbergen, Information Retrieval (2nd ed., 1979), ch. 7 — origin of the F-measure
The formulas on this page were last cross-checked against scikit-learn and the Wikipedia F-score article on 2026-06-10. F1 is verified by computing it two independent ways (harmonic mean of precision and recall, and the count form) and confirming they agree.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, edge case, or want to suggest an improvement?
Email me at [email protected] — most fixes ship within 24 hours.