How do you calculate the Brier score?

Take each forecast probability f and its actual outcome o (1 if the event happened, 0 if not), square the difference, and average over all N forecasts: BS = (1/N) Σ (fᵢ − oᵢ)². For the default example [0.9, 0.8, 0.3, 0.6] against outcomes [1, 1, 0, 1], the squared errors are 0.01, 0.04, 0.09 and 0.16, summing to 0.30, so the Brier score is 0.30 / 4 = 0.0750.

What is a good Brier score?

A Brier score runs from 0 (perfect) to 1 (always confidently wrong) for binary outcomes. There is no universal cut-off because it depends on the base rate: a useful anchor is the base-rate baseline ō(1 − ō), the score of always predicting the event's overall frequency. If your model beats that, the Brier Skill Score is positive. Weather and forecasting work often treats anything below about 0.10–0.20 as strong, but always compare against the baseline.

What is the difference between the Brier score and accuracy?

Accuracy grades hard labels: was the predicted class right or wrong? It ignores confidence. The Brier score grades the probabilities themselves, so a model that says 0.99 and is wrong is penalised far more than one that says 0.51 and is wrong. That makes the Brier score a measure of calibration and sharpness, not just correctness — two models with identical accuracy can have very different Brier scores.

How is the Brier Skill Score calculated?

The Brier Skill Score is BSS = 1 − BS / BS_ref, where BS_ref is the Brier score of a reference forecast — usually the base rate (the mean outcome) or a fixed constant like 0.5. A BSS above 0 means your model beats the baseline, 0 means it ties, and below 0 means it is worse than just predicting the reference every time. With the base-rate reference, BS_ref equals ō(1 − ō).

What is the range of the Brier score?

For binary (two-outcome) forecasts the Brier score lies between 0 and 1. Zero means every probability was fully confident and correct; one means every probability was fully confident and wrong. This calculator is binary-only, matching scikit-learn's brier_score_loss. The original multi-category Brier score has a range of 0 to 2, but that variant is not computed here.

Is a lower or higher Brier score better?

Lower is better. The Brier score is an error measure — the mean squared distance between your probabilities and what actually happened — so smaller means better-calibrated forecasts. This is the opposite of the Brier Skill Score, where higher is better because a larger skill score means a bigger improvement over the baseline.

Does this match scikit-learn's brier_score_loss?

Yes, for the binary case. scikit-learn's brier_score_loss returns the mean of (probability − outcome)² over all samples, which is exactly the formula used here. This tool also computes the same number a second, independent way — splitting the sum by class into Σ(1 − f)² over positives plus Σf² over negatives — and checks the two agree, so the result is self-verified before you see it.

Does this calculator send my predictions anywhere?

No. Parsing your probabilities and outcomes, computing the Brier score and skill score, and building the breakdown table all run in your browser with plain JavaScript. Nothing is uploaded, logged, or stored, and the page keeps working offline once loaded. You can paste validation-set predictions with no privacy concern.

AI · Machine learning

Brier Score Calculator

Paste your forecast probabilities and the actual 0/1 outcomes to get the Brier score — the mean squared error of your probabilities — plus the Brier Skill Scoreversus a baseline, the formula, and a per-prediction breakdown. It matches scikit-learn's brier_score_loss, runs entirely in your browser, and needs no signup.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 11, 2026

Brier score calculator

Predicted probabilities

Each forecast as a probability in [0, 1]. Separate with commas, spaces, or new lines.

Actual outcomes

The realised outcome for each forecast: 1 if the event happened, 0 if not.

Skill-score baseline

The reference forecast the Brier Skill Score is measured against.

Presets

Brier score

0.0750

Mean squared error (0 best, 1 worst)

Skill score (BSS)

0.6000

1 − BS / BS_ref

Base rate

0.7500

Mean outcome = 75% positive

Pairs (N)

Reference Brier 0.1875

Brier score 0.0750 on a 0 (best) to 1 (worst) scale — better than the base-rate baseline of 0.1875. Skill score 0.6000: 60.0% better than the baseline (reference Brier 0.1875).

Decimals

Formulas

BS = (1/N) Σ (fᵢ − oᵢ)²
ō = (1/N) Σ oᵢ (base rate)
BS_ref = (1/N) Σ (r − oᵢ)²
BSS = 1 − BS / BS_ref

Cross-check. The direct mean-squared-error gives BS = 0.0750; the independent per-class split Σ(1−f)² over positives + Σf² over negatives gives 0.0750. They reconcile, as they must — the result is verified.

Per-prediction breakdown

#	Probability f	Outcome o	Squared error (f − o)²
1	0.9000	1	0.0100
2	0.8000	1	0.0400
3	0.3000	0	0.0900
4	0.6000	1	0.1600
Brier score = mean squared error			0.0750

Method: BS = (1/N) Σ (fᵢ − oᵢ)² (Brier 1950, matching scikit-learn brier_score_loss), with BSS = 1 − BS / BS_ref (US National Weather Service). Sources cited below the calculator. No data leaves this page.

How it works

The Brier score grades probabilistic forecasts. Instead of asking whether a hard label was right, it measures how far each predicted probability sat from what actually happened. It was defined by Glenn Brier in 1950 for weather verification and is identical to the mean squared error of the probabilities — the same quantity scikit-learn returns from brier_score_loss.

With N forecasts, each a probability fᵢ ∈ [0, 1] of a binary event whose actual outcome is oᵢ ∈ {0, 1}:

BS = (1/N) Σ (fᵢ − oᵢ)²

Validate. Every probability must lie in [0, 1], every outcome must be 0 or 1, and the two lists must be the same length. Bad input gets a specific message, never a silent NaN.
Score. Square each gap (fᵢ − oᵢ)² and average them. For binary outcomes this lands in [0, 1]; 0 is a perfect, fully-confident-and-correct forecaster.
Baseline. Compute the reference Brier score for a constant forecast r:
BS_ref = (1/N) Σ (r − oᵢ)²
With the base rate r = ō (the mean outcome) this simplifies to the outcome variance ō(1 − ō) — the score of a climatology forecaster that always predicts the long-run frequency.
Skill. The Brier Skill Score rescales the Brier score against that baseline:
BSS = 1 − BS / BS_ref
Above 0 the model beats the baseline; 0 ties it; below 0 it is worse than just predicting the reference. When the baseline is itself perfect (BS_ref = 0, every outcome identical) the skill score is undefined and the tool shows “—” rather than dividing by zero.

As an internal correctness gate the tool also recomputes the Brier score a second way — splitting the sum by class into Σ(1 − fᵢ)² over the positive cases plus Σfᵢ² over the negatives — and asserts the two agree to floating-point precision. The two forms are algebraically identical because oᵢ²= oᵢ for binary outcomes, so any disagreement would signal a bug.

Worked examples

Four forecasts — Brier 0.0750, skill 0.6000 (the Demo preset)

Probabilities f = [0.9, 0.8, 0.3, 0.6], outcomes o = [1, 1, 0, 1]. N = 4
Squared errors: (0.9−1)²=0.01, (0.8−1)²=0.04, (0.3−0)²=0.09, (0.6−1)²=0.16
Sum = 0.30 → Brier score = 0.30 / 4 = 0.0750
Base rate ō = 3/4 = 0.75; BS_ref = ō(1−ō) = 0.75·0.25 = 0.1875
Brier Skill Score = 1 − 0.0750 / 0.1875 = 1 − 0.40 = 0.6000
Read-out: 60% better than always predicting the base rate

Custom baseline 0.5 on the same data — skill 0.7000

Same forecasts, but the reference is a fixed 0.5 (a coin flip), not the base rate
BS_ref = mean((0.5 − o)²) = (0.25·3 + 0.25·1) / 4 = 0.25
Brier Skill Score = 1 − 0.0750 / 0.25 = 1 − 0.30 = 0.7000
Against an uninformed 0.5 forecaster the model looks even stronger

The bounds — perfect 0.0000 and worst 1.0000

Perfect: f = [1, 0, 1] against o = [1, 0, 1] → every squared error 0 → Brier = 0.0000
Worst (binary): f = [0, 1] against o = [1, 0] → (0−1)² + (1−0)² = 2
Brier = 2 / 2 = 1.0000 — the maximum a binary forecaster can score
Edge case: outcomes all identical (e.g. all 1) make BS_ref = 0, so the skill score is shown as '—' rather than dividing by zero

Frequently asked questions

Sources & references

The formulas on this page were last cross-checked against these sources on 2026-06-11. The Brier score is a stable mathematical definition, so this tool needs no rate or schedule updates — only the worked examples are periodically re-reconciled against scikit-learn.

Related tools

LiveAI

MCC Calculator

Compute the Matthews Correlation Coefficient from a confusion matrix or two label columns, with formula breakdown and imbalanced-data interpretation, entirely in the browser.

Open tool

LiveAI

Silhouette Score Calc

Compute the silhouette score (silhouette coefficient) of a clustering from raw data points and labels. Get the overall score, per-cluster means, and the full per-sample a(i)/b(i)/s(i) working — with misassigned points flagged. Matches scikit-learn silhouette_score, runs entirely in your browser.

Open tool

LiveAI

F1 Score Calculator

Calculate the F1 score, precision, recall and F-beta of a binary classifier from confusion-matrix counts (TP, FP, FN) or directly from precision and recall, with every step of the arithmetic shown. Matches scikit-learn, runs in your browser.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.