How do you calculate cross-entropy loss by hand?

For each sample take the negative natural log of the probability the model gave to the correct class. In binary form that is L = −[y·ln(p) + (1−y)·ln(1−p)] with p = P(class 1); in multi-class form it is simply L = −ln(p of the true class). Average the per-sample losses to get the standard log-loss metric. Example: a true label of 1 with predicted p = 0.9 gives −ln(0.9) ≈ 0.1054.

What is the difference between log loss and cross-entropy?

They are the same quantity. "Cross-entropy" is the information-theory name for −Σ y·ln(p) between the true distribution y and the predicted distribution p; "log loss" is the machine-learning name for its average over a dataset, used by scikit-learn's log_loss. Binary cross-entropy (BCE) and categorical cross-entropy are just the two-class and K-class versions of the same formula.

Why is cross-entropy loss always positive?

Each term is −ln(p) where p is a probability in (0, 1]. The natural log of a number ≤ 1 is ≤ 0, so its negative is ≥ 0. The loss is 0 only when the model assigns probability 1 to the correct class, and it grows without bound as that probability approaches 0. Averaging non-negative numbers keeps the mean non-negative.

What is a good cross-entropy loss value?

Lower is better, and the floor is 0 (a perfect classifier). A useful reference is ln(K) for K classes — the loss of a model that guesses uniformly: about 0.693 for 2 classes, 1.099 for 3, and 2.303 for 10. Anything well below ln(K) means the model is learning. The matching perplexity = exp(loss) reads as the effective number of classes the model is still unsure between.

How does binary cross-entropy differ from categorical cross-entropy?

Binary cross-entropy scores one probability per sample — P(class 1) — and uses both the y·ln(p) and (1−y)·ln(1−p) terms. Categorical cross-entropy scores a full probability vector over K classes per sample and only the true class's log-probability contributes. Binary BCE is mathematically the 2-class case of categorical cross-entropy with the distribution [1−p, p].

Why clip probabilities with an epsilon?

Because ln(0) is −∞. If a model assigns exactly 0 to the true class, the raw loss is infinite. scikit-learn's log_loss clips predicted probabilities to [eps, 1−eps] with eps = 1e-15 so a 0 becomes a large but finite loss (−ln(1e-15) ≈ 34.5). This calculator does the same by default; you can lower or disable the epsilon to see the unclipped behaviour.

Should I pass probabilities or logits?

If your numbers already lie in [0, 1] and (for multi-class) each row sums to ~1, use Probabilities. If you have raw model outputs before the activation, choose Logits: the tool applies a sigmoid for binary or a numerically-stable softmax for multi-class first, exactly like PyTorch's BCEWithLogitsLoss and CrossEntropyLoss, then shows the converted probabilities.

Does this calculator send my data anywhere?

No. Parsing your labels and probabilities, taking the logs, and averaging all run in your browser with plain JavaScript. Nothing is uploaded, logged, or stored, and the page keeps working offline once loaded. You can paste validation-set predictions with no privacy concern.

AI · Machine learning

Cross-Entropy Loss Calculator

Compute cross-entropy (log) loss for binary and multi-class classification in your browser. Paste true labels and predicted probabilities or logits to get per-sample loss, the mean log-loss metric, perplexity, and the full step-by-step working — matching log_loss and PyTorch CrossEntropyLoss.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 10, 2026

Cross-entropy loss calculator

True labels (0 or 1)

One label per sample — 0 for the negative class, 1 for the positive. Comma, space, or newline separated.

Predicted P(class 1)

One probability in [0, 1] per sample — the model's confidence in class 1.

Input type

Reduction

Log base

Epsilon clipping

Epsilon (0 ≤ eps < 0.5)

Presets

Mean loss (nats)

0.1976

The standard log-loss metric

Sum (nats)

0.7905

Mean × N = 4

Perplexity

1.2185

exp(mean loss in nats)

Worst sample

loss 0.3567 nats

Decimals

Formula & first sample

Lᵢ = −[ yᵢ·ln(pᵢ) + (1 − yᵢ)·ln(1 − pᵢ) ]

Sample #1: true class 1, p(true) = 0.9000 → L₁ = −ln(0.9000) = 0.1054 nats

Cross-check. Summing logs gives a mean of 0.1976 nats; the independent product form −ln((∏ p)^(1/N)) gives 0.1976. They reconcile, as they must. (Shown for up to 50samples, where the raw product doesn't underflow.)

Per-sample loss

Sample	True class	p(true class)	Loss (nats)
#1	1	0.9000	0.1054
#2	0	0.8000	0.2231
#3	1	0.7000	0.3567
#4	0	0.9000	0.1054
Sum (nats)			0.7905

Method: binary L = −[y·ln p + (1−y)·ln(1−p)], multi-class L = −ln p(true class), with eps clipping to [eps, 1−eps] — scikit-learn log_loss; PyTorch BCELoss / CrossEntropyLoss. No data leaves this page.

How it works

Cross-entropy loss — also called log loss — measures how far a classifier's predicted probabilities sit from the true labels. It is the negative log-likelihood of the correct class, averaged over the dataset. The definition comes from information theory (Goodfellow, Bengio & Courville, Deep Learning, Ch. 3) and is the loss returned by scikit-learn's log_loss and PyTorch's CrossEntropyLoss.

For binary classification with true label yᵢ ∈ {0, 1} and predicted positive-class probability pᵢ, the per-sample loss is:

Lᵢ = −[ yᵢ·ln(pᵢ) + (1 − yᵢ)·ln(1 − pᵢ) ]

For multi-class classification with K classes and integer true class cᵢ, only the correct class's probability contributes:

Lᵢ = −ln( p(class cᵢ)ᵢ )

Convert, if needed.When you pass logits, a sigmoid (binary) or a numerically-stable softmax (multi-class, subtract the row-max) maps them to probabilities first — the same pipeline as PyTorch's BCEWithLogitsLoss and CrossEntropyLoss.
Clip. Each probability is clipped to [eps, 1 − eps] with eps = 1e-15(scikit-learn's default) so a predicted 0 yields a large finite loss instead of ln 0 = −∞.
Score each sample with the formula above to get the per-sample Lᵢ.
Reduce. Mean — L = (1/N)·Σ Lᵢ — is the standard log-loss metric (scikit-learn default). Sum gives Σ Lᵢ; None returns the per-sample vector unchanged.

Natural log gives the loss in nats (the ML convention, matching PyTorch); base-2 gives bits, where L_bits = L_nats / ln 2. The related perplexity is exp(mean loss in nats), read as the effective number of equally likely classes the model is still unsure between. As a credibility check, the tool also recomputes the mean by the independent product form −ln((∏ p)^(1/N)) and confirms the two agree to floating-point precision. Every step is plain double-precision arithmetic in your browser — nothing is uploaded.

Worked examples

Binary, mean — y = [1, 0, 1, 0], p = [0.9, 0.2, 0.7, 0.1]

Sample 0 (y=1): −ln(0.9) = 0.105361
Sample 1 (y=0): −ln(1 − 0.2) = −ln(0.8) = 0.223144
Sample 2 (y=1): −ln(0.7) = 0.356675
Sample 3 (y=0): −ln(1 − 0.1) = −ln(0.9) = 0.105361
Sum = 0.790541; mean = 0.790541 / 4 = 0.197635 nats
Worst sample = #2 (0.356675); matches sklearn log_loss ≈ 0.1976 ✓

Multi-class, integer labels, mean — true [0, 2, 1]

Rows: [0.7,0.2,0.1], [0.1,0.3,0.6], [0.2,0.5,0.3]
Sample 0 (true 0): −ln(0.7) = 0.356675
Sample 1 (true 2): −ln(0.6) = 0.510826
Sample 2 (true 1): −ln(0.5) = 0.693147
Sum = 1.560648; mean = 1.560648 / 3 = 0.520216 nats
Perplexity = e^0.520216 = 1.682367

Edge case — predicted 0 with epsilon clipping (y = [1], p = [0])

Raw loss would be −ln(0) = +∞ — undefined
Clip p to eps = 1e-15: p(true) = 1e-15
L = −ln(1e-15) = 15 × ln(10) = 34.538776 nats
Finite and large — the model is heavily penalised, not broken
Turn clipping off to see the ∞ behaviour and a warning

Frequently asked questions

Sources & references

The formulas on this page were last cross-checked against these sources on 2026-06-10. Cross-entropy is a stable mathematical definition, so this tool needs no rate or schedule updates — only the worked examples are periodically re-reconciled.

Related tools

LiveAI

Perplexity Calculator

Compute language-model perplexity from token probabilities, cross-entropy loss, or log-likelihood, with nats and bits-per-token conversions. Step-by-step, matches PyTorch, runs entirely in the browser.

Open tool

LiveAI

KL Divergence Calc

Compute the Kullback–Leibler divergence D(P‖Q) between two discrete probability distributions, in bits or nats, with the full per-term working. Also shows the reverse divergence D(Q‖P), cross-entropy H(P,Q) and entropy H(P) — entirely in your browser, matched to SciPy.

Open tool

LiveAI

Minkowski Distance Calc

Compute the Minkowski distance (the generalized Lₚ metric) between two numeric vectors of any dimension, for any order p ≥ 1, with the full per-dimension working. Shows the Manhattan (p=1), Euclidean (p=2), and Chebyshev (p→∞) special cases side by side, and matches scikit-learn and SciPy — entirely in your browser.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.