induwara.lk
induwara.lkAI · Machine learning

Cross-Entropy Loss Calculator

Compute cross-entropy (log) loss for binary and multi-class classification in your browser. Paste true labels and predicted probabilities or logits to get per-sample loss, the mean log-loss metric, perplexity, and the full step-by-step working — matching log_loss and PyTorch CrossEntropyLoss.

By Induwara AshinsanaUpdated Jun 10, 2026
Cross-entropy loss calculator

One label per sample — 0 for the negative class, 1 for the positive. Comma, space, or newline separated.

One probability in [0, 1] per sample — the model's confidence in class 1.

Input type
Reduction
Log base
Presets
Mean loss (nats)
0.1976
The standard log-loss metric
Sum (nats)
0.7905
Mean × N = 4
Perplexity
1.2185
exp(mean loss in nats)
Worst sample
#3
loss 0.3567 nats
Decimals

Formula & first sample

Lᵢ = −[ yᵢ·ln(pᵢ) + (1 − yᵢ)·ln(1 − pᵢ) ]

Sample #1: true class 1, p(true) = 0.9000 → L₁ = −ln(0.9000) = 0.1054 nats

Cross-check. Summing logs gives a mean of 0.1976 nats; the independent product form −ln((∏ p)^(1/N)) gives 0.1976. They reconcile, as they must. (Shown for up to 50samples, where the raw product doesn't underflow.)

Per-sample loss

SampleTrue classp(true class)Loss (nats)
#110.90000.1054
#200.80000.2231
#310.70000.3567
#400.90000.1054
Sum (nats)0.7905

Method: binary L = −[y·ln p + (1−y)·ln(1−p)], multi-class L = −ln p(true class), with eps clipping to [eps, 1−eps] — scikit-learn log_loss; PyTorch BCELoss / CrossEntropyLoss. No data leaves this page.

How it works

Cross-entropy loss — also called log loss — measures how far a classifier's predicted probabilities sit from the true labels. It is the negative log-likelihood of the correct class, averaged over the dataset. The definition comes from information theory (Goodfellow, Bengio & Courville, Deep Learning, Ch. 3) and is the loss returned by scikit-learn's log_loss and PyTorch's CrossEntropyLoss.

For binary classification with true label yᵢ ∈ {0, 1} and predicted positive-class probability pᵢ, the per-sample loss is:

Lᵢ = −[ yᵢ·ln(pᵢ) + (1 − yᵢ)·ln(1 − pᵢ) ]

For multi-class classification with K classes and integer true class cᵢ, only the correct class's probability contributes:

Lᵢ = −ln( p(class cᵢ)ᵢ )

  1. Convert, if needed.When you pass logits, a sigmoid (binary) or a numerically-stable softmax (multi-class, subtract the row-max) maps them to probabilities first — the same pipeline as PyTorch's BCEWithLogitsLoss and CrossEntropyLoss.
  2. Clip. Each probability is clipped to [eps, 1 − eps] with eps = 1e-15(scikit-learn's default) so a predicted 0 yields a large finite loss instead of ln 0 = −∞.
  3. Score each sample with the formula above to get the per-sample Lᵢ.
  4. Reduce. Mean — L = (1/N)·Σ Lᵢ — is the standard log-loss metric (scikit-learn default). Sum gives Σ Lᵢ; None returns the per-sample vector unchanged.

Natural log gives the loss in nats (the ML convention, matching PyTorch); base-2 gives bits, where L_bits = L_nats / ln 2. The related perplexity is exp(mean loss in nats), read as the effective number of equally likely classes the model is still unsure between. As a credibility check, the tool also recomputes the mean by the independent product form −ln((∏ p)^(1/N)) and confirms the two agree to floating-point precision. Every step is plain double-precision arithmetic in your browser — nothing is uploaded.

Worked examples

Binary, mean — y = [1, 0, 1, 0], p = [0.9, 0.2, 0.7, 0.1]

  1. Sample 0 (y=1): −ln(0.9) = 0.105361
  2. Sample 1 (y=0): −ln(1 − 0.2) = −ln(0.8) = 0.223144
  3. Sample 2 (y=1): −ln(0.7) = 0.356675
  4. Sample 3 (y=0): −ln(1 − 0.1) = −ln(0.9) = 0.105361
  5. Sum = 0.790541; mean = 0.790541 / 4 = 0.197635 nats
  6. Worst sample = #2 (0.356675); matches sklearn log_loss ≈ 0.1976 ✓

Multi-class, integer labels, mean — true [0, 2, 1]

  1. Rows: [0.7,0.2,0.1], [0.1,0.3,0.6], [0.2,0.5,0.3]
  2. Sample 0 (true 0): −ln(0.7) = 0.356675
  3. Sample 1 (true 2): −ln(0.6) = 0.510826
  4. Sample 2 (true 1): −ln(0.5) = 0.693147
  5. Sum = 1.560648; mean = 1.560648 / 3 = 0.520216 nats
  6. Perplexity = e^0.520216 = 1.682367

Edge case — predicted 0 with epsilon clipping (y = [1], p = [0])

  1. Raw loss would be −ln(0) = +∞ — undefined
  2. Clip p to eps = 1e-15: p(true) = 1e-15
  3. L = −ln(1e-15) = 15 × ln(10) = 34.538776 nats
  4. Finite and large — the model is heavily penalised, not broken
  5. Turn clipping off to see the ∞ behaviour and a warning

Frequently asked questions

Sources & references

The formulas on this page were last cross-checked against these sources on 2026-06-10. Cross-entropy is a stable mathematical definition, so this tool needs no rate or schedule updates — only the worked examples are periodically re-reconciled.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.