How do you calculate Shannon entropy by hand?

Turn your counts into probabilities (pᵢ = countᵢ ÷ total). For each outcome compute −pᵢ·log₂(pᵢ); a zero-probability outcome contributes 0. Add up all the terms — that sum is the entropy in bits. For [9, 5]: p = [0.643, 0.357], H = −(0.643·log₂0.643 + 0.357·log₂0.357) = 0.940 bits.

What is the difference between entropy in bits and nats?

They measure the same uncertainty in different units: bits use log base 2, nats use the natural log (base e), and dits use base 10. Information theory and coding usually use bits; machine-learning loss functions and SciPy default to nats. Convert with 1 nat = 1/ln2 ≈ 1.4427 bits. This calculator shows all three at once.

What is the maximum possible entropy for n outcomes?

The maximum is logₐ(n), reached only by the uniform distribution where every outcome is equally likely. In bits that is log₂(n): 1 bit for a fair coin (n=2), 2 bits for a fair 4-sided die, log₂(6) ≈ 2.585 bits for a fair 6-sided die. Efficiency (H ÷ Hmax) tells you how close your distribution is to that maximum.

How do you calculate information gain in a decision tree?

Information gain is the parent set's entropy minus the weighted average entropy of the children after a split: Gain(S, A) = H(S) − Σ (|Sᵥ|/|S|)·H(Sᵥ). Turn on the information-gain panel, paste each branch's class counts, and the tool returns the parent entropy, each child entropy and weight, and the gain — the ID3/C4.5 attribute-selection metric.

Why is the entropy of a pure (single-class) set zero?

Entropy measures uncertainty. If one outcome has probability 1 and the rest 0, there is nothing to be uncertain about — you always know the result, so it carries no information. Mathematically the single non-zero term is −1·log(1) = 0 and every other term is 0·log0 = 0, so H = 0. Try the “Pure set → 0” preset.

Does this match scipy.stats.entropy and sklearn?

Yes. In nats mode the entropy equals scipy.stats.entropy(pk) with its default base e, using the same normalise-to-sum-1 and 0·log0 = 0 conventions. The information-gain entropies match the impurity sklearn's DecisionTreeClassifier(criterion='entropy') computes per node, so you can reconcile its feature splits.

Do my probabilities need to add up to exactly 1?

No. In counts mode the values are divided by their total automatically. In probabilities mode, if your numbers don't sum to 1 they are renormalised (and you'll see a notice), exactly as scipy.stats.entropy does. Entropy is scale-invariant, so [9, 5] and [0.643, 0.357] give the same answer.

What does efficiency (normalised entropy) tell me?

Efficiency is H ÷ Hmax, a number from 0 to 1. A value of 1 means a perfectly uniform, maximally uncertain source; 0 means a certain, single-outcome source. Redundancy is 1 − efficiency. It's a quick way to compare distributions over different alphabet sizes on the same scale.

AI · Information Theory

Shannon Entropy & Information Gain Calculator

Compute Shannon entropy H(X) for any discrete distribution — from counts or probabilities — in bits, nats, or dits, with the full per-term working, the maximum entropy and efficiency. Flip on information-gain mode to reconcile a decision-tree split step by step. All in your browser, no signup, matched to SciPy.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 14, 2026

Shannon entropy calculator

Distribution

1,000 outcomes max. Comma, space, or newline separated. Counts are normalised to probabilities.

Input as

Log base

Presets

Entropy H(X) — bits

0.9403

Average uncertainty

Max entropy — bits

1.0000

log₂(n), n = 2

Efficiency (H / Hmax)

94.03%

1 = uniform, 0 = certain

Outcomes (n)

Total mass 14

Decimals

H(X) in every base0.9403 bits0.6518 nats0.2831 dits(1 nat = 1/ln2 ≈ 1.4427 bits)

Cross-check. Summing the per-term contributions gives 0.9403 bits; the independent counts-form identity logₐ(N) − (1/N)·Σ cᵢ·logₐ(cᵢ) gives 0.9403 bits. They reconcile, as they must — two groupings of the same sum.

Per-term working — H(X)

i	value	pᵢ	−log₂(pᵢ)	−pᵢ·log₂(pᵢ)
1	9.0000	0.6429	0.6374	0.4098
2	5.0000	0.3571	1.4854	0.5305
H(X) total (bits)				0.9403

Compute decision-tree information gain (ID3)

Method: H(X) = −Σ pᵢ·logₐ(pᵢ), base 2 → bits, base e → nats, base 10 → dits, with 0·log0 = 0. Information gain uses Gain = H(S) − Σ (|Sᵥ|/|S|)·H(Sᵥ) (Quinlan 1986). No data leaves this page.

How it works

Shannon entropy, introduced by Claude Shannon in his 1948 paper A Mathematical Theory of Communication, measures the average uncertainty — equivalently, the average information content — of a discrete random source. For a distribution over n outcomes with probabilities pᵢ it is:

H(X) = −Σᵢ pᵢ · logₐ(pᵢ)

The base a of the logarithm sets the unit: base 2 gives bits (Shannon's original choice), base e gives nats, and base 10 gives dits. They convert with 1 nat = 1/ln2 ≈ 1.4427 bits. The calculation follows the conventions used by Shannon and by SciPy:

Parse your numbers and reject anything invalid — a negative count or probability, a non-number, or an all-zero distribution — with a specific message rather than a silent NaN.
Normalise to probabilities: divide each value by the total so they sum to 1. This is exactly what scipy.stats.entropy does, and it lets you paste raw counts like 9, 5 directly.
For each outcome compute the term tᵢ = −pᵢ·logₐ(pᵢ), using the convention 0·log0 = 0 so a zero-probability outcome contributes nothing.
Sum the terms to get H(X). The maximum entropy is Hmax = logₐ(n), reached by the uniform distribution, and efficiency = H/Hmax places your distribution on a 0-to-1 scale (1 = uniform, 0 = certain).

As a built-in correctness check the tool also computes entropy a second, algebraically-independent way — the counts form H = logₐ(N) − (1/N)·Σ cᵢ·logₐ(cᵢ) — and shows the two reconciling to floating-point precision.

For decision trees, the information-gain panel uses Quinlan's 1986 ID3 definition. Given a parent set S split by an attribute A into children S₁…Sₖ:

Gain(S, A) = H(S) − Σᵥ (|Sᵥ| / |S|) · H(Sᵥ)

You enter each child's class counts; the parent distribution is the column-wise sum of the children, so the split partitions S exactly. The tool returns the parent entropy, every child entropy and its weight |Sᵥ|/|S|, the weighted child entropy, and the gain — the same impurity drop that scikit-learn's entropy criterion uses to pick splits.

Worked examples

Biased coin (bits)

p = [0.9, 0.1], base 2

t₀ = −0.9·log₂(0.9) = −0.9·(−0.152003) = 0.136803
t₁ = −0.1·log₂(0.1) = −0.1·(−3.321928) = 0.332193
H = 0.136803 + 0.332193 = 0.4690 bits
Hmax = log₂2 = 1, efficiency = 0.4690 → far from uniform

Fair 4-sided die — maximum entropy (bits)

counts = [1, 1, 1, 1] → p = [0.25]×4

each tᵢ = −0.25·log₂(0.25) = −0.25·(−2) = 0.5
H = 4 × 0.5 = 2.0000 bits
Hmax = log₂4 = 2 → H = Hmax, efficiency = 1.0 (uniform)
Cross-check (counts form): log₂4 − (1/4)·Σ(1·log₂1) = 2 − 0 = 2.0000 ✓

Information gain — ID3 “Play Tennis” / Outlook (bits)

children Sunny [2,3], Overcast [4,0], Rain [3,2]

Parent = column sums = [9, 5], |S| = 14
H(S) = −(9/14·log₂9/14 + 5/14·log₂5/14) = 0.9403 bits
H(Sunny[2,3]) = 0.9710; H(Overcast[4,0]) = 0 (pure); H(Rain[3,2]) = 0.9710
Weighted = (5/14)·0.9710 + (4/14)·0 + (5/14)·0.9710 = 0.6935
Gain(Outlook) = 0.9403 − 0.6935 = 0.2467 bits — matches the textbook

Frequently asked questions

Sources & references

The formulas and conventions on this page were last cross-checked against Shannon's paper, SciPy, and the ID3 definition on 2026-06-10. Worked examples reproduce to the displayed precision against scipy.stats.entropy and the canonical decision-tree textbook values.

Related tools

LiveAI

Cross-Entropy Loss Calc

Compute cross-entropy (log) loss for binary and multi-class classification from labels and predicted probabilities or logits. Shows per-sample loss, the mean log-loss metric, perplexity and full step-by-step working — matches scikit-learn log_loss and PyTorch CrossEntropyLoss, entirely in the browser.

Open tool

LiveAI

Gini Impurity Calculator

Compute the Gini impurity of a decision-tree node from class counts or proportions, with the full 1 − Σ pₖ² working, a Shannon-entropy comparison, and the Gini gain of a candidate split. Matches scikit-learn, runs in the browser.

Open tool

LiveAI

Perplexity Calculator

Compute language-model perplexity from token probabilities, cross-entropy loss, or log-likelihood, with nats and bits-per-token conversions. Step-by-step, matches PyTorch, runs entirely in the browser.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.