Shannon Entropy & Information Gain Calculator
Compute Shannon entropy H(X) for any discrete distribution — from counts or probabilities — in bits, nats, or dits, with the full per-term working, the maximum entropy and efficiency. Flip on information-gain mode to reconcile a decision-tree split step by step. All in your browser, no signup, matched to SciPy.
How it works
Shannon entropy, introduced by Claude Shannon in his 1948 paper A Mathematical Theory of Communication, measures the average uncertainty — equivalently, the average information content — of a discrete random source. For a distribution over n outcomes with probabilities pᵢ it is:
H(X) = −Σᵢ pᵢ · logₐ(pᵢ)
The base a of the logarithm sets the unit: base 2 gives bits (Shannon's original choice), base e gives nats, and base 10 gives dits. They convert with 1 nat = 1/ln2 ≈ 1.4427 bits. The calculation follows the conventions used by Shannon and by SciPy:
- Parse your numbers and reject anything invalid — a negative count or probability, a non-number, or an all-zero distribution — with a specific message rather than a silent NaN.
- Normalise to probabilities: divide each value by the total so they sum to 1. This is exactly what
scipy.stats.entropydoes, and it lets you paste raw counts like 9, 5 directly. - For each outcome compute the term tᵢ = −pᵢ·logₐ(pᵢ), using the convention 0·log0 = 0 so a zero-probability outcome contributes nothing.
- Sum the terms to get H(X). The maximum entropy is Hmax = logₐ(n), reached by the uniform distribution, and efficiency = H/Hmax places your distribution on a 0-to-1 scale (1 = uniform, 0 = certain).
As a built-in correctness check the tool also computes entropy a second, algebraically-independent way — the counts form H = logₐ(N) − (1/N)·Σ cᵢ·logₐ(cᵢ) — and shows the two reconciling to floating-point precision.
For decision trees, the information-gain panel uses Quinlan's 1986 ID3 definition. Given a parent set S split by an attribute A into children S₁…Sₖ:
Gain(S, A) = H(S) − Σᵥ (|Sᵥ| / |S|) · H(Sᵥ)
You enter each child's class counts; the parent distribution is the column-wise sum of the children, so the split partitions S exactly. The tool returns the parent entropy, every child entropy and its weight |Sᵥ|/|S|, the weighted child entropy, and the gain — the same impurity drop that scikit-learn's entropy criterion uses to pick splits.
Worked examples
Frequently asked questions
Sources & references
- C. E. Shannon (1948), “A Mathematical Theory of Communication”, Bell System Technical Journal 27
- SciPy — scipy.stats.entropy (reference implementation, normalise to 1, base e default)
- J. R. Quinlan (1986), “Induction of Decision Trees”, Machine Learning 1(1):81–106 — ID3 information gain
- scikit-learn — decision-tree entropy criterion (mathematical formulation)
The formulas and conventions on this page were last cross-checked against Shannon's paper, SciPy, and the ID3 definition on 2026-06-10. Worked examples reproduce to the displayed precision against scipy.stats.entropy and the canonical decision-tree textbook values.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, edge case, or want to suggest an improvement?
Email me at [email protected] — most fixes ship within 24 hours.