induwara.lk
induwara.lkAI · Information Theory

Shannon Entropy & Information Gain Calculator

Compute Shannon entropy H(X) for any discrete distribution — from counts or probabilities — in bits, nats, or dits, with the full per-term working, the maximum entropy and efficiency. Flip on information-gain mode to reconcile a decision-tree split step by step. All in your browser, no signup, matched to SciPy.

By Induwara AshinsanaUpdated Jun 14, 2026
Shannon entropy calculator

1,000 outcomes max. Comma, space, or newline separated. Counts are normalised to probabilities.

Input as
Log base
Presets
Entropy H(X) — bits
0.9403
Average uncertainty
Max entropy — bits
1.0000
log₂(n), n = 2
Efficiency (H / Hmax)
94.03%
1 = uniform, 0 = certain
Outcomes (n)
2
Total mass 14
Decimals
H(X) in every base0.9403 bits0.6518 nats0.2831 dits(1 nat = 1/ln2 ≈ 1.4427 bits)

Cross-check. Summing the per-term contributions gives 0.9403 bits; the independent counts-form identity logₐ(N) − (1/N)·Σ cᵢ·logₐ(cᵢ) gives 0.9403 bits. They reconcile, as they must — two groupings of the same sum.

Per-term working — H(X)

ivaluepᵢ−log(pᵢ)−pᵢ·log(pᵢ)
19.00000.64290.63740.4098
25.00000.35711.48540.5305
H(X) total (bits)0.9403

Method: H(X) = −Σ pᵢ·logₐ(pᵢ), base 2 → bits, base e → nats, base 10 → dits, with 0·log0 = 0. Information gain uses Gain = H(S) − Σ (|Sᵥ|/|S|)·H(Sᵥ) (Quinlan 1986). No data leaves this page.

How it works

Shannon entropy, introduced by Claude Shannon in his 1948 paper A Mathematical Theory of Communication, measures the average uncertainty — equivalently, the average information content — of a discrete random source. For a distribution over n outcomes with probabilities pᵢ it is:

H(X) = −Σᵢ pᵢ · logₐ(pᵢ)

The base a of the logarithm sets the unit: base 2 gives bits (Shannon's original choice), base e gives nats, and base 10 gives dits. They convert with 1 nat = 1/ln2 ≈ 1.4427 bits. The calculation follows the conventions used by Shannon and by SciPy:

  1. Parse your numbers and reject anything invalid — a negative count or probability, a non-number, or an all-zero distribution — with a specific message rather than a silent NaN.
  2. Normalise to probabilities: divide each value by the total so they sum to 1. This is exactly what scipy.stats.entropy does, and it lets you paste raw counts like 9, 5 directly.
  3. For each outcome compute the term tᵢ = −pᵢ·logₐ(pᵢ), using the convention 0·log0 = 0 so a zero-probability outcome contributes nothing.
  4. Sum the terms to get H(X). The maximum entropy is Hmax = logₐ(n), reached by the uniform distribution, and efficiency = H/Hmax places your distribution on a 0-to-1 scale (1 = uniform, 0 = certain).

As a built-in correctness check the tool also computes entropy a second, algebraically-independent way — the counts form H = logₐ(N) − (1/N)·Σ cᵢ·logₐ(cᵢ) — and shows the two reconciling to floating-point precision.

For decision trees, the information-gain panel uses Quinlan's 1986 ID3 definition. Given a parent set S split by an attribute A into children S₁…Sₖ:

Gain(S, A) = H(S) − Σᵥ (|Sᵥ| / |S|) · H(Sᵥ)

You enter each child's class counts; the parent distribution is the column-wise sum of the children, so the split partitions S exactly. The tool returns the parent entropy, every child entropy and its weight |Sᵥ|/|S|, the weighted child entropy, and the gain — the same impurity drop that scikit-learn's entropy criterion uses to pick splits.

Worked examples

Biased coin (bits)

p = [0.9, 0.1], base 2

  1. t₀ = −0.9·log₂(0.9) = −0.9·(−0.152003) = 0.136803
  2. t₁ = −0.1·log₂(0.1) = −0.1·(−3.321928) = 0.332193
  3. H = 0.136803 + 0.332193 = 0.4690 bits
  4. Hmax = log₂2 = 1, efficiency = 0.4690 → far from uniform

Fair 4-sided die — maximum entropy (bits)

counts = [1, 1, 1, 1] → p = [0.25]×4

  1. each tᵢ = −0.25·log₂(0.25) = −0.25·(−2) = 0.5
  2. H = 4 × 0.5 = 2.0000 bits
  3. Hmax = log₂4 = 2 → H = Hmax, efficiency = 1.0 (uniform)
  4. Cross-check (counts form): log₂4 − (1/4)·Σ(1·log₂1) = 2 − 0 = 2.0000 ✓

Information gain — ID3 “Play Tennis” / Outlook (bits)

children Sunny [2,3], Overcast [4,0], Rain [3,2]

  1. Parent = column sums = [9, 5], |S| = 14
  2. H(S) = −(9/14·log₂9/14 + 5/14·log₂5/14) = 0.9403 bits
  3. H(Sunny[2,3]) = 0.9710; H(Overcast[4,0]) = 0 (pure); H(Rain[3,2]) = 0.9710
  4. Weighted = (5/14)·0.9710 + (4/14)·0 + (5/14)·0.9710 = 0.6935
  5. Gain(Outlook) = 0.9403 − 0.6935 = 0.2467 bits — matches the textbook

Frequently asked questions

Sources & references

The formulas and conventions on this page were last cross-checked against Shannon's paper, SciPy, and the ID3 definition on 2026-06-10. Worked examples reproduce to the displayed precision against scipy.stats.entropy and the canonical decision-tree textbook values.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.