induwara.lk
induwara.lkAI · Information Theory

KL Divergence Calculator

Compute the Kullback–Leibler divergence D(P‖Q) between two discrete distributions, in bits or nats, with the full per-term working. It also shows the reverse divergence D(Q‖P), the cross-entropy H(P,Q) and the entropy H(P) — all in your browser, no signup, matched to SciPy.

By Induwara AshinsanaUpdated Jun 10, 2026
KL divergence calculator

Two or more non-negative numbers. Comma, space, or newline separated. Auto-normalised to sum to 1.

Same number of categories as P. KL measures the cost of using Q to encode P.

Log base
Auto-normalise
Presets
D(P‖Q) — bits
1.9270
Forward divergence
D(Q‖P) — bits
2.0216
Reverse — usually different
Cross-entropy H(P,Q)
3.2879
Entropy H(P) = 1.3610
Categories
3
Sums to 1 as entered
Decimals
D(P‖Q) in both bases1.9270 bits1.3357 nats(1 nat = 1/ln2 ≈ 1.4427 bits)

Cross-check. Summing the per-term divergence gives 1.9270 bits; the independent identity H(P,Q) − H(P) = 3.28791.3610 = 1.9270. They reconcile, as they must.

Per-term working — D(P‖Q)

ipᵢqᵢpᵢ/qᵢlog() pᵢ/qᵢpᵢ·log(pᵢ/qᵢ)
10.10000.80000.1250-3.0000-0.3000
20.40000.15002.66671.41500.5660
30.50000.050010.00003.32191.6610
D(P‖Q) total (bits)1.9270

Method: D(P‖Q) = Σ pᵢ·log(pᵢ/qᵢ), base-2 → bits, base-e → nats, with 0·log(0/q) = 0 and a +∞ result when qᵢ = 0 while pᵢ > 0 — the SciPy rel_entr / stats.entropy convention. No data leaves this page.

How it works

KL divergence — also called relative entropy or the directed divergence — measures how many extra bits (or nats) you need to encode samples drawn from a distribution P when you use a code built for a different distribution Q. It was introduced by Kullback and Leibler in 1951 and is defined for two discrete distributions over the same set of categories as:

D(P‖Q) = Σᵢ pᵢ · log_b(pᵢ / qᵢ)

The base of the logarithm sets the unit: base 2 gives bits, base e (the natural log) gives nats, and they convert with 1 nat = 1/ln2 ≈ 1.4427 bits. The calculation follows the convention used by Cover & Thomas and by SciPy:

  1. Parse P and Q into numeric arrays. Reject the input if the lengths differ, if either has fewer than two categories, or if any entry is negative — a probability mass cannot be below zero.
  2. If auto-normalise is on, divide each vector by its own sum so that ΣP = ΣQ = 1. This is exactly what scipy.stats.entropy does, and it lets you paste raw counts like 1, 2, 3.
  3. For each category i, compute the term tᵢ = pᵢ·log_b(pᵢ/qᵢ), using the convention 0·log(0/q) = 0 — a category with no mass in P adds nothing.
  4. If any qᵢ = 0 while pᵢ > 0, the divergence is +∞: Q gives no probability to an outcome P expects, which costs infinitely many bits. The tool reports +∞ rather than a silent NaN.
  5. Sum the terms to get D(P‖Q). Swap the roles of P and Q to get the reverse divergence D(Q‖P) — almost always a different number, because KL is asymmetric.

The page also reports the cross-entropy H(P,Q) = −Σ pᵢ·log_b(qᵢ) and the entropy H(P) = −Σ pᵢ·log_b(pᵢ). These satisfy the identity H(P,Q) = H(P) + D(P‖Q), so the calculator independently recomputes the divergence as H(P,Q) − H(P) and shows the two methods reconciling — a built-in correctness check. Two more facts double as sanity signals: KL divergence is always ≥ 0 (Gibbs' inequality), and it equals 0 if and only if P = Q.

Worked examples

Asymmetric divergence (bits)

P = [0.1, 0.4, 0.5], Q = [0.8, 0.15, 0.05], base 2

  1. t₁ = 0.1·log₂(0.1/0.8) = 0.1·log₂(0.125) = 0.1·(−3) = −0.3000
  2. t₂ = 0.4·log₂(0.4/0.15) = 0.4·(1.41504) = 0.56601
  3. t₃ = 0.5·log₂(0.5/0.05) = 0.5·log₂(10) = 0.5·(3.32193) = 1.66096
  4. D(P‖Q) = −0.3000 + 0.56601 + 1.66096 = 1.9270 bits
  5. Reverse: D(Q‖P) = 2.4 − 0.21226 − 0.16610 = 2.0216 bits → asymmetric

Fair vs biased coin (nats)

P = [0.5, 0.5], Q = [0.9, 0.1], base e

  1. 0.5·ln(0.5/0.9) = 0.5·(−0.58779) = −0.29389
  2. 0.5·ln(0.5/0.1) = 0.5·(1.60944) = 0.80472
  3. D(P‖Q) = −0.29389 + 0.80472 = 0.5108 nats
  4. = 0.5108 / ln2 = 0.7370 bits
  5. Matches scipy.stats.entropy([0.5,0.5],[0.9,0.1]) = 0.5108

Zero in Q where P > 0 (edge case)

P = [0.5, 0.5], Q = [1, 0], base 2

  1. t₁ = 0.5·log₂(0.5/1) = 0.5·(−1) = −0.5
  2. t₂ = 0.5·log₂(0.5/0) = 0.5·log₂(∞) = +∞
  3. D(P‖Q) = +∞ — Q assigns no mass to category 2 but P does
  4. Fix: smooth Q (give every category a tiny mass) to keep it finite

Frequently asked questions

Sources & references

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.