KL Divergence Calculator
Compute the Kullback–Leibler divergence D(P‖Q) between two discrete distributions, in bits or nats, with the full per-term working. It also shows the reverse divergence D(Q‖P), the cross-entropy H(P,Q) and the entropy H(P) — all in your browser, no signup, matched to SciPy.
How it works
KL divergence — also called relative entropy or the directed divergence — measures how many extra bits (or nats) you need to encode samples drawn from a distribution P when you use a code built for a different distribution Q. It was introduced by Kullback and Leibler in 1951 and is defined for two discrete distributions over the same set of categories as:
D(P‖Q) = Σᵢ pᵢ · log_b(pᵢ / qᵢ)
The base of the logarithm sets the unit: base 2 gives bits, base e (the natural log) gives nats, and they convert with 1 nat = 1/ln2 ≈ 1.4427 bits. The calculation follows the convention used by Cover & Thomas and by SciPy:
- Parse P and Q into numeric arrays. Reject the input if the lengths differ, if either has fewer than two categories, or if any entry is negative — a probability mass cannot be below zero.
- If auto-normalise is on, divide each vector by its own sum so that ΣP = ΣQ = 1. This is exactly what
scipy.stats.entropydoes, and it lets you paste raw counts like 1, 2, 3. - For each category i, compute the term tᵢ = pᵢ·log_b(pᵢ/qᵢ), using the convention 0·log(0/q) = 0 — a category with no mass in P adds nothing.
- If any qᵢ = 0 while pᵢ > 0, the divergence is +∞: Q gives no probability to an outcome P expects, which costs infinitely many bits. The tool reports +∞ rather than a silent NaN.
- Sum the terms to get D(P‖Q). Swap the roles of P and Q to get the reverse divergence D(Q‖P) — almost always a different number, because KL is asymmetric.
The page also reports the cross-entropy H(P,Q) = −Σ pᵢ·log_b(qᵢ) and the entropy H(P) = −Σ pᵢ·log_b(pᵢ). These satisfy the identity H(P,Q) = H(P) + D(P‖Q), so the calculator independently recomputes the divergence as H(P,Q) − H(P) and shows the two methods reconciling — a built-in correctness check. Two more facts double as sanity signals: KL divergence is always ≥ 0 (Gibbs' inequality), and it equals 0 if and only if P = Q.
Worked examples
Frequently asked questions
Sources & references
- Kullback & Leibler (1951), “On Information and Sufficiency”, Ann. Math. Stat. 22(1):79–86
- Cover & Thomas, Elements of Information Theory (2nd ed., 2006), ch. 2
- SciPy — scipy.stats.entropy(pk, qk) (reference implementation, base e default)
- SciPy — scipy.special.rel_entr (per-term relative entropy, 0·log(0/q)=0 convention)
The formulas and conventions on this page were last cross-checked against SciPy and Cover & Thomas on 2026-06-10. Worked examples reproduce to the displayed precision against scipy.stats.entropy.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, edge case, or want to suggest an improvement?
Email me at [email protected] — most fixes ship within 24 hours.