How do you calculate KL divergence between two distributions?

For two discrete distributions P and Q over the same categories, KL divergence is D(P‖Q) = Σ pᵢ·log(pᵢ/qᵢ). For each category, divide pᵢ by qᵢ, take the log (base 2 for bits, base e for nats), multiply by pᵢ, then add up every term. A category with pᵢ = 0 contributes nothing.

Why is KL divergence not symmetric (D(P‖Q) ≠ D(Q‖P))?

Because each term is weighted by pᵢ, the distribution you treat as “true.” Swapping P and Q re-weights the same log-ratios by different masses, so the totals differ. For P = [0.1, 0.4, 0.5] and Q = [0.8, 0.15, 0.05], D(P‖Q) = 1.9270 bits but D(Q‖P) = 2.0216 bits. KL is a divergence, not a true distance metric — it fails symmetry and the triangle inequality.

What is the difference between KL divergence and cross-entropy?

Cross-entropy H(P,Q) = −Σ pᵢ·log(qᵢ) is the total cost of encoding P with a code built for Q. KL divergence is the extra cost beyond the unavoidable minimum: D(P‖Q) = H(P,Q) − H(P), where H(P) is P's own entropy. So cross-entropy = entropy + KL divergence. The calculator shows all three and verifies the identity.

Can KL divergence be negative?

No. By Gibbs' inequality, D(P‖Q) ≥ 0 for any valid pair of distributions, and it equals 0 only when P = Q exactly. If you ever see a negative value it means an input was not a proper distribution (for example negative entries, or vectors that were not normalised the same way).

What does a KL divergence of 0 mean?

It means P and Q are identical after normalisation — Q is a perfect model of P, with no extra encoding cost. Try the “Identical → 0” preset: P = Q = [0.25, 0.25, 0.25, 0.25] gives exactly 0 in both bits and nats.

Why does the result show +∞ sometimes?

KL divergence is +∞ when some category has qᵢ = 0 while pᵢ > 0 — Q assigns no probability where P has some, so encoding that outcome costs infinitely many bits. This is the absolute-continuity requirement. Smoothing Q (adding a tiny mass to every category, as in Laplace smoothing) keeps the divergence finite.

Should I use bits or nats?

They measure the same quantity in different units: bits use log base 2, nats use natural log (base e). Information theory and coding usually use bits; machine-learning loss functions (and SciPy's default) use nats. Convert with 1 nat = 1/ln2 ≈ 1.4427 bits. The calculator shows both at once.

Does this match SciPy and PyTorch?

Yes. The nats result equals scipy.stats.entropy(pk, qk) and the sum of scipy.special.rel_entr(pk, qk), using the same 0·log(0/q) = 0 convention and the same auto-normalisation. PyTorch's F.kl_div expects log-probabilities as input but computes the same Σ pᵢ·log(pᵢ/qᵢ) once you account for that.

AI · Information Theory

KL Divergence Calculator

Compute the Kullback–Leibler divergence D(P‖Q) between two discrete distributions, in bits or nats, with the full per-term working. It also shows the reverse divergence D(Q‖P), the cross-entropy H(P,Q) and the entropy H(P) — all in your browser, no signup, matched to SciPy.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 10, 2026

KL divergence calculator

Distribution P (reference / “true”)

Two or more non-negative numbers. Comma, space, or newline separated. Auto-normalised to sum to 1.

Distribution Q (model / approximation)

Same number of categories as P. KL measures the cost of using Q to encode P.

Log base

Auto-normaliseScale each vector to sum 1

Presets

D(P‖Q) — bits

1.9270

Forward divergence

D(Q‖P) — bits

2.0216

Reverse — usually different

Cross-entropy H(P,Q)

3.2879

Entropy H(P) = 1.3610

Per-term working — D(P‖Q)

i	pᵢ	qᵢ	pᵢ/qᵢ	log(₂) pᵢ/qᵢ	pᵢ·log(pᵢ/qᵢ)
1	0.1000	0.8000	0.1250	-3.0000	-0.3000
2	0.4000	0.1500	2.6667	1.4150	0.5660
3	0.5000	0.0500	10.0000	3.3219	1.6610
D(P‖Q) total (bits)					1.9270

Method: D(P‖Q) = Σ pᵢ·log(pᵢ/qᵢ), base-2 → bits, base-e → nats, with 0·log(0/q) = 0 and a +∞ result when qᵢ = 0 while pᵢ > 0 — the SciPy rel_entr / stats.entropy convention. No data leaves this page.

How it works

KL divergence — also called relative entropy or the directed divergence — measures how many extra bits (or nats) you need to encode samples drawn from a distribution P when you use a code built for a different distribution Q. It was introduced by Kullback and Leibler in 1951 and is defined for two discrete distributions over the same set of categories as:

D(P‖Q) = Σᵢ pᵢ · log_b(pᵢ / qᵢ)

The base of the logarithm sets the unit: base 2 gives bits, base e (the natural log) gives nats, and they convert with 1 nat = 1/ln2 ≈ 1.4427 bits. The calculation follows the convention used by Cover & Thomas and by SciPy:

Parse P and Q into numeric arrays. Reject the input if the lengths differ, if either has fewer than two categories, or if any entry is negative — a probability mass cannot be below zero.
If auto-normalise is on, divide each vector by its own sum so that ΣP = ΣQ = 1. This is exactly what scipy.stats.entropy does, and it lets you paste raw counts like 1, 2, 3.
For each category i, compute the term tᵢ = pᵢ·log_b(pᵢ/qᵢ), using the convention 0·log(0/q) = 0 — a category with no mass in P adds nothing.
If any qᵢ = 0 while pᵢ > 0, the divergence is +∞: Q gives no probability to an outcome P expects, which costs infinitely many bits. The tool reports +∞ rather than a silent NaN.
Sum the terms to get D(P‖Q). Swap the roles of P and Q to get the reverse divergence D(Q‖P) — almost always a different number, because KL is asymmetric.

The page also reports the cross-entropy H(P,Q) = −Σ pᵢ·log_b(qᵢ) and the entropy H(P) = −Σ pᵢ·log_b(pᵢ). These satisfy the identity H(P,Q) = H(P) + D(P‖Q), so the calculator independently recomputes the divergence as H(P,Q) − H(P) and shows the two methods reconciling — a built-in correctness check. Two more facts double as sanity signals: KL divergence is always ≥ 0 (Gibbs' inequality), and it equals 0 if and only if P = Q.

Worked examples

Asymmetric divergence (bits)

P = [0.1, 0.4, 0.5], Q = [0.8, 0.15, 0.05], base 2

t₁ = 0.1·log₂(0.1/0.8) = 0.1·log₂(0.125) = 0.1·(−3) = −0.3000
t₂ = 0.4·log₂(0.4/0.15) = 0.4·(1.41504) = 0.56601
t₃ = 0.5·log₂(0.5/0.05) = 0.5·log₂(10) = 0.5·(3.32193) = 1.66096
D(P‖Q) = −0.3000 + 0.56601 + 1.66096 = 1.9270 bits
Reverse: D(Q‖P) = 2.4 − 0.21226 − 0.16610 = 2.0216 bits → asymmetric

Fair vs biased coin (nats)

P = [0.5, 0.5], Q = [0.9, 0.1], base e

0.5·ln(0.5/0.9) = 0.5·(−0.58779) = −0.29389
0.5·ln(0.5/0.1) = 0.5·(1.60944) = 0.80472
D(P‖Q) = −0.29389 + 0.80472 = 0.5108 nats
= 0.5108 / ln2 = 0.7370 bits
Matches scipy.stats.entropy([0.5,0.5],[0.9,0.1]) = 0.5108

Zero in Q where P > 0 (edge case)

P = [0.5, 0.5], Q = [1, 0], base 2

t₁ = 0.5·log₂(0.5/1) = 0.5·(−1) = −0.5
t₂ = 0.5·log₂(0.5/0) = 0.5·log₂(∞) = +∞
D(P‖Q) = +∞ — Q assigns no mass to category 2 but P does
Fix: smooth Q (give every category a tiny mass) to keep it finite

Frequently asked questions

Sources & references

The formulas and conventions on this page were last cross-checked against SciPy and Cover & Thomas on 2026-06-10. Worked examples reproduce to the displayed precision against scipy.stats.entropy.

Related tools

LiveAI

Cosine Similarity Calc

Compute the cosine similarity, cosine distance, and angle between two numeric vectors or two short texts, with the full dot-product and magnitude working. Matches scikit-learn, runs entirely in the browser.

Open tool

LiveAI

Cross-Entropy Loss Calc

Compute cross-entropy (log) loss for binary and multi-class classification from labels and predicted probabilities or logits. Shows per-sample loss, the mean log-loss metric, perplexity and full step-by-step working — matches scikit-learn log_loss and PyTorch CrossEntropyLoss, entirely in the browser.

Open tool

LiveAI

Euclidean Distance Calc

Compute the Euclidean (L2) distance between two points or two numeric vectors of any dimension, with the full per-dimension working. Also shows the squared Euclidean, Manhattan (L1), and Chebyshev (L∞) distances, and matches scikit-learn and NumPy — entirely in your browser.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.