induwara.lk
induwara.lkAI · Machine learning

Perplexity Calculator

Compute language-model perplexity in your browser — from a list of token probabilities, a cross-entropy / NLL loss, or a total log-likelihood. See the cross-entropy in nats and bits-per-token, the average token probability, and the exact formula behind every result.

By Induwara AshinsanaUpdated Jun 10, 2026
Perplexity calculator

The model's probability for each observed token, each in (0, 1]. Separate with commas, spaces, or new lines.

Examples
Perplexity (PP)
2.8284
Lower is better — 1 is a perfect model
Cross-entropy (nats)
1.0397
ln PP — matches PyTorch loss
Bits per token
1.5000
log₂ PP
Avg token probability
0.3536
1 / PP
Decimals

Formula used

PP = exp( −(1/N) · Σ ln pᵢ )

On average the model is as uncertain as choosing uniformly among 2.83 equally likely tokens, over N = 4 tokens.

Cross-check. The exponential form gives PP = 2.8284; the independent product form (∏pᵢ)^(−1/N) gives 2.8284. They reconcile, as they must. (Shown for up to 50tokens, where the raw product doesn't underflow.)

Per-token contributions

TokenProbability pᵢln pᵢ
#10.5000-0.6931
#20.2500-1.3863
#30.2500-1.3863
#40.5000-0.6931
Σ ln pᵢ-4.1589

Method: PP = exp(−(1/N)·Σ ln pᵢ) = exp(H_nats) = 2^(H_bits), with H_bits = H_nats / ln 2 — Jurafsky & Martin, Speech and Language Processing(3rd ed.) Ch. 3; Hugging Face perplexity guide; PyTorch CrossEntropyLoss. No data leaves this page.

How it works

Perplexitymeasures how well a probability model predicts a sample of text: it is the model's average uncertainty per token, read as the number of equally likely options it is effectively choosing between. Lower is better. The definition comes from Jurafsky & Martin's Speech and Language Processing, Chapter 3.

For a test sequence of N tokens, where the model assigns probability pᵢ to the i-th observed token in context, perplexity is the inverse geometric mean of those probabilities:

PP = ( ∏ pᵢ )^(−1/N) = exp( −(1/N) · Σ ln pᵢ )

The exponent −(1/N)·Σ ln pᵢ is the average cross-entropy (equivalently, the mean negative log-likelihood) H, in nats. So perplexity is simply the exponential of the cross-entropy, and the two are interchangeable:

  1. From probabilities. Sum the natural logs of the per-token probabilities, average and negate to get H = −(1/N)·Σ ln pᵢ, then PP = exp(H). If you enter log-probabilities directly, the logs are already taken.
  2. From loss. An average cross-entropy / NLL loss already is H. In nats — PyTorch CrossEntropyLoss, TensorFlow — PP = exp(loss). In bits, PP = 2^loss.
  3. From log-likelihood. Given a total Σ log P and token count N, the per-token cross-entropy is H = −(Σ log P)/N in the chosen unit, and PP is its exponential (base e for nats, base 2 for bits).

Units convert with H_bits = H_nats / ln 2, so log₂ PP is the bits-per-token figure and 1/PP = exp(−H_nats) is the average per-token probability. All three input modes converge on the same (PP, nats, bits) triple, which is why the tool can cross-check a probabilities-mode result against the independent product form (∏ pᵢ)^(−1/N) and have them agree to floating-point precision. Probabilities of 0 or below are rejected, because ln 0 = −∞ would send perplexity to infinity. Everything is plain double-precision arithmetic in your browser.

Worked examples

From token probabilities — p = [0.5, 0.25, 0.25, 0.5], N = 4

  1. Σ ln p = −0.693147 − 1.386294 − 1.386294 − 0.693147 = −4.158883
  2. H (nats) = 4.158883 / 4 = 1.039721
  3. PP = exp(1.039721) = 2.828427
  4. Cross-check: ∏p = 0.015625, 0.015625^(−1/4) = 64^(1/4) = 2.828427 ✓
  5. Bits/token = 1.039721 / ln 2 = 1.5; avg token prob = 1/2.828427 = 0.353553

From cross-entropy loss — PyTorch CrossEntropyLoss = 2.3 (nats)

  1. Loss is already the per-token cross-entropy H = 2.3 nats
  2. PP = e^2.3 = 9.974182
  3. Bits/token = 2.3 / ln 2 = 3.318137
  4. Avg token prob = 1 / 9.974182 = 0.100259
  5. Same input as bits: 3.318137 bits → 2^3.318137 = 9.974182 ✓

From log-likelihood in bits — total log₂P = −9000 over N = 1000

  1. H (bits) = −(−9000) / 1000 = 9 bits per token
  2. H (nats) = 9 × ln 2 = 6.238325
  3. PP = 2^9 = exp(6.238325) = 512
  4. Equivalent to a uniform model over a 512-token vocabulary
  5. Avg token prob = 1 / 512 = 0.001953 ✓

Frequently asked questions

Sources & references

The formulas on this page were last cross-checked against these sources on 2026-06-10. Perplexity is a stable mathematical definition, so this tool needs no rate or schedule updates — only the worked examples are periodically re-reconciled.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.