induwara.lk
induwara.lkAI · Machine learning

Softmax Calculator (with Temperature)

Turn raw logits into probabilities in your browser. Paste your scores, adjust the temperature, and read each class probability, the predicted argmax, the entropy, and a step-by-step, numerically-stable breakdown — verified against PyTorch.

By Induwara AshinsanaUpdated Jun 10, 2026
Softmax calculator

Separate values with commas, spaces, or new lines. Up to 100 numbers; negatives and decimals are fine.

1.00

T > 1 flattens the distribution; T < 1 sharpens it. Range 0.15.

Comma-separated; must match the number of logits.

Examples
Predicted class (argmax)
Class 1
p = 0.6590
Probability sum
1.0000
Always 1 — that's what softmax guarantees
Entropy
1.222 bits
Max 1.585 (uniform)
Sharpness
23%
Spread — the classes are close to each other.

Distribution

Class 1
0.6590
Class 2
0.2424
Class 3
0.0986

Per-class breakdown

Decimals
ClassLogit zz / Texp(z/T − max)Probability%
Class 1221.00000.659065.90%
Class 2110.36790.242424.24%
Class 30.10.10.14960.09869.86%
Sum1.0000100.00%

Step-by-step

  1. 1. Scale by T = 1.00:  sᵢ = zᵢ / 1.00
  2. 2. Subtract max(s) = 2 from every sᵢ (stability shift)
  3. 3. Exponentiate each shifted score: eᵢ = exp(sᵢ − max)
  4. 4. Denominator: Σ eᵢ = 1.5174
  5. 5. Divide: pᵢ = eᵢ / 1.5174 argmax = Class 1 (0.6590)
Copy

Method: temperature-scaled, max-shifted softmax — Goodfellow, Bengio & Courville, Deep Learning(2016) §4.1 & §6.2.2; temperature per Hinton et al. (2015). Entropy is Shannon H = −Σ pᵢ log₂ pᵢ. Everything runs in your browser — no data leaves this page.

How it works

The softmax function converts a vector of real-valued scores — called logits— into a probability distribution: every output is positive and the outputs sum to 1. It is the last layer of almost every classification neural network, and the function behind the temperature knob in modern language-model sampling. This calculator implements the standard, numerically-stable definition from Goodfellow, Bengio & Courville's Deep Learning (MIT Press, 2016).

For logits z = [z₁ … zₙ] and a temperature T, it runs these steps:

  1. Temperature scaling. Each logit is divided by the temperature: sᵢ = zᵢ / T. At T = 1 this is the plain softmax (Hinton, Vinyals & Dean, 2015).
  2. Stability shift. Subtract the maximum scaled score, dᵢ = sᵢ − max(s). This stops exp() from overflowing on large inputs. The constant cancels in the next division, so the answer is unchanged (Deep Learning §4.1).
  3. Exponentiate. eᵢ = exp(dᵢ). Because every shifted score is ≤ 0, each exponential lands in (0, 1].
  4. Normalise. pᵢ = eᵢ / Σⱼ eⱼ. This is the softmax probability for class i (Deep Learning §6.2.2; PyTorch softmax reference).

The predicted class is the argmax — the index with the highest probability, which is temperature-invariant because dividing by a positive T never reorders the scores. For a sharpness readout the tool also computes the Shannon entropy H = −Σ pᵢ log₂ pᵢ in bits, whose maximum of log₂ n marks a perfectly uniform distribution. To prove correctness the module cross-checks the stable result against the naive (no-shift) softmax: on well-conditioned inputs the two agree to roughly 1e-15, and on extreme inputs only the stable path survives — which is exactly why the shift exists. Everything is plain double-precision arithmetic; nothing is sent to a server.

Worked examples

Textbook vector — [2.0, 1.0, 0.1], T = 1

  1. Subtract max (2.0): [0, −1.0, −1.9]
  2. Exponentiate: [1, 0.367879, 0.149569]
  3. Sum = 1.517448
  4. Divide: [0.6590, 0.2424, 0.0986]
  5. Sum check: 0.6590 + 0.2424 + 0.0986 = 1.0000 ✓
  6. Argmax = Class 1 (0.6590) — matches PyTorch softmax([2,1,0.1])

Same logits, T = 2 — temperature flattens it

  1. Scale by T: [1.0, 0.5, 0.05]
  2. Subtract max (1.0): [0, −0.5, −0.95]
  3. Exponentiate: [1, 0.606531, 0.386741]
  4. Sum = 1.993272
  5. Divide: [0.5017, 0.3043, 0.1940] (sum 1.0000 ✓)
  6. Top class dropped 0.659 → 0.502: higher T = more uniform

Numerical-stability edge case — [1e9, 0], T = 1

  1. Naive softmax: exp(1e9) = Infinity → NaN (broken)
  2. Stable path subtracts max (1e9): [0, −1e9]
  3. Exponentiate: [1, 0] (exp(−1e9) underflows to 0)
  4. Divide: [1, 0] (sum 1.0000 ✓)
  5. This is why production softmax always subtracts the max

Frequently asked questions

Sources & references

The formulas on this page were last cross-checked against these sources and PyTorch on 2026-06-10. Softmax is a stable mathematical definition, so this tool needs no rate or schedule updates — only the worked examples are periodically re-reconciled.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.