Softmax Calculator (with Temperature)
Turn raw logits into probabilities in your browser. Paste your scores, adjust the temperature, and read each class probability, the predicted argmax, the entropy, and a step-by-step, numerically-stable breakdown — verified against PyTorch.
How it works
The softmax function converts a vector of real-valued scores — called logits— into a probability distribution: every output is positive and the outputs sum to 1. It is the last layer of almost every classification neural network, and the function behind the temperature knob in modern language-model sampling. This calculator implements the standard, numerically-stable definition from Goodfellow, Bengio & Courville's Deep Learning (MIT Press, 2016).
For logits z = [z₁ … zₙ] and a temperature T, it runs these steps:
- Temperature scaling. Each logit is divided by the temperature:
sᵢ = zᵢ / T. At T = 1 this is the plain softmax (Hinton, Vinyals & Dean, 2015). - Stability shift. Subtract the maximum scaled score,
dᵢ = sᵢ − max(s). This stopsexp()from overflowing on large inputs. The constant cancels in the next division, so the answer is unchanged (Deep Learning §4.1). - Exponentiate.
eᵢ = exp(dᵢ). Because every shifted score is ≤ 0, each exponential lands in (0, 1]. - Normalise.
pᵢ = eᵢ / Σⱼ eⱼ. This is the softmax probability for class i (Deep Learning §6.2.2; PyTorchsoftmaxreference).
The predicted class is the argmax — the index with the highest probability, which is temperature-invariant because dividing by a positive T never reorders the scores. For a sharpness readout the tool also computes the Shannon entropy H = −Σ pᵢ log₂ pᵢ in bits, whose maximum of log₂ n marks a perfectly uniform distribution. To prove correctness the module cross-checks the stable result against the naive (no-shift) softmax: on well-conditioned inputs the two agree to roughly 1e-15, and on extreme inputs only the stable path survives — which is exactly why the shift exists. Everything is plain double-precision arithmetic; nothing is sent to a server.
Worked examples
Frequently asked questions
Sources & references
- Goodfellow, Bengio & Courville — Deep Learning (MIT Press, 2016), §4.1 & §6.2.2
- PyTorch documentation — torch.nn.Softmax (reference implementation)
- Hinton, Vinyals & Dean — Distilling the Knowledge in a Neural Network (2015), §2 (temperature)
The formulas on this page were last cross-checked against these sources and PyTorch on 2026-06-10. Softmax is a stable mathematical definition, so this tool needs no rate or schedule updates — only the worked examples are periodically re-reconciled.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, edge case, or want to suggest an improvement?
Email me at [email protected] — most fixes ship within 24 hours.