How do you calculate softmax by hand?

For logits z = [z₁ … zₙ]: (1) find the largest value m = max(z); (2) subtract it from each, giving dᵢ = zᵢ − m; (3) take eᵢ = exp(dᵢ); (4) sum them, S = Σ eᵢ; (5) divide, pᵢ = eᵢ / S. The probabilities are positive and sum to 1. Subtracting the max is optional on paper but prevents overflow in code.

What does temperature do in the softmax function?

Temperature T rescales the logits before softmax: softmax(zᵢ / T). T = 1 is the plain softmax. T > 1 divides the scores down, so the gaps shrink and the distribution flattens toward uniform — more random sampling. T < 1 magnifies the gaps, so the top class dominates — sharper, more confident output. It is the same temperature knob in the OpenAI and Anthropic sampling APIs.

Why subtract the max before applying softmax?

exp() overflows quickly — exp(1000) is already infinity in floating point, which turns the result into NaN. Subtracting the largest logit makes the biggest exponent exp(0) = 1 and all others smaller, so nothing overflows. Because the same constant is subtracted from every term, it cancels in the division, so the probabilities are mathematically identical to the naive formula.

What is the difference between softmax and sigmoid?

Sigmoid maps a single number to one probability in (0, 1) and is used for binary or independent multi-label problems. Softmax maps a whole vector to a probability distribution that sums to 1, used when classes are mutually exclusive (one winner). For two classes, softmax and sigmoid give the same answer. This tool computes softmax only; sigmoid is a separate activation.

Do softmax outputs always sum to 1?

Yes. Softmax divides each exponential by the sum of all exponentials, so by construction Σ pᵢ = S / S = 1 for any input vector and any temperature. This calculator displays the sum on every run so you can confirm it reads exactly 1.0000. Tiny floating-point error (around 1e-16) can appear far past the visible decimals but never changes the rounded value.

Does temperature change which class is predicted?

No. Temperature is a positive divisor, so it rescales every logit by the same factor and never reorders them. The argmax — the predicted class — is therefore the same for any T > 0. Temperature only changes how confident the distribution looks (how peaked or flat), not the winner. That is why this tool shows the same predicted class as you slide T.

Is this softmax calculator the same as PyTorch's?

Yes, for a single vector. It implements the numerically-stable softmax that PyTorch's torch.nn.functional.softmax uses internally, and the worked examples on this page reconcile to PyTorch to four decimals. This v1 handles one vector at a time; it does not do batched or multi-axis (2-D) softmax, log-softmax, or the softmax Jacobian used in backprop.

What does the entropy readout mean?

Entropy H = −Σ pᵢ log₂ pᵢ (in bits) measures how spread out the distribution is. It is 0 when one class has all the probability and reaches its maximum of log₂(n) when every class is equally likely. The 'sharpness' figure is just 1 minus the entropy normalised by that maximum — higher means a more confident, peaked distribution.

AI · Machine learning

Softmax Calculator (with Temperature)

Turn raw logits into probabilities in your browser. Paste your scores, adjust the temperature, and read each class probability, the predicted argmax, the entropy, and a step-by-step, numerically-stable breakdown — verified against PyTorch.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 10, 2026

Softmax calculator

Logits (raw scores)

Separate values with commas, spaces, or new lines. Up to 100 numbers; negatives and decimals are fine.

Temperature (T)1.00

T > 1 flattens the distribution; T < 1 sharpens it. Range 0.1–5.

Class labels (optional)

Comma-separated; must match the number of logits.

Examples

Predicted class (argmax)

Class 1

p = 0.6590

Probability sum

1.0000

Always 1 — that's what softmax guarantees

Entropy

1.222 bits

Max 1.585 (uniform)

Sharpness

23%

Spread — the classes are close to each other.

Distribution

Class 1

0.6590

Class 2

0.2424

Class 3

0.0986

Per-class breakdown

Decimals

Class	Logit z	z / T	exp(z/T − max)	Probability	%
Class 1	2	2	1.0000	0.6590	65.90%
Class 2	1	1	0.3679	0.2424	24.24%
Class 3	0.1	0.1	0.1496	0.0986	9.86%
Sum				1.0000	100.00%

Step-by-step

1. Scale by T = 1.00: sᵢ = zᵢ / 1.00
2. Subtract max(s) = 2 from every sᵢ (stability shift)
3. Exponentiate each shifted score: eᵢ = exp(sᵢ − max)
4. Denominator: Σ eᵢ = 1.5174
5. Divide: pᵢ = eᵢ / 1.5174 → argmax = Class 1 (0.6590)

Copy

Method: temperature-scaled, max-shifted softmax — Goodfellow, Bengio & Courville, Deep Learning(2016) §4.1 & §6.2.2; temperature per Hinton et al. (2015). Entropy is Shannon H = −Σ pᵢ log₂ pᵢ. Everything runs in your browser — no data leaves this page.

How it works

The softmax function converts a vector of real-valued scores — called logits— into a probability distribution: every output is positive and the outputs sum to 1. It is the last layer of almost every classification neural network, and the function behind the temperature knob in modern language-model sampling. This calculator implements the standard, numerically-stable definition from Goodfellow, Bengio & Courville's Deep Learning (MIT Press, 2016).

For logits z = [z₁ … zₙ] and a temperature T, it runs these steps:

Temperature scaling. Each logit is divided by the temperature: sᵢ = zᵢ / T. At T = 1 this is the plain softmax (Hinton, Vinyals & Dean, 2015).
Stability shift. Subtract the maximum scaled score, dᵢ = sᵢ − max(s). This stops exp() from overflowing on large inputs. The constant cancels in the next division, so the answer is unchanged (Deep Learning §4.1).
Exponentiate. eᵢ = exp(dᵢ). Because every shifted score is ≤ 0, each exponential lands in (0, 1].
Normalise. pᵢ = eᵢ / Σⱼ eⱼ. This is the softmax probability for class i (Deep Learning §6.2.2; PyTorch softmax reference).

The predicted class is the argmax — the index with the highest probability, which is temperature-invariant because dividing by a positive T never reorders the scores. For a sharpness readout the tool also computes the Shannon entropy H = −Σ pᵢ log₂ pᵢ in bits, whose maximum of log₂ n marks a perfectly uniform distribution. To prove correctness the module cross-checks the stable result against the naive (no-shift) softmax: on well-conditioned inputs the two agree to roughly 1e-15, and on extreme inputs only the stable path survives — which is exactly why the shift exists. Everything is plain double-precision arithmetic; nothing is sent to a server.

Worked examples

Textbook vector — [2.0, 1.0, 0.1], T = 1

Subtract max (2.0): [0, −1.0, −1.9]
Exponentiate: [1, 0.367879, 0.149569]
Sum = 1.517448
Divide: [0.6590, 0.2424, 0.0986]
Sum check: 0.6590 + 0.2424 + 0.0986 = 1.0000 ✓
Argmax = Class 1 (0.6590) — matches PyTorch softmax([2,1,0.1])

Same logits, T = 2 — temperature flattens it

Scale by T: [1.0, 0.5, 0.05]
Subtract max (1.0): [0, −0.5, −0.95]
Exponentiate: [1, 0.606531, 0.386741]
Sum = 1.993272
Divide: [0.5017, 0.3043, 0.1940] (sum 1.0000 ✓)
Top class dropped 0.659 → 0.502: higher T = more uniform

Numerical-stability edge case — [1e9, 0], T = 1

Naive softmax: exp(1e9) = Infinity → NaN (broken)
Stable path subtracts max (1e9): [0, −1e9]
Exponentiate: [1, 0] (exp(−1e9) underflows to 0)
Divide: [1, 0] (sum 1.0000 ✓)
This is why production softmax always subtracts the max

Frequently asked questions

Sources & references

The formulas on this page were last cross-checked against these sources and PyTorch on 2026-06-10. Softmax is a stable mathematical definition, so this tool needs no rate or schedule updates — only the worked examples are periodically re-reconciled.

Related tools

LiveAI

Attention Score Calculator

Compute scaled dot-product self-attention step by step from your own Query, Key, and Value matrices — raw QKᵀ scores, √dₖ scaling, softmax attention weights, and output context vectors, with a per-row arithmetic trace. Runs entirely in your browser.

Open tool

LiveAI

Sigmoid Calculator

Compute the logistic sigmoid σ(x) = 1/(1+e⁻ˣ) for one or more values, plus its derivative σ′(x) and its inverse (the logit), with a numerically-stable step-by-step breakdown and a plotted S-curve. Verified against PyTorch, entirely in the browser.

Open tool

LiveAI

AI Temperature Calc

Interactive visualizer for LLM temperature, top-p (nucleus), and top-k sampling. Drag the sliders and watch the next-token softmax probabilities sharpen, flatten, and truncate, with the exact math token by token. Runs in your browser, no API key, sources cited.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.