What does temperature 0 mean in an LLM?

Temperature 0 is greedy decoding: the model always picks the single highest-probability token, so the same prompt returns the same output every time. In this tool, set T = 0 and the top token jumps to 100% while every other bar drops to zero. Most APIs treat temperature 0 as effectively deterministic.

Is it better to change temperature or top_p?

OpenAI's guidance is to alter one or the other, not both, because they interact in ways that are hard to reason about. Temperature rescales every probability; top_p clips the long tail. Pick temperature when you want overall randomness control, top_p when you want to cap how unlikely a chosen token can be.

What is a good temperature for ChatGPT or Claude?

For factual or code answers, 0 to 0.3 keeps output focused. For balanced chat, 0.7 to 1.0 is common. For brainstorming or creative writing, 1.0 and up adds variety. There is no universal best — load a preset here, watch the entropy readout, and match it to how deterministic you need the result.

How is the softmax probability of a token calculated?

Softmax turns raw logits into probabilities: p_i = exp(z_i / T) / Σ_j exp(z_j / T). Each logit is divided by the temperature T, exponentiated, then normalized so the values sum to 1. Lower T sharpens the distribution toward the top token; higher T flattens it. The step table shows this column by column.

Does top_k or top_p apply first?

In the common reference order — used by Hugging Face's generation code and mirrored here — temperature scaling happens first, then top-k keeps the k highest tokens, then top-p keeps the smallest set whose cumulative probability reaches p, and the survivors are renormalized once at the end. Providers can differ in edge cases, which is why the order is stated openly.

What is top-p (nucleus) sampling?

Top-p, introduced by Holtzman et al. (2019), sorts tokens by probability and keeps the smallest group whose probabilities add up to at least p (say 0.9), discarding the rest. It adapts the cutoff to the shape of the distribution — keeping few tokens when the model is confident and more when it is unsure — unlike a fixed top-k.

What is the entropy readout for?

Entropy, H = −Σ p·log₂ p, is a one-number measure of how random the final distribution is, in bits. Near 0 means the model is effectively certain (one token dominates). Higher values mean the probability is spread across many tokens, so sampled output will vary more. It updates live as you move the sliders.

Does this tool run a real model or call an API?

No. Everything runs in your browser with the logits you supply or a preset — no API key, no cost, no data leaves the page, and it works offline after load. It is a teaching and intuition tool for the sampling math, not a live model. For token counting use the AI Token Counter; this visualizes a single next-token step.

AI · Sampling

LLM Temperature & Top-p Sampling Visualizer

Drag the temperature, top-p, and top-k sliders and watch a language model's next-token probabilities sharpen, flatten, and get truncated in real time — with the exact softmax math, token by token. No API key, no cost, runs in your browser.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 11, 2026

Sampling visualizer

Probabilities sum to 1.0 ✓

Provider mode

No provider clamp — explore the full math, temperature 0–2 and top-k on.

Distribution preset

Tokens & logits (one label, logit per line)

2–12 tokens. Logits range −20 to 20. These are illustrative scores, not a real model run.

Temperature (T)

Top-p (nucleus)

Off (no truncation)

Top-k

Off (keep all)

Most likely next token

cat

57.93%

Effective choices

tokens with non-zero prob

Entropy

1.6 bits

Spread out

Final sampling probability

cat

57.93%

dog

21.31%

bird

12.93%

fish

7.84%

Step-by-step math

Token	Logit	Softmax @ T=1	@ current T	Top-k	Top-p	Final prob
cat	2	57.93%	57.93%	✓	✓	57.93%
dog	1	21.31%	21.31%	✓	✓	21.31%
bird	0.5	12.93%	12.93%	✓	✓	12.93%
fish	0	7.84%	7.84%	✓	✓	7.84%

Order: temperature → top-k → top-p → renormalize. Sources: .

How it works

A language model does not pick the next word directly — it outputs a raw score (a logit) for every token in its vocabulary. Sampling parameters turn those logits into a probability distribution and then decide how adventurously to draw from it. This tool applies the three parameters people actually search for, in the common reference order used by Hugging Face's generation code: temperature → top-k → top-p → renormalize.

Tempered softmax. Each logit is divided by the temperature T, then run through softmax: p_i = exp(z_i / T) / Σ_j exp(z_j / T). Lower T makes the top token more dominant (sharper, more confident); higher T flattens the distribution (more random). At T = 0 the math collapses to greedy decoding: the single highest token gets probability 1. We subtract the maximum scaled logit before exponentiating (the log-sum-exp trick) so large logits or tiny temperatures never overflow.
Top-k truncation. If k > 0, keep only the k highest-probability tokens and zero the rest (Fan et al., 2018). A fixed cutoff, regardless of how confident the model is.
Top-p (nucleus) truncation. If p < 1, sort the surviving tokens by probability and keep the smallest prefix whose cumulative probability reaches p(Holtzman et al., 2019). Unlike top-k, the cutoff adapts to the distribution's shape.
Renormalize. Divide the survivors by their sum so they total 1 again — these are the actual probabilities the sampler draws from. The badge in the calculator confirms they reconcile to 1.0.

The summary chips add an entropy readout, H = −Σ p·log₂ p in bits, as a single "how random is this" number, and an effective choicescount — how many tokens still carry non-zero probability after truncation. The provider toggle clamps the slider ranges to each API's documented limits (OpenAI temperature 0–2 with no public top_k; Anthropic temperature 0–1 with optional top_k). Providers can differ in edge-case clamping and tie-breaking, so this tool states its reference order rather than claiming to mirror any one backend exactly.

Worked examples

A · Temperature only

cat=2.0, dog=1.0, bird=0.5, fish=0.0

exp(z) @ T=1 = [7.389, 2.718, 1.649, 1.000], Σ = 12.756
T = 1.0 → cat 57.93%, dog 21.31%, bird 12.93%, fish 7.84%
T = 0.5 (z/T = [4,2,1,0]) → cat 83.10%, dog 11.25%, bird 4.14%, fish 1.52%
T = 0 (greedy) → cat 100%, the rest 0 — same answer every run

B · Top-p truncation

T = 1.0, p = 0.8

Sorted cumulative: 0.5793 (<0.8) → +0.2131 = 0.7924 (<0.8) → +0.1293 = 0.9216 (≥0.8) stop
Keep cat, dog, bird; drop fish
Renormalize over 0.9216 → cat 62.85%, dog 23.12%, bird 14.02%, fish 0%
Effective choices = 3

C · Top-k truncation

T = 1.0, k = 2

Keep the two highest: cat, dog; drop bird, fish
Renormalize over 0.5793 + 0.2131 = 0.7924
→ cat 73.11%, dog 26.89%
Effective choices = 2, entropy ≈ 0.84 bits

Frequently asked questions

Sources & references

Parameter ranges and semantics were last cross-checked against the OpenAI and Anthropic API references on 2026-06-11. The sampling math is standard; the worked examples above double as the tool's reconciliation fixtures (tolerance 1e-4).

Related tools

LiveAI

Top-p & Top-k Calc

Interactive calculator for LLM top-k and top-p (nucleus) sampling. Drag top_k or top_p on a next-token distribution and watch which candidates survive, get zeroed, and renormalize, with the exact math token by token. Runs in your browser, no API key, sources cited.

Open tool

LiveAI

AI Min-p Sampling Calc

Interactive calculator for LLM min-p and locally-typical sampling. Drag min_p or typical_p and watch which next-token candidates survive, get zeroed, and renormalize, with the exact math token by token. Runs in your browser, no API key, sources cited.

Open tool

LiveAI

AI Chatbot Cost Calculator

Estimate the monthly API cost of a multi-turn AI chatbot across Claude, GPT, and Gemini. Models the quadratic context re-sending that single-call calculators miss, with and without prompt caching, in USD and LKR.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.