LLM Temperature & Top-p Sampling Visualizer
Drag the temperature, top-p, and top-k sliders and watch a language model's next-token probabilities sharpen, flatten, and get truncated in real time — with the exact softmax math, token by token. No API key, no cost, runs in your browser.
How it works
A language model does not pick the next word directly — it outputs a raw score (a logit) for every token in its vocabulary. Sampling parameters turn those logits into a probability distribution and then decide how adventurously to draw from it. This tool applies the three parameters people actually search for, in the common reference order used by Hugging Face's generation code: temperature → top-k → top-p → renormalize.
- Tempered softmax. Each logit is divided by the temperature
T, then run through softmax:p_i = exp(z_i / T) / Σ_j exp(z_j / T). LowerTmakes the top token more dominant (sharper, more confident); higherTflattens the distribution (more random). AtT = 0the math collapses to greedy decoding: the single highest token gets probability 1. We subtract the maximum scaled logit before exponentiating (the log-sum-exp trick) so large logits or tiny temperatures never overflow. - Top-k truncation. If
k > 0, keep only thekhighest-probability tokens and zero the rest (Fan et al., 2018). A fixed cutoff, regardless of how confident the model is. - Top-p (nucleus) truncation. If
p < 1, sort the surviving tokens by probability and keep the smallest prefix whose cumulative probability reachesp(Holtzman et al., 2019). Unlike top-k, the cutoff adapts to the distribution's shape. - Renormalize. Divide the survivors by their sum so they total 1 again — these are the actual probabilities the sampler draws from. The badge in the calculator confirms they reconcile to 1.0.
The summary chips add an entropy readout, H = −Σ p·log₂ p in bits, as a single "how random is this" number, and an effective choicescount — how many tokens still carry non-zero probability after truncation. The provider toggle clamps the slider ranges to each API's documented limits (OpenAI temperature 0–2 with no public top_k; Anthropic temperature 0–1 with optional top_k). Providers can differ in edge-case clamping and tie-breaking, so this tool states its reference order rather than claiming to mirror any one backend exactly.
Worked examples
Frequently asked questions
Sources & references
- OpenAI API Reference — Chat Completions temperature & top_p
- Anthropic Messages API Reference — temperature, top_p, top_k
- Holtzman et al. (2019) — The Curious Case of Neural Text Degeneration (top-p / nucleus sampling)
- Fan, Lewis & Dauphin (2018) — Hierarchical Neural Story Generation (top-k sampling)
- Goodfellow, Bengio & Courville (2016) — Deep Learning (tempered softmax)
Parameter ranges and semantics were last cross-checked against the OpenAI and Anthropic API references on 2026-06-11. The sampling math is standard; the worked examples above double as the tool's reconciliation fixtures (tolerance 1e-4).
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, edge case, or want to suggest an improvement?
Email me at [email protected] — most fixes ship within 24 hours.