Large Language Models (LLMs)

Temperature

Definition

Temperature is a sampling parameter that scales the probability distribution over the vocabulary before the model selects its next token. At temperature=0, the model always picks the highest-probability token (greedy decoding), producing fully deterministic output. At temperature=1, the distribution is unchanged—sampling follows the model's raw probabilities. At temperature>1, the distribution is flattened (low-probability tokens become relatively more likely), increasing variety and occasionally producing surprising or incoherent outputs. Temperature=0.7-1.0 is a common sweet spot for conversational AI, balancing coherence with naturalness. Temperature below 0.3 is preferred for factual, technical, or structured outputs.

Why It Matters

Temperature is the most impactful parameter for controlling chatbot personality and reliability. A customer support chatbot answering factual product questions should use temperature=0 or 0.1—consistent, predictable answers reduce user confusion and support team overhead. A creative writing assistant or brainstorming tool benefits from temperature=1.0-1.5, producing varied and imaginative outputs. For 99helpers customers deploying AI chatbots, temperature tuning is often the first optimization step after initial deployment: reducing temperature for factual support bots reduces hallucination and response variance, while increasing it for engagement tools produces more conversational, natural-feeling exchanges.

How It Works

Mathematically: adjusted_logit[i] = logit[i] / temperature. After dividing all logits by the temperature, softmax converts them to probabilities. When temperature approaches 0, the highest logit dominates exponentially; probabilities concentrate on the top token. When temperature=1, softmax is applied directly to raw logits. When temperature=2, logit differences are halved, flattening the distribution. In practice, temperature=0 is implemented as argmax (take highest logit) to avoid division-by-zero. Temperature interacts with top-p and top-k: in most APIs, temperature is applied first, then top-p or top-k filtering narrows the candidate pool.

Temperature — Token Probability Distribution

cat
dog
bird
fish
bear

T = 0

Greedy — always picks top token

100%
0%
0%
0%
0%

T = 1

Balanced — samples as trained

55%
20%
12%
8%
5%

T = 2

Creative — highly random output

30%
23%
19%
16%
12%

T = 0

Deterministic, factual, repetitive

T = 1

Default balance of quality & variety

T = 2

Creative, unpredictable, may be incoherent

Real-World Example

A 99helpers customer service chatbot initially deploys with temperature=0.8. Users report inconsistent answers—the same question about refund policy sometimes receives 3 different responses across sessions, creating customer confusion and support escalations. Lowering temperature to 0.1 makes responses nearly deterministic for the same query: the refund policy question always returns the same accurate, policy-grounded answer. Separately, their marketing team's content generation tool uses temperature=1.2 to produce varied product descriptions from the same feature list, preventing identical-sounding outputs across different customers' generated content.

Common Mistakes

  • Using temperature=1.0 for all use cases without considering whether the application is factual (needs low temp) or creative (benefits from higher temp).
  • Expecting temperature=0 to eliminate all non-determinism—system-level batching and floating-point non-determinism can still produce occasional variation even at temp=0.
  • Combining very high temperature (>1.5) with no other output constraints—the model may generate grammatically correct but nonsensical content.

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is Temperature? Temperature Definition & Guide | 99helpers | 99helpers.com