Large Language Models (LLMs)

Greedy Decoding

Definition

Greedy decoding is the simplest generation strategy: at every step, the model picks the token with the highest probability from its output distribution and appends it to the sequence. This is equivalent to setting temperature=0 or top-k=1. The name 'greedy' reflects its myopic optimization—it makes the locally optimal choice at each step without considering whether that choice leads to the globally optimal complete sequence. Greedy decoding is fully deterministic (the same input always produces the same output), computationally efficient (no sampling or beam expansion), and often produces high-quality outputs for simple, factual queries where there is one clear best next token at each step.

Why It Matters

Greedy decoding is the implicit baseline for LLM outputs and helps explain certain failure modes. When an LLM gets 'stuck' generating the same phrase repeatedly, it is often caught in a greedy loop: the highest-probability next token is one that reinforces a pattern, which makes the same continuation more likely, creating a cycle. Understanding this explains why repetition penalties and sampling help—they break the deterministic greedy cycle. For 99helpers support chatbots where answer consistency matters, temperature=0 (greedy) is often the right choice; for content generation where variety is valued, sampling outperforms greedy.

How It Works

Greedy decoding process: (1) compute logits over vocabulary from model forward pass; (2) apply softmax to convert to probabilities; (3) select argmax (token with highest probability); (4) append to sequence; (5) run forward pass again with extended sequence; (6) repeat until stop token. The computational cost per token is one forward pass through the model—identical to sampling, making greedy no faster than sampling at the per-token level. Greedy is differentiated by eliminating the sampling step and associated randomness. In code: next_token = logits.argmax(-1) instead of torch.multinomial(probabilities, num_samples=1).

Greedy Decoding — Argmax Token Selection

Context: The weather today is

argmax

sunny

42%

cloudy

31%

rainy

18%

cold

Context: The weather today is sunny

argmax

and

38%

with

27%

22%

but

13%

Context: The weather today is sunny and

argmax

warm

51%

bright

28%

hot

14%

clear

Output sequence (greedy)

Theweathertodayissunnyandwarm

Highlighted tokens were greedily selected

Greedy decoding always picks the single highest-probability token. Fast, deterministic, but can miss globally better sequences that require taking a lower-probability first token.

Real-World Example

A 99helpers chatbot is configured with temperature=0 (greedy decoding) for its FAQ answering feature. During testing, a developer notices that when asked 'What does 99helpers do?' the bot responds identically on every run—perfect for a use case where consistency builds user trust. However, when the same setting is used for a creative product description generator, all outputs are nearly identical regardless of the input product. The team adds temperature=0.9 for creative generation while keeping temperature=0 for structured FAQ, applying greedy and sampling appropriately to each use case.

Common Mistakes

✕Thinking greedy decoding is always 'best' because it selects the highest-probability token—locally optimal choices can lead to globally suboptimal sequences.
✕Expecting greedy decoding to always produce the same output across different model serving infrastructure—hardware and batching differences can occasionally produce different floating-point results.
✕Using greedy decoding for tasks requiring creative variation—greedy decoding by definition produces the least diverse possible output.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Greedy Decoding

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Temperature

Beam Search

Top-K Sampling

Top-P Sampling (Nucleus Sampling)

Large Language Model (LLM)

Ready to build your AI chatbot?