Large Language Models (LLMs)

Stop Sequence

Definition

Stop sequences are tokens or strings that the LLM API monitors during generation; when the model produces one, generation halts immediately and the stop sequence is excluded from the response. Common stop sequences: ' ' (stop at first newline for single-line outputs), '###' (custom separator for structured responses), '<|end|>' (end-of-turn marker), or domain-specific terminators. Stop sequences work in conjunction with max_tokens: generation ends at whichever condition is met first. Multiple stop sequences can be specified; generation stops when any of them is produced. Stop sequences are particularly useful for: extracting a single value from a prompted completion, preventing the model from continuing past the desired output, and managing turn boundaries in multi-turn conversation templates.

Why It Matters

Stop sequences provide deterministic control over response length and structure that max_tokens alone cannot achieve. A max_tokens limit stops at an arbitrary point; a stop sequence stops at a semantically meaningful boundary. For JSON extraction tasks, stopping at ' ' or '}' after the relevant value prevents the model from generating unwanted text after the answer. For templated generation where the model fills in blanks, a stop sequence at the template's end marker prevents over-generation. For 99helpers chatbots, stop sequences are less commonly needed (instruction tuning handles natural stopping), but they're invaluable for structured generation tasks like data extraction or template completion.

How It Works

Stop sequence usage in OpenAI API: response = openai.chat.completions.create(model='gpt-4o', messages=[...], stop=[' ', '###', '---']). The response ends at the first occurrence of any of these strings. Example: extracting a number from a prompt—stop=[' '] ensures only the number on the first line is returned. For multi-choice extraction: prompt the model with 'Answer: A, B, C, or D: [question]' and stop=[' ']—the model outputs 'Answer: B' and stops. When using stop sequences with few-shot prompting, use the same delimiter as in your examples so the model learns to stop at the same point.

Stop Sequence — Token Generation & Truncation

Configuration

stop_sequences:["\n###", "END", "</answer>"]

Token generation stream

The

answer

\n###

STOP

Now

dropped

let

dropped

explain

dropped

Returned output

The answer is 42.

Stop sequence excluded from result

Without stop sequence

The answer is 42. ### Now let me explain...

Generation continues until max_tokens

Generated & returned

Stop sequence (excluded)

Dropped tokens

Real-World Example

A 99helpers developer builds a feature that classifies ticket priority from message text. Zero-shot prompt: 'Priority (low/medium/high): [ticket text] Priority: '. Stop sequence: [' ']. The model generates 'Priority: high' and stops at the newline—returning exactly the priority label without additional explanation. Without the stop sequence, the model might generate: 'Priority: high This ticket mentions a system outage affecting multiple users, which warrants immediate attention...' The stop sequence ensures a predictable, parseable output format for this classification use case.

Common Mistakes

✕Setting stop sequences that appear inside the expected response—if your stop sequence is a common word or character, it will terminate generation prematurely.
✕Using stop sequences as the only response length control—always combine with max_tokens as a safety limit in case the stop sequence is never generated.
✕Forgetting that stop sequences are exact string matches—'END' and 'end' are different; ensure case matches expected model output.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Stop Sequence

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Max Tokens

Structured Output

LLM API

JSON Mode

Temperature

Ready to build your AI chatbot?