Step-Back Prompting
Definition
Step-back prompting, introduced by Google DeepMind, addresses cases where a specific query fails to retrieve relevant content because the answer is embedded in a more general document that doesn't directly use the specific terminology. For example, 'Why does my API return a 429 error after 3 PM?' is very specific; the answer about rate limiting and time-based quotas likely lives in a general 'API Rate Limits' document. Step-back prompting uses an LLM to abstract the query to 'What are the API rate limit rules?', retrieves against this general query, then provides both the original specific context and the retrieved general content to the LLM for final answer generation.
Why It Matters
Step-back prompting improves retrieval for highly specific queries that are rare or phrased in ways that don't match document language. Support queries often include specific product versions, error messages, or configuration values that don't appear in general documentation. By stepping back to the general concept, retrieval finds the authoritative reference document, which the LLM then applies to the specific situation. For 99helpers chatbots serving technical users who ask very specific questions, step-back prompting reduces the 'not found' rate for queries that fail standard retrieval but whose answers exist in general documentation.
How It Works
Implementation: given a specific user query, call an LLM with prompt: 'Given this specific question: [query], what more general question would help retrieve the background information needed to answer it?' The LLM returns a more abstract query. Both the original and abstract queries are used for retrieval—typically fetching top-K for each and combining the result sets. The combined context plus both queries are passed to the final generation LLM. The generation prompt notes both the specific question and the background documents retrieved via the abstract query, helping the LLM apply general principles to the specific situation.
Step-Back Prompting — Abstract First, Specific Answer Second
Direct retrieval
Query
Why did my API call fail with error 429?
Retrieved: 429 error logs
Answer
Partial answer — missing rate limit principles
Step-back prompting
Query
Why did my API call fail with error 429?
Step-back
What are the general principles of API rate limiting?
Retrieved: rate limiting concepts + quotas + 429 docs
Answer
Grounded, complete answer with full context
Step-back flow
Original query
Specific: "Why did my API call fail with error 429?"
Step-back abstraction
Abstract: "What are the general principles of API rate limiting?"
Retrieve abstract docs
Fetch foundational concept pages, not just error logs
Combine context
Abstract concept docs + original specific query passed to LLM
Generate answer
LLM grounds specific answer in abstract principles — more complete
When step-back helps most
Debugging
Error cause understood through underlying system principles
How-to questions
Task context improved by knowing general methodology
Why-questions
Root cause requires conceptual foundation, not just event logs
Real-World Example
A 99helpers user asks: 'My webhook notifications stopped working after I upgraded to plan tier 2.' This specific query retrieves webhook setup documentation but misses the relevant context about plan tier 2's webhook behavior differences. Step-back prompting transforms it to: 'What are the differences in webhook functionality between subscription plans?' Retrieval against this general query surfaces a plan comparison document that explains tier 2's webhook filtering rules. The combined context—original webhook troubleshooting + plan comparison—enables the LLM to correctly diagnose that tier 2's event filtering setting is blocking the notifications.
Common Mistakes
- ✕Always applying step-back prompting even when original retrieval is sufficient—the extra LLM call adds 100-300ms latency for no benefit on clear, specific queries.
- ✕Using step-back prompting without also retrieving against the original query—the specific query captures different relevant content that should complement, not replace, the general query results.
- ✕Generating step-back queries that are too abstract—overly general queries (e.g., 'How does software work?') retrieve irrelevant content.
Related Terms
Query Rewriting
Query rewriting is a technique that transforms a user's original query into an improved version — clearer, more complete, or better suited for retrieval — using an LLM to improve recall and relevance before searching the knowledge base.
Query Expansion
Query expansion is a retrieval technique that augments the original user query with related terms, synonyms, or alternative phrasings before search, improving recall by retrieving relevant documents that would not match the original query vocabulary.
Query Decomposition
Query decomposition breaks a complex, multi-part user question into simpler sub-queries that can each be answered independently, improving RAG retrieval by matching each sub-query against relevant document segments.
Multi-Query Retrieval
Multi-query retrieval generates multiple alternative phrasings of the user's question and retrieves documents for each phrasing separately, then merges results to achieve higher recall than any single query formulation would provide.
Retrieval Pipeline
A retrieval pipeline is the online query-time workflow that transforms a user question into a ranked set of relevant document chunks, serving as the information retrieval stage of a RAG system.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →