Retrieval-Augmented Generation (RAG)

Query Rewriting

Definition

Query rewriting uses an LLM or rule-based system to transform the user's raw input into a better search query before retrieval. User queries are often ambiguous, abbreviated, or rely on conversational context that the retrieval system cannot access. Query rewriting addresses this by: resolving pronouns with their referents from conversation history ('What about the other plan?' → 'What are the features of the Professional plan?'), expanding abbreviations and colloquial terms, adding relevant domain context, and breaking compound questions into component parts. The rewritten query more precisely represents the user's information need, improving retrieval quality.

Why It Matters

Query rewriting addresses the gap between how users naturally express questions in conversation and what retrieval systems need to find relevant documents. In multi-turn conversations, queries frequently reference earlier context ('Can you explain that more?' or 'What about the cost?') that the retrieval system cannot resolve without the full conversation history. Query rewriting with conversation context enables the retrieval system to find the right information even when the user's raw query is incomplete. This is particularly important for AI chatbot applications where conversational queries are the norm.

How It Works

Query rewriting is implemented as an LLM call that occurs before retrieval. The prompt includes the conversation history and the user's latest message, and asks the LLM to rewrite the query as a standalone search question that captures the full information need. Example prompt: 'Given the conversation history below, rewrite the user's last message as a clear, standalone search query. Resolve any references to previous messages. Output only the rewritten query.' The rewritten query is then used for both vector search and keyword search. Some implementations generate multiple query rewrites and retrieve separately for each.

Query Rewriting — Transforming Vague Queries for Better Retrieval

Original Query

Rewritten Query

"it doesn't work"

troubleshooting chatbot not responding

Generic complaint → specific issue

"how much?"

subscription pricing and plan costs

Ambiguous price question → precise intent

"that feature" (ambiguous)

export conversation history feature

Context-aware resolution from prior turns

Why Rewriting Improves Retrieval

Specificity

vague terms

domain vocabulary

Disambiguation

pronouns / 'it'

explicit noun phrase

Context

single-turn query

conversation-aware

Real-World Example

A 99helpers customer analyzes multi-turn chatbot conversations and finds that retrieval accuracy drops significantly on the second and third turns of a conversation — the user is referencing earlier context but the retrieval system treats each message independently. After adding a query rewriting step that incorporates conversation history, multi-turn retrieval accuracy improves from 61% to 83%. The most impactful case: users who ask 'How do I cancel?' after a context-setting discussion about their account type now retrieve the correct cancellation procedure for their specific account type rather than generic cancellation content.

Common Mistakes

✕Rewriting queries without preserving the user's original intent — overly aggressive rewriting that over-specifies the query can actually hurt retrieval by narrowing the search too much
✕Adding query rewriting latency without measuring quality improvement — rewriting adds an LLM call (100-300ms); only add it if measured retrieval improvement justifies the cost
✕Applying single-query rewriting when multi-query generation would capture more relevant documents for complex questions

Related Terms

Query Expansion

Query expansion is a retrieval technique that augments the original user query with related terms, synonyms, or alternative phrasings before search, improving recall by retrieving relevant documents that would not match the original query vocabulary.

Multi-Query Retrieval

Multi-query retrieval generates multiple alternative phrasings of the user's question and retrieves documents for each phrasing separately, then merges results to achieve higher recall than any single query formulation would provide.

Hypothetical Document Embedding

Hypothetical Document Embedding (HyDE) is a RAG technique that improves retrieval by having an LLM generate a hypothetical document that would answer the user's query, then using that document's embedding rather than the query embedding for similarity search.

Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retrieving relevant documents from an external knowledge base and then using that retrieved content as context when generating an answer.

Context Window

A context window is the maximum amount of text (measured in tokens) that a language model can process in a single inference call, determining how much retrieved content, conversation history, and instructions can be included in a RAG prompt.

← Retrieval-Augmented Generation (RAG)← Glossary Hub

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →