Query Expansion
Definition
Query expansion is a pre-retrieval technique that enriches the original query with additional terms or alternative formulations to improve retrieval coverage. In traditional keyword search, query expansion adds synonyms, related terms, and common abbreviations to cast a wider net. In neural RAG systems, query expansion can take forms such as: generating hypothetical answers (HyDE), rephrasing the query in multiple ways (multi-query retrieval), extracting explicit sub-questions from a complex query, or adding domain-specific terminology the user may not know. Query expansion increases recall (finding more relevant documents) at the cost of slightly lower precision (the expanded results may include more irrelevant documents).
Why It Matters
Query expansion addresses the vocabulary mismatch problem: users describe their questions in their own language, which may differ from the language used in the knowledge base. A customer asking 'why was I charged twice?' may not know to search for 'duplicate invoice' or 'double billing'. Query expansion can automatically add these synonymous terms, ensuring the retrieval covers documents using the correct technical vocabulary. For complex multi-part questions, query expansion that decomposes the question into sub-queries and retrieves context for each sub-query enables the LLM to answer questions requiring information from multiple documents.
How It Works
Query expansion is implemented at the retrieval stage, before embedding and search. Term-based expansion: look up the query terms in a synonym dictionary or word association graph and add related terms. LLM-based expansion: prompt an LLM to generate 3-5 alternative phrasings of the user's query, retrieve separately for each phrasing, and merge the results (removing duplicates). HyDE: prompt the LLM to write a hypothetical document that would answer the query, then embed and search with that hypothetical document rather than the query itself (since document embeddings are closer to other document embeddings than query embeddings).
Query Expansion — Broadening Retrieval Coverage
Original query
"reset password"
Synonym Expansion
Related Terms
LLM Variations
Expanded query terms
Retrieve for each variant
Union of all result sets
De-duplicate results
Remove overlapping docs
Better recall
Fewer missed relevant docs
Real-World Example
A 99helpers customer finds that customer queries about account permissions often use imprecise language: 'who can see my stuff', 'sharing settings', 'privacy controls'. Their knowledge base uses the technical term 'access management'. After implementing LLM-based query expansion that rewrites each query into 3 alternative phrasings, retrieval for permission-related queries improves significantly because the expanded queries include both the customer's informal language and the technical terms from the knowledge base. Recall@5 for permission queries improves from 56% to 83%.
Common Mistakes
- ✕Over-expanding queries with too many additional terms — excessive expansion retrieves irrelevant documents that dilute the LLM context with noise
- ✕Expanding queries without measuring the precision impact — while expansion improves recall, it may reduce precision; measure both before and after
- ✕Using generic synonym expansion without domain knowledge — domain-specific query expansion based on your actual customer language and knowledge base vocabulary outperforms generic expansion
Related Terms
Query Rewriting
Query rewriting is a technique that transforms a user's original query into an improved version — clearer, more complete, or better suited for retrieval — using an LLM to improve recall and relevance before searching the knowledge base.
Hypothetical Document Embedding
Hypothetical Document Embedding (HyDE) is a RAG technique that improves retrieval by having an LLM generate a hypothetical document that would answer the user's query, then using that document's embedding rather than the query embedding for similarity search.
Multi-Query Retrieval
Multi-query retrieval generates multiple alternative phrasings of the user's question and retrieves documents for each phrasing separately, then merges results to achieve higher recall than any single query formulation would provide.
Dense Retrieval
Dense retrieval is a retrieval approach that encodes both queries and documents into dense embedding vectors and finds relevant documents by computing vector similarity, enabling semantic matching beyond exact keyword overlap.
Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retrieving relevant documents from an external knowledge base and then using that retrieved content as context when generating an answer.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →