Multi-Query Retrieval
Definition
Multi-query retrieval is a query expansion strategy that uses an LLM to generate multiple reformulations of the user's original question, performs separate vector searches for each formulation, and merges the resulting document sets (typically using union with deduplication) to produce a richer candidate pool. The motivation is that any single query formulation captures only one way of expressing the information need — different phrasings retrieve different (but potentially equally relevant) documents. By retrieving across multiple phrasings, the system achieves higher recall at the cost of retrieving more candidates (some of which may be less relevant).
Why It Matters
Multi-query retrieval is particularly valuable for complex, ambiguous, or underspecified queries where a single formulation may miss relevant documents. Users asking about nuanced topics often benefit from having their question rephrased in both technical and lay language, in both general and specific form, and from both question and statement perspectives. The merged document pool provides the LLM with a broader information base for answering complex questions. Multi-query is especially effective when combined with reranking — the larger candidate pool from multiple queries is reranked to select the most relevant documents.
How It Works
Multi-query retrieval is implemented by prompting an LLM to generate N alternative formulations (typically 3-5) of the user's query. Example prompt: 'Generate 4 alternative versions of the following question that capture the same information need from different angles: {question}'. Each alternative is used for a separate vector search (and optionally keyword search). Results are merged with deduplication — if the same document chunk appears in multiple query results, it is included once. The merged set is either used directly (top-k from each query) or passed to a reranker to select the final context.
Multi-Query Retrieval — Expanded Recall via Sub-queries
LLM Query Expansion
Generates 3 sub-queries
chatbot response latency causes
slow AI inference optimization
vector search performance tuning
Deduplicated Union — 6 unique docs (vs 3 from single query)
Real-World Example
A 99helpers customer implements multi-query retrieval for their complex B2B SaaS product. For the query 'Can my team use different account permissions?', multi-query generates: 'team member role-based access', 'user permission levels', 'admin and viewer account types', and 'sharing settings multiple users'. The four queries collectively retrieve 18 unique relevant chunks versus 4 for the original query. The reranker selects the 5 most relevant for the LLM context. Answer completeness for multi-part permission questions improves significantly, and customer queries requiring escalation on this topic decrease by 40%.
Common Mistakes
- ✕Generating too many query variations — 8+ queries create noise and latency without proportional recall improvement; 3-5 variations is typically optimal
- ✕Not deduplicating results before passing to the LLM — duplicate chunks waste context window space and may cause the LLM to over-weight repeated information
- ✕Applying multi-query to simple, specific questions — 'what is your refund policy?' does not benefit from multiple reformulations; apply multi-query selectively to complex queries
Related Terms
Query Expansion
Query expansion is a retrieval technique that augments the original user query with related terms, synonyms, or alternative phrasings before search, improving recall by retrieving relevant documents that would not match the original query vocabulary.
Query Rewriting
Query rewriting is a technique that transforms a user's original query into an improved version — clearer, more complete, or better suited for retrieval — using an LLM to improve recall and relevance before searching the knowledge base.
Hypothetical Document Embedding
Hypothetical Document Embedding (HyDE) is a RAG technique that improves retrieval by having an LLM generate a hypothetical document that would answer the user's query, then using that document's embedding rather than the query embedding for similarity search.
Reranking
Reranking is a second-stage retrieval step that takes an initial set of candidate documents returned by a fast retrieval method and reorders them using a more accurate but computationally expensive model to improve final result quality.
Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retrieving relevant documents from an external knowledge base and then using that retrieved content as context when generating an answer.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →