Query Decomposition
Definition
Query decomposition addresses a core limitation of embedding-based retrieval: a single embedding vector representing a complex question may not closely match any single document chunk, leading to poor retrieval. By decomposing 'What are the pricing plans for 99helpers and what integrations does each include?' into sub-queries—(1) 'What are the pricing plans?' (2) 'What integrations are available on each plan?'—each sub-query produces a focused embedding that closely matches specific chunks. The retrieved results for all sub-queries are combined and deduplicated before passing to the LLM. Query decomposition is particularly effective for comparison questions, multi-step questions, and questions requiring synthesis across multiple topics.
Why It Matters
Complex queries are the hardest for RAG systems to answer correctly because they require retrieving multiple distinct pieces of information that may live in different document sections. Without decomposition, a single retrieval attempt for a complex query either retrieves a mix of partially relevant chunks or misses some required information entirely. For 99helpers customers whose users ask complex configuration or comparison questions, query decomposition significantly improves answer completeness and reduces the need for follow-up questions. It is one of the most impactful retrieval enhancement techniques for knowledge-intensive query types.
How It Works
Query decomposition is typically implemented using an LLM prompt before retrieval: 'Given the question: [query], identify the distinct sub-questions that need to be answered to fully address this question. Return a JSON list of sub-questions.' The LLM returns 2-5 sub-questions. Each sub-question is embedded and used to retrieve top-K chunks independently. All retrieved chunks are collected, deduplicated by content hash, and assembled into a combined context for the generation step. LlamaIndex's SubQuestionQueryEngine and LangChain's MultiQueryRetriever implement this pattern. The overhead is one additional LLM call for decomposition, which is acceptable for complex queries but wasteful for simple ones.
Query Decomposition — Breaking Complex Questions Apart
Original complex query
"What are the pricing differences between the Pro and Enterprise plans, and which integrations does each support?"
Pro plan pricing
Enterprise plan pricing
Integration support by plan
Retrieved docs A1, A2
Retrieved docs B1, B3
Retrieved docs C2, C4
Each sub-query targets a focused retrieval — results combined for a complete answer
Real-World Example
A 99helpers user asks: 'Can I use the chatbot API with Zapier and what data does it send?' Embedding-based retrieval for this combined query retrieves general API documentation but misses the Zapier-specific integration guide. Query decomposition splits it: (1) 'What is the chatbot API?' (2) 'How do I integrate the chatbot with Zapier?' (3) 'What data does the chatbot API send in webhooks?' Three separate retrievals surface three focused document chunks. The combined context enables the LLM to provide a complete answer covering all three aspects, eliminating the need for three separate follow-up questions.
Common Mistakes
- ✕Applying query decomposition to every query regardless of complexity—simple, single-topic queries don't benefit and the extra LLM call adds unnecessary latency.
- ✕Generating too many sub-queries (5+)—more sub-queries means more retrieval calls and more context, increasing latency and cost without proportional quality improvement.
- ✕Skipping deduplication when combining sub-query results—the same chunk may be highly relevant to multiple sub-queries and should appear only once in the context.
Related Terms
Multi-Query Retrieval
Multi-query retrieval generates multiple alternative phrasings of the user's question and retrieves documents for each phrasing separately, then merges results to achieve higher recall than any single query formulation would provide.
Query Rewriting
Query rewriting is a technique that transforms a user's original query into an improved version — clearer, more complete, or better suited for retrieval — using an LLM to improve recall and relevance before searching the knowledge base.
Query Expansion
Query expansion is a retrieval technique that augments the original user query with related terms, synonyms, or alternative phrasings before search, improving recall by retrieving relevant documents that would not match the original query vocabulary.
RAG Fusion
RAG Fusion is a retrieval technique that generates multiple query variations, retrieves documents for each, and uses Reciprocal Rank Fusion (RRF) to merge the ranked result lists, improving overall retrieval coverage and quality.
Retrieval Pipeline
A retrieval pipeline is the online query-time workflow that transforms a user question into a ranked set of relevant document chunks, serving as the information retrieval stage of a RAG system.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →