Retrieval-Augmented Generation (RAG)

Query Decomposition

Definition

Query decomposition addresses a core limitation of embedding-based retrieval: a single embedding vector representing a complex question may not closely match any single document chunk, leading to poor retrieval. By decomposing 'What are the pricing plans for 99helpers and what integrations does each include?' into sub-queries—(1) 'What are the pricing plans?' (2) 'What integrations are available on each plan?'—each sub-query produces a focused embedding that closely matches specific chunks. The retrieved results for all sub-queries are combined and deduplicated before passing to the LLM. Query decomposition is particularly effective for comparison questions, multi-step questions, and questions requiring synthesis across multiple topics.

Why It Matters

Complex queries are the hardest for RAG systems to answer correctly because they require retrieving multiple distinct pieces of information that may live in different document sections. Without decomposition, a single retrieval attempt for a complex query either retrieves a mix of partially relevant chunks or misses some required information entirely. For 99helpers customers whose users ask complex configuration or comparison questions, query decomposition significantly improves answer completeness and reduces the need for follow-up questions. It is one of the most impactful retrieval enhancement techniques for knowledge-intensive query types.

How It Works

Query decomposition is typically implemented using an LLM prompt before retrieval: 'Given the question: [query], identify the distinct sub-questions that need to be answered to fully address this question. Return a JSON list of sub-questions.' The LLM returns 2-5 sub-questions. Each sub-question is embedded and used to retrieve top-K chunks independently. All retrieved chunks are collected, deduplicated by content hash, and assembled into a combined context for the generation step. LlamaIndex's SubQuestionQueryEngine and LangChain's MultiQueryRetriever implement this pattern. The overhead is one additional LLM call for decomposition, which is acceptable for complex queries but wasteful for simple ones.

Query Decomposition — Breaking Complex Questions Apart

Original complex query

"What are the pricing differences between the Pro and Enterprise plans, and which integrations does each support?"

LLM Decomposer

1Sub-query 1

Pro plan pricing

2Sub-query 2

Enterprise plan pricing

3Sub-query 3

Integration support by plan

Retrieved docs A1, A2

Retrieved docs B1, B3

Retrieved docs C2, C4

Synthesize answers into final response

Each sub-query targets a focused retrieval — results combined for a complete answer

Real-World Example

A 99helpers user asks: 'Can I use the chatbot API with Zapier and what data does it send?' Embedding-based retrieval for this combined query retrieves general API documentation but misses the Zapier-specific integration guide. Query decomposition splits it: (1) 'What is the chatbot API?' (2) 'How do I integrate the chatbot with Zapier?' (3) 'What data does the chatbot API send in webhooks?' Three separate retrievals surface three focused document chunks. The combined context enables the LLM to provide a complete answer covering all three aspects, eliminating the need for three separate follow-up questions.

Common Mistakes

✕Applying query decomposition to every query regardless of complexity—simple, single-topic queries don't benefit and the extra LLM call adds unnecessary latency.
✕Generating too many sub-queries (5+)—more sub-queries means more retrieval calls and more context, increasing latency and cost without proportional quality improvement.
✕Skipping deduplication when combining sub-query results—the same chunk may be highly relevant to multiple sub-queries and should appear only once in the context.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Query Decomposition

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Multi-Query Retrieval

Query Rewriting

Query Expansion

RAG Fusion

Retrieval Pipeline

Ready to build your AI chatbot?