Adaptive RAG
Definition
Adaptive RAG is a meta-strategy that routes queries to the most appropriate retrieval approach rather than applying a fixed pipeline to every question. Simple factual queries that the LLM can answer from parametric memory (e.g., 'What does API stand for?') are answered directly without retrieval, saving latency and cost. Moderately complex queries use single-step RAG (one round of retrieval). Complex, multi-hop queries trigger iterative retrieval where the LLM reasons through multiple retrieval steps. Query routing is performed by a classifier trained to predict query complexity, or by an LLM prompted to categorize the query. Adaptive RAG combines the efficiency of no-retrieval with the accuracy of multi-step RAG.
Why It Matters
A RAG system that retrieves for every query wastes resources on questions the LLM already knows and over-complicates simple lookups. Conversely, a system that only does single-step retrieval fails on complex queries requiring multi-hop reasoning. Adaptive RAG finds the right approach for each query, optimizing cost (no-retrieval is cheapest), latency (single-step is faster than iterative), and quality (iterative handles complexity that single-step misses). For 99helpers chatbots handling diverse query types—from simple definitional questions to complex troubleshooting requiring multiple document lookups—Adaptive RAG can reduce average response latency by 25-40% while maintaining quality on complex queries.
How It Works
Adaptive RAG requires a query complexity classifier. Options: (1) a small LLM prompted to categorize queries as 'simple', 'moderate', or 'complex'; (2) a fine-tuned text classifier trained on labeled examples; (3) rule-based routing (queries containing comparison words, 'how do I', specific error codes → complex; short factual questions → simple). Based on the routing decision, the query is dispatched to: no-retrieval path (LLM called directly), single-step RAG (standard vector retrieval), or multi-step RAG (agentic loop with iterative retrieval). Monitoring the routing distribution in production identifies whether the classifier is calibrated correctly—too many 'complex' routings indicate over-triggering of expensive paths.
Adaptive RAG — Query Complexity Routing
Complexity Classifier
Routes query to best path
Direct LLM
No retrieval
Parametric memory
~200 ms
Single-Step RAG
One retrieval round
Vector search once
~400 ms
Multi-Step Agentic
Iterative retrieval
ReAct loop ×N
~1200 ms
Routing tradeoffs
Cost
Accuracy
Real-World Example
A 99helpers chatbot receives 1,000 queries per hour. Analysis shows 30% are simple definitional questions ('What is a webhook?'), 50% are moderate support questions ('How do I configure X?'), and 20% are complex troubleshooting ('My integration stopped working after I changed Y—why?'). Adaptive RAG routes simple queries directly to the LLM (no retrieval, 200ms response time), moderate queries through single-step RAG (400ms), and complex queries through agentic iterative retrieval (1200ms). Average response time: 0.3(200) + 0.5(400) + 0.2(1200) = 500ms, compared to 700ms if all queries used single-step RAG.
Common Mistakes
- ✕Building an inaccurate query classifier that routes too many queries to the expensive iterative path, eliminating the cost and latency benefits.
- ✕Never routing to the no-retrieval path for domains where parametric knowledge is unreliable—always verify domain-specific answers with retrieval.
- ✕Ignoring routing errors in production monitoring—misclassified complex queries sent to single-step RAG produce poor answers silently.
Related Terms
Agentic RAG
Agentic RAG extends basic RAG with autonomous planning and multi-step reasoning, where the AI agent decides which sources to query, in what order, and whether additional retrieval steps are needed before generating a final answer.
Corrective RAG (CRAG)
Corrective RAG (CRAG) adds a self-evaluation step that assesses retrieved document relevance and automatically triggers web search or knowledge base expansion when initial retrieval is deemed insufficient.
Self-RAG
Self-RAG is an advanced RAG framework where the language model learns to decide when to retrieve, evaluate the relevance of retrieved passages, and assess the quality and groundedness of its own generated responses.
Query Decomposition
Query decomposition breaks a complex, multi-part user question into simpler sub-queries that can each be answered independently, improving RAG retrieval by matching each sub-query against relevant document segments.
Retrieval Pipeline
A retrieval pipeline is the online query-time workflow that transforms a user question into a ranked set of relevant document chunks, serving as the information retrieval stage of a RAG system.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →