Corrective RAG (CRAG)
Definition
Corrective RAG is a RAG variant that introduces a retrieval evaluator—typically a lightweight classifier or LLM prompt—that scores the relevance of initially retrieved documents before passing them to the generator. If the evaluator judges retrieved documents as highly relevant, CRAG proceeds to generation normally. If documents are partially relevant, CRAG refines and supplements them. If documents are irrelevant or absent, CRAG triggers a fallback strategy such as web search, broader database queries, or query rewriting before retrying retrieval. This self-correcting loop improves robustness for out-of-distribution queries that the knowledge base doesn't cover well.
Why It Matters
RAG systems fail gracefully when they have relevant documents but fail badly when they don't—generating hallucinated or completely wrong answers. CRAG adds a safety layer by detecting when retrieval has failed before the generation step, enabling automated recovery rather than silent hallucination. For 99helpers chatbots deployed in fast-moving product environments where the knowledge base may lag behind recent releases, CRAG can automatically fall back to web search for questions about features not yet documented, preventing the chatbot from inventing answers.
How It Works
CRAG implementation requires three components: (1) a retrieval evaluator (can be a cross-encoder relevance scorer, an LLM prompt asking 'Is this document relevant to the query?', or a dedicated classifier); (2) a refinement module that cleans, deduplicates, or rewrites retrieved content; (3) a fallback retrieval source such as a web search API. At query time: retrieve → evaluate → branch: if relevant, generate; if ambiguous, refine and generate; if irrelevant, search fallback → retrieve again → generate. The threshold for triggering fallback is a hyperparameter balancing latency (fallback is slow) against accuracy (too high a threshold misses bad retrievals).
Corrective RAG (CRAG) — Retrieval Correction Flow
Query
User question
Retrieve Docs
Vector search
Relevance Evaluator
Scores each doc
High Relevance
score >= 0.7
Use retrieved docs
Proceed with original context
Ambiguous
score 0.3 – 0.7
Use both sources
Merge docs + web results
Low Relevance
score < 0.3
Trigger web search
Fetch fresh external info
Combine Best Context
Merge available information
Generate Answer
With best available context
Real-World Example
A 99helpers chatbot is asked: 'Does 99helpers support the new OpenAI GPT-5 model?' This model was announced after the last knowledge base update. Standard RAG retrieves outdated model list documentation and hallucinates a 'yes' or 'no' answer. CRAG's evaluator scores the retrieved docs as 'low relevance' for a query mentioning GPT-5 and triggers a web search fallback. The web search returns a current blog post confirming GPT-5 support, which CRAG injects as context. The chatbot accurately answers based on current information.
Common Mistakes
- ✕Building CRAG without a fallback budget—unlimited web searches add unbounded latency and cost to every low-confidence retrieval.
- ✕Using the same LLM for both evaluation and generation, creating a slow pipeline; a smaller, faster model for evaluation is more practical.
- ✕Triggering fallback too aggressively—if the evaluator's threshold is too low, nearly every query triggers fallback, negating the speed advantage of local retrieval.
Related Terms
Self-RAG
Self-RAG is an advanced RAG framework where the language model learns to decide when to retrieve, evaluate the relevance of retrieved passages, and assess the quality and groundedness of its own generated responses.
Agentic RAG
Agentic RAG extends basic RAG with autonomous planning and multi-step reasoning, where the AI agent decides which sources to query, in what order, and whether additional retrieval steps are needed before generating a final answer.
RAG Evaluation
RAG evaluation is the systematic measurement of a RAG system's quality across multiple dimensions — including retrieval accuracy, answer faithfulness, relevance, and completeness — to identify weaknesses and guide improvement.
Query Rewriting
Query rewriting is a technique that transforms a user's original query into an improved version — clearer, more complete, or better suited for retrieval — using an LLM to improve recall and relevance before searching the knowledge base.
Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retrieving relevant documents from an external knowledge base and then using that retrieved content as context when generating an answer.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →