Retrieval-Augmented Generation (RAG)

Corrective RAG (CRAG)

Definition

Corrective RAG is a RAG variant that introduces a retrieval evaluator—typically a lightweight classifier or LLM prompt—that scores the relevance of initially retrieved documents before passing them to the generator. If the evaluator judges retrieved documents as highly relevant, CRAG proceeds to generation normally. If documents are partially relevant, CRAG refines and supplements them. If documents are irrelevant or absent, CRAG triggers a fallback strategy such as web search, broader database queries, or query rewriting before retrying retrieval. This self-correcting loop improves robustness for out-of-distribution queries that the knowledge base doesn't cover well.

Why It Matters

RAG systems fail gracefully when they have relevant documents but fail badly when they don't—generating hallucinated or completely wrong answers. CRAG adds a safety layer by detecting when retrieval has failed before the generation step, enabling automated recovery rather than silent hallucination. For 99helpers chatbots deployed in fast-moving product environments where the knowledge base may lag behind recent releases, CRAG can automatically fall back to web search for questions about features not yet documented, preventing the chatbot from inventing answers.

How It Works

CRAG implementation requires three components: (1) a retrieval evaluator (can be a cross-encoder relevance scorer, an LLM prompt asking 'Is this document relevant to the query?', or a dedicated classifier); (2) a refinement module that cleans, deduplicates, or rewrites retrieved content; (3) a fallback retrieval source such as a web search API. At query time: retrieve → evaluate → branch: if relevant, generate; if ambiguous, refine and generate; if irrelevant, search fallback → retrieve again → generate. The threshold for triggering fallback is a hyperparameter balancing latency (fallback is slow) against accuracy (too high a threshold misses bad retrievals).

Corrective RAG (CRAG) — Retrieval Correction Flow

Query

User question

Retrieve Docs

Vector search

Relevance Evaluator

Scores each doc

High Relevance

score >= 0.7

Use retrieved docs

Proceed with original context

Ambiguous

score 0.3 – 0.7

Use both sources

Merge docs + web results

Low Relevance

score < 0.3

Trigger web search

Fetch fresh external info

Combine Best Context

Merge available information

Generate Answer

With best available context

Real-World Example

A 99helpers chatbot is asked: 'Does 99helpers support the new OpenAI GPT-5 model?' This model was announced after the last knowledge base update. Standard RAG retrieves outdated model list documentation and hallucinates a 'yes' or 'no' answer. CRAG's evaluator scores the retrieved docs as 'low relevance' for a query mentioning GPT-5 and triggers a web search fallback. The web search returns a current blog post confirming GPT-5 support, which CRAG injects as context. The chatbot accurately answers based on current information.

Common Mistakes

✕Building CRAG without a fallback budget—unlimited web searches add unbounded latency and cost to every low-confidence retrieval.
✕Using the same LLM for both evaluation and generation, creating a slow pipeline; a smaller, faster model for evaluation is more practical.
✕Triggering fallback too aggressively—if the evaluator's threshold is too low, nearly every query triggers fallback, negating the speed advantage of local retrieval.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Corrective RAG (CRAG)

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Self-RAG

Agentic RAG

RAG Evaluation

Query Rewriting

Retrieval-Augmented Generation

Ready to build your AI chatbot?