Retrieval-Augmented Generation (RAG)

Retrieval Recall

Definition

Retrieval recall is a core evaluation metric for RAG systems that quantifies how completely a retriever captures relevant documents. Formally, recall equals the number of relevant documents retrieved divided by the total number of relevant documents in the corpus. A recall of 1.0 means every relevant document was found; 0.5 means half were missed. In RAG contexts, low recall directly causes LLM failures—if the answer-containing document is not retrieved, no amount of generation sophistication can compensate. Recall is typically measured alongside precision, and the two metrics are often in tension: increasing retrieval breadth improves recall but lowers precision.

Why It Matters

Retrieval recall determines whether your RAG system even has a chance of answering questions correctly. Missing relevant documents at retrieval time is an unrecoverable error—the generation step cannot invent information it was never given. Teams building production RAG systems track recall to identify gaps in their retrieval strategy, whether that means better chunking, improved embeddings, or hybrid search combining dense and sparse methods. High recall is especially critical for compliance and customer support use cases where missing a single relevant policy document could lead to incorrect guidance.

How It Works

To measure recall, you need a ground-truth dataset pairing queries with their relevant documents. The retriever runs each query and returns its top-K results. For each query, you count how many ground-truth relevant documents appear in the retrieved set and divide by the total relevant count. Tools like RAGAS automate this evaluation. Improving recall typically involves expanding K (retrieve more candidates), switching to hybrid retrieval, refining chunking so relevant content isn't fragmented, or adding metadata filters that narrow the search space without excluding relevant content.

Retrieval Recall — Corpus Coverage

Recall Calculation

Retrieved Relevant

Total Relevant

0.75

Recall

All 12 relevant documents in corpus

Retrieved (9)

Missed (3)

Recall vs Precision — increasing K

K = 3

Recall

50%

Precision

100%

K = 9

Recall

75%

Precision

100%

K = 15

Recall

92%

Precision

73%

K = 20

Recall

100%

Precision

60%

Higher K improves recall but lowers precision — retrieve more, include more irrelevant.

Real-World Example

A 99helpers customer asks 'How do I reset my account password?' The ground truth labels three documents as relevant: a help center article, a FAQ entry, and a troubleshooting guide. If the retriever returns the help center article and FAQ entry but misses the troubleshooting guide, retrieval recall is 2/3 = 0.67. By switching to hybrid retrieval combining BM25 and dense embeddings, the team retrieves all three documents, achieving recall of 1.0 and eliminating answer gaps.

Common Mistakes

✕Optimizing only for recall without monitoring precision leads to noisy context that confuses the LLM with irrelevant content.
✕Using K=5 as a fixed number without testing whether relevant documents fall outside the top 5 for difficult queries.
✕Evaluating recall on an unrepresentative sample—common queries may have high recall while rare but important queries do not.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Retrieval Recall

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Retrieval Precision

RAG Evaluation

Hybrid Retrieval

Mean Reciprocal Rank (MRR)

Faithfulness

Ready to build your AI chatbot?