Retrieval-Augmented Generation (RAG)

Mean Reciprocal Rank (MRR)

Definition

Mean Reciprocal Rank evaluates the ranking quality of a retrieval system. For a single query, the reciprocal rank is 1/r, where r is the position of the first relevant document in the ranked result list. If the first relevant document is ranked first, the reciprocal rank is 1.0; if it is ranked second, 0.5; if ranked third, 0.33, and so on. MRR averages this score across all queries in an evaluation set. MRR is particularly useful for RAG systems where the top-retrieved document is given the most weight by the language model—if the most relevant document is buried at position 5, the LLM may produce a suboptimal response even if the document is present.

Why It Matters

MRR captures whether your retrieval system surfaces the most relevant content prominently. In RAG pipelines, LLMs give more attention to content appearing early in the context, so a high MRR directly correlates with better answer quality. Teams building 99helpers chatbots use MRR alongside recall and precision to tune their reranking strategies. A reranker that significantly boosts MRR—moving the first relevant result from position 4 to position 1—typically produces measurable improvements in response accuracy even with the same underlying retrieval corpus.

How It Works

To compute MRR, run an evaluation set of queries through the retriever. For each query, scan the ranked result list to find the first relevant document and record its reciprocal rank (1/position). Average these scores across all queries. An MRR of 1.0 means every query's first relevant document was ranked first; 0.5 means on average the first relevant document appears around position 2. Improving MRR typically involves a cross-encoder reranker, better embedding models, or query rewriting to align query representations with document representations.

Mean Reciprocal Rank (MRR) — Worked Example

How to reset password?

#1Password Reset Guiderel

#2Account Settings FAQ

#3Login Troubleshooting

#4Security Overview

First relevant: rank 1RR=1.00

Billing invoice download

#1Subscription Plans

#2Payment Methods

#3Download Your Invoicerel

#4Refund Policy

First relevant: rank 3RR=0.33

Cancel my subscription

#1Billing Dashboard

#2How to Cancel Subscriptionrel

#3Refund Policy

#4Account Deletion

First relevant: rank 2RR=0.50

MRR Calculation

RR(Q1) = 1 / 1 = 1.00

RR(Q2) = 1 / 3 = 0.33

RR(Q3) = 1 / 2 = 0.50

MRR = (1.00 + 0.33 + 0.50) / 3 = 1.83 / 3 = 0.61

0.8 – 1.0

Excellent

0.6 – 0.8

Good

< 0.6

Needs work

Real-World Example

A 99helpers knowledge base retriever returns 10 results per query. For the query 'set up Zapier integration,' the first relevant document appears at rank 3 (reciprocal rank = 0.33). For 'change chatbot color,' it appears at rank 1 (1.0). For 'export conversation logs,' at rank 2 (0.5). MRR = (0.33 + 1.0 + 0.5) / 3 = 0.61. After fine-tuning the embedding model on support data, the first relevant document consistently ranks first across all queries, pushing MRR to 0.92.

Common Mistakes

✕Confusing MRR with MAP (Mean Average Precision)—MRR only considers the first relevant document, while MAP considers all relevant documents.
✕Using MRR as the only metric when there are multiple relevant documents—it ignores whether all relevant documents are retrieved.
✕Evaluating on too small a query set where a few outlier queries disproportionately skew the average.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Mean Reciprocal Rank (MRR)

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Retrieval Recall

Retrieval Precision

RAG Evaluation

Reranking

Normalized Discounted Cumulative Gain (NDCG)

Ready to build your AI chatbot?