Retrieval-Augmented Generation (RAG)

Normalized Discounted Cumulative Gain (NDCG)

Definition

Normalized Discounted Cumulative Gain extends simpler metrics like precision and MRR by accounting for graded relevance—documents can be labeled highly relevant, somewhat relevant, or not relevant, not just binary. DCG sums the relevance grades of retrieved documents, discounting each by the logarithm of its rank position to penalize relevant documents buried lower in the list. NDCG normalizes DCG by the ideal DCG (the best possible ranking), yielding a score between 0 and 1 regardless of query difficulty. NDCG@K evaluates quality in the top-K positions, making it ideal for RAG systems where K documents are passed to the LLM.

Why It Matters

NDCG is the gold standard for evaluating search and retrieval systems when relevance is not binary. In RAG pipelines, not all relevant documents are equally useful—the most authoritative, detailed document should rank first, while partially relevant documents should rank below it. Teams at large-scale 99helpers deployments use NDCG@10 when comparing retrieval strategies, because it penalizes systems that retrieve relevant documents but rank them poorly. A high NDCG score indicates that when users (or LLMs) read the top results in order, they encounter the most relevant content first.

How It Works

Computing NDCG requires graded relevance labels (e.g., 0 = not relevant, 1 = somewhat relevant, 2 = highly relevant). For each query, compute DCG = sum(relevance_grade[i] / log2(i + 1)) for positions i=1..K. Compute IDCG by sorting documents in ideal relevance order. NDCG = DCG / IDCG. Automated grading can use LLM judges with rubrics, reducing the human labeling burden. NDCG is commonly used in reranker fine-tuning, where the reranker is trained to maximize NDCG on labeled training queries.

NDCG — Ranking Quality Metric

Query: "How do I cancel my subscription?"

How to cancel your plan

Billing & subscription FAQ

+1.89

Account deletion guide

Pricing overview page

+0.43

DCG

6.32

Ideal DCG

7.14

NDCG Score

0.87

Real-World Example

A 99helpers evaluation dataset grades documents on a 0-2 scale. For the query 'configure webhook notifications,' the retriever returns: highly relevant guide (rank 2), somewhat relevant API docs (rank 1), irrelevant billing page (rank 3). DCG = 1/log2(2) + 2/log2(3) + 0 = 1 + 1.26 + 0 = 2.26. IDCG with ideal ordering = 2/log2(2) + 1/log2(3) = 2 + 0.63 = 2.63. NDCG = 2.26/2.63 = 0.86. After reranking, the highly relevant guide moves to rank 1, pushing NDCG to 0.95.

Common Mistakes

✕Using NDCG with binary relevance labels defeats its purpose—it reduces to a less-informative version of precision in that case.
✕Computing NDCG@K with a very small K (e.g., K=1) makes it equivalent to MRR, losing the graded ranking signal.
✕Ignoring the cost of human labeling graded relevance—synthetic LLM-judged labels can introduce bias if not calibrated.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Normalized Discounted Cumulative Gain (NDCG)

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Mean Reciprocal Rank (MRR)

Retrieval Precision

RAG Evaluation

Reranking

Retrieval Recall

Ready to build your AI chatbot?