1. Definition

NDCG measures the quality of ranking by considering both:

  • Relevance of retrieved items (are they useful?)
  • Position of items in the ranking (are useful items near the top?).

$NDCG@k = \frac{DCG@k}{IDCG@k}$

where:

  • $DCG@k$ = Discounted Cumulative Gain at cutoff $k$
  • $IDCG@k$ = Ideal DCG (best possible ranking at cutoff $k$)

2. DCG (Discounted Cumulative Gain)

$DCG@k = \sum_{i=1}^k \frac{2^{rel_i} – 1}{\log_2(i+1)}$

  • $rel_i$ = relevance score of item at position $i$
  • The denominator $\log_2(i+1)$ discounts items lower in the ranking (less useful if a relevant item is far down).

This reflects diminishing utility: it’s better to have a relevant item at rank 1 than at rank 10.


3. IDCG (Ideal DCG)

  • Sort items by true relevance in descending order.
  • Compute DCG on this ideal ranking.
  • Ensures NDCG is always in $[0, 1]$.

4. Example

Suppose a search engine returns 5 docs with true relevance scores:

RankRelevance (rel)
13
22
33
40
51
  • DCG@5:

$DCG@5 = \frac{2^3-1}{\log_2(2)} + \frac{2^2-1}{\log_2(3)} + \frac{2^3-1}{\log_2(4)} + \frac{2^0-1}{\log_2(5)} + \frac{2^1-1}{\log_2(6)}$​

$= \frac{7}{1} + \frac{3}{1.585} + \frac{7}{2} + 0 + \frac{1}{2.585} \approx 7 + 1.89 + 3.5 + 0 + 0.39 = 12.78$

  • IDCG@5: Arrange ideal order (3,3,2,1,0) and compute → about 13.65.
  • NDCG@5:

$NDCG@5 = \frac{12.78}{13.65} \approx 0.94$


5. Interpretation

  • Range: 0 → 1
  • 1.0 = perfect ranking (all relevant items in ideal order).
  • Higher NDCG = more useful ranking.
  • Advantage over MAP: can handle graded relevance (e.g., 0 = not relevant, 1 = somewhat relevant, 2 = highly relevant).

6. Use Cases

  • Search Engines: Evaluating quality of ranked search results.
  • Recommender Systems: Ranking recommended products by user preference.
  • Question Answering / Dialogue: Ranking candidate answers by relevance.

7. Python Example

import numpy as np
from sklearn.metrics import ndcg_score

# True relevance scores for 1 query (higher = more relevant)
y_true = np.array([[3, 2, 3, 0, 1]])

# Predicted scores by model
y_scores = np.array([[0.9, 0.8, 0.3, 0.2, 0.1]])

# Compute NDCG@5
ndcg = ndcg_score(y_true, y_scores, k=5)
print("NDCG@5:", ndcg)

Output:

NDCG@5: 0.94

Summary

  • NDCG = DCG / IDCG → measures ranking quality.
  • Rewards placing highly relevant items near the top.
  • Range [0, 1], higher = better.
  • Handles graded relevance (beyond binary relevant/not relevant).