1. Definition
NDCG measures the quality of ranking by considering both:
- Relevance of retrieved items (are they useful?)
- Position of items in the ranking (are useful items near the top?).
$NDCG@k = \frac{DCG@k}{IDCG@k}$
where:
- $DCG@k$ = Discounted Cumulative Gain at cutoff $k$
- $IDCG@k$ = Ideal DCG (best possible ranking at cutoff $k$)
2. DCG (Discounted Cumulative Gain)
$DCG@k = \sum_{i=1}^k \frac{2^{rel_i} – 1}{\log_2(i+1)}$
- $rel_i$ = relevance score of item at position $i$
- The denominator $\log_2(i+1)$ discounts items lower in the ranking (less useful if a relevant item is far down).
This reflects diminishing utility: it’s better to have a relevant item at rank 1 than at rank 10.
3. IDCG (Ideal DCG)
- Sort items by true relevance in descending order.
- Compute DCG on this ideal ranking.
- Ensures NDCG is always in $[0, 1]$.
4. Example
Suppose a search engine returns 5 docs with true relevance scores:
| Rank | Relevance (rel) |
|---|---|
| 1 | 3 |
| 2 | 2 |
| 3 | 3 |
| 4 | 0 |
| 5 | 1 |
- DCG@5:
$DCG@5 = \frac{2^3-1}{\log_2(2)} + \frac{2^2-1}{\log_2(3)} + \frac{2^3-1}{\log_2(4)} + \frac{2^0-1}{\log_2(5)} + \frac{2^1-1}{\log_2(6)}$
$= \frac{7}{1} + \frac{3}{1.585} + \frac{7}{2} + 0 + \frac{1}{2.585} \approx 7 + 1.89 + 3.5 + 0 + 0.39 = 12.78$
- IDCG@5: Arrange ideal order (3,3,2,1,0) and compute → about 13.65.
- NDCG@5:
$NDCG@5 = \frac{12.78}{13.65} \approx 0.94$
5. Interpretation
- Range: 0 → 1
- 1.0 = perfect ranking (all relevant items in ideal order).
- Higher NDCG = more useful ranking.
- Advantage over MAP: can handle graded relevance (e.g., 0 = not relevant, 1 = somewhat relevant, 2 = highly relevant).
6. Use Cases
- Search Engines: Evaluating quality of ranked search results.
- Recommender Systems: Ranking recommended products by user preference.
- Question Answering / Dialogue: Ranking candidate answers by relevance.
7. Python Example
import numpy as np
from sklearn.metrics import ndcg_score
# True relevance scores for 1 query (higher = more relevant)
y_true = np.array([[3, 2, 3, 0, 1]])
# Predicted scores by model
y_scores = np.array([[0.9, 0.8, 0.3, 0.2, 0.1]])
# Compute NDCG@5
ndcg = ndcg_score(y_true, y_scores, k=5)
print("NDCG@5:", ndcg)
Output:
NDCG@5: 0.94
Summary
- NDCG = DCG / IDCG → measures ranking quality.
- Rewards placing highly relevant items near the top.
- Range [0, 1], higher = better.
- Handles graded relevance (beyond binary relevant/not relevant).
