1. Purpose
- DCG measures the usefulness (gain) of items in a ranked list, while applying a discount to lower-ranked items.
- Motivation: Highly relevant documents at the top of the search results are more valuable than if they appear later.
2. Formula
For a ranked list of documents up to position $k$:
$DCG@k = \sum_{i=1}^k \frac{rel_i}{\log_2(i+1)}$
Where:
- $rel_i$ = relevance score of the document at position $i$ (e.g., graded relevance: 0, 1, 2, 3).
- $\log_2(i+1)$ = discount factor; the deeper the item, the less weight it gets.
3. Variants of Gain
Some definitions use exponential gain to emphasize higher relevance:
$DCG@k = \sum_{i=1}^k \frac{2^{rel_i} – 1}{\log_2(i+1)}$
- $2^{rel_i} – 1$ ensures higher relevance levels contribute more.
- Example: relevance 3 ≠ just “three times” relevance 1 — it’s exponentially more valuable.
4. Normalization → NDCG
- Problem: DCG values are not bounded; they depend on dataset and number of items.
- Solution: Normalize with IDCG (Ideal DCG) = DCG of the perfectly sorted ranking.
$NDCG@k = \frac{DCG@k}{IDCG@k}$
- Ensures values are between 0 and 1.
- 1 = perfect ranking; closer to 0 = poor ranking.
5. Example
Suppose we have 5 documents, with true graded relevance scores:
| Rank (i) | Relevance (relirel_ireli) |
|---|---|
| 1 | 3 |
| 2 | 2 |
| 3 | 3 |
| 4 | 0 |
| 5 | 1 |
Compute DCG@5 (using exponential gain):
$DCG@5 = \frac{2^3 – 1}{\log_2(2)} + \frac{2^2 – 1}{\log_2(3)} + \frac{2^3 – 1}{\log_2(4)} + \frac{2^0 – 1}{\log_2(5)} + \frac{2^1 – 1}{\log_2(6)}$
$= \frac{7}{1} + \frac{3}{1.585} + \frac{7}{2} + 0 + \frac{1}{2.585}$
$= 7 + 1.89 + 3.5 + 0 + 0.39 = 12.78$
- Then compute IDCG@5 by sorting true relevances $[3,3,2,1,0]$.
- Divide to get NDCG@5.
6. Why Important
- Used in search engines (Google, Bing, etc.), recommendation systems, and ranking tasks in ML.
- Unlike accuracy/precision, DCG rewards systems for putting highly relevant items near the top.
7. Key Points
- DCG = raw score, can grow arbitrarily.
- NDCG = normalized, bounded between 0 and 1.
- Discount factor = logarithmic, modeling user behavior (people rarely look deep into results).
- Works for graded relevance, not just binary relevance.
Summary:
Discounted Cumulative Gain (DCG) evaluates ranked lists by summing relevance scores, discounted logarithmically by position. It rewards placing relevant documents higher. NDCG normalizes it for fair comparison across queries.
