1. Purpose

  • DCG measures the usefulness (gain) of items in a ranked list, while applying a discount to lower-ranked items.
  • Motivation: Highly relevant documents at the top of the search results are more valuable than if they appear later.

2. Formula

For a ranked list of documents up to position $k$:

$DCG@k = \sum_{i=1}^k \frac{rel_i}{\log_2(i+1)}$

Where:

  • $rel_i$​ = relevance score of the document at position $i$ (e.g., graded relevance: 0, 1, 2, 3).
  • $\log_2(i+1)$ = discount factor; the deeper the item, the less weight it gets.

3. Variants of Gain

Some definitions use exponential gain to emphasize higher relevance:

$DCG@k = \sum_{i=1}^k \frac{2^{rel_i} – 1}{\log_2(i+1)}$

  • $2^{rel_i} – 1$ ensures higher relevance levels contribute more.
  • Example: relevance 3 ≠ just “three times” relevance 1 — it’s exponentially more valuable.

4. Normalization → NDCG

  • Problem: DCG values are not bounded; they depend on dataset and number of items.
  • Solution: Normalize with IDCG (Ideal DCG) = DCG of the perfectly sorted ranking.

$NDCG@k = \frac{DCG@k}{IDCG@k}$

  • Ensures values are between 0 and 1.
  • 1 = perfect ranking; closer to 0 = poor ranking.

5. Example

Suppose we have 5 documents, with true graded relevance scores:

Rank (i)Relevance (relirel_ireli​)
13
22
33
40
51

Compute DCG@5 (using exponential gain):

$DCG@5 = \frac{2^3 – 1}{\log_2(2)} + \frac{2^2 – 1}{\log_2(3)} + \frac{2^3 – 1}{\log_2(4)} + \frac{2^0 – 1}{\log_2(5)} + \frac{2^1 – 1}{\log_2(6)}$

$= \frac{7}{1} + \frac{3}{1.585} + \frac{7}{2} + 0 + \frac{1}{2.585}$

$= 7 + 1.89 + 3.5 + 0 + 0.39 = 12.78$

  • Then compute IDCG@5 by sorting true relevances $[3,3,2,1,0]$.
  • Divide to get NDCG@5.

6. Why Important

  • Used in search engines (Google, Bing, etc.), recommendation systems, and ranking tasks in ML.
  • Unlike accuracy/precision, DCG rewards systems for putting highly relevant items near the top.

7. Key Points

  • DCG = raw score, can grow arbitrarily.
  • NDCG = normalized, bounded between 0 and 1.
  • Discount factor = logarithmic, modeling user behavior (people rarely look deep into results).
  • Works for graded relevance, not just binary relevance.

Summary:
Discounted Cumulative Gain (DCG) evaluates ranked lists by summing relevance scores, discounted logarithmically by position. It rewards placing relevant documents higher. NDCG normalizes it for fair comparison across queries.