DCG (Discounted Cumulative Gain)

1. Purpose

DCG measures the usefulness (gain) of items in a ranked list, while applying a discount to lower-ranked items.
Motivation: Highly relevant documents at the top of the search results are more valuable than if they appear later.

2. Formula

For a ranked list of documents up to position $k$:

$DCG@k = \sum_{i=1}^k \frac{rel_i}{\log_2(i+1)}$

Where:

$rel_i$ = relevance score of the document at position $i$ (e.g., graded relevance: 0, 1, 2, 3).
$\log_2(i+1)$ = discount factor; the deeper the item, the less weight it gets.

3. Variants of Gain

Some definitions use exponential gain to emphasize higher relevance:

$DCG@k = \sum_{i=1}^k \frac{2^{rel_i} – 1}{\log_2(i+1)}$

$2^{rel_i} – 1$ ensures higher relevance levels contribute more.
Example: relevance 3 ≠ just “three times” relevance 1 — it’s exponentially more valuable.

4. Normalization → NDCG

Problem: DCG values are not bounded; they depend on dataset and number of items.
Solution: Normalize with IDCG (Ideal DCG) = DCG of the perfectly sorted ranking.

$NDCG@k = \frac{DCG@k}{IDCG@k}$

Ensures values are between 0 and 1.
1 = perfect ranking; closer to 0 = poor ranking.

5. Example

Suppose we have 5 documents, with true graded relevance scores:

Rank (i)	Relevance (relirel_ireli)
1	3
2	2
3	3
4	0
5	1

Compute DCG@5 (using exponential gain):

$DCG@5 = \frac{2^3 – 1}{\log_2(2)} + \frac{2^2 – 1}{\log_2(3)} + \frac{2^3 – 1}{\log_2(4)} + \frac{2^0 – 1}{\log_2(5)} + \frac{2^1 – 1}{\log_2(6)}$

$= \frac{7}{1} + \frac{3}{1.585} + \frac{7}{2} + 0 + \frac{1}{2.585}$

$= 7 + 1.89 + 3.5 + 0 + 0.39 = 12.78$

Then compute IDCG@5 by sorting true relevances $[3,3,2,1,0]$.
Divide to get NDCG@5.

6. Why Important

Used in search engines (Google, Bing, etc.), recommendation systems, and ranking tasks in ML.
Unlike accuracy/precision, DCG rewards systems for putting highly relevant items near the top.

7. Key Points

DCG = raw score, can grow arbitrarily.
NDCG = normalized, bounded between 0 and 1.
Discount factor = logarithmic, modeling user behavior (people rarely look deep into results).
Works for graded relevance, not just binary relevance.

Summary:
Discounted Cumulative Gain (DCG) evaluates ranked lists by summing relevance scores, discounted logarithmically by position. It rewards placing relevant documents higher. NDCG normalizes it for fair comparison across queries.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

DCG (Discounted Cumulative Gain)

1. Purpose

2. Formula

3. Variants of Gain

4. Normalization → NDCG

5. Example

6. Why Important

7. Key Points

Like this:

Related

Leave a ReplyCancel reply

1. Purpose

2. Formula

3. Variants of Gain

4. Normalization → NDCG

5. Example

6. Why Important

7. Key Points

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery