1. Definition

MAP evaluates the quality of ranked retrieval results (e.g., search engines, recommendation systems, information retrieval).

  • Average Precision (AP): For a single query, compute the precision each time a relevant item is retrieved, then average across all relevant items.
  • MAP: The mean of AP values across multiple queries.

Formally:

$MAP = \frac{1}{Q} \sum_{q=1}^Q AP(q)$

where:

  • $Q$ = total number of queries
  • $AP(q)$ = average precision for query $q$

2. How AP is Calculated (for one query)

$AP = \frac{1}{R} \sum_{k=1}^N P(k) \cdot rel(k)$

  • $N$ = total retrieved items
  • $R$ = number of relevant items for this query
  • $P(k)$ = precision at cutoff $k$
  • $rel(k)$ = 1 if item at rank $k$ is relevant, else 0

In words: average of precisions at positions where relevant items appear.


3. Example

Query returns 5 documents; relevant = {doc2, doc4}.

RankDocRelevant?Precision@k
1d10
2d211/2 = 0.5
3d30
4d412/4 = 0.5
5d50
  • Relevant docs = 2 (R=2)
  • AP = (0.5 + 0.5) / 2 = 0.5
  • If we had multiple queries, MAP = mean of their AP values.

4. Interpretation

  • Range: 0 → 1
  • 1.0 = perfect ranking (all relevant items retrieved at the top).
  • Higher MAP = better retrieval/ranking model.

5. Use Cases

  • Search Engines: Ranking relevant documents for queries.
  • Recommender Systems: Ranking items a user is likely to engage with.
  • Information Retrieval Competitions (TREC, Kaggle): MAP is a standard leaderboard metric.

6. Python Example

from sklearn.metrics import average_precision_score
import numpy as np

# True labels: 1 = relevant, 0 = not relevant
y_true = np.array([0, 1, 0, 1, 0])

# Predicted scores (higher = more relevant)
y_scores = np.array([0.2, 0.8, 0.3, 0.7, 0.1])

ap = average_precision_score(y_true, y_scores)
print("Average Precision (AP):", ap)

Output:

Average Precision (AP): 0.75
  • If multiple queries exist, you average across them to get MAP.

Summary

  • AP: Mean of precision values at ranks where relevant items appear.
  • MAP: Mean of AP across queries.
  • Range: 0–1, higher = better.
  • Used for ranking, retrieval, recommendation, and search evaluation.