1. Definition

  • Genre overlap measures how many genres two items (or two users’ preferences) have in common, relative to the total genres they cover.
  • It is basically a set similarity measure applied to genre tags.

Formally, if item $i$ has genre set $G_i$​ and item $j$ has genre set $G_j$​:

$\text{GenreOverlap}(i,j) = \frac{|G_i \cap G_j|}{|G_i \cup G_j|}$

This is the Jaccard index over genre sets.


2. Example

  • Movie A (Avengers): {Action, Sci-Fi, Adventure}
  • Movie B (Iron Man): {Action, Sci-Fi}

$\text{GenreOverlap}(A,B) = \frac{|\{Action, Sci-Fi\}|}{|\{Action, Sci-Fi, Adventure\}|} = \frac{2}{3} \approx 0.67$

So they have 67% genre overlap.


3. Usage in Recommender Systems

  • Item similarity: If two movies share many genres, they’re likely similar → used in content-based recommendations.
  • User profiling:
    • User’s preferred genres = union of genres of consumed items.
    • Genre overlap between a new item’s genres and the user’s preferred genres indicates suitability.
  • Diversity metrics: Genre overlap within a single recommendation list can indicate redundancy (low diversity).
    • High genre overlap across recommended items = the list is narrow.
    • Low overlap = more varied recommendations.

4. Variants

  • Raw count overlap: Just count how many genres overlap (e.g., 2 genres in common).
  • Weighted overlap: Weight by importance (e.g., primary vs secondary genre, or genre popularity).
  • Cosine similarity on genre vectors: Represent each item as a binary genre vector (e.g., [1,0,1,0,…]) and compute cosine similarity. This captures overlap in a slightly different way.

5. Relation to Other Metrics

  • Genre overlap = Jaccard similarity on genre sets.
  • Can be combined with other features (cast, keywords, embeddings) for hybrid similarity.
  • In evaluation, used as a proxy for diversity: high overlap means the list is less diverse.

Summary:
Genre overlap is a set-based similarity measure showing how much two items (or a user and an item) share in terms of genres. Typically measured as intersection / union (Jaccard index). It’s useful for content-based recommendation, profiling user interests, and measuring recommendation diversity.