1. Definition
- Genre overlap measures how many genres two items (or two users’ preferences) have in common, relative to the total genres they cover.
- It is basically a set similarity measure applied to genre tags.
Formally, if item $i$ has genre set $G_i$ and item $j$ has genre set $G_j$:
$\text{GenreOverlap}(i,j) = \frac{|G_i \cap G_j|}{|G_i \cup G_j|}$
This is the Jaccard index over genre sets.
2. Example
- Movie A (Avengers): {Action, Sci-Fi, Adventure}
- Movie B (Iron Man): {Action, Sci-Fi}
$\text{GenreOverlap}(A,B) = \frac{|\{Action, Sci-Fi\}|}{|\{Action, Sci-Fi, Adventure\}|} = \frac{2}{3} \approx 0.67$
So they have 67% genre overlap.
3. Usage in Recommender Systems
- Item similarity: If two movies share many genres, they’re likely similar → used in content-based recommendations.
- User profiling:
- User’s preferred genres = union of genres of consumed items.
- Genre overlap between a new item’s genres and the user’s preferred genres indicates suitability.
- Diversity metrics: Genre overlap within a single recommendation list can indicate redundancy (low diversity).
- High genre overlap across recommended items = the list is narrow.
- Low overlap = more varied recommendations.
4. Variants
- Raw count overlap: Just count how many genres overlap (e.g., 2 genres in common).
- Weighted overlap: Weight by importance (e.g., primary vs secondary genre, or genre popularity).
- Cosine similarity on genre vectors: Represent each item as a binary genre vector (e.g., [1,0,1,0,…]) and compute cosine similarity. This captures overlap in a slightly different way.
5. Relation to Other Metrics
- Genre overlap = Jaccard similarity on genre sets.
- Can be combined with other features (cast, keywords, embeddings) for hybrid similarity.
- In evaluation, used as a proxy for diversity: high overlap means the list is less diverse.
Summary:
Genre overlap is a set-based similarity measure showing how much two items (or a user and an item) share in terms of genres. Typically measured as intersection / union (Jaccard index). It’s useful for content-based recommendation, profiling user interests, and measuring recommendation diversity.
