Maximum Mean Discrepancy (MMD)

Definition

MMD is a statistical test to measure the difference between two probability distributions $P$ and $Q$ using samples.

If MMD is 0, the two distributions are identical (under the chosen kernel).
A larger MMD means the distributions are more different.

It’s widely used in machine learning to detect distribution shift (drift), compare training vs. test data, or evaluate generative models (GANs, VAEs, diffusion models).

Mathematical Idea

MMD compares the mean embeddings of two distributions in a Reproducing Kernel Hilbert Space (RKHS).

$\text{MMD}^2(P, Q) = \left\| \, \mu_P – \mu_Q \, \right\|_{\mathcal{H}}^2$

Where:

$\mu_P = \mathbb{E}_{x \sim P}[ \phi(x) ]$ → mean embedding of $P$ in RKHS
$\mu_Q = \mathbb{E}_{y \sim Q}[ \phi(y) ]$ → mean embedding of $Q$
$\phi(\cdot)$ = feature mapping defined by the kernel $k(x,y)$

Empirical Estimation (with kernel trick)

Given samples $\{x_i\}_{i=1}^m \sim P$ and $\{y_j\}_{j=1}^n \sim Q$:

$\text{MMD}^2(P, Q) \approx \frac{1}{m^2} \sum_{i=1}^m \sum_{i’=1}^m k(x_i, x_{i’}) + \frac{1}{n^2} \sum_{j=1}^n \sum_{j’=1}^n k(y_j, y_{j’}) – \frac{2}{mn} \sum_{i=1}^m \sum_{j=1}^n k(x_i, y_j)$

$k(x,y)$ is a kernel (commonly Gaussian RBF).
This formula avoids explicitly working in infinite-dimensional space.

Interpretation

Small MMD → two samples are from similar distributions.
Large MMD → two samples are from different distributions.

Applications

Drift Detection in ML
- Compare training vs. production data distributions.
- Detect covariate drift (feature shift).
Generative Model Evaluation
- Compare generated data vs. real data.
- MMD is often used to evaluate GAN quality.
Domain Adaptation
- Used in Deep Domain Adaptation (e.g., DANN, MMD-regularized networks) to align feature distributions between source and target domains.

Example

Suppose you train a fraud detection model on last year’s data (distribution $P$).
In 2025, transaction patterns shift (distribution $Q$).
If MMD between $P$ and $Q$ is high, you have covariate drift → the model may need retraining.

Summary:

MMD = kernel-based distance between two distributions.
Works well for high-dimensional data (unlike simpler tests like KS test).
Heavily used for drift detection and GAN evaluation.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Maximum Mean Discrepancy (MMD)

Definition

Mathematical Idea

Empirical Estimation (with kernel trick)

Interpretation

Applications

Example

Like this:

Related

Leave a ReplyCancel reply

Definition

Mathematical Idea

Empirical Estimation (with kernel trick)

Interpretation

Applications

Example

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery