IID (Independent and Identically Distributed)

Date: August 24, 2025Author: Jenny Eum 0 Comments

Definition

IID stands for Independent and Identically Distributed, a fundamental assumption in statistics and machine learning.

Independent → each data point does not depend on any other.
Identically Distributed → all data points come from the same probability distribution.

In short:

$X_1, X_2, \dots, X_n \sim \text{i.i.d. from distribution } P(X)$

Breakdown

Independence
- Knowing one sample gives no information about another.
- Example: flipping a fair coin 10 times → each flip is independent.
Identical Distribution
- All samples follow the same distribution (same mean, variance, etc.).
- Example: every coin flip has the same probability $P(H) = 0.5, P(T) = 0.5$.

Examples

IID data (good case)
- Random sampling from a population (e.g., survey respondents chosen randomly).
Non-IID data (bad case)
- Time series: today’s stock price depends on yesterday’s → not independent.
- Changing population: early customers vs. late customers may have different distributions → not identically distributed.
- Grouped data: multiple entries from the same patient → correlated, not independent.

Why It Matters

Many algorithms (linear regression, logistic regression, hypothesis tests, neural nets at the start) assume IID.
IID assumption makes probability theory simpler (law of large numbers, central limit theorem).
Violations (non-IID data) → biased estimates, overconfident predictions.

In Machine Learning

Training Data: Often assumed IID (each sample independent, same distribution).
Reality: Often violated (autocorrelation in time series, drift in production, data leakage).
Handling non-IID requires special methods (time-series CV, grouped CV, domain adaptation).

Summary
IID = Independent and Identically Distributed.

Independent = no correlation between samples.
Identically distributed = same probability distribution.
Assumption makes math/statistics tractable, but often violated in real-world data (time series, grouped data, distribution shifts).

Related

Leave a ReplyCancel reply