1) What it is
- A statistical hypothesis test used to determine whether there’s a significant difference between observed frequencies and expected frequencies in categorical data.
- Based on the χ² distribution.
In plain terms: “Do the counts we see differ from what we’d expect by chance?”
2) Formula
$\chi^2 = \sum \frac{(O_i – E_i)^2}{E_i}$
- $O_i$: observed frequency
- $E_i$: expected frequency under the null hypothesis
- Large χ² value → observed data doesn’t match expected distribution → evidence against H₀.
3) Types of Chi-square Tests
- Goodness-of-fit test
- Tests if sample distribution matches a theoretical distribution.
- Example: “Do dice rolls follow a uniform distribution?”
- Test of independence
- Tests if two categorical variables are independent.
- Example: “Are gender and loan approval independent?”
- Homogeneity test
- Tests if different populations have the same distribution.
- Example: “Is customer preference for product color the same across regions?”
4) Assumptions
- Data are counts/frequencies (not percentages or continuous).
- Categories are mutually exclusive.
- Expected frequency ≥ 5 in most cells (for validity).
- Observations are independent.
5) Example (Independence Test)
| Loan Approved | Loan Denied | Total | |
|---|---|---|---|
| Male | 40 | 60 | 100 |
| Female | 30 | 70 | 100 |
| Total | 70 | 130 | 200 |
- Null hypothesis $H_0$: loan approval independent of gender.
- Compute expected frequencies, χ² statistic, p-value.
- If p < 0.05 → reject $H_0$ → approval depends on gender (potential fairness violation).
6) Applications in ML
- Feature selection: test dependence between categorical feature & target (e.g., chi² test in scikit-learn).
- Fairness checks: test if outcomes differ by protected group.
- Drift detection: compare categorical feature distributions (training vs production).
- Survey analysis / A/B testing: categorical response comparisons.
7) Python Example
import numpy as np
from scipy.stats import chi2_contingency
# Contingency table
data = np.array([[40, 60],
[30, 70]]) # rows: gender, cols: loan outcome
chi2, p, dof, expected = chi2_contingency(data)
print("Chi-square:", chi2)
print("p-value:", p)
print("Expected counts:\n", expected)
Summary
- Chi-square test = checks if observed categorical frequencies differ from expected.
- Types: goodness-of-fit, independence, homogeneity.
- Widely used in feature selection, fairness testing, drift monitoring.
- Pros: simple, interpretable. Cons: only works with categorical count data, needs large samples.
