1) What it is

  • A statistical hypothesis test used to determine whether there’s a significant difference between observed frequencies and expected frequencies in categorical data.
  • Based on the χ² distribution.

In plain terms: “Do the counts we see differ from what we’d expect by chance?”


2) Formula

$\chi^2 = \sum \frac{(O_i – E_i)^2}{E_i}$

  • $O_i$​: observed frequency
  • $E_i$​: expected frequency under the null hypothesis
  • Large χ² value → observed data doesn’t match expected distribution → evidence against H₀.

3) Types of Chi-square Tests

  1. Goodness-of-fit test
    • Tests if sample distribution matches a theoretical distribution.
    • Example: “Do dice rolls follow a uniform distribution?”
  2. Test of independence
    • Tests if two categorical variables are independent.
    • Example: “Are gender and loan approval independent?”
  3. Homogeneity test
    • Tests if different populations have the same distribution.
    • Example: “Is customer preference for product color the same across regions?”

4) Assumptions

  • Data are counts/frequencies (not percentages or continuous).
  • Categories are mutually exclusive.
  • Expected frequency ≥ 5 in most cells (for validity).
  • Observations are independent.

5) Example (Independence Test)

Loan ApprovedLoan DeniedTotal
Male4060100
Female3070100
Total70130200
  • Null hypothesis $H_0$​: loan approval independent of gender.
  • Compute expected frequencies, χ² statistic, p-value.
  • If p < 0.05 → reject $H_0$​ → approval depends on gender (potential fairness violation).

6) Applications in ML

  • Feature selection: test dependence between categorical feature & target (e.g., chi² test in scikit-learn).
  • Fairness checks: test if outcomes differ by protected group.
  • Drift detection: compare categorical feature distributions (training vs production).
  • Survey analysis / A/B testing: categorical response comparisons.

7) Python Example

import numpy as np
from scipy.stats import chi2_contingency

# Contingency table
data = np.array([[40, 60],
                 [30, 70]])  # rows: gender, cols: loan outcome

chi2, p, dof, expected = chi2_contingency(data)
print("Chi-square:", chi2)
print("p-value:", p)
print("Expected counts:\n", expected)

Summary

  • Chi-square test = checks if observed categorical frequencies differ from expected.
  • Types: goodness-of-fit, independence, homogeneity.
  • Widely used in feature selection, fairness testing, drift monitoring.
  • Pros: simple, interpretable. Cons: only works with categorical count data, needs large samples.