1) Definition
- The median is the middle value of a dataset when values are ordered.
- If the dataset has:
- Odd number of values → the middle one is the median.
- Even number of values → the average of the two middle ones.
It represents the “typical” value, less affected by extreme outliers than the mean.
2) Formula / Procedure
Given $n$ sorted values: $x_{(1)}, x_{(2)}, \dots, x_{(n)}$
$\text{Median} = \begin{cases} x_{(\frac{n+1}{2})}, & n \text{ odd} \\ \frac{x_{(\frac{n}{2})} + x_{(\frac{n}{2}+1)}}{2}, & n \text{ even} \end{cases}$
3) Examples
- Data: [2, 4, 6, 8, 10] (odd count = 5)
Median = 6 - Data: [2, 4, 6, 8] (even count = 4)
Median = (4 + 6) / 2 = 5
4) Properties
Robust to outliers (unlike the mean).
Good measure of central tendency for skewed distributions.
Ignores actual magnitudes of values (just position).
Less efficient than mean when data is symmetric and well-behaved.
5) Applications
- Income data: median is better than mean because income distributions are skewed.
- House prices: avoids distortion from luxury mansions skewing the mean.
- Feature engineering:
- Median imputation for missing values (more robust than mean).
- Model evaluation: median absolute error (robust alternative to MSE).
6) Mean vs Median
- Symmetric distributions (e.g., Gaussian) → mean ≈ median.
- Skewed distributions → median is more representative.
Example: [2, 3, 4, 100]
- Mean = 27.25 (pulled by outlier).
- Median = (3 + 4)/2 = 3.5 (better representation of typical values).
Summary
- Median = middle value of ordered data.
- Robust against outliers, great for skewed distributions.
- Common in income, housing, imputation, robust error metrics.
