1. What is it?
- Ordinary Least Squares (OLS) regression models the conditional mean of $y$ given $x$:
- $E[y|x] = x^\top \beta$
- Quantile Regression models the conditional quantile of $y$ given $x$:
- $Q_\alpha(y|x) = x^\top \beta_\alpha$
- where $\alpha \in (0,1)$ is the quantile level.
Instead of predicting only the average outcome, quantile regression can predict the median, lower bound, or upper bound.
2. Why Use Quantile Regression?
- Data is often skewed or has heteroskedasticity (variance depends on $x$).
- A single mean estimate may not describe the full relationship.
- Quantile regression gives a fuller picture of how predictors affect different parts of the distribution.
Examples:
- Predicting median house price (0.5 quantile) vs 90th percentile price (0.9 quantile).
- Forecasting uncertainty bounds for demand or energy load.
3. Loss Function (Pinball Loss)
OLS minimizes squared errors.
Quantile regression minimizes asymmetric absolute errors, called pinball loss (or check function):
$L_\alpha(y, \hat{y}) = \begin{cases} \alpha \cdot (y – \hat{y}) & \text{if } y \geq \hat{y} \\ (1-\alpha) \cdot (\hat{y} – y) & \text{if } y < \hat{y} \end{cases}$
- For $\alpha = 0.5$, the loss is symmetric → median regression.
- For $\alpha = 0.9$, the loss penalizes underestimates more than overestimates → predicts upper quantile.
4. Example
Suppose we want to predict house price based on size.
- OLS: predicts mean price = $300,000.
- Quantile regression (α=0.1): predicts lower bound = $250,000.
- Quantile regression (α=0.9): predicts upper bound = $380,000.
Gives an interval forecast instead of a single number.
5. Interpretation
- Each quantile regression line shows how predictors influence different points of the conditional distribution.
- Example: Education might strongly increase top salaries (0.9 quantile) but only slightly increase lower salaries (0.1 quantile).
6. Applications
- Economics: wage distribution analysis, inequality studies.
- Finance: risk estimation (e.g., Value-at-Risk = 0.05 quantile).
- Forecasting: demand forecasting with prediction intervals.
- Medicine: predicting survival times at different quantile levels.
7. In Python (statsmodels / sklearn)
import statsmodels.api as sm
import pandas as pd
# Example data
df = pd.DataFrame({
"x": [1,2,3,4,5],
"y": [2,3,2,5,7]
})
X = sm.add_constant(df["x"])
y = df["y"]
# Quantile regression at median (0.5)
model = sm.QuantReg(y, X)
res = model.fit(q=0.5)
print(res.summary())
This estimates the median regression line.
Changing q=0.1 or q=0.9 estimates lower/upper quantiles.
Summary:
Quantile regression estimates conditional quantiles (median, percentiles) instead of just the mean.
- Uses pinball loss.
- Captures heterogeneity in how predictors affect the distribution of outcomes.
- Useful for risk analysis, uncertainty quantification, and interval forecasting.
