1. What is it?

  • Ordinary Least Squares (OLS) regression models the conditional mean of $y$ given $x$:
    • $E[y|x] = x^\top \beta$
  • Quantile Regression models the conditional quantile of $y$ given $x$:
    • $Q_\alpha(y|x) = x^\top \beta_\alpha$
    • where $\alpha \in (0,1)$ is the quantile level.

Instead of predicting only the average outcome, quantile regression can predict the median, lower bound, or upper bound.


2. Why Use Quantile Regression?

  • Data is often skewed or has heteroskedasticity (variance depends on $x$).
  • A single mean estimate may not describe the full relationship.
  • Quantile regression gives a fuller picture of how predictors affect different parts of the distribution.

Examples:

  • Predicting median house price (0.5 quantile) vs 90th percentile price (0.9 quantile).
  • Forecasting uncertainty bounds for demand or energy load.

3. Loss Function (Pinball Loss)

OLS minimizes squared errors.
Quantile regression minimizes asymmetric absolute errors, called pinball loss (or check function):

$L_\alpha(y, \hat{y}) = \begin{cases} \alpha \cdot (y – \hat{y}) & \text{if } y \geq \hat{y} \\ (1-\alpha) \cdot (\hat{y} – y) & \text{if } y < \hat{y} \end{cases}$

  • For $\alpha = 0.5$, the loss is symmetric → median regression.
  • For $\alpha = 0.9$, the loss penalizes underestimates more than overestimates → predicts upper quantile.

4. Example

Suppose we want to predict house price based on size.

  • OLS: predicts mean price = $300,000.
  • Quantile regression (α=0.1): predicts lower bound = $250,000.
  • Quantile regression (α=0.9): predicts upper bound = $380,000.

Gives an interval forecast instead of a single number.


5. Interpretation

  • Each quantile regression line shows how predictors influence different points of the conditional distribution.
  • Example: Education might strongly increase top salaries (0.9 quantile) but only slightly increase lower salaries (0.1 quantile).

6. Applications

  • Economics: wage distribution analysis, inequality studies.
  • Finance: risk estimation (e.g., Value-at-Risk = 0.05 quantile).
  • Forecasting: demand forecasting with prediction intervals.
  • Medicine: predicting survival times at different quantile levels.

7. In Python (statsmodels / sklearn)

import statsmodels.api as sm
import pandas as pd

# Example data
df = pd.DataFrame({
    "x": [1,2,3,4,5],
    "y": [2,3,2,5,7]
})

X = sm.add_constant(df["x"])
y = df["y"]

# Quantile regression at median (0.5)
model = sm.QuantReg(y, X)
res = model.fit(q=0.5)
print(res.summary())

This estimates the median regression line.
Changing q=0.1 or q=0.9 estimates lower/upper quantiles.


Summary:
Quantile regression estimates conditional quantiles (median, percentiles) instead of just the mean.

  • Uses pinball loss.
  • Captures heterogeneity in how predictors affect the distribution of outcomes.
  • Useful for risk analysis, uncertainty quantification, and interval forecasting.