1. What are Percentiles?
- A percentile is a quantile expressed in percentage terms.
- Example:
- 25th percentile = value below which 25% of data fall.
- 50th percentile = median.
- 90th percentile = value below which 90% of data fall.
So predicting percentiles = estimating these cut-points of the outcome distribution, possibly conditional on predictors.
2. Why Predict Percentiles?
- Beyond the mean: Standard regression predicts the average (mean).
- Percentiles tell us about distributional behavior:
- Lower percentiles = pessimistic scenarios.
- Higher percentiles = optimistic scenarios.
- Useful in risk assessment, demand forecasting, and fairness analysis.
Examples:
- Retail: predict 90th percentile demand (to set stock buffers).
- Finance: predict 5th percentile return (Value-at-Risk).
- Medicine: predict 95th percentile of wait times (worst-case planning).
3. How to Predict Percentiles
A. Quantile Regression
- Directly models conditional quantiles: $Q_\alpha(y|x) = x^\top \beta_\alpha$
- Choose different α\alphaα levels (e.g., 0.1, 0.5, 0.9) to predict multiple percentiles.
B. Distributional Forecasting
- Fit a parametric distribution (e.g., Normal, Lognormal) to predictions.
- Then compute percentiles from the fitted CDF.
- Example: If $y \sim N(\mu,\sigma^2)$, the 95th percentile is $\mu + 1.645\sigma$.
C. Empirical / Simulation-Based
- Use bootstrapping, Bayesian posterior samples, or ensembles.
- Collect predictive samples and compute empirical percentiles.
4. Example
Suppose we want to forecast daily demand.
- Model outputs:
- 10th percentile = 85 units
- 50th percentile = 100 units (median forecast)
- 90th percentile = 120 units
Interpretation:
- Most likely demand ≈ 100
- In a pessimistic scenario (low demand), ≈ 85
- In a high-demand scenario, ≈ 120
5. Relation to Prediction Intervals
- A prediction interval is defined by two percentiles.
- Example: A 90% prediction interval = [5th percentile, 95th percentile].
6. Applications
- Finance: Value-at-Risk = a lower percentile of return distribution.
- Forecasting: supply chain, electricity load, weather extremes.
- Medicine: survival analysis (percentile life expectancy).
- Recommender Systems: estimate distribution of rating or engagement percentiles.
Summary:
Predicting percentiles = estimating conditional quantiles of the outcome (e.g., 10th, 50th, 90th) instead of just the mean. Methods include quantile regression, distributional modeling, and simulation/bootstrapping. Percentile predictions provide richer information, enabling risk management, uncertainty quantification, and scenario planning.
