1) Meaning
In feature engineering, sensitivity refers to how much a machine learning model’s predictions depend on the way features are represented (scales, encodings, transformations).
A model is said to be sensitive if small differences in feature representation cause large changes in model behavior.
2) Examples
a) Numerical Features (scaling issues)
- Income = [10,000 … 200,000]
- Age = [18 … 90]
- If not normalized, a distance-based model (like k-NN, SVM, clustering) will give more weight to income simply because it has a larger range.
- → The model is sensitive to feature scale.
b) Categorical Features (encoding issues)
- Color = {Red, Green, Blue}
- If encoded as Red=1, Green=2, Blue=3, a linear model might interpret “Blue > Green > Red,” even though categories have no inherent order.
- → The model is sensitive to encoding choice.
c) Sparse Features / High Cardinality
- City = {New York, London, Tokyo, … 10,000+ cities}.
- One-hot encoding leads to huge, sparse vectors.
- Some models become sensitive (unstable) if rare categories dominate training.
3) Why It Matters
- Sensitive features can lead to:
- Unstable models (predictions change a lot under small changes).
- Bias (features dominate just due to representation).
- Poor generalization (fails on new data distributions).
4) How to Reduce Sensitivity
- Normalize / standardize numerical features.
- Use robust encodings:
- One-hot encoding for nominal categories.
- Ordinal encoding only for truly ordered categories.
- Target or frequency encoding for high-cardinality features.
- Regularization: penalize over-reliance on any single feature.
- Feature selection: drop irrelevant or unstable features.
Summary
- In feature engineering, sensitivity = how much model predictions depend on arbitrary choices of scale or encoding.
- Example: unscaled income overshadowing age; wrong encoding making “blue > red.”
- Solution: normalize numeric features, encode categorical features properly, use regularization.
