Definition
- Logistic Regression is a classification model, not a regression model (despite the name).
- It predicts the probability that an observation belongs to a particular class (usually the positive class).
- Most common use: binary classification (e.g., spam vs not spam, churn vs retain).
Core Idea
- Instead of predicting $y$ directly, logistic regression predicts the log-odds of the positive class as a linear function of features:
$\ln\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n$
- Solving for $p$:
$p = P(y=1 \mid X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \dots + \beta_n x_n)}}$
This is the sigmoid function.
Why Logistic Regression?
- Linear regression can predict values outside [0,1] (invalid as probabilities).
- Logistic regression “squashes” predictions into [0,1].
- Decision rule: predict class 1 if p≥0.5p \geq 0.5p≥0.5, else class 0.
Training (How Parameters are Estimated)
- Uses Maximum Likelihood Estimation (MLE), not least squares.
- Likelihood: maximize the probability of observing the actual labels given the model’s parameters.
- Optimization: usually solved with gradient descent or variants.
Example
Suppose we want to predict if a student passes an exam (yes/no) based on hours studied.
- Model:
$p = \frac{1}{1+e^{-(\beta_0 + \beta_1 \cdot \text{hours})}}$
- If $\beta_1 > 0$, more study hours → higher probability of passing.
- If predicted $p = 0.8$, we interpret: “This student has an 80% chance of passing.”
Extensions
- Multinomial Logistic Regression – for more than two classes.
- Regularized Logistic Regression – Ridge (L2), Lasso (L1) to prevent overfitting.
- Ordinal Logistic Regression – when classes have an order (e.g., low, medium, high).
Applications
- Healthcare: Disease diagnosis (disease vs no disease).
- Finance: Credit scoring (default vs no default).
- Marketing: Churn prediction, ad click prediction.
- NLP: Sentiment classification (positive vs negative).
Advantages & Limitations
Advantages:
- Simple and interpretable.
- Outputs probabilities, not just class labels.
- Works well with small/medium datasets.
Limitations:
- Assumes a linear relationship in log-odds.
- Not good for complex nonlinear patterns (trees or neural nets are better).
- Sensitive to multicollinearity in features.
In short:
Logistic Regression = a simple, interpretable classification model that predicts probabilities using the sigmoid function applied to a linear model of features.
