Definition

  • Logistic Regression is a classification model, not a regression model (despite the name).
  • It predicts the probability that an observation belongs to a particular class (usually the positive class).
  • Most common use: binary classification (e.g., spam vs not spam, churn vs retain).

Core Idea

  • Instead of predicting $y$ directly, logistic regression predicts the log-odds of the positive class as a linear function of features:

$\ln\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n$

  • Solving for $p$:

$p = P(y=1 \mid X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \dots + \beta_n x_n)}}$

This is the sigmoid function.


Why Logistic Regression?

  • Linear regression can predict values outside [0,1] (invalid as probabilities).
  • Logistic regression “squashes” predictions into [0,1].
  • Decision rule: predict class 1 if p≥0.5p \geq 0.5p≥0.5, else class 0.

Training (How Parameters are Estimated)

  • Uses Maximum Likelihood Estimation (MLE), not least squares.
  • Likelihood: maximize the probability of observing the actual labels given the model’s parameters.
  • Optimization: usually solved with gradient descent or variants.

Example

Suppose we want to predict if a student passes an exam (yes/no) based on hours studied.

  • Model:

$p = \frac{1}{1+e^{-(\beta_0 + \beta_1 \cdot \text{hours})}}$

  • If $\beta_1 > 0$, more study hours → higher probability of passing.
  • If predicted $p = 0.8$, we interpret: “This student has an 80% chance of passing.”

Extensions

  1. Multinomial Logistic Regression – for more than two classes.
  2. Regularized Logistic Regression – Ridge (L2), Lasso (L1) to prevent overfitting.
  3. Ordinal Logistic Regression – when classes have an order (e.g., low, medium, high).

Applications

  • Healthcare: Disease diagnosis (disease vs no disease).
  • Finance: Credit scoring (default vs no default).
  • Marketing: Churn prediction, ad click prediction.
  • NLP: Sentiment classification (positive vs negative).

Advantages & Limitations

Advantages:

  • Simple and interpretable.
  • Outputs probabilities, not just class labels.
  • Works well with small/medium datasets.

Limitations:

  • Assumes a linear relationship in log-odds.
  • Not good for complex nonlinear patterns (trees or neural nets are better).
  • Sensitive to multicollinearity in features.

In short:
Logistic Regression = a simple, interpretable classification model that predicts probabilities using the sigmoid function applied to a linear model of features.