1) Definition
- In machine learning, especially in classification models, a logit is the raw output score of a model before applying a squashing function like sigmoid (for binary) or softmax (for multi-class).
- Formally, the logit is the log-odds of the probability.
$\text{logit}(p) = \ln \left(\frac{p}{1-p}\right)$
Where:
- $p$ = predicted probability of the positive class
- logit(p) maps $p \in (0,1)$ to $(-\infty, +\infty)$.
2) Intuition
- The model computes a linear combination of inputs:
- $z = w^\top x + b$
- This $z$ is the logit.
- Then it converts the logit into a probability using:
- Sigmoid: $\sigma(z) = \frac{1}{1+e^{-z}}$
- Softmax (multiclass): $P(y=i) = \frac{e^{z_i}}{\sum_j e^{z_j}}$
So logits are the raw “scores” that get transformed into probabilities.
3) Example
Suppose logistic regression gives $z = 2.2$.
- This is the logit.
- Convert to probability:
- $p = \sigma(2.2) = \frac{1}{1+e^{-2.2}} \approx 0.90$
Interpretation: The model is about 90% confident the instance is positive.
4) Why use logits instead of probabilities?
- Numerical stability: Working in logit space avoids underflow when probabilities are very close to 0 or 1.
- Better for loss functions:
- Binary cross-entropy is more stable if you pass logits instead of probabilities.
- Many ML libraries (TensorFlow, PyTorch, scikit-learn) have
*_with_logitsversions of loss functions for this reason.
- Linear modeling convenience: Logits are linear in weights $w^\top x$, probabilities are not.
5) Applications
- Logistic Regression: logit is the link function connecting linear predictors to probability.
- Neural Networks: the last dense layer often outputs logits; then activation (sigmoid/softmax) converts them to probabilities.
- Interpretability: logit scale shows how much the model “leans” toward a class (positive logits → more likely positive, negative logits → more likely negative).
Summary:
- Logits = raw model scores before probability transformation.
- For binary classification, it’s the log-odds.
- Probabilities are derived by applying sigmoid (binary) or softmax (multiclass).
- Using logits is numerically stable and aligns better with how models are trained.
