Logits

Date: August 17, 2025Author: Ju Yeon Eum 0 Comments

1) Definition

In machine learning, especially in classification models, a logit is the raw output score of a model before applying a squashing function like sigmoid (for binary) or softmax (for multi-class).
Formally, the logit is the log-odds of the probability.

For binary classification:

$\text{logit}(p) = \ln \left(\frac{p}{1-p}\right)$

Where:

$p$ = predicted probability of the positive class
logit(p) maps $p \in (0,1)$ to $(-\infty, +\infty)$.

2) Intuition

The model computes a linear combination of inputs:
- $z = w^\top x + b$
- This $z$ is the logit.
Then it converts the logit into a probability using:
- Sigmoid: $\sigma(z) = \frac{1}{1+e^{-z}}$
- Softmax (multiclass): $P(y=i) = \frac{e^{z_i}}{\sum_j e^{z_j}}$

So logits are the raw “scores” that get transformed into probabilities.

3) Example

Suppose logistic regression gives $z = 2.2$.

This is the logit.
Convert to probability:
- $p = \sigma(2.2) = \frac{1}{1+e^{-2.2}} \approx 0.90$

Interpretation: The model is about 90% confident the instance is positive.

4) Why use logits instead of probabilities?

Numerical stability: Working in logit space avoids underflow when probabilities are very close to 0 or 1.
Better for loss functions:
- Binary cross-entropy is more stable if you pass logits instead of probabilities.
- Many ML libraries (TensorFlow, PyTorch, scikit-learn) have *_with_logits versions of loss functions for this reason.
Linear modeling convenience: Logits are linear in weights $w^\top x$, probabilities are not.

5) Applications

Logistic Regression: logit is the link function connecting linear predictors to probability.
Neural Networks: the last dense layer often outputs logits; then activation (sigmoid/softmax) converts them to probabilities.
Interpretability: logit scale shows how much the model “leans” toward a class (positive logits → more likely positive, negative logits → more likely negative).

Summary:

Logits = raw model scores before probability transformation.
For binary classification, it’s the log-odds.
Probabilities are derived by applying sigmoid (binary) or softmax (multiclass).
Using logits is numerically stable and aligns better with how models are trained.

Related

Leave a ReplyCancel reply