Definition
A loss function measures how well (or poorly) a machine learning model’s predictions match the true target values.
- Input: predicted value ($\hat{y}$) and true value ($y$)
- Output: a single number (loss)
- Goal: minimize this number during training
Think of it as the “error signal” that guides model learning.
Properties of a Good Loss Function
- Differentiable (so we can use gradient descent)
- Sensitive to errors (larger errors → larger loss)
- Aligned with task (classification vs regression)
Common Loss Functions
1. Regression (continuous targets)
- Mean Squared Error (MSE):
- $L = \frac{1}{n}\sum_{i=1}^n (y_i – \hat{y}_i)^2$
- Penalizes large errors more (squared).
- Mean Absolute Error (MAE):
- $L = \frac{1}{n}\sum_{i=1}^n |y_i – \hat{y}_i|$
- More robust to outliers.
- Huber Loss:
- Combines MSE and MAE (quadratic for small errors, linear for large ones).
2. Classification (categorical targets)
- Binary Cross-Entropy (Log Loss):
- $L = -\frac{1}{n}\sum_{i=1}^n \big[ y_i \log(\hat{p}_i) + (1-y_i)\log(1-\hat{p}_i) \big]$
- Used in binary classification (logistic regression, neural nets).
- Categorical Cross-Entropy:
- $L = -\sum_{i=1}^K y_i \log(\hat{p}_i)$
- Used with softmax for multi-class classification.
- Hinge Loss:
- $L = \max(0, 1 – y \cdot \hat{y})$
- Used in Support Vector Machines (SVMs).
3. Ranking / Structured Tasks
- Contrastive Loss: Used in Siamese networks to compare embeddings.
- Triplet Loss: Ensures an anchor is closer to positive than negative samples.
Loss vs Cost vs Objective
- Loss function: Error for one sample.
- Cost function: Average loss over all samples.
- Objective function: The function being optimized (usually cost + regularization).
Example
Suppose true label $y = 1$, model predicts $\hat{p} = 0.9$.
- Binary cross-entropy loss:
$L = -[1 \cdot \log(0.9) + (0) \cdot \log(0.1)] = -\log(0.9) \approx 0.105$
Good prediction → small loss.
If $\hat{p} = 0.1$, $L = -\log(0.1) \approx 2.30$
Bad prediction → large loss.
Applications
- Regression tasks: MSE, MAE, Huber
- Classification tasks: Cross-entropy, Hinge
- Neural embeddings: Contrastive, Triplet
- Generative models: Adversarial losses, KL divergence
In short:
A loss function measures prediction error.
- Regression → MSE, MAE
- Classification → Cross-entropy, Hinge
- Representation learning → Contrastive, Triplet
