Definition

A loss function measures how well (or poorly) a machine learning model’s predictions match the true target values.

  • Input: predicted value ($\hat{y}$​) and true value ($y$)
  • Output: a single number (loss)
  • Goal: minimize this number during training

Think of it as the “error signal” that guides model learning.


Properties of a Good Loss Function

  • Differentiable (so we can use gradient descent)
  • Sensitive to errors (larger errors → larger loss)
  • Aligned with task (classification vs regression)

Common Loss Functions

1. Regression (continuous targets)

  • Mean Squared Error (MSE):
    • $L = \frac{1}{n}\sum_{i=1}^n (y_i – \hat{y}_i)^2$
    • Penalizes large errors more (squared).
  • Mean Absolute Error (MAE):
    • $L = \frac{1}{n}\sum_{i=1}^n |y_i – \hat{y}_i|$
    • More robust to outliers.
  • Huber Loss:
    • Combines MSE and MAE (quadratic for small errors, linear for large ones).

2. Classification (categorical targets)

  • Binary Cross-Entropy (Log Loss):
    • $L = -\frac{1}{n}\sum_{i=1}^n \big[ y_i \log(\hat{p}_i) + (1-y_i)\log(1-\hat{p}_i) \big]$
    • Used in binary classification (logistic regression, neural nets).
  • Categorical Cross-Entropy:
    • $L = -\sum_{i=1}^K y_i \log(\hat{p}_i)$
    • Used with softmax for multi-class classification.
  • Hinge Loss:
    • $L = \max(0, 1 – y \cdot \hat{y})$
    • Used in Support Vector Machines (SVMs).

3. Ranking / Structured Tasks

  • Contrastive Loss: Used in Siamese networks to compare embeddings.
  • Triplet Loss: Ensures an anchor is closer to positive than negative samples.

Loss vs Cost vs Objective

  • Loss function: Error for one sample.
  • Cost function: Average loss over all samples.
  • Objective function: The function being optimized (usually cost + regularization).

Example

Suppose true label $y = 1$, model predicts $\hat{p} = 0.9$.

  • Binary cross-entropy loss:

$L = -[1 \cdot \log(0.9) + (0) \cdot \log(0.1)] = -\log(0.9) \approx 0.105$

Good prediction → small loss.

If $\hat{p} = 0.1$, $L = -\log(0.1) \approx 2.30$

Bad prediction → large loss.


Applications

  • Regression tasks: MSE, MAE, Huber
  • Classification tasks: Cross-entropy, Hinge
  • Neural embeddings: Contrastive, Triplet
  • Generative models: Adversarial losses, KL divergence

In short:
A loss function measures prediction error.

  • Regression → MSE, MAE
  • Classification → Cross-entropy, Hinge
  • Representation learning → Contrastive, Triplet