1. Model Definition

Logistic regression is used to model the probability of a binary outcome.
The prediction is defined as:

  • y^(i)=σ(z(i))=σ(WTx(i)+b)\hat{y}^{(i)} = \sigma(z^{(i)}) = \sigma(W^T x^{(i)} + b)

Where:

  • σ(z)\sigma(z) is the sigmoid function
  • WW and bb are parameters
  • x(i)x^{(i)} is the input of the ii-th training example

The sigmoid function ensures that the output y^\hat{y} is between 0 and 1.


2. Training Objective

Given a training set of mm labeled examples:

  • (x(i),y(i))(x^{(i)}, y^{(i)}), where i=1,2,...,mi = 1, 2, …, m

The objective is to find parameters WW and bb such that:

  • y^(i)y(i)\hat{y}^{(i)} \approx y^{(i)}

The superscript (i)(i) denotes the index of the training example.


3. Loss Function

The loss function measures the error for a single training example.

Logistic regression uses the following loss function:

L(y^,y)=[ylog(y^)+(1y)log(1y^)]L(\hat{y}, y) = – \left[ y \log(\hat{y}) + (1 – y)\log(1 – \hat{y}) \right]

This function is chosen instead of squared error because it leads to a convex optimization problem, which is easier to optimize.


4. Behavior of the Loss Function

  • If y=1y = 1: L=log(y^)L = -\log(\hat{y})Minimizing loss requires y^1\hat{y} \to 1
  • If y=0y = 0: L=log(1y^)L = -\log(1 – \hat{y})Minimizing loss requires y^0\hat{y} \to 0

5. Cost Function

The cost function evaluates the model over the entire training set.

J(W,b)=1mi=1mL(y^(i),y(i))J(W, b) = \frac{1}{m} \sum_{i=1}^{m} L(\hat{y}^{(i)}, y^{(i)})

Expanded form:

J(W,b)=1mi=1m[y(i)log(y^(i))+(1y(i))log(1y^(i))]J(W, b) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(\hat{y}^{(i)}) + (1 – y^{(i)}) \log(1 – \hat{y}^{(i)}) \right]


6. Optimization Goal

Training logistic regression involves:

  • Minimizing the cost function J(W,b)J(W, b)
  • Using optimization methods such as gradient descent

7. Key Distinction

  • Loss function: applied to a single training example
  • Cost function: average of losses over all training examples