1. Introduction

Logistic regression is a fundamental algorithm used for binary classification problems, where the output label yy can only take two values: 0 or 1.

The goal of logistic regression is not just to predict a class, but to estimate:

The probability that $y = 1$ given input $x$

For example:

  • Input: an image
  • Output: probability that the image is a cat

2. Problem Setup

We are given:

  • Input vector: xRnxx \in \mathbb{R}^{n_x}
  • Parameters:
    • wRnxw \in \mathbb{R}^{n_x} (weights)
    • bRb \in \mathbb{R} (bias)

We want to compute:

y^=P(y=1x)\hat{y} = P(y = 1 \mid x)


3. Why Linear Model is Not Enough

A natural first idea is:

y^=wTx+b\hat{y} = w^T x + b

This is linear regression, but it has a serious problem:

  • Output can be < 0 or > 1
  • Not valid as a probability

Example:

  • y^=3.2\hat{y} = 3.2 → impossible
  • y^=1.5\hat{y} = -1.5 → meaningless

4. Solution: Sigmoid Function

To fix this, logistic regression uses the sigmoid function.

4.1 Definition

σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}

Where:

z=wTx+bz = w^T x + b

So the final model becomes:

y^=σ(wTx+b)\hat{y} = \sigma(w^T x + b)


5. Intuition of Sigmoid

The sigmoid function converts any real number into a value between 0 and 1.

Case 1: z0z \gg 0

  • ez0e^{-z} \approx 0
  • y^1\hat{y} \approx 1

Model is confident that y=1y = 1

Case 2: z0z \ll 0

  • eze^{-z} is very large
  • y^0\hat{y} \approx 0

Model is confident that y=0y = 0

Case 3: z=0z = 0

  • y^=0.5\hat{y} = 0.5

Model is uncertain

Core Intuition:

z = “score” → sigmoid → “probability”


6. Interpretation of z

z=wTx+bz = w^T x + b

This is:

a linear combination of input features

You can think of it as:

  • a weighted sum of features
  • a decision score

7. Model Summary

The full logistic regression pipeline:

Step 1: Compute score

z=wTx+bz = w^T x + b

Step 2: Convert to probability

y^=σ(z)\hat{y} = \sigma(z)

Final meaning:

y^\hat{y} = probability that the input belongs to class 1


8. Alternative Notation (Important but Optional)

Some courses combine ww and bb into one vector:

  • Add x0=1x_0 = 1
  • Define parameter vector θ\theta

Then:

y^=σ(θTx)\hat{y} = \sigma(\theta^T x)

But in deep learning:

  • we usually keep $w$ and $b$ separate
  • easier for implementation

9. Learning Objective

The goal of training is:

Find ww and bb such that y^\hat{y} is close to the true label yy

This will be done using:

  • cost function (next step)
  • gradient descent

10. Big Picture Connection

Now connect everything:

Input

  • image → vector xx

Model

  • logistic regression

Computation

  • z=wTx+bz = w^T x + b
  • y^=σ(z)\hat{y} = \sigma(z)

Output

  • probability of class (cat vs not cat)

11. Key Insights

Logistic regression outputs probability, not just class

Linear function alone is not enough

Sigmoid converts score → probability

zz is the “confidence score”

w,bw, b are learned from data


Final One-Line Summary

Logistic regression computes a linear score from input features and transforms it into a probability using the sigmoid function to perform binary classification.