1. Objective

Previously, logistic regression required:

  • A loop over training examples
  • A loop over features

The goal of vectorization is:

Process all training examples simultaneously
Eliminate all explicit for-loops


2. Data Representation

Instead of processing one example at a time, we stack all inputs into a matrix:

X=[x(1)x(2)x(m)]X = \begin{bmatrix} | & | & & | \\ x^{(1)} & x^{(2)} & \cdots & x^{(m)} \\ | & | & & | \end{bmatrix}

  • Shape: (nx,m)(n_x, m)
  • Each column represents one training example

3. Vectorized Computation of ZZ

For a single example:

z(i)=WTx(i)+bz^{(i)} = W^T x^{(i)} + b

For all examples at once:

Z=WTX+bZ = W^T X + b

  • ZZ is a row vector of shape (1,m)(1, m)
  • Contains z(1),z(2),,z(m)z^{(1)}, z^{(2)}, \dots, z^{(m)}

Implementation:

Z = np.dot(W.T, X) + b

Python automatically broadcasts bb across all columns


4. Vectorized Computation of AA

Apply sigmoid function to all elements:

A=σ(Z)A = \sigma(Z)

  • AA contains all predictions: A=[a(1),a(2),,a(m)]A = [a^{(1)}, a^{(2)}, \dots, a^{(m)}]

Implementation:

A = sigmoid(Z)

5. Forward Propagation (Vectorized)

Instead of computing each example separately:

  • Compute all ZZ in one step
  • Compute all AA in one step

Entire dataset processed simultaneously


6. Broadcasting Concept

In:

Z=WTX+bZ = W^T X + b

  • bb is a scalar (or 1×11 \times 1)
  • It is automatically expanded to match ZZ‘s shape

This is called:

broadcasting


7. Key Insight

Vectorization transforms:

  • mm separate computations
    → into one matrix operation

8. Computational Advantage

Benefits:

  • Eliminates all loops
  • Enables parallel computation
  • Significantly faster execution

Especially important for:

  • Large datasets
  • Deep learning models

9. Extension to Backpropagation

Vectorization is not limited to forward propagation:

Backward propagation (gradient computation)
can also be fully vectorized

This allows:

  • Efficient gradient computation
  • Scalable training

10. Key Takeaways

  1. Stack training examples into matrix XX
  2. Compute Z=WTX+bZ = W^T X + b in one step
  3. Apply activation: A=σ(Z)A = \sigma(Z)
  4. Use broadcasting for bias addition
  5. Remove all explicit loops