1. Objective
In logistic regression, the goal is to update parameters and in order to minimize the loss function.
To achieve this, we need to:
- Compute predictions (forward pass)
- Compute derivatives (backward pass)
- Update parameters using gradient descent
2. Forward Propagation
For a single training example with two features :
Step 1: Linear combination
Step 2: Activation (sigmoid)
Step 3: Loss function
This completes the forward pass.
3. Backward Propagation (Derivatives)
We now compute derivatives starting from the loss and moving backward through the computation graph.
Step 1: Derivative with respect to
(In implementation, this is stored as dA)
Step 2: Derivative with respect to
Using the chain rule:
(This is a key simplification result)
This comes from combining:
-
Step 3: Derivatives with respect to parameters
In implementation:
4. Gradient Descent Update
After computing gradients, update parameters:
Where:
- is the learning rate
5. Summary of Computation Flow
Forward pass:
Backward pass:
This follows the computation graph:
- Forward: left → right
- Backward: right → left
6. Key Insight
- The backward pass applies the chain rule
- The derivative simplifies to:
This simplification is crucial for efficient implementation.
7. Limitation of This Version
This derivation applies to:
a single training example
In practice:
- We use a dataset with examples
- Gradients are averaged over all examples
