Definition
The sigmoid function is a smooth, S-shaped (logistic) function that maps any real number into the range (0, 1).
- Formula:
$\sigma(x) = \frac{1}{1 + e^{-x}}$
- Input ($x$): can be any real value (–∞, +∞).
- Output: always between 0 and 1.
Key Properties
- Range: (0, 1)
- Shape: S-shaped curve (hence “sigmoid”).
- Monotonic: As $x$ increases, $\sigma(x)$ increases.
- Symmetry: $\sigma(-x) = 1 – \sigma(x)$.
- Derivative:
$\sigma'(x) = \sigma(x)(1 – \sigma(x))$
This is important for gradient-based optimization.
Why It’s Useful
- Converts raw values (logits) into probabilities.
- Smooth and differentiable → good for optimization.
- Historically popular as an activation function in neural networks.
Applications
- Logistic Regression:
- Models the probability of a binary outcome.
- Example: Probability of customer churn.
- Neural Networks:
- Used as an activation function in hidden/output layers.
- Helps compress neuron outputs into a bounded range.
- Probability Modeling:
- Converts unbounded scores into interpretable probabilities.
Example
Suppose $x = 2$:
$\sigma(2) = \frac{1}{1+e^{-2}} \approx 0.88$
This means the model interprets the input “2” as a probability of 88%.
Limitations
- Vanishing gradients: For very large positive or negative $x$, derivative becomes near zero, slowing learning in deep networks.
- Not zero-centered: Outputs are between 0 and 1, which can cause optimization inefficiency compared to tanh.
In short:
The sigmoid function is a classic squashing function mapping real numbers to (0, 1). It’s key for logistic regression and probability interpretation, but less used in modern deep learning (ReLU and variants are often preferred).
