Definition

The sigmoid function is a smooth, S-shaped (logistic) function that maps any real number into the range (0, 1).

  • Formula:

$\sigma(x) = \frac{1}{1 + e^{-x}}$

  • Input ($x$): can be any real value (–∞, +∞).
  • Output: always between 0 and 1.

Key Properties

  1. Range: (0, 1)
  2. Shape: S-shaped curve (hence “sigmoid”).
  3. Monotonic: As $x$ increases, $\sigma(x)$ increases.
  4. Symmetry: $\sigma(-x) = 1 – \sigma(x)$.
  5. Derivative:

$\sigma'(x) = \sigma(x)(1 – \sigma(x))$

This is important for gradient-based optimization.


Why It’s Useful

  • Converts raw values (logits) into probabilities.
  • Smooth and differentiable → good for optimization.
  • Historically popular as an activation function in neural networks.

Applications

  1. Logistic Regression:
    • Models the probability of a binary outcome.
    • Example: Probability of customer churn.
  2. Neural Networks:
    • Used as an activation function in hidden/output layers.
    • Helps compress neuron outputs into a bounded range.
  3. Probability Modeling:
    • Converts unbounded scores into interpretable probabilities.

Example

Suppose $x = 2$:

$\sigma(2) = \frac{1}{1+e^{-2}} \approx 0.88$

This means the model interprets the input “2” as a probability of 88%.


Limitations

  • Vanishing gradients: For very large positive or negative $x$, derivative becomes near zero, slowing learning in deep networks.
  • Not zero-centered: Outputs are between 0 and 1, which can cause optimization inefficiency compared to tanh.

In short:
The sigmoid function is a classic squashing function mapping real numbers to (0, 1). It’s key for logistic regression and probability interpretation, but less used in modern deep learning (ReLU and variants are often preferred).