1) General Meaning

  • Downsampling means reducing the number of observations in a dataset.
  • The idea is to make the dataset smaller or more balanced, depending on context.

It appears in two main areas:

  1. Imbalanced Classification (resampling strategy)
  2. Time Series or Signal Processing (reducing frequency)

2) Downsampling in Imbalanced Classification

  • In classification, one class (usually “negative”) can dominate the dataset.
  • Example: Fraud detection dataset → 99% non-fraud, 1% fraud.
  • If we train directly, the model may ignore minority class.

Downsampling strategy:

  • Randomly remove samples from the majority class until the dataset is more balanced.
  • Example: 10,000 negatives + 100 positives → downsample negatives to 200 → now 200 vs 100.

Benefit:

  • Forces model to “see” minority class more clearly.

Risks:

  • Throwing away data → loss of information.
  • If dataset is small, this can hurt performance.

Variants:


3) Downsampling in Time Series / Signals

  • Means reducing the sampling rate.
  • Example: you have sensor readings every 1 ms → downsample to every 10 ms.

Why?

  • To reduce storage or computational cost.
  • To remove noise / smooth data.
  • To match another dataset’s frequency (e.g., align weather data hourly with energy usage hourly).

How?

  • Pick every k-th sample (simple subsampling).
  • Aggregate within bins (e.g., average temperature per hour).
  • Often requires low-pass filtering first (anti-aliasing) to avoid distortions.

4) Pros and Cons

Pros:

  • In classification: improves balance → models trained fairly.
  • In time series: reduces noise, smaller files, faster processing.

Cons:

  • Classification: may discard important majority information.
  • Time series: may lose details or introduce aliasing.

5) Related Concepts


Summary:

  • Downsampling (classification) = cut down majority class samples to fix imbalance.
  • Downsampling (time series) = reduce data frequency to save resources or smooth noise.
  • Always a trade-off: balance vs. information loss.