1. General Definition
- Subsampling = selecting a subset of the original dataset (or signal) for analysis or training.
- It can be done for efficiency, class balancing, or signal processing (reducing sample rate).
2. In Machine Learning / Data Science
- Often used when datasets are too large or imbalanced.
- Random subsampling: randomly pick a subset of data (like bootstrapping but without replacement).
- Undersampling (subsampling majority class): reduce the size of the majority class to balance with the minority class.
- Cross-validation subsampling: select subsets of data in each fold for model validation.
Example:
- Dataset: 1,000,000 samples
- You take a subsample of 100,000 to train faster.
3. In Signal Processing / Time Series
- Subsampling = reducing the sampling rate of a signal (a form of downsampling).
- Example:
- Original: audio sampled at 44.1 kHz
- Subsampled: reduce to 22.05 kHz
- Must apply a low-pass filter first to avoid aliasing (distortion caused by high frequencies folding into lower ones).
4. Advantages
- Faster training and inference (less data).
- Reduces storage and computation cost.
- In imbalanced datasets, helps balance class proportions (if applied to the majority class).
5. Disadvantages
- Information loss: discards data, which may reduce accuracy.
- If subsampling isn’t stratified, it may change class distribution unintentionally.
- In signals, careless subsampling without filtering introduces aliasing noise.
6. Examples
In ML (Python, scikit-learn):
from sklearn.utils import resample
# Subsample dataset
X_sub, y_sub = resample(X, y, n_samples=10000, random_state=42)
In Signal Processing (Python, scipy):
import scipy.signal as sps
# Downsample signal by factor of 2
signal_sub = sps.resample(signal, len(signal)//2)
7. Comparison
| Term | Context | Meaning |
|---|---|---|
| Undersampling | Imbalanced classification | Reduce majority class samples |
| Oversampling | Imbalanced classification | Increase minority class samples |
| Subsampling | General ML | Take subset of data for efficiency or balance |
| Subsampling (DSP) | Signals | Reduce sampling rate (downsampling) |
Summary
- Subsampling = selecting a smaller subset of data or reducing signal sampling rate.
- In ML: improves efficiency or balances classes.
- In DSP: reduces sample rate → must use low-pass filtering to avoid aliasing.
- Pros: faster, cheaper. Cons: possible loss of information.

Thanks for making this.