Outlier

Definition

An outlier is a data point that lies far away from the majority of other observations.
It is an unusual or extreme value that doesn’t follow the general trend of the data.

Example: If most people’s height is between 150–190 cm, but one value is 250 cm, that’s an outlier.

Causes of Outliers

Measurement Error – faulty sensors, data entry mistakes.
Natural Variation – genuine extreme values (e.g., very tall people).
Sampling Issues – mixing populations or incorrect sampling method.

Why Outliers Matter

Skew Results: Can heavily influence the mean, standard deviation, regression coefficients, and MSE.
Model Performance: In regression, one extreme point can pull the regression line toward it.
Detection of Rare Events: Outliers can sometimes represent fraud, defects, or important rare cases.

Methods to Detect Outliers

Statistical Rules
- Z-score method: If ∣z∣>3|z| > 3∣z∣>3, treat as outlier.
- IQR rule: If a value is below $Q1 – 1.5 \times IQR$ or above $Q3 + 1.5 \times IQR$.
Visualization
- Boxplot (points outside whiskers).
- Scatterplot (isolated points).
Machine Learning Approaches
- Isolation Forest, DBSCAN, One-Class SVM.

Handling Outliers

Keep them: If they are real and important (e.g., rare but valid cases).
Remove them: If caused by error or not relevant.
Transform them: Apply log or square root transformations to reduce their influence.
Use robust models: Median-based methods, MAE instead of MSE, or robust regression.

Example

Suppose exam scores are: $[70, 75, 80, 85, 90, 92, 95, 100, 12]$

Most scores are between 70–100, but 12 is extremely low.
Mean with outlier = 77.7, without outlier = 86.4 → The outlier drags the average down.

In short:
An outlier is a data point far from the rest. It can signal errors, rare events, or meaningful anomalies. Handling depends on context—sometimes remove, sometimes keep.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Outlier

Definition

Causes of Outliers

Why Outliers Matter

Methods to Detect Outliers

Handling Outliers

Example

Like this:

Related

Leave a ReplyCancel reply

Definition

Causes of Outliers

Why Outliers Matter

Methods to Detect Outliers

Handling Outliers

Example

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery