Two commonly used measures for identifying outliers in regression analysis are:

  1. Ordinary residuals
  2. Studentized residuals (also called internally studentized residuals, or standardized residuals in Minitab)

Both are reviewed below, with additional detail to clarify their roles and limitations.


1. Ordinary Residuals

Definition

For each observation i=1,,ni = 1, \dots, n, the ordinary residual is defined as:

ei=yiy^ie_i = y_i – \hat{y}_i

where:

  • yiy_i is the observed response
  • y^i\hat{y}_i is the predicted (fitted) response

Example

Consider the following small data set with four observations:

xyFITSRESI
122.2-0.2
254.40.6
366.6-0.6
498.80.2

Each residual is computed by subtracting the fitted value from the observed value. For example:

  • First residual: 22.2=0.22 – 2.2 = -0.2
  • Second residual: 54.4=0.65 – 4.4 = 0.6

Limitation of Ordinary Residuals

The major drawback of ordinary residuals is that their magnitude depends on the units of measurement of the response variable. As a result:

  • A residual of size 10 may be large in one context but small in another.
  • This makes it difficult to use ordinary residuals directly to detect outliers.

2. Studentized (Internally Studentized) Residuals

Motivation

To remove the effect of measurement units, residuals are scaled by an estimate of their standard deviation. This produces studentized residuals, which are unit-free and directly comparable across observations.

Definition

The internally studentized residual for observation ii is:

ri=eiMSE(1hi)r_i = \frac{e_i}{\sqrt{\text{MSE} \, (1 – h_i)}}

where:

  • eie_i is the ordinary residual
  • MSE is the mean square error
  • hih_i is the leverage of observation ii

This shows that studentized residuals depend on:

  • the size of the residual,
  • the overall variability of the model (MSE),
  • how extreme the predictor value is (leverage).

Example with Leverage

Using the same four-point data set:

xyFITSRESIHISRES
122.2-0.20.7-0.57735
254.40.60.31.13389
366.6-0.60.3-1.13389
498.80.20.70.57735

Given:

  • MSE=0.40\text{MSE} = 0.40

The first studentized residual is:

r1=0.20.40(10.7)=0.57735r_1 = \frac{-0.2}{\sqrt{0.40(1 – 0.7)}} = -0.57735

Each studentized residual measures how many standard deviations the residual is away from zero, accounting for leverage.


3. Using Studentized Residuals to Detect Outliers

Interpretation Guidelines

  • A studentized residual with absolute value greater than 3 is generally considered evidence of an outlier.
  • Some software (e.g., Minitab) uses a more conservative cutoff of 2.
  • These thresholds should not be treated rigidly; instead, they serve as warning signals prompting further investigation.

4. Example: Influence2 Data Set Revisited

In the Influence2 data set, one observation visually appears to deviate strongly from the general trend.

Minitab’s diagnostic output for that observation is:

ObsyFitResidStd Resid
2140.0023.1116.893.68

Because the internally studentized residual is 3.68, Minitab flags this observation as having a large residual, confirming its outlier status.


5. Why Do Outliers Matter?

Outliers matter because they can substantially affect certain aspects of a regression analysis. One way to see this is to compare results with and without the outlier.


Regression Without the Outlier

  • Mean Square Error (MSE): 6.72
  • R2R^2: 97.32%
  • Standard error SSS: 2.59

Regression With the Outlier Included

  • Mean Square Error (MSE): 22.19
  • R2R^2: 91.01%
  • Standard error SSS: 4.71

6. Key Insight: What Changes and What Does Not

The most substantial change caused by the outlier is the inflation of MSE, from 6.72 to 22.19.

This is important because:

  • MSE appears in all confidence interval and prediction interval formulas.
  • A larger MSE leads to wider intervals, reducing precision.

However:

  • The estimated regression coefficients,
  • the predicted values,
  • and hypothesis test conclusions

remain largely unchanged.

Therefore, in this case, the outlier is not influential in terms of coefficient estimates, but it is influential with respect to model uncertainty, as reflected by MSE.


Final Takeaway

  • Ordinary residuals are simple but scale-dependent.
  • Studentized residuals standardize residuals using MSE and leverage, making them effective for outlier detection.
  • Outliers can dramatically inflate MSE, harming interval estimation even when coefficient estimates remain stable.
  • Identifying and understanding outliers is essential for reliable regression inference, not merely for improving model fit.