R² (R-squared)

R² (Coefficient of Determination)

1. Definition

R² measures how much of the variance in the dependent variable (y) is explained by the independent variables (X) in a regression model.
It tells us the goodness of fit: how well the model captures the variability in the data.

2. Formula

Let:

$y_i$ = actual values
$\hat{y}_i$ = predicted values
$\bar{y}$ = mean of actual values

Total Sum of Squares (TSS):

$TSS = \sum_{i}(y_i – \bar{y})^2$

= total variance in data

Residual Sum of Squares (RSS):

$RSS = \sum_{i}(y_i – \hat{y}_i)^2$

= variance not explained by the model

Explained Sum of Squares (ESS):

$ESS = \sum_{i}(\hat{y}_i – \bar{y})^2$

= variance explained by the model

R² definition:

$R^2 = 1 – \frac{RSS}{TSS} = \frac{ESS}{TSS}$

3. Range & Interpretation

R² = 1 → perfect fit (model explains all variance).
R² = 0 → model explains no variance (same as predicting the mean).
R² < 0 → model is worse than just predicting the mean (bad model).

4. Example

Suppose:

Actual values: $y = [3, 4, 5]$
Predictions: $\hat{y} = [2.8, 4.2, 5.0]$
Mean: $\bar{y} = 4$

$TSS = (3-4)^2 + (4-4)^2 + (5-4)^2 = 1 + 0 + 1 = 2$

$RSS = (3-2.8)^2 + (4-4.2)^2 + (5-5.0)^2 = 0.04 + 0.04 + 0 = 0.08$

$R^2 = 1 – \frac{0.08}{2} = 0.96$

Model explains 96% of variance. Very good fit.

5. Limitations

High R² ≠ good model: A model can overfit (memorize data) and get high R² but perform poorly on new data.
Not for all tasks: R² is useful for regression, not classification.
Doesn’t show bias: Two models can have the same R² but different prediction errors.

6. Variants

Adjusted R²: Penalizes adding irrelevant predictors (avoids artificially inflated R² in multiple regression).

$R^2_{adj} = 1 – \left( \frac{(1-R^2)(n-1)}{n-p-1} \right)$

where $n$ = number of observations, $p$ = number of predictors.

Pseudo-R²: Used in logistic regression since regular R² doesn’t apply.

Summary:
R² (coefficient of determination) = proportion of variance in the dependent variable explained by the regression model.

$R² = 1$: perfect fit
$R² = 0$: no fit
$R² < 0$: worse than baseline

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

R² (R-squared)

R² (Coefficient of Determination)

1. Definition

2. Formula

3. Range & Interpretation

4. Example

5. Limitations

6. Variants

Like this:

Related

Leave a ReplyCancel reply

R² (Coefficient of Determination)

1. Definition

2. Formula

3. Range & Interpretation

4. Example

5. Limitations

6. Variants

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery