Measuring Associations Between Two Continuous Variables

When both variables are quantitative (continuous), the most common goal is to determine whether they move together in a systematic way. Some relationships are clearly linear, while others may be curved or more complex. Two major tools discussed here—Pearson correlation and distance correlation—address different aspects of dependence.

1. Pearson Correlation: Measuring Linear Association

What problem Pearson correlation solves

Pearson correlation is designed to quantify how strongly two continuous variables are linearly associated. In practical terms, it answers questions like:

When $x$ is above its average, does $y$ also tend to be above its average?
Do larger values of one variable tend to correspond to larger (or smaller) values of the other?
Is the relationship reasonably well approximated by a straight line?

Data setup

Assume we observe $n$ paired observations $(x_i, y_i)$ for $i = 1, \dots, n$ .
We compute the sample means:

$\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$
$\bar{y} = \frac{1}{n}\sum_{i=1}^{n} y_i$

Definition and intuition

Pearson correlation $r$ is computed as:

$r=\frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i-\bar{x})^2}\sqrt{\sum_{i=1}^{n}(y_i-\bar{y})^2}}$

The numerator measures whether deviations from the mean move together:

If $x_i-\bar{x}$ and $y_i-\bar{y}$ tend to have the same sign, the numerator tends to be positive.
If one tends to be positive when the other is negative, the numerator tends to be negative.

The denominator rescales the value so that the result becomes a standardized measure.

Range and interpretation

By the Cauchy–Schwarz inequality:

$-1 \le r \le 1$

Interpretation uses both magnitude and sign:

∣r∣|r| measures strength of linear association.
- $r = 1$ : perfect positive linear relationship (all points fall exactly on an increasing straight line).
- $r = -1$ : perfect negative linear relationship (all points fall exactly on a decreasing straight line).
- $r = 0$ : no linear association (a straight-line pattern is not visible; it may still have a nonlinear relationship).
The sign indicates direction:
- $r>0$ : larger-than-average $x$ values tend to correspond to larger-than-average $y$ values.
- $r<0$ : larger-than-average $x$ values tend to correspond to smaller-than-average $y$ values.

When Pearson correlation becomes undefined

Pearson correlation requires variation in both variables. If either variable has no variability, the denominator becomes zero (or non-positive in computational practice), making the correlation undefined.

2. When Pearson Correlation Becomes Exactly +1 or -1

A key result is that Pearson correlation becomes perfectly ±1 when the points lie exactly on a straight line: $y_i = a + b x_i$

In this setting:

If $b>0$ , the correlation is $+1$ .
If $b<0$ , the correlation is $-1$ .
If $b=0$ , then all $y_i$ are equal to $a$ , so $y$ has zero variance and correlation is undefined.

The reasoning is based on the fact that if $y_i = a + b x_i$ , then:

$\bar{y} = a + b\bar{x}$
$y_i – \bar{y} = b(x_i – \bar{x})$

So all deviations of $y$ are exactly a constant multiple of deviations of $x$ , which produces perfect linear dependence.

3. Testing Whether the Pearson Correlation Could Be “Due to Chance”

A correlation computed from a sample might appear non-zero just because of random variation, especially in smaller samples. To address this, a classical hypothesis test is used:

Null hypothesis: the true correlation is zero (no linear association in the population).
Alternative hypothesis: the true correlation is not zero.

A test statistic is computed:

$t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}}$

Under the null hypothesis, this statistic follows a Student’s $t$ -distribution with $n-2$ degrees of freedom.
The p-value is computed as:

$p = 2 \cdot P(T > |t|)$

If $p < 0.05$ , the conclusion is that the observed correlation is unlikely to be zero, meaning there is statistically significant evidence of linear association.

4. Distance Correlation: Measuring General Dependence (Not Just Linear)

Why distance correlation is needed

Pearson correlation is powerful for linear relationships but can completely miss nonlinear dependence.

For example, if $y$ follows a curved pattern like a parabola in $x$ , Pearson correlation may be close to zero even though $x$ and $y$ are strongly related.

Distance correlation addresses this gap by measuring statistical dependence more broadly, not restricted to straight-line relationships.

What distance correlation measures

Distance correlation captures the magnitude of dependence between variables. Importantly:

It does not indicate the direction (no positive/negative notion like Pearson).
It is designed so that zero indicates independence under broad conditions.

Core construction idea (high-level intuition)

Distance correlation is built from pairwise distances among observations.

For $x$ :

Compute distances $a_{ij} = |x_i – x_j|$

Then “center” these distances by subtracting row means and column means and adding the grand mean, producing adjusted distances $A_{ij}$ . This adjustment ensures the distance matrix behaves like a mean-centered representation.

The same procedure is applied to $y$ , producing adjusted distances $B_{ij}$ .

Distance covariance is then formed by averaging products $A_{ij}B_{ij}$ , and distance variances are formed similarly from $A_{ij}^2$ and $B_{ij}^2$ . The empirical distance correlation is essentially:

$\mathcal{R}_n^2(x,y)=\frac{\mathcal{V}_n^2(x,y)}{\mathcal{V}_n^2(x,x)\cdot \mathcal{V}_n^2(y,y)}$

and the reported distance correlation is the positive square root of this quantity.

If the product of distance variances is zero, the distance correlation is undefined (analogous to Pearson needing variability).

5. Why Pearson and Distance Correlation Can Differ Dramatically

The examples illustrate the strengths and limitations:

Four points on a straight line

Pearson is near $-1$ , showing an almost perfectly linear negative relationship.
Distance correlation is near 1, confirming very strong dependence.

Thirteen points on a parabola

Pearson is approximately 0, because the curved shape cancels linear association.
Distance correlation is moderately positive, revealing non-linear dependence.

Twenty-five points on a lattice grid

Pearson is 0 and distance correlation is 0, indicating no dependence pattern in either metric.

Five hundred random points

Both correlations are close to zero, consistent with weak or absent dependence.

This set of examples reinforces a key lesson:
Pearson correlation detects linear patterns, while distance correlation detects broader dependence patterns.

6. Applying These Measures to Chicago Taxi Data: Distance vs. Time

Practical constraint: very large sample size

The taxi dataset has 217,631 observations, and distance correlation requires pairwise distances, which scales poorly because it involves an $n \times n$ distance matrix. That quickly becomes computationally infeasible.

To address this, the analysis uses a 5% random sample without replacement.

The reasoning is that statistical theory supports that estimates computed from a properly drawn random sample can still be reliable and consistent indicators of the full dataset’s behavior, especially when the sample is still large.

Results on the sample

Pearson correlation between trip distance and trip minutes: 0.8145, with a p-value effectively zero.
Distance correlation: 0.8458.

Interpretation

Both numbers are high, meaning the association is strong. Pearson being high indicates that the relationship is strongly linear: longer distances tend to correspond to longer trip times. Distance correlation being slightly higher suggests that, beyond linearity, there may also be additional structure (for example, different speed regimes such as city vs. freeway driving) that still reflects strong dependence.

The key takeaway is that trip distance is highly informative about trip duration, even though the relationship is not perfectly linear due to variation in traffic, routes, and speeds.

Overall Takeaway

Pearson correlation is an excellent first tool when you care about straight-line association and direction. Distance correlation provides a broader view of dependence and can detect relationships Pearson may miss. In practical analysis, using both helps distinguish whether a relationship is primarily linear or whether meaningful nonlinear dependence exists as well.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Measuring Associations Between Two Continuous Variables

1. Pearson Correlation: Measuring Linear Association

What problem Pearson correlation solves

Data setup

Definition and intuition

Range and interpretation

When Pearson correlation becomes undefined

2. When Pearson Correlation Becomes Exactly +1 or -1

3. Testing Whether the Pearson Correlation Could Be “Due to Chance”

4. Distance Correlation: Measuring General Dependence (Not Just Linear)

Why distance correlation is needed

What distance correlation measures

Core construction idea (high-level intuition)

5. Why Pearson and Distance Correlation Can Differ Dramatically

Four points on a straight line

Thirteen points on a parabola

Twenty-five points on a lattice grid

Five hundred random points

6. Applying These Measures to Chicago Taxi Data: Distance vs. Time

Practical constraint: very large sample size

Results on the sample

Interpretation

Overall Takeaway

Like this:

Related

Leave a ReplyCancel reply

1. Pearson Correlation: Measuring Linear Association

What problem Pearson correlation solves

Data setup

Definition and intuition

Range and interpretation

When Pearson correlation becomes undefined

2. When Pearson Correlation Becomes Exactly +1 or -1

3. Testing Whether the Pearson Correlation Could Be “Due to Chance”

4. Distance Correlation: Measuring General Dependence (Not Just Linear)

Why distance correlation is needed

What distance correlation measures

Core construction idea (high-level intuition)

5. Why Pearson and Distance Correlation Can Differ Dramatically

Four points on a straight line

Thirteen points on a parabola

Twenty-five points on a lattice grid

Five hundred random points

6. Applying These Measures to Chicago Taxi Data: Distance vs. Time

Practical constraint: very large sample size

Results on the sample

Interpretation

Overall Takeaway

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery