How CART Decision Trees Model Interactions

Core Idea

The central idea is to explain how CART (Classification and Regression Trees) represent interactions between variables, and why these interactions are described as local interactions rather than global ones.
This becomes clear once a decision tree is expressed as a mathematical equation using indicator functions.

Why Express a Decision Tree as an Equation

A decision tree is often described procedurally: start at the root and follow branches until reaching a terminal node. While intuitive, this view hides how interactions are formed.

Writing a decision tree as an equation reveals that:

Each terminal node corresponds to a distinct mathematical term.
Indicator functions determine whether a term is active.
Products of indicator functions naturally create interaction terms.
These interaction terms apply only within specific regions of the predictor space.

This formulation makes the interaction structure explicit.

Example Tree and Predictions

Consider a regression tree with two predictors, $HD$ and $cement$, and three terminal nodes with predicted values:

$HD \le 21 \Rightarrow \hat{y} = 23.9437$
$HD > 21$ and $cement \le 355.95 \Rightarrow \hat{y} = 37.036$
$HD > 21$ and $cement > 355.95 \Rightarrow \hat{y} = 57.026$

For an observation with $HD = 22$ and $cement = 450$, the conditions lead to the third terminal node, producing a prediction of $57.026$.

Indicator Function Representation

Define an indicator function as:

$I(b) = 1$ if condition $b$ is true,
$I(b) = 0$ otherwise.

The tree can then be written as:

$\hat{y} = 23.9437 \cdot I(HD \le 21) + 37.036 \cdot I(HD > 21)\cdot I(cement \le 355.95) + 57.026 \cdot I(HD > 21)\cdot I(cement > 355.95)$

Only one term is active for any given observation, depending on which conditions are satisfied.

Where Interactions Appear in CART

The interaction terms are the products of indicator functions:

$I(HD > 21)\cdot I(cement \le 355.95)$
$I(HD > 21)\cdot I(cement > 355.95)$

These terms represent interactions because the effect depends on both variables simultaneously.
Crucially, each interaction is active only within a specific region of the predictor space.

Why CART Interactions Are Local

An interaction is called local because:

It applies only when certain split conditions are met.
It is confined to a particular region defined by the tree.
Outside that region, the interaction does not exist.

For example, the interaction between $HD$ and $cement$ differs depending on whether $cement \le 355.95$ or $cement > 355.95$, even though $HD > 21$ in both cases. Each region has its own interaction structure.

Contrast with Interactions in Regression Models

In a typical linear regression, an interaction is written as:

$y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 (X_1 \times X_2)$

Here:

The interaction term $X_1 \times X_2$ applies to all observations.
The same coefficient $\beta_3$ governs the interaction everywhere.
The interaction is global.

In contrast, CART:

Uses different interaction terms in different regions.
Activates interactions conditionally.
Allows interactions to vary across the predictor space.

Automatic Discovery of Interactions

CART automatically determines:

Which variables interact,
Where interactions begin and end,
The thresholds that define interaction regions,
The predicted value within each region.

In classical regression, defining such local interactions requires manual selection of variables, thresholds, and functional forms, which becomes impractical in high-dimensional settings.

Local Interactions in Regression Are Possible but Manual

Regression models can include local interactions, for example:

$y = \beta_0 + \beta_1 \text{sqft} + 10 \cdot \text{sqft}\cdot I(\text{days} > 7)$

However, this requires prior knowledge of:

Which variable interacts,
The cutoff value,
The relevant region.

CART discovers these elements directly from the data.

Key Takeaways

CART represents interactions through tree structure, not explicit interaction terms.
Interactions in CART are local and region-specific.
Indicator function expressions make these interactions explicit.
CART automatically identifies variables, thresholds, and interaction regions.
This makes CART particularly effective for complex, high-dimensional data where interactions are difficult to predefine.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

How CART Decision Trees Model Interactions

Core Idea

Why Express a Decision Tree as an Equation

Example Tree and Predictions

Indicator Function Representation

Where Interactions Appear in CART

Why CART Interactions Are Local

Contrast with Interactions in Regression Models

Automatic Discovery of Interactions

Local Interactions in Regression Are Possible but Manual

Key Takeaways

Like this:

Related

Leave a ReplyCancel reply

Core Idea

Why Express a Decision Tree as an Equation

Example Tree and Predictions

Indicator Function Representation

Where Interactions Appear in CART

Why CART Interactions Are Local

Contrast with Interactions in Regression Models

Automatic Discovery of Interactions

Local Interactions in Regression Are Possible but Manual

Key Takeaways

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery