Core Idea
The central idea is to explain how CART (Classification and Regression Trees) represent interactions between variables, and why these interactions are described as local interactions rather than global ones.
This becomes clear once a decision tree is expressed as a mathematical equation using indicator functions.
Why Express a Decision Tree as an Equation
A decision tree is often described procedurally: start at the root and follow branches until reaching a terminal node. While intuitive, this view hides how interactions are formed.
Writing a decision tree as an equation reveals that:
- Each terminal node corresponds to a distinct mathematical term.
- Indicator functions determine whether a term is active.
- Products of indicator functions naturally create interaction terms.
- These interaction terms apply only within specific regions of the predictor space.
This formulation makes the interaction structure explicit.
Example Tree and Predictions
Consider a regression tree with two predictors, $HD$ and $cement$, and three terminal nodes with predicted values:
- $HD \le 21 \Rightarrow \hat{y} = 23.9437$
- $HD > 21$ and $cement \le 355.95 \Rightarrow \hat{y} = 37.036$
- $HD > 21$ and $cement > 355.95 \Rightarrow \hat{y} = 57.026$
For an observation with $HD = 22$ and $cement = 450$, the conditions lead to the third terminal node, producing a prediction of $57.026$.
Indicator Function Representation
Define an indicator function as:
- $I(b) = 1$ if condition $b$ is true,
- $I(b) = 0$ otherwise.
The tree can then be written as:
Only one term is active for any given observation, depending on which conditions are satisfied.
Where Interactions Appear in CART
The interaction terms are the products of indicator functions:
- $I(HD > 21)\cdot I(cement \le 355.95)$
- $I(HD > 21)\cdot I(cement > 355.95)$
These terms represent interactions because the effect depends on both variables simultaneously.
Crucially, each interaction is active only within a specific region of the predictor space.
Why CART Interactions Are Local
An interaction is called local because:
- It applies only when certain split conditions are met.
- It is confined to a particular region defined by the tree.
- Outside that region, the interaction does not exist.
For example, the interaction between $HD$ and $cement$ differs depending on whether $cement \le 355.95$ or $cement > 355.95$, even though $HD > 21$ in both cases. Each region has its own interaction structure.
Contrast with Interactions in Regression Models
In a typical linear regression, an interaction is written as:
Here:
- The interaction term $X_1 \times X_2$ applies to all observations.
- The same coefficient $\beta_3$ governs the interaction everywhere.
- The interaction is global.
In contrast, CART:
- Uses different interaction terms in different regions.
- Activates interactions conditionally.
- Allows interactions to vary across the predictor space.
Automatic Discovery of Interactions
CART automatically determines:
- Which variables interact,
- Where interactions begin and end,
- The thresholds that define interaction regions,
- The predicted value within each region.
In classical regression, defining such local interactions requires manual selection of variables, thresholds, and functional forms, which becomes impractical in high-dimensional settings.
Local Interactions in Regression Are Possible but Manual
Regression models can include local interactions, for example:
However, this requires prior knowledge of:
- Which variable interacts,
- The cutoff value,
- The relevant region.
CART discovers these elements directly from the data.
Key Takeaways
- CART represents interactions through tree structure, not explicit interaction terms.
- Interactions in CART are local and region-specific.
- Indicator function expressions make these interactions explicit.
- CART automatically identifies variables, thresholds, and interaction regions.
- This makes CART particularly effective for complex, high-dimensional data where interactions are difficult to predefine.
