Core Perspective

A decision tree should be understood not merely as a flowchart of decisions, but as a piecewise predictive model. Each terminal (leaf) node corresponds to a specific region of the feature space and assigns a constant prediction within that region. The overall model is therefore a collection of local rules, each valid only for a subset of the data.


Decision Trees as Rule-Based Models

A decision tree partitions the input space by applying a sequence of binary rules of the form:

  • $x \le a$ versus $x > a$ (for numeric features),
  • $x \in A$ versus $x \notin A$ (for categorical features).

Each path from the root to a leaf defines a conjunction of conditions.
These conditions jointly determine which prediction applies.

Thus, a tree implicitly represents a set of rules such as:

  • If condition set 1 holds, predict $c_1$
  • If condition set 2 holds, predict $c_2$

Each rule corresponds to one leaf node.


Mathematical Representation Using Indicator Functions

Each leaf can be expressed mathematically using indicator functions:

  • $I(b) = 1$ if condition $b$ is true,
  • $I(b) = 0$ otherwise.

A regression tree prediction can be written as:

y^=c1I(rule1)+c2I(rule2)++cLI(ruleL)\hat{y} = c_1 \cdot I(\text{rule}_1) + c_2 \cdot I(\text{rule}_2) + \cdots + c_L \cdot I(\text{rule}_L)

where:

  • $c_\ell$ is the prediction at leaf $\ell$,
  • exactly one indicator equals 1 for any observation.

This formulation shows that decision trees are additive models over disjoint regions, not smooth global functions.


Piecewise-Constant Nature of Predictions

For regression trees:

  • The prediction within each region is constant.
  • Changes in prediction occur only when crossing a split boundary.

As a result:

  • The model is nonlinear globally,
  • but simple and stable locally.

This explains why decision trees can capture complex patterns without explicit parametric assumptions.


Implicit Modeling of Interactions

Decision trees naturally encode interactions between variables.

An interaction arises when:

  • A split on one variable occurs after a split on another variable.
  • The effect of one variable depends on the value range of another.

Mathematically, this appears as products of indicator functions, for example:

I(x1>a)I(x2b)I(x_1 > a) \cdot I(x_2 \le b)

These interaction terms are:

  • Local, applying only within a specific region,
  • Different across different branches of the tree.

Local vs. Global Effects

Decision trees differ fundamentally from linear or generalized linear models:

  • Linear models impose global effects: coefficients apply everywhere.
  • Decision trees impose local effects: rules apply only in certain regions.

Because of this:

  • Different parts of the feature space can exhibit different relationships.
  • Interactions do not need to be pre-specified.

Interpretability Through Regions

Each leaf node can be interpreted as:

  • A subpopulation defined by explicit conditions,
  • With a clearly stated predicted outcome.

This makes decision trees especially useful for:

  • Policy rules,
  • Risk stratification,
  • Business decision logic,
  • Exploratory data analysis.

Strengths of This Representation

Viewing decision trees as piecewise models clarifies why they:

  • Automatically discover nonlinearities and interactions,
  • Adapt to heterogeneous data patterns,
  • Remain interpretable despite modeling complexity.

It also explains why trees often serve as base learners for ensemble methods such as Random Forests and Gradient Boosting.


Key Takeaways

  • A decision tree is a piecewise-constant predictive model.
  • Each leaf corresponds to a rule-defined region of the input space.
  • Indicator functions make the model structure explicit.
  • Interactions are local and region-specific.
  • Complexity arises from partitioning, not from parametric assumptions.