Cluster Profiling Using Decision Trees

Purpose of Cluster Profiling

Clustering algorithms such as k-means or hierarchical clustering are effective at discovering latent group structures in data. However, their outputs—cluster IDs—are often difficult to interpret directly. Cluster profiling addresses this limitation by explaining what defines each cluster in terms of the original features.

A common and effective approach to cluster profiling is to train a classification tree that predicts cluster membership. The goal is not to rediscover clusters, but to translate clusters into interpretable decision rules that characterize typical observations within each cluster.

Role of Decision Trees in Cluster Profiling

A decision tree is particularly well suited for cluster profiling because:

It produces explicit, rule-based descriptions
Each leaf node corresponds to a homogeneous subgroup
Rules are easy to interpret and communicate
Feature thresholds reveal dominant drivers of cluster separation

In this context, the decision tree is used purely as an interpretability tool, not as a clustering method.

Scikit-Learn Decision Tree Module

Scikit-learn provides decision tree implementations in the tree module:

tree.DecisionTreeClassifier() for categorical targets
tree.DecisionTreeRegressor() for continuous targets
tree.plot_tree() for visualizing the learned tree structure

Important Usage Assumptions and Constraints

The Scikit-learn decision tree implementation has several practical constraints that must be respected:

All input features must be numeric
- Nominal and ordinal variables encoded as strings are not supported
- Categorical features must be encoded numerically in advance
No missing values are allowed
- All observations must be complete across all input features
- Missing-value handling must be done prior to modeling
Variable names are not automatically retained
- Feature names must be manually supplied when visualizing the tree
- Otherwise, generic feature indices are displayed

These constraints influence data preprocessing decisions and must be considered when designing the workflow.

Methodology: Characterizing Clusters with a Classification Tree

The process of cluster profiling using a decision tree follows a clear sequence of steps:

Train a clustering model
- The clustering algorithm discovers latent structure
- Cluster IDs are assigned to all observations
Train a classification tree
- The cluster ID is treated as the nominal target variable
- The same features used for clustering are used as predictors
- The objective is high classification accuracy and pure leaf nodes
Extract decision rules
- Focus on leaf nodes with zero impurity
- These nodes describe dominant patterns within clusters
Interpret clusters
- Translate decision rules into real-world descriptions
- Do not expect the tree to recreate the clustering process

A key principle is that clustering has already completed its task. The decision tree does not compete with the clustering algorithm; it explains its results.

Capital Bike Share Dataset

The dataset used for illustration comes from the Capital Bikeshare program in Washington, DC. It captures bicycle rental activity under varying weather conditions between 2011 and 2012.

Features Used

Three continuous (interval-scale) features are selected:

temp: hourly temperature in Celsius
humidity: relative humidity in percent
windspeed: wind speed in km/h

A total of 10,886 observations are used, all of which are free of missing values across these features.

Determining the Number of Clusters

Both the Elbow Method and the Silhouette Method indicate that a two-cluster solution is optimal. This suggests that the data naturally separates into two dominant weather-related usage patterns.

Two-Cluster Solution: Summary Statistics

The resulting clusters have nearly equal sizes:

Cluster 0: 5,466 observations
Cluster 1: 5,420 observations

Feature-wise comparison shows:

Mean temperature is nearly identical across clusters
Cluster 0 has higher humidity and lower wind speed
Cluster 1 has lower humidity and higher wind speed

Temperature does not appear to be a primary discriminating factor between clusters.

Classification Tree Specification

To profile the clusters:

Target variable: Cluster ID (nominal)
Predictors: temperature, humidity, wind speed
Maximum depth: 4 (corresponding to 3 decision levels)
Splitting criterion: Entropy

The limited depth enforces interpretability while allowing sufficient flexibility.

Classification Performance

The classification tree correctly predicts 99.63% of cluster memberships. This indicates that:

Cluster boundaries are well defined
Weather variables strongly explain cluster membership
Clusters are internally consistent

Importantly, the decision tree does not use temperature at any split, reinforcing the earlier observation that temperature plays a minimal role in distinguishing clusters.

Interpreting Node-Level Decision Rules

Each leaf node corresponds to a rule composed of inequalities involving humidity and wind speed. Leaf nodes with zero entropy represent perfectly pure segments, meaning all observations in that node belong to the same cluster.

Dominant Rules for Cluster 0

$humidity > 63.5$ and $windspeed \le 27.001$
Covers 93% of Cluster 0 observations

Additional smaller segments include:

$61.5 < humidity \le 63.5$ and $windspeed \le 14$
$humidity > 66.5$ and $windspeed > 27.001$

These rules indicate humid conditions with moderate winds.

Dominant Rules for Cluster 1

$humidity \le 59.5$
Covers 93% of Cluster 1 observations

Other minor segments involve:

$59.5 < humidity \le 61.5$ and $windspeed > 8$
$61.5 < humidity \le 62.5$ and $windspeed > 14$

These rules indicate dry conditions, often accompanied by stronger winds.

Final Cluster Interpretation

Cluster 0: Tourist Season Conditions

Represents warm, slightly humid days
Typically includes a gentle breeze
Weather is consistent with summer tourism activity
High humidity is the dominant defining feature

Cluster 1: Typical School and Working Days

Represents dry, comfortable weather
Common during spring and autumn
Lower humidity is the defining characteristic
Weather conditions are conducive to routine commuting

Key Conceptual Takeaways

Decision trees provide transparent explanations of clusters
High classification accuracy validates cluster separability
Pure leaf nodes reveal core cluster characteristics
Not all original features are necessarily informative
Cluster profiling complements clustering rather than replacing it

By combining unsupervised clustering with supervised decision trees, cluster profiling bridges the gap between statistical discovery and human interpretability, transforming abstract cluster labels into actionable and understandable insights.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Cluster Profiling Using Decision Trees

Purpose of Cluster Profiling

Role of Decision Trees in Cluster Profiling

Scikit-Learn Decision Tree Module

Important Usage Assumptions and Constraints

Methodology: Characterizing Clusters with a Classification Tree

Capital Bike Share Dataset

Features Used

Determining the Number of Clusters

Two-Cluster Solution: Summary Statistics

Classification Tree Specification

Classification Performance

Interpreting Node-Level Decision Rules

Dominant Rules for Cluster 0

Dominant Rules for Cluster 1

Final Cluster Interpretation

Cluster 0: Tourist Season Conditions

Cluster 1: Typical School and Working Days

Key Conceptual Takeaways

Like this:

Related

Leave a ReplyCancel reply

Purpose of Cluster Profiling

Role of Decision Trees in Cluster Profiling

Scikit-Learn Decision Tree Module

Important Usage Assumptions and Constraints

Methodology: Characterizing Clusters with a Classification Tree

Capital Bike Share Dataset

Features Used

Determining the Number of Clusters

Two-Cluster Solution: Summary Statistics

Classification Tree Specification

Classification Performance

Interpreting Node-Level Decision Rules

Dominant Rules for Cluster 0

Dominant Rules for Cluster 1

Final Cluster Interpretation

Cluster 0: Tourist Season Conditions

Cluster 1: Typical School and Working Days

Key Conceptual Takeaways

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery