Businesses seek to understand what customers place in their baskets in order to identify their most valuable customers and improve decision-making around sales, marketing, and customer experience.
Identifying Ideal Customers
Customers can be categorized based on their purchasing behavior along three key dimensions.
Recency refers to how recently a customer has made a purchase, relative to their business needs. A purchase could have occurred as recently as yesterday or as far back as last year, depending on the purchasing cycle.
Frequency captures how often a customer makes purchases. Some customers purchase daily, while others may purchase only once per year, yet still be considered frequent within their specific business context.
Monetary value reflects how much a customer spends. High-monetary customers tend to purchase many items, premium items, or high-end products and contribute disproportionately to revenue.
Why Retaining Ideal Customers Matters
Retaining existing customers is generally more cost-effective than acquiring new ones, since existing customers have already generated revenue for the business.
Even a small increase in customer retention can result in a substantial increase in profit, as retention-related costs tend to be relatively fixed. In addition, the probability of successfully selling additional products to existing customers is consistently higher than selling to new customers.
Long-term customers also play a critical role in organic marketing. Word-of-mouth recommendations from loyal customers are often more persuasive and effective than random advertisements.
Retaining Customers While Generating New Revenue
Businesses aim to improve customer experience while simultaneously increasing revenue. One effective approach is to encourage existing customers to promote the business indirectly by offering them unique benefits through tailored deals.
Three common strategies are cross-selling, up-selling, and bundling.
Cross-Selling, Up-Selling, and Bundling Strategies
Cross-selling involves recommending items that a customer is likely to need but may have overlooked. These items are typically complementary to products already in the customer’s basket.
Up-selling focuses on recommending higher-value or downstream items that the customer may not need immediately but may be curious to try. These items supplement the products already selected.
Bundling provides incentives when customers purchase multiple items together. A business may offer free items or significant discounts when a predefined set of products is purchased as a group.
If items A and B are bundled and denoted as , the objective is for the perceived value of the bundle to exceed the combined value of purchasing A and B separately. That value may be measured in terms of customer satisfaction or business revenue.
Business Questions Behind Cross-Selling and Bundling
A central business question is how to determine which items should be cross-sold, up-sold, or bundled together. Association rules provide a data-driven framework for answering these questions.
Understanding associations helps businesses decide which coupons to distribute, when to discount premium products, which items should be placed together on shelves, and how to design promotions that align with customer purchasing behavior.
Market Baskets in Everyday Life
Market baskets appear naturally in daily activities. Grocery shopping often involves purchasing multiple items rather than a single product. Certain items are commonly consumed together, such as bread with butter or ice cream with cookies and syrup.
Customers may also purchase multiple items online to qualify for free shipping, illustrating how incentives influence basket composition.
Market baskets are also observed in activities such as preparing a special breakfast, painting a room, or building a personal investment portfolio.
Sources of Market Basket Data
Market basket data are typically generated by point-of-sale transaction processing systems. These systems collect data from physical retail stores, online e-commerce platforms, call-in phone orders, and mail-order purchases.
Structure of Transaction Data
Let the universal set of items be denoted as , where is the number of unique items observed in the transaction data. This set represents all distinct items sold during the observation period and does not necessarily include every item offered by the merchant.
A transaction t is defined as a subset of the item set, . Although each transaction is timestamped, the order in which items appear within a transaction is not relevant for association discovery.
Transactions are usually initiated by different customers, though a single customer may generate multiple transactions within a short time interval, such as when making purchases on behalf of others or returning shortly after a transaction to buy additional items.
The full transaction dataset is denoted as
, where is the number of transactions. Each transaction is identified by a unique marker, such as a customer ID, invoice number, or transaction timestamp.
Treatment of Duplicate Items
Although customers may purchase multiple quantities of the same item in a single transaction, quantities are not used in association rule discovery.
The analysis only requires knowledge of whether an item appears in a transaction. Purchasing ten boxes of the same cereal is treated identically to purchasing one box of that cereal for the purpose of identifying associations.
Minimum Information Required for Association Discovery
Only two pieces of information are required:
- A feature that uniquely identifies each transaction, such as a customer identifier, invoice number, or transaction timestamp.
- A feature that uniquely identifies each item, such as an internal SKU, UPC, or item description.
These two features are treated as categorical variables during the discovery process.
Common Transaction Data Formats
Sale Receipt Format
In the sale receipt format, each row represents one item within a transaction. A transaction appears as multiple rows, equal to the number of items purchased. One field identifies the transaction, and the other identifies the item.
This format is widely used because it has a fixed number of fields and stores values as strings, making it convenient for storage and processing.
Item List Format
In the item list format, each transaction appears as a single record. One field identifies the transaction, and the second field contains a list or array of item identifiers.
This format is also common, but the second field requires a special data type to store a variable-length list of strings.
Item Indicator Format
In the item indicator format, each transaction occupies one record. The first field identifies the transaction, and each subsequent field corresponds to a specific item.
Each item field takes a Boolean value: TRUE if the item appears in the transaction and FALSE otherwise.
This format is convenient for programming and algorithm implementation, but it is inefficient for storage because most transactions contain only a small number of items, resulting in many FALSE values.
Key Takeaway
Understanding market baskets and item associations allows businesses to design effective cross-selling, up-selling, and bundling strategies. By structuring transaction data appropriately and focusing on item presence rather than quantity, association rules can reveal actionable patterns that improve customer experience and drive revenue growth.
