RMBI1020 - Data Analytics For Business - Association Rule Mining
RMBI1020 - Data Analytics For Business - Association Rule Mining
RMBI1020 - Data Analytics For Business - Association Rule Mining
Regression
Market Basket Clustering
Analysis
Analysis
Optimization Classification
Association Collaborative
Rule Mining Filtering
RMBI1020@JeanWang, UST
Introduction
90%
RMBI1020@JeanWang, UST
Introduction
RMBI1020@JeanWang, UST
Outline
✓Introduction to Market Basket Analysis
✓ Data Source
✓ Different Types of Analytics
✓ Business Applications
RMBI1020@JeanWang, UST
RMBI1020 - Market Basket Analysis
• Source of Market Basket Data
• Different Types of Analytics
• Business Applications
RMBI1020@JeanWang, UST
For retailer giants Walmart and Amazon, The
number of products could be up to millions, and
Market Basket Data the number of transactions could be billions.
RMBI1020@JeanWang, UST
Market Basket Analysis (II)
› Market basket data is not just about the contents of shopping carts. It could also
tell us
How characteristics of customers affect their Whether a market invention is effective or not
purchases
Sales
in US
Which products are often purchased together Interest and preference of individual customers
RMBI1020@JeanWang, UST
Related Data Analytics Techniques
Regression Classification Association Rule Mining
RMBI1020@JeanWang, UST
Applications of Association Rule Mining (I)
› Understanding basket-level dynamics allows retailers to generate new revenue by
RMBI1020@JeanWang, UST
Applications of Association Rule Mining (II)
› Association Rule Mining is applicable where exists a one-to-many relationship
Healthcare
RMBI1020@JeanWang, UST
RMBI1020 Market Basket Analysis
– Association Rules
• Association Rule and Mining
• Interestingness Measure of the Rules
• Apriori Algorithm
RMBI1020@JeanWang, UST
Association Rule
Mining is process of
What is an Association Rule? generating association
rules
RMBI1020@JeanWang, UST
𝑐𝑜𝑢𝑛𝑡(𝑋 ∪ 𝑌) is the
co-concurrence count
How Good is an Association Rule? of the union of items
in X and Y
𝑐𝑜𝑢𝑛𝑡(𝑋 ∪ 𝑌)
› Support 𝑠𝑢𝑝𝑝 𝑋 ⇒ 𝑌 =
# 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠. TID
– The occurring frequency of the association (i.e., the number
of transactions containing both X and Y over the total 1 1 1 0 0 1
number of transactions)
2 0 1 0 1 0
› supp ( Milk => Cereal ) = 3 / 10 = 0.3
3 0 1 1 1 0
› supp ( Milk, Bread => Eggs ) = 2 / 10 = 0.2
4 1 1 0 1 0
𝑐𝑜𝑢𝑛𝑡(𝑋 ∪ 𝑌)
› Confidence 𝑐𝑜𝑛𝑓 𝑋 ⇒ 𝑌 =
𝑐𝑜𝑢𝑛𝑡(𝑋)
5 1 0 1 0 0
6 0 1 1 0 0
– The strength of the association (i.e., measures of how often
7 1 0 0 0 0
items in Y appear in transactions that contain X)
› conf ( Milk => Cereal ) = 3 / 6 = 0.5 8 1 1 1 0 1
› conf ( Milk, Bread => Eggs ) = 2 / 4 = 0.5 9 1 1 1 0 0
10 0 1 1 1 0
RMBI1020@JeanWang, UST
Support of an itemset is
How Good is an Association Rule? (II) the occurring frequency
of the set
𝑠𝑢𝑝𝑝 (𝑋 ∪ 𝑌)
› Lift 𝑙𝑖𝑓𝑡 𝑋 ⇒ 𝑌 =
𝑠𝑢𝑝𝑝 𝑋 ∗ 𝑠𝑢𝑝𝑝(𝑌) TID
– The ratio of the observed support to the expected
support if X and Y and independent 1 1 1 0 0 1
› lift ( Milk => Cereal ) = 0.3 / (0.6 * 0.6) = 0.83 2 0 1 0 1 0
› lift ( Milk, Bread => Eggs ) = 0.2 / (0.4 * 0.2) = 2.5 3 0 1 1 1 0
4 1 1 0 1 0
– Can be considered as a “lift” that X provides to the
probability of having Y in the transaction 5 1 0 1 0 0
– High lift (lift > 1) suggests the presence of X increases 6 0 1 1 0 0
the chances that Y occurs, which might be worth
7 1 0 0 0 0
investigating
8 1 1 1 0 1
9 1 1 1 0 0
10 0 1 1 1 0
RMBI1020@JeanWang, UST
BUT it is
computationally
› Association rule mining is the task of finding all association rules that are having
support and confidence above the user-specified thresholds suppmin and confmin
› Brute-force approach:
Rule ID Rule Description Supp Conf suppmin = 0.6 and confmin = 0.7
1 {X1} => {X3} 0.65 0.8
2 {X1, X2} => {X3} 0.3 0.5
3 {X2, X3} => {X1} 0.7 0.3
4 {X2, X3} => {X1, X4} 0.61 0.8
5 {X2, X3, X4} => {X1,X5} 0.2 0.3
6 {X1, X2, X3} => {X4,X5} 0.67 0.9
… … … …
1000000 {X1, X2, …, X10} => {X11, X12, …, X20} 0.2 0.1
RMBI1020@JeanWang, UST
Given 𝑑 items,
there will be 2𝑑
Example: Itemset Lattice for 5 Items possible itemsets
› In a rule of X => Y, X and Y could be one single item or a set of multiple items
RMBI1020@JeanWang, UST
Apriori Algorithm to Discover Frequent Itemsets
› The major challenge of Association Rule Mining is to find the high-support
itemsets, or refereed to as frequent itemsets
– A set of items whose support is greater than or equal to the given support threshold (suppmin)
› Apriori Principle
– All subsets of a frequent itemset must be frequent too freq(subset) >= freq (superset)
– The supersets of infrequent itemset will not be frequent either
Exit if no frequent
itemset is found
RMBI1020@JeanWang, UST
TID
1 1 1 0 0 1
2 0 1 0 1 0
RMBI1020@JeanWang, UST
high freq, lift but low support
0.67
Let suppmin = 0.4 Support = 0.4 Confidence = 0.83
RMBI1020@JeanWang, UST
Introduction
RMBI1020@JeanWang, UST
Data Specification
lec09_wine.xlsx – “WineInfo” Worksheet lec09_wine.xlsx – “Transactions” Worksheet
One customer
-> one basket
Data pre-processing is
needed to transform
transactional data to
basket data
RMBI1020@JeanWang, UST
Transactional Data to Basket Data in Excel
› Create a Pivot Table lec09_wine.xlsx – “Baskets” Worksheet
based on the
transaction data
› In Pivot Table Field
Setting,
› Set customers as
the ROWS
› Set wine as the
COLUMNS
› Set count of
wine as the
VALUES
RMBI1020@JeanWang, UST
lec09_wine.xlsx – “AssociationRules” Worksheet
4. Cancel
highlighting on
the diagonal cells
of the matrix
RMBI1020@JeanWang, UST
lec09_wine.xlsx – “AssociationRules” Worksheet
RMBI1020@JeanWang, UST
Generate Association Rules in Excel
lec09_wine.xlsx – “AssociationRules” Worksheet
4. Highlight “interesting” rules using
Conditional Formatting 5. Interpret the selected rules
RMBI1020@JeanWang, UST
lec09_wine.xlsx – “AssociationRules” Worksheet
From 2-item Sets to 3-item Sets
1. Generate 3-item sets by combining two 2-item 3. Generate rules if any itemset is frequent
sets with a shared item (with the help of search
box and conditional highlighting)
RMBI1020@JeanWang, UST
Summary
✓Market Basket Analysis
✓Association Rule Mining
✓Apriori Algorithm
RMBI1020@JeanWang, UST
Readings
› [1] Market Basket Analysis Using Big Data Analytics
– https://www.linkedin.com/pulse/gain-consumer-insight-market-basket-analysis-birendra-kumar-
sahu
› [2] Association Rule Mining – Not Your Typical Data Science Algorithm
– https://www.mapr.com/blog/association-rule-mining-not-your-typical-data-science-algorithm
RMBI1020@JeanWang, UST