Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
103 views

MBA in Python - 4

This document discusses various visualization techniques for analyzing market basket data in Python, including heatmaps, scatterplots, and parallel coordinates plots. It explains how to generate association rules from transaction data using the Apriori algorithm, and how to prepare the rules data for each type of visualization. Heatmaps are used to show the support values of rule antecedents and consequents, scatterplots can visualize multiple metrics like support and confidence, and parallel coordinates plots allow examining individual rules without visual clutter. The document provides code examples for generating each visualization from market basket data.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views

MBA in Python - 4

This document discusses various visualization techniques for analyzing market basket data in Python, including heatmaps, scatterplots, and parallel coordinates plots. It explains how to generate association rules from transaction data using the Apriori algorithm, and how to prepare the rules data for each type of visualization. Heatmaps are used to show the support values of rule antecedents and consequents, scatterplots can visualize multiple metrics like support and confidence, and parallel coordinates plots allow examining individual rules without visual clutter. The document provides code examples for generating each visualization from market basket data.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Heatmaps

M A R K E T B A S K E T A N A LY S I S I N P Y T H O N

Isaiah Hull
Economist
MovieLens dataset
import pandas as pd

# Load ratings data.


ratings = pd.read_csv('datasets/movie_ratings.csv')
print(ratings.head())

userId movieId title


0 3149 54286 Bourne Ultimatum, The (2007)
1 3149 1220 Blues Brothers, The (1980)
2 3149 4007 Wall Street (1987)
3 3149 7156 Fog of War: Eleven...
4 3149 97304 Argo (2012)

MARKET BASKET ANALYSIS IN PYTHON


Creating "transactions" from ratings
# Recover unique user IDs.
user_id = movies['userId'].unique()

# Create library of highly rated movies for each user.


libraries = [list(ratings[ratings['userId'] == u].title) for u in user_id]

# Print example library.


print(library[0])

['Battlestar Galactica (2003)',


'Gorgon, The (1964)',
'Under the Skin (2013)',
'Upstream Color (2013)',
'Destry Rides Again (1939)',
'Dr. Phibes Rises Again (1972)']

MARKET BASKET ANALYSIS IN PYTHON


One-hot encoding transactions
from mlxtend.preprocessing import TransactionEncoder

# Instantiate transaction encoder.


encoder = TransactionEncoder()

# One-hot encode libraries.


onehot = encoder.fit(libraries).transform(libraries)

# Use movie titles as column headers.


onehot = pd.DataFrame(onehot, columns = encoder.columns_)

# Print onehot header.


print(onehot.head())

MARKET BASKET ANALYSIS IN PYTHON


One-hot encoding transactions
 

(500) Days of Summer (2009) .45 (2006) 10 Things I Hate About You (1999)
0 False False False
1 False False False
2 False False False
3 False False False
4 False False False

MARKET BASKET ANALYSIS IN PYTHON


What is a heatmap?

MARKET BASKET ANALYSIS IN PYTHON


Preparing the data
1. Generate the rules.
Use Apriori algorithm and association rules.

2. Convert antecedents and consequents into strings.


Stored as frozen sets by default in mlxtend.

3. Convert rules into matrix format.


Suitable for use in heatmaps.

MARKET BASKET ANALYSIS IN PYTHON


Preparing the data
from mlxtend.frequent_patterns import association_rules, apriori
import seaborn as sns

# Apply the apriori algorithm


frequent_itemsets = apriori(onehot, min_support=0.10,
use_colnames=True, max_len=2)

# Recover the association rules


rules = association_rules(frequent_itemsets)

MARKET BASKET ANALYSIS IN PYTHON


Generating a heatmap
# Convert antecedents and consequents into strings
rules['antecedents'] = rules['antecedents'].apply(lambda a: ','.join(list(a)))
rules['consequents'] = rules['consequents'].apply(lambda a: ','.join(list(a)))

# Print example.
print(rules[['antecedents','consequents']])

antecedents consequents
0 Batman Begins (2005) Dark Knight Rises, The (2012)

MARKET BASKET ANALYSIS IN PYTHON


Generating a heatmap
# Transform antecedent, consequent, and support columns into matrix
support_table = rules.pivot(index='consequents', columns='antecedents',
values='support')

# Generate heatmap
sns.heatmap(support_table)

MARKET BASKET ANALYSIS IN PYTHON


Generating a heatmap

MARKET BASKET ANALYSIS IN PYTHON


Customizing heatmaps
sns.heatmap(pivot, annot=True, cbar=False, cmap='ocean')

MARKET BASKET ANALYSIS IN PYTHON


Let's practice!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
Scatterplots
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N

Isaiah Hull
Economist
Introduction to scatterplots

MARKET BASKET ANALYSIS IN PYTHON


Introduction to scatterplots
A scatterplot displays pairs of values.
Antecedent and consequent support.

Con dence and lift.

No model is assumed.
No trend line or curve needed.

Can provide starting point for pruning.


Identify patterns in data and rules.

MARKET BASKET ANALYSIS IN PYTHON


Support versus con dence

MARKET BASKET ANALYSIS IN PYTHON


Support versus con dence

1 Bayardo Jr., R.J. and Agrawal, R. (1999). Mining the Most Interesting Rules. In Proceedings of the Fifth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining (pp. 145-154).

MARKET BASKET ANALYSIS IN PYTHON


Generating a scatterplot
import pandas as pd
import seaborn as sns
from mlxtend.frequent_patterns import association_rules, apriori

# Load one-hot encoded MovieLens data


onehot = pd.read_csv('datasets/movies_onehot.csv')

# Generate frequent itemsets using Apriori


frequent_itemsets = apriori(onehot, min_support=0.01, use_colnames=True, max_len=2)

# Generate association rules


rules = association_rules(frequent_itemsets, metric='support', min_threshold=0.0)

sns.scatterplot(x="antecedent support", y="consequent support", data=rules)

MARKET BASKET ANALYSIS IN PYTHON


Generating a scatterplot

MARKET BASKET ANALYSIS IN PYTHON


Adding a third metric
 

sns.scatterplot(x="antecedent support",
y="consequent support",
size="lift",
data=rules)

MARKET BASKET ANALYSIS IN PYTHON


Adding a third metric

MARKET BASKET ANALYSIS IN PYTHON


What can we learn from scatterplots?
Identify natural thresholds in data.
Not possible with heatmaps or other visualizations.

Visualize entire dataset.


Not limited to small number of rules.

Use ndings to prune.


Use natural thresholds and patterns to prune.

MARKET BASKET ANALYSIS IN PYTHON


Let's practice!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
Parallel coordinates
plot
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N

Isaiah Hull
Economist
What is a parallel coordinates plot?

MARKET BASKET ANALYSIS IN PYTHON


When to use parallel coordinate plots
   

Parallel coordinates vs. heatmap. Parallel coordinates vs. scatterplot.


Don't need intensity information. Want individual rule information.

Only want to know whether rule exists. Not interested in multiple metrics.

Want to reduce visual clutter. Only want to examine nal rules.

MARKET BASKET ANALYSIS IN PYTHON


Preparing the data
 

from mlxtend.frequent_patterns import association_rules, apriori

# Load the one-hot encoded data


onehot = pd.read_csv('datasets/movies_onehot.csv')

# Generate frequent itemsets


frequent_itemsets = apriori(onehot, min_support = 0.10, use_colnames = True, max_len = 2)

# Generate association rules


rules = association_rules(frequent_itemsets, metric = 'support', min_threshold = 0.00)

MARKET BASKET ANALYSIS IN PYTHON


Converting rules to coordinates
# Convert rules to coordinates.
rules['antecedent'] = rules['antecedents'].apply(lambda antecedent: list(antecedent)[0])
rules['consequent'] = rules['consequents'].apply(lambda consequent: list(consequent)[0])
rules['rule'] = rules.index

# Define coordinates and label


coords = rules[['antecedent','consequent','rule']]

# Print example
print(coords.head(1))

antecedent consequent rule


0 Dark Knight, The (2008) Inception (2010) 0

MARKET BASKET ANALYSIS IN PYTHON


Generating a parallel coordinates plot
 

from pandas.plotting import parallel_coordinates

# Generate parallel coordinates plot


parallel_coordinates(coords, 'rule', colormap = 'ocean')

MARKET BASKET ANALYSIS IN PYTHON


Generating a parallel coordinates plot

MARKET BASKET ANALYSIS IN PYTHON


Re ning a parallel coordinates plot
# Generate frequent itemsets
frequent_itemsets = apriori(onehot, min_support = 0.01, use_colnames = True, max_len = 2

# Generate association rules


rules = association_rules(frequent_itemsets, metric = 'lift', min_threshold = 1.00)

# Generate coordinates and print example


coords = rules_to_coordinates(rules)

# Generate parallel coordinates plot


parallel_coordinates(coords, 'rule')

MARKET BASKET ANALYSIS IN PYTHON


Re ning a parallel coordinates plot

MARKET BASKET ANALYSIS IN PYTHON


Let's practice!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
Congratulations!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N

Isaiah Hull
Instructor
Transactions and itemsets
Transactions Itemsets
{MILK, BREAD}
TID Transaction
{MILK, COFFEE, CEREAL}
1 MILK, BREAD, BISCUIT

... ...

20 TEA, MILK, COFFEE, CEREAL

MARKET BASKET ANALYSIS IN PYTHON


Association rules and metrics
Association Rules Metrics
Use if-then structure. Measure strength of association.
If A then B. Support, lift, con dence, conviction

Have antecedent(s) and consequent(s). Used to prune itemsets and rules.

Many association rules.

MARKET BASKET ANALYSIS IN PYTHON


Pruning and aggregation

MARKET BASKET ANALYSIS IN PYTHON


The Apriori algorithm

MARKET BASKET ANALYSIS IN PYTHON


Visualizing rules

MARKET BASKET ANALYSIS IN PYTHON


Congratulations!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N

You might also like