MBA in Python - 4
MBA in Python - 4
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
Isaiah Hull
Economist
MovieLens dataset
import pandas as pd
(500) Days of Summer (2009) .45 (2006) 10 Things I Hate About You (1999)
0 False False False
1 False False False
2 False False False
3 False False False
4 False False False
# Print example.
print(rules[['antecedents','consequents']])
antecedents consequents
0 Batman Begins (2005) Dark Knight Rises, The (2012)
# Generate heatmap
sns.heatmap(support_table)
Isaiah Hull
Economist
Introduction to scatterplots
No model is assumed.
No trend line or curve needed.
1 Bayardo Jr., R.J. and Agrawal, R. (1999). Mining the Most Interesting Rules. In Proceedings of the Fifth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining (pp. 145-154).
sns.scatterplot(x="antecedent support",
y="consequent support",
size="lift",
data=rules)
Isaiah Hull
Economist
What is a parallel coordinates plot?
Only want to know whether rule exists. Not interested in multiple metrics.
# Print example
print(coords.head(1))
Isaiah Hull
Instructor
Transactions and itemsets
Transactions Itemsets
{MILK, BREAD}
TID Transaction
{MILK, COFFEE, CEREAL}
1 MILK, BREAD, BISCUIT
... ...