Cluster Analysis in Python Chapter2 PDF
Cluster Analysis in Python Chapter2 PDF
clustering
C L U S T E R A N A LY S I S I N P Y T H O N
Shaumik Daityari
Business Analyst
Creating a distance matrix using linkage
scipy.cluster.hierarchy.linkage(observations,
method='single',
metric='euclidean',
optimal_ordering=False
)
Shaumik Daityari
Business Analyst
Why visualize clusters?
Try to make sense of the clusters formed
Contains functions that make data visualization tasks easy in the context of data analytics
df.plot.scatter(x='x',
y='y',
c=df['labels'].apply(lambda x: colors[x]))
plt.show()
sns.scatterplot(x='x',
y='y',
hue='labels',
data=df)
plt.show()
Shaumik Daityari
Business Analyst
Introduction to dendrograms
Strategy till now - decide clusters on visual
inspection
Z = linkage(df[['x_whiten', 'y_whiten']],
method='ward',
metric='euclidean')
dn = dendrogram(Z)
plt.show()
Shaumik Daityari
Business Analyst
Measuring speed in hierarchical clustering
timeit module
points = 100
df = pd.DataFrame({'x': random.sample(range(0, points), points),
'y': random.sample(range(0, points), points)})
1.02 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)