Clustering Documentation Python Code
Clustering Documentation Python Code
data=cdata.drop(["Index"],axis=1)
=>Reducing the coloumn so that index coloumn is categorical data
def norm_func(i):
x=(i-i.min()) / (i.max()-i.min())
= return (x)
=df_norm=norm_func(data.iloc[:,:])
df_norm.describe()
Murder Assault UrbanPop Rape
count 50.000000 50.000000 50.000000 50.000000
mean 0.420964 0.430685 0.568475 0.360000
std 0.262380 0.285403 0.245335 0.242025
min 0.000000 0.000000 0.000000 0.000000
25% 0.197289 0.219178 0.381356 0.200904
50% 0.388554 0.390411 0.576271 0.330749
75% 0.629518 0.698630 0.775424 0.487726
max 1.000000 1.000000 1.000000 1.000000
=>Importing the linkage from scipy.cluster.hierarchy
from scipy.cluster.hirarchy import linkage
import scipy.cluster.hierarchy as sch
z = linkage(df_norm, method="complete",metrics="euclidean")
plt.figure(figsize=(20,5));plt.title("hierarchical clustering Dendrogram");
plt.xlable('index');plt.ylable('distance')
sch.dendrogram(z,
leaf_rotation = 90,
left_font_size= 10
)
plt.show
#algomerative clustering
=>from sklearn.cluster import AgglomerativeClustering
cluster_labels = pd.series(h_complete.labels_)
=>Adding the new coloumn to the data set
cdata['clust'] = cluster labels #creating new column
=>Changing the location of the coloumn
data=cdata.iloc[: ,[5,0,1,2,3,4,]]
data.head()
clust Index Murder Assault UrbanPop Rape
0 3 Alabama 13.2 236 58 21.2
1 4 Alaska 10.0 263 48 44.5
2 1 Arizona 8.1 294 80 31.0
3 0 Arkansas 8.8 190 50 19.5
4 1 California 9.0 276 91 40.6
=>Saving the data
data.iloc[: , 1:].groupby(data.clust).mean()
data.to_csv("crimedata.csv", encoding = "utf-8")
import os
os.getcwd()
Python libraries :
import pandas as pd
import mat plot lib. pylab as plt
import linkage from scipy.cluster.hierarchy
import scipy.cluster.hierarchy as sch
from sklearn.cluster import AgglomerativeClustering
df_norm=norm_func(air1.iloc[:,:])
=>Normalizing the data to reduce the values
df_norm.describe()
Balance Qual_miles ... Days_since_enroll Award?
count 3999.000000 3999.000000 ... 3999.000000 3999.000000
mean 0.043172 0.012927 ... 0.496330 0.370343
std 0.059112 0.069399 ... 0.248991 0.482957
min 0.000000 0.000000 ... 0.000000 0.000000
25% 0.010868 0.000000 ... 0.280685 0.000000
50% 0.025279 0.000000 ... 0.493610 0.000000
75% 0.054201 0.000000 ... 0.697914 1.000000
max 1.000000 1.000000 ... 1.000000 1.000000
from scipy.cluster.hierarchy import linkage
z = linkage(df_norm, method="complete",metrics="euclidean")
=> Figering the data to find the no of clusteres
plt.figure(figsize=(20,20));plt.title("hierarchical clustering
Dendrogram");plt.xlable('index');plt.ylable('distance')
sch.dendrogram(z,
leaf_rotation = 90,
left_font_size= 10
)
plt.show
#algomerative clustering
from sklearn.cluster import AgglomerativeClustering
import os
os.getcwd()
Python libraries :
import pandas as pd
import mat plot lib. pylab as plt
import linkage from scipy.cluster.hierarchy
import scipy.cluster.hierarchy as sch
from sklearn.cluster import AgglomerativeClustering