Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
215 views

K-Means Clustering Using Python

K-means clustering is an unsupervised machine learning algorithm that groups unlabeled data points into a specified number of clusters (k) based on their similarity. It aims to partition the data space into Voronoi cells based on cluster centers (centroids) such that data points closer to the centroid are assigned to the same cluster. The number of clusters k needs to be determined beforehand. The elbow method and within-cluster sum of squares (WCSS) can help identify the optimal number of clusters, with the elbow point indicating the "right" number where adding more clusters does not significantly improve the model. Random initialization of centroids can impact clustering results, so the algorithm is typically run multiple times.

Uploaded by

meghanaghogare
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
215 views

K-Means Clustering Using Python

K-means clustering is an unsupervised machine learning algorithm that groups unlabeled data points into a specified number of clusters (k) based on their similarity. It aims to partition the data space into Voronoi cells based on cluster centers (centroids) such that data points closer to the centroid are assigned to the same cluster. The number of clusters k needs to be determined beforehand. The elbow method and within-cluster sum of squares (WCSS) can help identify the optimal number of clusters, with the elbow point indicating the "right" number where adding more clusters does not significantly improve the model. Random initialization of centroids can impact clustering results, so the algorithm is typically run multiple times.

Uploaded by

meghanaghogare
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

K-means Clustering

Meghana Tribhuwan
K-means

• K means clears the confusion as to how many


groups
• Till you reach a point where no reassignment
is needed
2nd example Random Initialization
We take more appropriate clusters
End result
What will happen if we have a bad (Centroid)
random initialization ?
• Assume that we perform all of the steps again.
Final result after performing the said steps
• Before After

Selection of the centroids has a huge impact on the


clusters and assigning centroids is random…..then
Algorithm to identify(decide) the right
number of Clusters
• If we determine the clusters to be 3 than after
applying K-means algo
How do we know what will perform better
weather 3 or 4 or 10 clusters
• Formula to choose the right number of
clusters
• Within-Cluster-Sum-of-Squares (WCSS)
Within-Cluster-Sum-of-Squares (WCSS)

• Calculate each points distance from its


centroid and square it
• If we take only one big cluster?
• the distance between the points and Centroid
will be more and so will be the WCSS value
• WCSS decreases when we make 2 clusters
• WCSS has decrease more
• Question is how many clusters we can have?
• Max. No of clusters can be as many data
points you have, eg 50 points 50 clusters

• Pause the video think and tell me, what will be


the value of WCSS?
• Answer is it will be zero
• Every point will be its own centroid therefore
distance between the point and centroid will be
0. Square them the value will be 0and after
adding also it will be 0.
• The lesser the WCSS the better our goodness of
fit will be.
• But how do we find the optimum goodness of
fit?
Elbow method
• But is an arbitrary method, not very particular.
• Elbow
method is a
hint,
ultimately
you have to
choose 2
then 3 and
then 4 and
you have to
decide for
yourself as
you are the
one who is
analysing the
data

You might also like