K-Means Clustering Using Python

K-means clustering is an unsupervised machine learning algorithm that groups unlabeled data points into a specified number of clusters (k) based on their similarity. It aims to partition the data space into Voronoi cells based on cluster centers (centroids) such that data points closer to the centroid are assigned to the same cluster. The number of clusters k needs to be determined beforehand. The elbow method and within-cluster sum of squares (WCSS) can help identify the optimal number of clusters, with the elbow point indicating the "right" number where adding more clusters does not significantly improve the model. Random initialization of centroids can impact clustering results, so the algorithm is typically run multiple times.

Uploaded by

meghanaghogare

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

215 views

K-Means Clustering Using Python

Uploaded by

meghanaghogare

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

K-means Clustering

Meghana Tribhuwan
K-means

• K means clears the confusion as to how many

groups
• Till you reach a point where no reassignment
is needed
2nd example Random Initialization
We take more appropriate clusters
End result
What will happen if we have a bad (Centroid)
random initialization ?
• Assume that we perform all of the steps again.
Final result after performing the said steps
• Before After

Selection of the centroids has a huge impact on the

clusters and assigning centroids is random…..then
Algorithm to identify(decide) the right
number of Clusters
• If we determine the clusters to be 3 than after
applying K-means algo
How do we know what will perform better
weather 3 or 4 or 10 clusters
• Formula to choose the right number of
clusters
• Within-Cluster-Sum-of-Squares (WCSS)
Within-Cluster-Sum-of-Squares (WCSS)

• Calculate each points distance from its

centroid and square it
• If we take only one big cluster?
• the distance between the points and Centroid
will be more and so will be the WCSS value
• WCSS decreases when we make 2 clusters
• WCSS has decrease more
• Question is how many clusters we can have?
• Max. No of clusters can be as many data
points you have, eg 50 points 50 clusters

• Pause the video think and tell me, what will be

the value of WCSS?
• Answer is it will be zero
• Every point will be its own centroid therefore
distance between the point and centroid will be
0. Square them the value will be 0and after
adding also it will be 0.
• The lesser the WCSS the better our goodness of
fit will be.
• But how do we find the optimum goodness of
fit?
Elbow method
• But is an arbitrary method, not very particular.
• Elbow
method is a
hint,
ultimately
you have to
choose 2
then 3 and
then 4 and
you have to
decide for
yourself as
you are the
one who is
analysing the
data

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Lab7 LLM Chains
No ratings yet
Lab7 LLM Chains
7 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
Data Literacy Questions All Types
No ratings yet
Data Literacy Questions All Types
2 pages
Deep Learning CNN
100% (1)
Deep Learning CNN
28 pages
Eda PDF
100% (1)
Eda PDF
45 pages
Pattern Classification
100% (1)
Pattern Classification
42 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
UE20CS302 Unit4 Slides
No ratings yet
UE20CS302 Unit4 Slides
312 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
2.2 ML Session Bias Variance Tradeoffs
No ratings yet
2.2 ML Session Bias Variance Tradeoffs
38 pages
Introduction To Data Visualization With Python
No ratings yet
Introduction To Data Visualization With Python
47 pages
Rakesh Kumar - Data Scientist
No ratings yet
Rakesh Kumar - Data Scientist
3 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Machine Lpipearning Interview Questions: Algorithms/Tp: Q1-What's The Trade-Off Between Bias and Variance?
No ratings yet
Machine Lpipearning Interview Questions: Algorithms/Tp: Q1-What's The Trade-Off Between Bias and Variance?
46 pages
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
No ratings yet
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
25 pages
Statistics Probability
No ratings yet
Statistics Probability
66 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
Matplotlib PDF
No ratings yet
Matplotlib PDF
16 pages
Lecture6 Tfidf
No ratings yet
Lecture6 Tfidf
45 pages
Machine Learning
100% (1)
Machine Learning
46 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
Independent Component Analysis: Bhagesh Bhutani (20) Chayan Sharma (21) Deepak
No ratings yet
Independent Component Analysis: Bhagesh Bhutani (20) Chayan Sharma (21) Deepak
15 pages
Bedrock Doc 1
No ratings yet
Bedrock Doc 1
4 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
Machine Learning Module-3
No ratings yet
Machine Learning Module-3
23 pages
Supervised Learning 1 PDF
100% (1)
Supervised Learning 1 PDF
162 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
Deep Learning PPT Full Notes
No ratings yet
Deep Learning PPT Full Notes
105 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
41 pages
Quiz
No ratings yet
Quiz
6 pages
Diabetes Prediction Report
No ratings yet
Diabetes Prediction Report
16 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Career Plans For Next 2 Years
No ratings yet
Career Plans For Next 2 Years
11 pages
Using Categorical Data With One Hot Encoding - Kaggle PDF
No ratings yet
Using Categorical Data With One Hot Encoding - Kaggle PDF
4 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Data Pre-Processing (Pandas)
No ratings yet
Data Pre-Processing (Pandas)
19 pages
Back Propagation Network: Soft Computing
No ratings yet
Back Propagation Network: Soft Computing
33 pages
Logistic Regression Example
100% (1)
Logistic Regression Example
22 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
DS+C25 PGDDS+Masters
No ratings yet
DS+C25 PGDDS+Masters
13 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
05 Logistic - Regression
No ratings yet
05 Logistic - Regression
7 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Time Series Analysis
No ratings yet
Time Series Analysis
3 pages
Deep Learning Unit 1
No ratings yet
Deep Learning Unit 1
32 pages
Chapter 17 - Logistic Regression
No ratings yet
Chapter 17 - Logistic Regression
32 pages
Cheat Sheet Final
100% (2)
Cheat Sheet Final
7 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
Time Series
No ratings yet
Time Series
23 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet