Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
26 views

ML Clustering

- The document describes the k-means clustering algorithm which groups together data points into k clusters. - It involves randomly selecting k data points as initial cluster centers, assigning each remaining point to the closest center, then recalculating the centers as the means of the points in each cluster. - This process repeats iteratively until the cluster centers stabilize and no longer change with additional iterations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

ML Clustering

- The document describes the k-means clustering algorithm which groups together data points into k clusters. - It involves randomly selecting k data points as initial cluster centers, assigning each remaining point to the closest center, then recalculating the centers as the means of the points in each cluster. - This process repeats iteratively until the cluster centers stabilize and no longer change with additional iterations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

clustering

- group together data

- divide data into clusters

- kmeans

- dbscan

--------------------------
row = record = tuple = observation = instance = datapoint

col1 col2 col3


12 24 20 G2 Dp1
14 10 20 G1 Dp2
17 18 25 G2 Dp3
17 24 10* Cr1(G1) Dp4
11 10 10 G1
12 28 15 G1
18 24 10 G2
12 31 10 G2
15 12 35* Cr2 (G2)
14 23 30 G1
19 50 10 G2
11 21 15 G1

k=2
- - divide above records
into k groups
1. pick any k rows as centers
randomly - Cr1, Cr2
2. find dist between Dp1 to Cr1
and Dp1 to Cr2

d1 = (12-17)**2 + (24-24)**2 + (20-10)**2

dp1 to G1

d2 = (12-15)**2 + (24-12)**2 + (20-35)**2

dp1 to G2

5 - 9 = 4

4,5 7,9
x1,y1 x2,y2

sqrt( (x1-x2)**2 + (y1-y2)**2 )

if a > b is true

will a**2 > b**2 be true

sq5 > sq3

5 > 3

5**2 > 3**2


x1,y1,z1 x2,y2,z2

sqrt( (x1-x2)**2 + (y1-y2)**2 + (z1-z2)**2 )

3. assign Dp1 to particular Grp


to which it is close (distance is lowest)
4. repeat steps 2 and 3
for all datapoints

col1 col2 col3


11 21 15 G1
14 10 20 G1 Dp2
17 24 10* Cr1(G1) Dp4
11 10 10 G1
12 28 15 G1 (NewCr1)
14 23 30 G1

15 12 35* Cr2 (G2)


19 50 10 G2
12 24 20 G2 Dp1
17 18 25 G2 Dp3
18 24 10 G2
12 31 10 G2 (newCr2)

18 24 10
- + - + -
12 31 11

** ** **
2 2 2

5. find the new center of each Group


by doing mean/avg operation on each group

for G1=> new center is

(11+14+17+11+12+14) / 6 12
(21+10+24+10+28+23) / 6 27
(15+20+10+10+15+30) / 6 15

for G2=> new center is :

6. repeat steps 2,3,4,5


again and again
until centers are not changing
and datapoints are not changing

=============== ============== K-MEANS ============== =================

N - 100 data points


d1 d2 d3 d4 d5 ... d100

k - 3 number of clusters
pick k - centroids - random

d1 d2 d3

d4 to d1 5
d4 to d2 7
d4 to d3 3

d4 belongs to d3

d5 to d1 3
d5 to d2 7
d5 to d3 5

d5 belongs to d1

d6

...

d100
-----------------------------------

c1 c2 c3
d1 d3* d9.. d2 d5 d7.. d4 d6 d8 ...

calc mean of
c1 data points

You might also like