08 clustering

What is Clustering in Data Mining?
• Cluster : (collection)
– (Similarity)
– (Dissimilarity or
Distance)
• Cluster Analysis
–
• Clustering
– (Classification)
(unsupervised classification)
2

Cluster Analysis
How many clusters?
Four ClustersTwo Clusters
Six Clusters
3

What is Good Clustering?
•
(Minimize Intra-Cluster
Distances)
(Maximize Inter-Cluster Distances)
Inter-cluster
distances are
maximized
Intra-cluster
distances are
minimized
4

Types of Clustering
• Partitional Clustering
– A division data objects into non-overlapping subsets
(clusters) such that each data object is in exactly one subset
Original Points A Partitional Clustering
5

Types of Clustering
• Hierarchical clustering
– A set of nested clusters organized as a hierarchical tree
p4
p1
p3
p2
p4
p1
p3
p2
p4p1 p2 p3
p4p1 p2 p3
Hierarchical Clustering#1
Hierarchical Clustering#2 Traditional Dendrogram 2
Traditional Dendrogram 1
6

Types of Clustering
• Exclusive versus non-exclusive
– In non-exclusive clusterings, points may belong to multiple
clusters.
– Can represent multiple classes or „border‟ points
• Fuzzy versus non-fuzzy
– In fuzzy clustering, a point belongs to every cluster with
some weight between 0 and 1
– Weights must sum to 1
– Probabilistic clustering has similar characteristics
• Partial versus complete
– In some cases, we only want to cluster some of the data
• Heterogeneous versus homogeneous
– Cluster of widely different sizes, shapes, and densities
7

Characteristics of Cluster
• Well-Separated Clusters:
– A cluster is a set of points such that any point in a cluster is
closer (or more similar) to every other point in the cluster than
to any point not in the cluster.
3 well-separated clusters
8

• Center-based
– A cluster is a set of objects such that an object in a cluster
is closer (more similar) to the “center” of a cluster, than to
the center of any other cluster.
– The center of a cluster is often a centroid, the average of all
the points in the cluster, or a medoid, the most
“representative” point of a cluster.
4 center-based clusters
9

• Density-based
– A cluster is a dense region of points, which is separated by
low-density regions, from other regions of high density.
– Used when the clusters are irregular, and when noise and
outliers are present.
6 density-based clusters
10

• Shared Property or Conceptual Clusters
– Finds clusters that share some common property
or represent a particular concept.
2 Overlapping Circles
11

Clustering Algorithms
• K-means clustering
• Hierarchical clustering
12

K-means Clustering
• (Partition) n
D k (
k)
• k-Means k
13

K-means Clustering Algorithm
Algorithm: The k-Means algorithm for partitioning based
on the mean value of object in the cluster.
Input: The number of cluster k and a database containing
n objects.
Output: A set of k clusters that mininimizes the squared-
error criterion.
14

K-means Clustering Algorithm
Method
1) Randomly choose k object as the initial cluster centers
(centroid);
2) Repeat
3) (re)assign each object to the cluster to which the object
is the most similar, based on the mean value of the
objects in the cluster;
4) Update the cluster mean
calculate the mean value of the objects for each
cluster;
5) Until centroid (center point) no change;
15

Example: K-Mean Clustering
• Problem: Cluster the following eight points (with (x,
y) representing locations) into three clusters A1(2,
10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4)
A7(1, 2) A8(4, 9).
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9
16

• Randomly choose k object as the initial cluster
centers;
• k =3 ; c1(2, 10), c2(5, 8) and c3(1, 2).
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9
+
+
+
c1
c2
c3
17

• The distance function between two points a=(x1, y1)
and b=(x2, y2) is defined as:
distance(a, b) = |x2 – x1| + |y2 – y1|
(2, 10) (5, 8) (1, 2)
Point Dist Mean 1 Dist Mean 2 Dist Mean 3 Cluster
A1 (2, 10)
A2 (2, 5)
A3 (8, 4)
A4 (5, 8)
A5 (7, 5)
A6 (6, 4)
A7 (1, 2)
A8 (4, 9)
18

• Step 2 Calculate distance by using the
distance functionpoint mean1
x1, y1 x2, y2
(2, 10) (2, 10)
distance(point, mean1) = |x2 – x1| + |y2 – y1|
= |2 – 2| + |10 – 10|
= 0 + 0 = 0
point mean2
x1, y1 x2, y2
(2, 10) (5, 8)
= |5 – 2| + |8 – 10|
= 3 + 2 = 5
point mean3
x1, y1 x2, y2
(2, 10) (1, 2)
= |1 – 2| + |2 – 10|
= 1 + 8 = 9
19

(2, 10) (5, 8) (1, 2)
A1 (2, 10) 0 5 9 1
A2 (2, 5)
A3 (8, 4)
A4 (5, 8)
A5 (7, 5)
A6 (6, 4)
A7 (1, 2)
A8 (4, 9)
20

• Calculate distance by using the distance function
point mean1
x1, y1 x2, y2
(2, 5) (2, 10)
= |2 – 2| + |10 – 5|
= 0 + 5 = 5
point mean2
x1, y1 x2, y2
(2, 5) (5, 8)
= |5 – 2| + |8 – 5|
= 3 + 3 = 6
point mean3
x1, y1 x2, y2
(2, 5) (1, 2)
= |1 – 2| + |2 – 5|
= 1 + 3 = 4
21

(2, 10) (5, 8) (1, 2)
A1 (2, 10) 0 5 9 1
A2 (2, 5) 5 6 4 3
A3 (8, 4)
A4 (5, 8)
A5 (7, 5)
A6 (6, 4)
A7 (1, 2)
A8 (4, 9)
22

• Iteration#1
(2, 10) (5, 8) (1, 2)
A1 (2, 10) 0 5 9 1
A2 (2, 5) 5 6 4 3
A3 (8, 4) 12 7 9 2
A4 (5, 8) 5 0 10 2
A5 (7, 5) 10 5 9 2
A6 (6, 4) 10 5 7 2
A7 (1, 2) 9 10 0 3
A8 (4, 9) 3 2 10 2
23

Cluster 1 Cluster 2 Cluster 3
A1(2, 10) A3(8, 4) A2(2, 5)
A4(5, 8) A7(1, 2)
A5(7, 5)
A6(6, 4)
A8(4, 9)
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9
+
+
c1
c2
c3
+
24

• re-compute the new cluster centers (means). We do
so, by taking the mean of all points in each cluster.
• For Cluster 1, we only have one point
A1(2, 10), which was the old mean, so the cluster
center remains the same.
• For Cluster 2, we have (
(8+5+7+6+4)/5, (4+8+5+4+9)/5 ) = (6, 6)
• For Cluster 3, we have ( (2+1)/2, (5+2)/2 ) =
(1.5, 3.5)
25

0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9
+
+
c1
c2
c3
+
26

• Iteration#2
(2, 10) (6, 6) (1.5, 3.5)
A1 (2, 10)
A2 (2, 5)
A3 (8, 4)
A4 (5, 8)
A5 (7, 5)
A6 (6, 4)
A7 (1, 2)
A8 (4, 9)
27

(Iteration#2)
Cluster 1? Cluster 2? Cluster 3?
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9
re-compute the new
cluster centers (means)
C1 = (2+4/2, 10+9/2)
= (3, 9.5)
C2 = (6.5, 5.25)
C3 = (1.5, 3.5)
28

Iteration#3
Cluster 1? Cluster 2? Cluster 3?
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9
re-compute the new
cluster centers (means)??
29

Distance functions
•
• Minkowski distance
• q=1 d Manhattan distance
• q=2 d Euclidean distance
q
q
jpip
q
ji
q
ji xxxxxxjid )...(),( 2211
jpipjiji xxxxxxjid ...),( 2211
)...(),(
22
22
2
11 jpipjiji xxxxxxjid
30

Evaluating K-means Clusters
• Most common measure is Sum of Squared
Error (SSE)
– For each point, the error is the distance to
the nearest cluster
– To get SSE, we square these errors and
sum them.
where,
– x is a data point in cluster Ci
– mi is the centroid point for cluster Ci
• can show that mi corresponds to the
K
i Cx
i
i
xmdistSSE
1
2
),(
31

Limitations of K-Mean
• K-means
–Size
–Density
–Shapes
32

Limitations of K-means: Differing
Sizes
• K-means
Original Points K-means (3 Clusters)
33

Limitations of K-means: Differing
Density
• K-means
34

Limitations of K-means: Non-
globular Shapes
• K-means
35

Overcoming K-means Limitations
Original Points K-means Clusters
One solution is to use many clusters.
Find parts of clusters, but need to put together.36

38

Hierarchical Clustering
•
Dendrogram
• Dendrogram
(cluster) (subcluster)
(cluster)
1 3 2 5 4 6
0
0.05
0.1
0.15
0.2
1
2
3
4
5
6
1
2
3 4
5
Dendrogram
39

2
1. Agglomerative ( ) :
Agglomerative
2. Divisive ( ) :
Divisive Agglomerative
(singleton
cluster) cluster
40

Agglomerative Clustering Algorithm
Basic algorithm is straightforward
1. Compute the proximity matrix
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the proximity matrix
6. Until only a single cluster remains
41

Example:
6
Euclidean distance
44

How to Define Inter-Cluster
Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Similarity?
Proximity Matrix
 MIN
 MAX
 Group Average
 Ward’s Method uses squared
error
45

Similarity
 MIN
 MAX
 Group Average
 Ward’s Method uses squared
error
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Proximity Matrix
46

How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Proximity Matrix
 MIN
 MAX
 Group Average
 Ward’s Method uses squared error
47

Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Proximity Matrix
 MIN
 MAX
 Group Average
 Ward’s Method uses squared error
48

Cluster Similarity: MIN or Single
Link
• Single link MIN
2
49

Link
1
2
3
4
5
6
1
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
0.11
50

Link
1
2
3
4
5
6
1
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
Dist({3,6},{2}) = min(dist(3,2), dist(6,2))
min(0.15, 0.25) = 0.15
Dist({3,6}, {5}) = min(dist(3,5), dist(6,5)) =0.28
Dist ({3,6}, {4}) = min(dist(3,4), dist(6,4)) = 0.15
Dist({3,6}, {1}) = min(dist(3,1), dist(6,1)) = 0.22
51

Cluster Similarity: MIN or Single Link
1
2
3
4
5
6
1
2
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
Dist({3,6},{2}) = min(dist(3,2), dist(6,2))
min(0.15, 0.25) = 0.15
Dist ({3,6}, {4}) = min(dist(3,4), dist(6,4)) = 0.15
Dist ({3,6},{2}), Dist ({3,6}, {4}) > dist(5,2) = 0.14
0.14
52

Link
1
2
3
4
5
6
1
2
3
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
Dist({3,6},{2,5}) = min(dist(3,2), dist(6,2), dist(3,5), dist(6,5))
= min(0.15, 0.25, 0.28, 0.39)
=0.15
Dist ({3,6},{1}) = min(dist(3,1), dist(6,1))
= min(0.22, 0.23) = 0.22
Dist ({3,6},{4}) = min (dist(3,4), dist(6,4))
= min(0.15, 0.22) = 0.15
53

Link
1
2
3
4
5
6
1
2
3
4
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
Dist({3,6,2,5}, {1}) = min(dist(3,1), dist(6,1), dist(2,1), dist(5,1))
= min(0.22, 0.23, 0.24, 0.34) = 0.22
Dist(({3,6,2,5}, {4}) = min(dist(3,4), dist(6,4), dist(2,4), dist(5,4))
= min(0.15, 0.22, 0.20, 0.29) = 0.15

Link
1
2
3
4
5
6
1
2
3
4
5
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
Dist(({3,6,2,5,4}, {1}) = min(dist(3,1), dist(6,1),
dist(2,1), dist(5,1), dist(4,1))
= min(0.22, 0.23, 0.24, 0.34, 0.37)
= 0.22
55

Strength of MIN
Original Points Two Clusters
• Can handle non-elliptical shapes
56

Limitations of MIN
•
• Sensitive to noise and outliers
57

Cluster Similarity: MAX or
Complete Linkage
• Complete link MAX
2
58

3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Complete Linkage
1
2
3
4
5
6
1
0.11
59

Complete Linkage
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1
2
3
4
5
6
1
Dist({3,6},{1}) = max(dist(3,1), dist(6,1))= 0.23
Dist({3,6},{4}) = max(dist(3,4), dist(6,4))= 0.22**

Complete Linkage
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1
2
3
4
5
6
1
2
Dist({3,6},{4}) = max(dist(3,4), dist(6,4))= 0.22 > Dist({2},{5})
0.14
61

Complete Linkage
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1
2
3
4
5
6
1
3
Dist({3,6},{2,5}) = max(dist(3,2), dist(3,5), dist(6,2), dist(6,5))
= 0.39
Dist({3,6},{4}) = max(dist(3,4), dist(6,4))= 0.22**
Dist({2,5},{4}) = max(dist(2,4), dist(5,4)) = 0.29
Dist({2,5}, {1}) = max(dist(2,1), dist(5,1)) = 0.34
0.22
2
62

Complete Linkage
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1
2
3
4
5
6
1
2
3
4
Dist({2,5}, {1}) = max(dist(2,1), dist(5,1))= 0.34**
Dist({2,5}, {3,6,4}) = max(dist(2,3), dist(2,6), dist(2,4)),
dist(5,3), dist(5,6), dist(5,4))
= max(0.15, 0.25, 0.20, 0.28, 0.39, 0.29)
= 0.39
0.34
63

Cluster Similarity: MAX or Complete Linkage
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1
2
3
4
5
6
1
3 5
2
4
Dist({2,5,1},{3,6,4}) = max(dist(2,3), dist(2,6), dist(2,4)
dist(5,3), dist(5, 6), dist(5,4)
dist(1,3), dist(1,6), dist(1,4))
= max(0.15, 0.25, 0.20, 0.28, 0.39,
0.29, 0.22, 0.23,0.37)
=0.39
0.39
64

Strength of MAX
• Less susceptible to noise and outliers
65

Limitations of MAX
•Tends to break large clusters
•Biased towards globular clusters
66

Cluster Similarity: Group Average
• Group average Hierarchical
Clustering
Single link
compete link
67

Linkage
1
2
3
4
5
6
1
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
0.11
68

1
2
3
4
5
6
1
Dist({3,6},{1}) = avg(dist(3,1), dist(6,1)) = (0.22+0.23)/(2*1) = 0.225
Dist ({3,6},{2}) = avg(dist(3,2), dist(6,2)) = (0.15+0.25)/(2*1) = 0.20
Dist({3,6},{4}) = avg(dist(3,4), dist(6,4)) = (0.15+0.22)/(2*1) = 0.185**
Dist({3,6}, {5}) = avg(dist(3,5), dist(6,5)) = (0.28+0.39)/(2*1) = 0.335
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
0.11
69

1
2
3
4
5
6
1
2
Dist({3,6},{4}) = avg(dist(3,4), dist(6,4)) = (0.15+0.22)/(2*1) = 0.185** >
Dist({2}, {5})
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
0.14
70

1
2
3
4
5
6
1
2
3
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
Dist({3,6},{1}) = avg(dist(3,1), dist(6,1)) = (0.22+0.23)/(2*1) = 0.225
Dist ({3,6},{2}) = avg(dist(3,2), dist(6,2)) = (0.15+0.25)/(2*1) = 0.20
Dist({3,6},{4}) = avg(dist(3,4), dist(6,4)) = (0.15+0.22)/(2*1) = 0.185**
Dist({3,6}, {5}) = avg(dist(3,5), dist(6,5)) = (0.28+0.39)/(2*1) = 0.335
0.185
71

1
2
3
4
5
6
1
2
3
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
Dist({3,6, 4},{1}) = avg(dist(3,1), dist(6,1), dist(4,1))
= (0.22+0.23+0.37)/(3*1) = 0.273
Dist ({3,6,4},{2,5}) = avg(dist(3,2), dist(3,5), dist(6,2), dist(6,5),
dist(4,2), dist(4,5))
= (0.15+0.28+0.25+0.39+0.20+0.29)/(3*2)
= 0.26
4
0.26

1
2
3
4
5
6
1
2
5
3
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
4
Dist ({3,6,4,2,5}, {1}) = avg(dist(3,1), dist(6,1), dist(4,1), dist(2,1),dist(5,1))
= (0.22+0.23+0.37+0.24+0.34)/(5*1)
= 0.28
0.28
73

Hierarchical Clustering: Group
Average
• Compromise between Single and Complete
Link
• Strengths
– Less susceptible to noise and outliers
• Limitations
– Biased towards globular clusters
74

Hierarchical Clustering:
Comparison
Group Average
MIN MAX
1
2
3
4
5
6
1
2
5
3
4
1
2
3
4
5
6
1
2 5
3
41
2
3
4
5
6
1
2
3
4
5
75

Hierarchical Clustering:
Comparison
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
Group Average
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
MAX
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
MIN
76

Internal Measures : Cohesion and
Separation
(graph-based clusters)• A graph-based cluster approach can be evaluated by
cohesion and separation measures.
– Cluster cohesion is the sum of the weight of all links within a
cluster.
– Cluster separation is the sum of the weights between nodes in the
cluster and nodes outside the cluster.
cohesion separation
77

Cohesion and Separation (Central-
based clusters)
• A central-based cluster approach can be
evaluated by cohesion and separation
measures.
78

Cohesion and Separation (Central-
based clustering)
• Cluster Cohesion: Measures how closely related are
objects in a cluster
– Cohesion is measured by the within cluster
sum of squares (SSE)
• Cluster Separation: Measure how distinct or well-
separated a cluster is from other clusters
– Separation is measured by the between cluster
sum of squares
»Where |Ci| is the size of cluster i
i Cx
i
i
mxWSS 2
)(
i
ii mmCBSS 2
)(
79

Example: Cohesion and Separation
 Example: WSS + BSS = Total SSE (constant)
1 2 3 4 5
m
1091
9)5.43(2)5.13(2
1)5.45()5.44()5.12()5.11(
22
2222
Total
BSS
WSSK=2 clusters:
10010
0)33(4
10)35()34()32()31(
2
2222
Total
BSS
WSSK=1 cluster:
1 2 3 4 5m1 m2
m

HW#8
81
• Database Segmentation
K-means clustering K=3
C1 = (1,5)
, C2 = (3,12) C3 = (2,13)
(Pattern)
K-means

HW#8
82
• What is cluster?
• What is Good Clustering?
• How many types of clustering?
• How many Characteristics of Cluster?
• What is K-means Clustering?
• What are limitations of K-Mean?
• Please explain method of Hierarchical
Clustering?

ID X Y
A1 1 5
A2 4 9
A3 8 15
A4 6 2
A5 3 12
A6 10 7
A7 7 7
A8 11 4
A9 13 10
A10 2 13

LAB 8
84
• Use weka program to construct a neural
network classification from the given file.
• Weka Explorer  Open file  bank.arff
• Cluster  Choose button 
SimpleKMeans  Next, click on the text
box to the right of the "Choose" button to
get the pop-up window

08 clustering

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to 08 clustering

Similar to 08 clustering (20)

Recently uploaded

Recently uploaded (20)

08 clustering