A New Index of Cluster Validity: Mu-Chun Su
A New Index of Cluster Validity: Mu-Chun Su
A New Index of Cluster Validity: Mu-Chun Su
Validity
Mu-Chun Su
Major Difficulties
∆ ( Ak ) = max{d ( x i , x j ) | x i , x j ∈ Ai }
d is a distance function and Aj is the set whose elements are the data points
assigned to the ith cluster.
The main drawback with direct implementation of Dunn’s index is
computational since calculating becomes computationally very expensive as
c and n increase.
Davies-Bouldin’s Index (1)
• Its major difference from Dunn’s index is that it considers the
average case by using the average error of each class.
• This index is a function of the ratio of the sum of within-cluster
scatter to between-cluster separation, it uses both the clusters and
their sample means.
• First, define the within ith cluster scatter and the between ith and
jth cluster as
1
1 q
S i ,q
= ∑ || x − v i || 2
q
| Ai | x∈Ai
1
p
t
d ij ,t = ∑ | v si − v sj t
| =|| v i − v j ||t
s =1
Davies-Bouldin Index (2)
• where v i is the ith cluster center, q, t ≥ 1 , q is an integer and q,t
can be selected independently of each other. | Ai | is the number of
elements in Ai
• Next, define
S i ,q + S j ,q
Ri ,qt = max
j∈c , j ≠ i
d ij ,t
• Finally, the Davies-Bouldin index can be
defined as
1 c
DB (c) = ∑ Ri ,qt
c i =1
Partition Coefficient (PC)
• Bezdek designed the partition coefficient (PC) to
measure the amount of “overlap” between clusters.
• He defined the partition coefficient (PC) as follows.
c N
1
PC (c) =
N
∑∑ ij
(u ) 2
i =1 j =1
∑∑ u d (x j − vi )2 ∑∑ u d (x j − vi )2
2 2
ij ij
i =1 j =1 i =1 j =1
S (c ) = =
N min {d (v m − v n )}2 N ∗ (d min ) 2
m , n =1,K,c
and m ≠ n
1 c 1 1
max{d ( x j , x k )} max{d ( x j , x k )}
c
∑ ∑
c i =1 | Ai | x j ∈Ai x k ∈Ai
∑ ∑
i =1 | Ai | x j ∈ Ai k i
x ∈A
CS (c) = =
1 c
∑ { min {d (v i , v j )}
c i =1 ∈c , j ≠i
j
} c
i =1
{
∑ min {d (v i , v j )}
j∈c , j ≠ i
}
Four Spherical Clusters (1)
c=4
.
Fig. 3(a). The data set in example 1: Fig. 3(b). The final clustering result achieved by
It contains of a mixture of compact the FCM algorithm at
spherical and ellipsoidal clusters.
Four Spherical Clusters (2)
c 2 3 4 5 6 7 8 9 10
FHV 1.124 1.047 0.780 0.851 0.927 0.929 1.075 0.975 0.901
Fig. 4.(a) The data set in example 2: Fig. 4(b). The final clustering result achieved by
It contains five compact clusters. the Gustafson-Kessel algorithm at
A Mixture of Spherical and
Ellipsoidal Clusters (2)
c 2 3 4 5 6 7 8 9 10
FHV 1.858 1.570 1.253 0.921 1.044 1.061 1.073 1.055 1.083
Fig. 5(a). The data set in example 3: Fig. 5(b). The final clustering result achieved
It contains distributed on five clusters by the FCM algorithm at
Five Clusters (2)
c 2 3 4 5 6 7 8 9 10
FHV 1.957 1.725 1.072 0.925 0.933 0.754 0.725 0.653 0.751