Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

6 - A - Robust - Dynamic - Niching - Genetic - Algorithm - With - Niche - Migration - For - Automatic - Clustering - Problem 2010

Uploaded by

yuvicena940
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

6 - A - Robust - Dynamic - Niching - Genetic - Algorithm - With - Niche - Migration - For - Automatic - Clustering - Problem 2010

Uploaded by

yuvicena940
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

ARTICLE IN PRESS

Pattern Recognition 43 (2010) 1346–1360

Contents lists available at ScienceDirect

Pattern Recognition
journal homepage: www.elsevier.com/locate/pr

A robust dynamic niching genetic algorithm with niche migration for


automatic clustering problem
Dong-Xia Chang a,c,, Xian-Da Zhang a, Chang-Wen Zheng b, Dao-Ming Zhang a
a
Tsinghua National Laboratory for Information Science and Technology, State Key Laboratory on Intelligent Technology and Systems, Department of Automation,
Tsinghua University, Beijing 100084, China
b
National Key Lab of Integrated Information System Technology, Institute of Software, Chinese Academy of Sciences, Beijing 100080, China
c
Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China

a r t i c l e in f o a b s t r a c t

Article history: In this paper, a genetic clustering algorithm based on dynamic niching with niche migration
Received 16 January 2009 (DNNM-clustering) is proposed. It is an effective and robust approach to clustering on the basis of a
Received in revised form similarity function relating to the approximate density shape estimation. In the new algorithm, a
1 October 2009
dynamic identification of the niches with niche migration is performed at each generation to
Accepted 4 October 2009
automatically evolve the optimal number of clusters as well as the cluster centers of the data set
without invoking cluster validity functions. The niches can move slowly under the migration operator
Keywords: which makes the dynamic niching method independent of the radius of the niches. Compared to other
Clustering existing methods, the proposed clustering method exhibits the following robust characteristics: (1)
Genetic algorithms
robust to the initialization, (2) robust to clusters volumes (ability to detect different volumes of
Niching method
clusters), and (3) robust to noise. Moreover, it is free of the radius of the niches and does not need to
Niche migration
Remote sensing image pre-specify the number of clusters. Several data sets with widely varying characteristics are used to
demonstrate its superiority. An application of the DNNM-clustering algorithm in unsupervised
classification of the multispectral remote sensing image is also provided.
& 2009 Elsevier Ltd. All rights reserved.

1. Introduction each cluster and maximizing the dissimilarity of different clusters.


Among the partitional clustering methods, the K-means [1]
Clustering analysis [1,2] is a core problem in data mining with algorithm is one of the more widely used algorithms.
innumerable applications spanning many fields. The primary However, most hierarchical and partitional clustering methods
objective of clustering analysis is to partition a given set of data or have a drawback that the number of clusters need to be specified
objects into groups or clusters so that objects in the same cluster a priori. For hierarchical clustering, the problem of selecting the
are similar in some sense and differentiate from those of other number of clusters is equivalent to deciding in which level to cut
clusters in the same sense. In the past, many clustering methods the tree. Partitional clustering algorithms typically require the
were proposed [1–5]. Generally, these algorithms can be broadly number of clusters as user input. Since a priori knowledge is
divided into two classes [3]: hierarchical and partitional. generally not always available, estimation of the number of
Hierarchical clustering proceeds successively by either merging clusters from the data set under review is required under some
smaller clusters into larger ones or by splitting larger clusters. circumstances. The classical approach of determining the number
Moreover, the hierarchical clustering algorithms can be subdi- of clusters involves the use of some validity measures [11–13].
vided into agglomerative methods [6–8], which proceed by series Within a range of the values of cluster number, the evaluation of a
of fusions of the objects into groups, and divisive methods [9,10], certain validity function of the clustering result is performed for
which separate objects successively until all clusters are singleton each given cluster number and then an optimal number is chosen
clusters. Partitional clustering attempts to directly decompose the for the validity measure. The number of clusters searched by this
data set into several disjointed clusters based on some criteria. method depends on the selected clustering algorithm, and the
The most common criterion adopted by partitional clustering is performance of the selected algorithm may rely on the initializa-
minimizing some measure of dissimilarity in the samples within tion. Some other methods of estimating the number of clusters
are based on the idea of cluster removal and merging. In
progressive clustering [14,15], the number of clusters is over-
 Corresponding author. specified. After convergence, spurious clusters are eliminated and
E-mail address: chang_dongxia@hotmail.com (D.-X. Chang). compatible clusters are merged. The thickest challenge of this

0031-3203/$ - see front matter & 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2009.10.020
ARTICLE IN PRESS
D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360 1347

method lies in how to define the spurious and compatible biological features capable of interbreeding among themselves,
clusters. Moreover, although overspecification of the cluster but unable to breed with individuals of other species [37]. By
number can reduce the initial cluster center effects, there is no analogy, in artificial systems, a niche corresponds to a local
way to guarantee that all clusters in the data set will be found. An optimum of the fitness function, and the individuals in one niche
alternative version of the progressive clustering is to seek one exhibit similar feature in terms of a given metric. Among niching
cluster at a time until no more ‘‘good’’ clusters can be found methods, fitness sharing (FS) and implicit fitness sharing are the
[16,17]. The performances of these techniques are also dependent best known and the most widely used methods [38–42]. In the
on the validity functions, which are used to evaluate the former, the fitness represents the resource for which the
individual clusters. individuals belonging to the same niche compete [38], while in
Since the global optimum of the validity function would the latter [40,41], the sharing effects are achieved by means of a
correspond to the most ‘‘valid’’ solutions with respect to the sample-and-match procedure.
functions, stochastic clustering algorithms based on genetic In FS, the fitness of an individual is reduced if there are many
algorithms (GAs) [18–21] have been reported to be able to other individuals near it and so the GA is forced to maintain
optimize the validity functions to determine the number of diversity in the population [38]. This method should define a
clusters and partitioning of the data set simultaneously. In these similarity metric on the search space and an appropriate niche
GA-based algorithms, the validity functions are regarded as the radius, representing the maximal distance among individuals to be
fitness function to evaluate the fitness of the individual, which considered similar and therefore belonging to the same niche. In
guides the evolution to search for the ‘‘valid’’ solution. In recent most circumstance, it is difficult to give an effective value for the
years, several clustering algorithms based on simple GA or its niche radius without any a priori knowledge. Deb and Goldberg
variants have been developed [22–36]. These algorithms fall into proposed a criterion for estimating the niche radius given the
two broad categories based on the representations for the heights of the peaks and their distances [39]. Since in most of
clustering solutions. The first category uses a straightforward the real applications there is very little prior knowledge about the
encoding, in which the chromosome is encoded as a string of fitness landscape, it is difficult to estimate the niche radius. In the
length n, where n is the number of data points and the element of implicit fitness sharing [40], sharing is accomplished by inducing
the chromosome denotes the cluster number that data point competition for limited and explicit resources, and there is no
belongs to, such as used in Refs. [22–24]. The desired number of specific limitation on the distance between peaks. This method
clusters should be specified in advance. Moreover, this approach avoids the difficult of appropriately choosing the niche radius and
does not reduce the size of the search space and searching the can be used to deal with problems in which the peaks are not
optimal solution can be onerous when the data points proliferate. equally spaced [40–42]. So, one of the most important limitations of
It is for this reason that some researchers opt to use a relatively FS seems to be removed. In fact, some other parameters, such as the
indirect approach where the chromosome encodes the centers of size of the sample of individuals that compete, the number of
the clusters, and each datum is subsequently assigned to the competition cycles and the definition of a matching procedure, need
closest cluster center [25–36]. This kind of algorithms can be to be set. In order to improve the performance of the FS methods,
subdivided into fixed-length encoding algorithms [25–31], which several dynamic niching methods were proposed [46,47]. These
use a fixed-length string to describe the cluster centers and the methods are based upon a dynamic, explicit identification of species
number of clusters is specified a prior, and variable-length discovered at each generation and the FS mechanism is restricted to
encoding algorithms [32–36], which use a variable-length string individuals belonging to the same species. However, the perfor-
to describe the cluster centers and the number of clusters is mance of these algorithms is dependent on the niche radius. When
automatically evolved. Although the number of cluster centers wrong value for the niche radius is selected, the algorithm did not
need not to be given in advance in the variable-length encoding find all the niches perfectly. In Ref. [48], a species conserving
algorithms, the initial values of the cluster centers are constrained genetic algorithm (SCGA) was proposed which does not consider
to be in the range from 2 to kmax , and kmax is the upper bound of any sharing mechanism. Once a new species is discovered, its fittest
the number of clusters and should be specified beforehand. individual is retained in the next generations until a fitter individual
Because the traditional GAs are suitable for locating the optimum for that species is generated. Therefore, each species populating a
of unimodal functions as they converge to a single solution of the region of the fitness landscape survives during the entire evolution,
search space, all these GA-based clustering algorithms consider whether or not it corresponds to an actual niche. Moreover, the
the clustering problem as a unimodal problem. Each chromosome performance of this algorithm is also depends on the niche radius.
is described by a sequence of the cluster centers. When every In addition, all these algorithms are not robust to noise. When the
cluster center is contained in the chromosome, then the fitness data set contains noise points, the performances of these algorithms
function reaches its global optimum. However, a simpler way is to are poor.
consider the clustering problem as a multimodal problem and In this paper, a new clustering algorithm based on dynamic
each cluster center corresponds to a local optimum of the fitness niching with niche migration (DNNM-clustering) is proposed
function. In this circumstance, each chromosome represents a which is robust to noise and cluster volumes. Within the
cluster center and all the local optima of the fitness function DNNM-clustering, a dynamic niching with niche migration is
should be found. Algorithms that allow the formation and the developed to preserve the diversity of the population. A simpler
maintenance of different solutions can be used to solve this representation is adopted, whereby each individual represents a
multimodal problem. single cluster center. All the niches presented in the population at
In order to preserve the population diversity, which prevents each generation are automatically and explicitly identified. Then,
GAs being trapped by a single optimum, niching methods have the application of FS is limited to individuals belonging to the
been developed. The basic idea of the niching methods is based same niche. In order to overcome the dependence of the niche
upon the natural ecosystems, which maintain population diver- radius, a niche migration is considered. This makes the algorithm
sity and permit the GA to investigate many optima in parallel. In work properly and independent of the niche radius even if some
nature, an ecosystem is typically composed of different physical noise points exist and the peaks are not equally spaced and have
niches that exhibit different features and allow both the different cluster volumes.
formation and the maintenance of different types of life (species). The rest of this paper is organized as follows. Section 2
It is assumed that a species is made up of individuals with similar provides the fitness function of the clustering problem used in the
ARTICLE IN PRESS
1348 D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360

algorithm. The dynamic niching with niche migration is presented detailed explanation can be found in Ref. [49]. Note that the ‘‘’’
in Section 3. Section 4 describes the evolutionary clustering in Fig. 1 means the value of J~ s ðxk Þ with respect to the data point xk ,
algorithm. Experimental results on several artificial data sets and k ¼ 1; 2; . . . ; n. According to Fig. 1(b), only two clusters will be
remote sensing image are given in Section 5. Experimental results found when g ¼ 1 and all the five peaks will be separated when g
demonstrate the effectiveness of the DNNM-clustering algorithm. increases to 5 and 10 as shown in Figs. 1(c) and (d).
Finally, conclusions are drawn in Section 6. Here, the CCA algorithm [49] is used to estimate g. For
convenience, it is presented in the following:

2. The fitness function


1. Set m ¼ 1 and e1 ¼ 0:97.
Let X ¼ fx1 ; x2 ; . . . ; xn g be a finite subset of a N-dimensional 2. Calculate the correlation of the value of J~ s ðxk Þgm and J~ s ðxk Þgðm þ 1Þ .
vector space, K be the number of clusters and Sðxj ; ci Þ denote the 3. If the correlation is greater than or equal to the specified e1 ,
similarity measure between xj and the i th cluster center ci . Our then choose gm to be the estimate of g, else m ¼ m þ1 and goto
clustering goal is to find ci to maximize the total similarity step 2.
measure JðcÞ with
!!g
XK X n
Jxj -ci J2 After getting the estimation of g, the function J~ s ðxk Þ becomes a
JðcÞ ¼ exp - ; ð1Þ
i¼1j¼1
b multimodal function which the number of peaks is equal to the
number of clusters. Therefore, the clustering problem can be
where c ¼ ðc1 ; c2 ; . . . ; cK Þ and b can be defined by transformed into a multimodal problem through this objective
Pn Pn function. In the following, our new algorithm will be used to
Jxj -xJ2 j ¼ 1 xj
b¼ j¼1 where x ¼ : ð2Þ estimate all the local optima of J~ s ðxk Þ. The number of the local
n n
optima is the same to the number of clusters, and the local optima
According to the analysis of g in Ref. [49], we know that g can are the cluster centers.
determine the location of peaks in the objective function Js ðcÞ. And
the value of b is no longer sensitive to the peak. Let J~ s ðxk Þ be the
total similarity of the data point xk to all data points with
!g
X
n
Jxj -xk J2 3. The dynamic niching with niche migration
~J ðxk Þ ¼ exp- ; k ¼ 1; 2; . . . ; n: ð3Þ
s
j¼1
b
Niching methods have been developed to minimize the effect
This function can be seen closely related to the density shape of of genetic drift resulting from the selection operator in the
the data points in the neighborhood of xk . A large value for J~ s ðxk Þ traditional GA in order to allow the parallel investigation of many
means that the data point xk is close to some cluster centers and solutions in the population. We begin this section by providing a
has many data points around it. A good estimation of g can give a brief overview of the fitness sharing method, which is a
good estimation of the peak of J~ s ðxk Þ. Here, we use the data set representative niche method. In order to overcome the drawbacks
shown in Fig. 1(a) to see the influence of g on (3) and more of the FS, a dynamic niching with niche migration is proposed.

200
0.5
150
0
100

−0.5 50
1
1
0 0
−1
−1 −1
−0.5 0 0.5 1

80 60

60
40
40
20
20

0 0
1 1
1 1
0 0 0 0
−1 −1 −1 −1

Fig. 1. (a) Five-clusters data set. (b), (c) and (d) are plots of (3) (the approximate density shapes) with g ¼ 1, 5 and 10, respectively.
ARTICLE IN PRESS
D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360 1349

3.1. Fitness sharing In our algorithm, two strategies, the initialization of the niche
radius and the migration of the niche candidates, are used to
Fitness sharing modifies the search landscape by reducing the achieve the goal of independent of the initial niche radius. After
fitness of an individual in densely populated regions. It works by the initialization of the niche radius, the dynamic niching method
derating the fitness of each individual by an amount related to the attempts to find the niches according to this radius at each
number of similar individuals in the population. Specifically, the generation. And the niche candidates identified will change their
shared fitness fsh;t ðiÞ of an individual i at generation t is given by location under the migration operator. The skeleton of dynamic
niching algorithm is presented in Table 1 and the initialization of
ft ðiÞ
fsh;t ðiÞ ¼ ; ð4Þ the niche radius in Table 2.
mt ðiÞ
When a niche radius inputs, a preprocessing of the input by the
where ft ðiÞ is the raw fitness of the individual, and mt ðiÞ is the algorithm shown in Table 2 is conducted to ensure the niche
niche count which depends on the number and the relative candidates will be sufficiently diverse in the first generation. Here
positions of the individuals within the P population. The niche Z is a constant and Z A ð1; 2Þ.
count is calculated as In phase II of the algorithm shown in Table 1, a migration
X
P operator is introduced. In reality, if a city is prosperous and its
mt ðiÞ ¼ shðdij Þ; ð5Þ citizen lead comfortable lives, then it will attract the people living
j¼1 nearby to migrate to it. For the clustering problem, the effect of
this migration operation is to change the relative position of the
where P is the population size, dij is the distance between the
niches in the entire population. Based on this analogy between
individual i and j, and shðdij Þ is the sharing function which
our society and a clustering problem, a migration operator is
measures the similarity between two individuals. The most
introduced and explained in the following. First, several defini-
commonly used form of sh is
8  a tions used in the migration operator are given.
< 1- dij
sh
>
if dij o ssh ; Definition 1 (Niche attraction). Suppose c1 ; c2 ; . . . ; cm are m
shðdij Þ ¼ ssh ð6Þ
>
:0 individual in a niche, and the fitness values are f1 ; f2 ; . . . ; fm ,
otherwise;
respectively. The attraction one niche acts on another niche is
where ssh is the niche radius and ash is a constant parameter defined as
which regulates the shape of the sharing function. The value of ash F ¼ ðf1 þf2 þ    þ fm Þ=m: ð7Þ
is commonly set to 1, yielding to a triangular form for the sharing
function [50]. The distance dij between individual i and j is Definition 2 (Migration principle). Let Ni and Nj be two niches,
implemented by defining a metric on either the genotypic or the and the niche attraction of the two niches are Fi and Fj ,
phenotypic space.
It has been proved that when the number of individuals within Table 1
the population is large enough and the niche radius is properly The dynamic niching algorithm with niche migration.
set, FS provides as many niches in the population as the number
of peaks in the fitness landscape [51,52]. But, there are several Input: the population Popt at generation t, the population size P, the niches
radius s( this value is obtained by the algorithm shown in Table 2)
problems with the fitness sharing approach. In order to ensure
that subpopulations are steadily formed and maintained, only the Sort the current population according to the raw fitness
individuals belonging to the same niche should share the vðtÞ ¼ 0 (the number of actual niches at generation t)
resources of the niche. This assumption is not generally true for uðtÞ ¼ 0 (the number of niche master candidates)
NC ¼ | (the niche master candidate set)
the FS methods [53], because each individual in the population
DN ¼ | (the dynamic niche set)
shares its fitness with all the individuals located at a distance
smaller than the niche radius, no matter for the actual peak, i.e., Phase I: The niche master candidates identification.
for the niche, to which they belong. As a consequence, individuals
For i ¼ 1 to P do
belonging to different peaks may share their fitness, while they if the i th individual is not marked then
should not. Moreover, the radius of the niches should be specified uðtÞ ¼ uðtÞþ 1
and this requires a priori knowledge of how far apart the optima NðuðtÞÞ ¼ 1 (the number of individuals in the u(t)th niche candidate set)
are. However, no information about the search space and the For j ¼ i þ 1 to P do
if (dði; jÞ o s) and (j th individual is not marked)
distance between the optima is available in the practical
insert j th individual into the niche master candidate set NC,
optimization problems. When the niche radius is wrong, the NðuðtÞÞ ¼ NðuðtÞÞþ 1
algorithm cannot find all the niches. In order to overcome these end if
drawbacks, a dynamic niching with niche migration is proposed. end for
If ðNðuðtÞÞ 41Þ then
vðtÞ ¼ vðtÞþ 1
3.2. Dynamic niching with niche migration mark i th individual as the niche master of the v(t)th niche
insert the pair (i th individual, NðuðtÞÞ) in DN
In this section, we propose a dynamic niching method which is end if
end if
independent of the niche radius. From Refs. [38–48], we can see End For
that the radius of the niches plays a crucial role in the
identification of the niches. If the niche radius chosen is too Phase II: The migration of the niches.

small, many niches may be found in every generation. On the Calculate the distance between the niche master candidates
other hand, a large value of the radius will make many solutions For l ¼ 1 to uðtÞ
indistinguishable. This means that too few niches will be If j th niche is the nearest neighbor to l th niche, then determine
conserved. If the radius is so large that only one niche master is the communication edge between these two niches according to Theorem 1.
If there exits communication between the two niches and Fl o Fj , then
found, the algorithm will degenerate into a simple genetic
niche l migrates toward niche j, otherwise niche l keep station.
algorithm and only find one optimum with the largest fitness End For
value.
ARTICLE IN PRESS
1350 D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360

Table 2
The initialization of the niche radius.

Input: Pop1 , the population at generation 1 sinit , an input niche radius M1


The Phase I described in Table 1 is used to determine the number of master
candidate uð1Þ and the number of actual niche masters vð1Þ.
If (vð1Þr 2 and uð1Þ4 P=3) then
s ¼ Zsinit
Else if (vð1Þ ¼ 1 or ðuð1Þ-vð1ÞÞo 1) then
s ¼ sinit =Z
End

respectively. If Fi 4 Fj , then Nj will migrate to Ni . Otherwise, Ni will M2


migrate to Nj .

Definition 3 (Distance of niches). Let Mi and Mj be two masters of


M4
two niches Ni and Nj , then the distance of these two niches is M3
defined as
dN ðNi ; Nj Þ ¼ dðMi ; Mj Þ ¼ JMi -Mj J2 : ð8Þ

For the niche candidate sets identified in phase I of the Fig. 2. An example of influence of noise.
algorithm shown in Table 1, the nearest neighbor of each niche
should be found. Here, a uðtÞ  uðtÞ matrix D is used to indicate the samples in between (see Fig. 2). From Fig. 2, we can see
nearest neighbor of the niche candidates, f ðM3 Þ oFðM4 Þ. Then according to Theorem 1, there is no
8 communication between M1 and M4 . However, the slight
<1 if dN ðNi ; Nj Þ ¼ min dN ðNk ; Nj Þ;
Dij ¼ k a j;k ¼ 1;2;...;uðtÞ
ð9Þ variance between the function values of M3 and M4 can be seen
:0 otherwise; as a result of the noise. In order to overcome the influence of
noise, we define a noise tolerance factor r ð0:8 r r r 1Þ and the
where dN ðNi ; Nj Þ is the distance between niche i and niche j. If inequality (11) modified as
Dij ¼ 1, then given the ability for the two niches to communicate.
f ðxm Þ o r minðfi ; fj Þ: ð12Þ
And the communication topology is specified by a matrix S, where
Sij is the number sent from niche i to niche j. Here, Sij ¼ 1 means Then inequality (12) will be used in the determination of
exist communication edge between these two niches, and Sij ¼ 0 communication between two points.
indicates no communication edge between them. The value of Sij After the determination of the communication, the magnitude
is determined by Theorem 1. of migration is defined.

Theorem 1. Let Ni and Nj be two niches, and Mi and Mj be the niche Definition 4 (Migration magnitude). Let Ni and Nj be two niches
masters of these two niches with fitness value fi and fj , a line that identified in one generation, and niche attraction of the two
intersects the two niche masters can be written as niches are Fi and Fj , respectively. The distance of these two niches
x ¼ Mi þkðMj -Mi Þ; k A ½0; 1: ð10Þ is dN ðNi ; Nj Þ and the niche masters be Mi and Mj . If Fi 4 Fj , then the
migration magnitude of the individual in Nj is defined as
Then, a series of points x1 ; x2 ; . . . ; xl is generated along this line and ,
the fitness of those points is calculated by Eq. (4). If ( mA ½1; l Fi
Dl ¼ d 2
r; ð13Þ
satisfies dN ðNi ; Nj Þ
,
f ðxm Þ o minðfi ; fj Þ; ð11Þ where r is the direction vector from the individual in Nj to Mi , and
d is a small constant greater than 0, called the migrating rate. We
then a valley lies between Ni and Nj , and at the same time there is no imagine here that the niches are migrating with negligible
communication between them and Sij ¼ 0. Otherwise, the commu- magnitude.
nication exist and Sij ¼ 1.
After the dynamic identification of the niche masters of the
The concept of Theorem 1 is simple. Given two end points in population Popt at generation t, the species belonging to the niche
Euclidean space, then choose a number of points along the line in master candidate can be defined as a subset Sit a | of individuals in
between the two end points and calculate the fitness of those the population Popt which have a distance from the master
points. In this way, it is possible to determine if a valley lies candidate less than the niche radius and do not belong to other
between the two end points (i.e., Mi and Mj ). If a valley lies species. If the number of the individuals in Sit is larger than 1, then
between the two niches (i.e., ( m A ½1; l, satisfies the inequality this subset is assumed as the actual niche, otherwise, the single
(11)), then the two niches can be seen as two different species and individual in the subset is considered as an isolated individual and
they should not communicate with each other. If no point has all the isolated individuals form the subset St . Then, the
lower fitness than either of the endpoints, then no valley lies population Popt at the generation is partitioned into a number
between the two niches, and they can communicate with each vðtÞ of species, say S1t ; S2t ; . . . ; SvðtÞ
t , and a number of isolated
other. The implementation of Theorem 1 is terminated on the first individuals
point discovered that had lower fitness than either of the two end 0 1
points. It is quite obvious how powerful this theorem is and how [
the decisions of whether the communication of niches exist using Popt ¼ @ St A [ St :
i
ð14Þ
i A f1;2;...;vðtÞg
it.
In fact, the inequality (11) used in Theorem 1 is not robust to After the identification of niches, the sharing fitness of each
noise. For example, given two end points M1 and M4 , and two individual was calculated according to (4). Here, the shared fitness
ARTICLE IN PRESS
D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360 1351

value for an individual within a dynamic niche (identified by the 4.3. Evolutionary operators
dynamic niching algorithm) is its raw fitness value divided by the
niche count. Otherwise, the individual belongs to the isolated Any combination of standard selection, crossover and mutate
category, and its fitness is not modified. The niche count in (5) is operators can be employed by our algorithm. Here intermediate
modified as recombination and uniform neighborhood mutation are used.
X For two randomly chosen parents c1 and c2 , the offspring c of
mt ðiÞ ¼ shðdij Þ; ð15Þ
the intermediate recombination crossover (with probability pc ) is
pj A Sit
c ¼ c1 þ rðc1 -c2 Þ; ð18Þ
and shðdij Þ is computed according to (6). Here, only the individuals
belonging to the same niche share their fitness and the fitness of where r is a uniformly distributed random number over ½0; 1.
the isolated individuals is not modified. Each chromosome undergoes mutation with a probability pm .
After all the niches have been found, the new population is If the minimum and maximum values of the data set along the q
q q
constructed by applying the usual genetic operators. Since some th dimension are cmin and cmax , respectively. If the position to be
niche masters may not survive during the evolution, the species mutated is the q th dimension of a cluster center with value cq ,
elitist strategy is implemented to enable the niche masters to then after uniform neighborhood mutation the value becomes
survive. Here, only the actual masters are conserved. c0q ¼ cq þ rm Rðcmax
q q
-cmin Þ; ð19Þ

where R is a uniformly distributed random number over ½-1; 1


and rm A ð0; 1Þ.
4. The DNNM-clustering algorithm

4.4. Description of the algorithm


In this section, we propose the dynamic niching with niche
migration clustering algorithm (DNNM-clustering), which can be
used to optimize the objective function to automatically evolve In our DNNM-clustering algorithm, a chromosome represents
the proper number of clusters as well as appropriate partition of one cluster center and is evaluated by using the fitness function
the data set. described in Section 4.2. The niches are identified by the dynamic
niching algorithm at each generation and the fitness sharing is
computed in every niches. The evolutionary operators, selected on
4.1. Chromosome representation and initialization the basis of probability distribution, can be crossover or mutation,
where the former transforms two individuals (parents) into two
For any GA, a chromosome representation is needed to offspring by combining parts from each parent, and the latter
describe each chromosome in the population. The representation develops on a single individual and creates an offspring by
method determines how the problem is structured in the mutating that individual. The elitist strategy [21] is implemented
algorithm and the genetic operators that are used. Each chromo- by replacing the worst chromosome of the current population
some is made up of a sequence of genes from certain alphabet. An with the niche masters found at each generation. The process
alphabet can consist of binary digits (0 and 1), floating-point
numbers, integers, symbols (i.e., A, B, C, D), etc. In early GAs, the
binary digit was used. It has been shown that more natural 0.5
representations can get more efficient and better solutions.
0.4
Michalewicz [20] has performed extensive experiments to
compare real-valued and binary GAs and shown that the 0.3
real-valued GA is more efficient in terms of CPU time. Therefore, in 0.2
this paper, real-valued numbers are used to describe the chromo-
some. 0.1
ψ (x)

Here the chromosome is encoded the center of the cluster. 0


Each chromosome is described by a sequence of N real-valued
−0.1
numbers where N is the dimension of the feature space. That is to
say, the chromosome of the algorithm is written as −0.2

c ¼ ½c1 ; c2 ; . . . ; cN : ð16Þ −0.3

An initial population of size P for DNNM-clustering algorithm −0.4


is usually chosen at random. In this paper, several P randomly −0.5
chosen data points from the data set with the exception that no −10 −5 0 5 10
two may be the same are used to initialize the P chromosome. x

Fig. 3. The c function of our estimation.


4.2. Fitness function

The fitness function is used to define a fitness value to each Table 3


The values of CCA.
candidate solution. Here, the fitness function of the chromosome,
f, is defined as Data set 5 and 10 10 and 15 15 and 20 Selected g
!g
X
n
Jxj -cJ2 Data 1 0.9000 0.9972 0.9993 10
~
f ðcÞ ¼ J ðcÞ ¼ exp- ; ð17Þ
s Data 2 0.9824 0.9925 0.9954 5
j¼1
b
Data 3 0.8277 0.9990 0.9996 10
Data 4 0.9779 0.9927 0.9962 5
where xj , j ¼ 1; 2; . . . ; n are all data points in the data set to be Data 5 0.9759 0.9948 0.9976 5
clustered.
ARTICLE IN PRESS
1352 D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360

terminates after some number of generations, fixed either by the The DNNM-clustering algorithm is described as follows:
user or determined dynamically by the program itself, and the
niche masters obtained is taken to be the solution.
1. Initialize a group of cluster centers with size of P.
2. Evaluate each chromosome.
22 3. Apply the dynamic niching algorithm and apply the fitness
sharing among the individuals belonging to the same niche.
20
4. If the termination condition is not reached, go to Step 5.
Otherwise, select the niche master from the population as the
18
final cluster centers.
16 5. Apply the selection operator.
6. Apply crossover operator to the selected individuals based on
14 the crossover probability.
7. Apply mutation operator to the selected individuals based on
12 the mutation probability.
8. Evaluate the newly generated candidates.
10
9. Apply the elitist strategy.
8 10. Go back to Step 3.

6 5. The robust property to noise

4 A good clustering method should be robust that it can


determine good clusters for noisy data set. Several different
2
5 10 15 20 classes of robust methods (such as the M, R, L estimators) exist
[43–45]. Here, the influence function [45] is used to show that our
Fig. 4. Data 1. method is robust to noise. Let x ¼ fx1 ; . . . ; xn g be an observed data

2 0.5 2 0.5

6 0.4 6 0.4
Average Number of Niches
Average Number of Niches

Average Number of Niches with σ = 0.5


Standard Error with σ = 0.5
Standard Error

10 0.3 10 0.3

Standard Error
Average Number of Niches with σ = 3
Standard Error with σ = 3
14 0.2 14 Average Number of Niches with σ = 0.5 0.2
Standard Error with σ = 0.5
Average Number of Niches with σ = 3
18 0.1 18 Standard Error with σ = 3
0.1

22 0 22 0

26 26
0 50 100 150 200 0 50 100 150 200
Generations Generations

2 0.5 2 0.5

6 0.4 6 0.4
Average Number of Niches

Average Number of Niches

Average Number of Niches with σ = 0.5


Standard Error

10 Average Number of Niches with σ = 0.5 0.3 10 Standard Error with σ = 0.5 0.3
Standard Error

Standard Error with σ = 0.5 Average Number of Niches with σ = 3


Average Number of Niches with σ = 3 Standard Error with σ = 3
14 0.2 14 0.2
Standard Error with σ = 3

18 0.1 18 0.1

22 0 22 0

26 26
0 50 100 150 200 0 50 100 150 200
Generations Generations

Fig. 5. The average number of clusters and its standard error by (a) DNS, (b) SCGA, (c) DFS, (d) DNNM-clustering. In all the experiments P ¼ 100.
ARTICLE IN PRESS
D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360 1353

22 22
20 20
18 18
16 16
14 14
12 12
10 10
8 8
6 6
4 4
2 2
5 10 15 20 5 10 15 20

22 22
20 20
18 18
16 16
14 14
12 12
10 10
8 8
6 6
4 4
2 2
5 10 15 20 5 10 15 20

Fig. 6. The cluster centers obtained by using (a) DNS, (b) SCGA, (c) DFS, (d) DNNM-clustering. In all the experiments P ¼ 100.

set of real numbers and y is an unknown parameter to be 16


estimated. We consider xi and y to be scalars and an M estimator
[45] is generated by minimizing the form
14
X
n
y^ ¼ argmin rðxi -yÞ; ð20Þ
y
i¼1
12
where r is a function that can measure the loss of xi and y. If we
let cðxi -yÞ be the derivative of rðxi -yÞ, i.e., cðxi -yÞ ¼ @rðxi -yÞ=@y,
then the M estimator is obtained by 10
X
n
cðxi -yÞ ¼ 0: ð21Þ
i¼1 8
The influence function (IF) can help us to assess the relative
influence of individual observations toward the estimation value.
It has been shown that influence function of an M estimator is 6
proportional to its c function [45]. For a location M estimator, we
have the influence function
4
cðx-yÞ 4 6 8 10 12 14 16
IFy^ ðx; F; yÞ ¼ ; ð22Þ
Ec0 ðx-yÞ
Fig. 7. Data 2.
where F is the distribution of x. If the influence function of an
estimator is unbounded, a noise might case trouble. In the and this also equivalent to maximize
following, we will show the boundedness of the IF function. Let
X
n
-1
rðxi -cÞ ¼ 1-expð-b-1 ðxi -cÞ2 Þg : ð23Þ expð-b ðxi -cÞ2 Þg ð25Þ
i¼1
Minimize (20) with (23) is equivalent to minimize
X
n which is the objective function of our genetic algorithm with one
-1
n- expð-b ðxi -cÞ2 Þg ð24Þ cluster. Our estimation of the cluster center c (by (1)) is equivalent
P -1
i¼1 to an M estimator with the ni¼ 1 expð-b ðxi -cÞ2 Þg in (20) replaced
ARTICLE IN PRESS
1354 D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360

16 16

14 14

12 12

10 10

8 8

6 6

4 4
4 6 8 10 12 14 16 4 6 8 10 12 14 16

16 16

14 14

12 12

10 10

8 8

6 6

4 4
4 6 8 10 12 14 16 4 6 8 10 12 14 16

Fig. 8. The cluster centers obtained by using (a) DNS, (b) SCGA, (c) DFS, (d) DNNM-clustering. In all the experiments P ¼ 100.

by a dissimilarity measure (23) and the c function of our 7


estimation is
-1
-2b ðxi -cÞ 6
cðxi -cÞ ¼ -1
: ð26Þ
expðb ðxi -cÞ2 Þ
Since the influence function IFy^ ðx; F; yÞ is proportion to cðxi -cÞ 5
according to (22), we need only to analyze the term cðxi -cÞ. By
applying the L’Hospital’s rule, we have
4
lim cðxi -cÞ ¼ lim cðxi -cÞ ¼ 0: ð27Þ
xi - þ 1 xi --1

Thus, we have IFðxi ; F; cÞ ¼ 0 when xi tends to positive or 3


negative infinity. From Eq. (27), ( M 4 0, when jxi j Z M, we have
jcðxi -cÞj o1. jcðxi -cÞj o 1 is continuous on interval ½-M; M, thus, (
2
K 40, for 8 x A ½-M; M, it holds that jcðxi -cÞj rK. Let G ¼ maxf1; Kg,
then for all x A ð-1; þ1Þ, we have jcðxi -cÞj rG. According to
above, the function cðxi -cÞ with (23) is bounded and continuous, 1
as shown in Fig. 3. Therefore, the influence of an extremely large
or small xi on our estimator is very small according to (27). In fact,
(27) also shows that an extremely large or small xi can be thought 0
0 1 2 3 4 5 6 7
of a new observation that have no influence (i.e., IFðx; F; yÞ ¼ 0) on
our estimator. Fig. 9. Data 3.
From the analysis above, we can deduce that our estimator has
a bounded and continuous influence function. Hence, it is robust
to noise from the robust statistical point view. image. The experiments show that DNNM-clustering has high
performance and flexibility.

6. Experiments results
6.1. Experiments on artificial data sets
In order to validate the proposed algorithm, we have
performed a set of experiments with several data sets with In this section, the performances of the DNS [46], SCGA [48],
widely varying characteristics and multispectral remote sensing DFS [47] and DNNM-clustering are compared through the
ARTICLE IN PRESS
D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360 1355

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

Fig. 10. The cluster centers obtained by using (a) DNS, (b) SCGA, (c) DFS, (d) DNNM-clustering. In all the experiments P ¼ 150.

experiments. In the experiments, the crossover and mutation 8


probabilities used by all algorithms are pc ¼ 0:8 and pm ¼ 0:005,
respectively. The population size is taken to be 150 for Data 3
since it has more clusters than the other data sets, while it is taken
to be 100 for the other data sets. The parameter used in mutation 7
operator is rm ¼ 0:2. The solution acceptance threshold rf used by
SCCG is rf ¼ 0:95. The total number of generations G is equal to
200. For all the experiments, the entire evolution of G generations,
i.e., the run, has been repeated R ¼ 30 times, with different initial 6
populations in order to reduce the well-known effects of
randomness embedded in the GAs.
In order to evaluate the performance, the average number of
niches and the standard errors [47] can be used. For each 5
evolution, we compute at each generation the number of niches
vðtÞ (the population size P and niche radius s being fixed) and
store it in a R  G matrix W, where one row for each evolution and
one column for each generation. At the end of each experiment 4
consisting in R runs, we compute the average number of niches
discovered at each generation by averaging the R values vðtÞ in all
the columns 4 5 6 7 8

1X
R
Fig. 11. Data 4.
/vðtÞS ¼ Wit ; t ¼ 1; 2; . . . ; G: ð28Þ
Ri¼1

Then, the values /vð1ÞS; . . . ; /vðGÞS represent the average In the experiments, five artificial data sets with widely varying
behavior of the algorithm for the assigned values of P and s. characteristics are used for comparison. All the algorithms run
Finally, we compute the standard errors with two different radii. The number of niches (i.e., the number of
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
!ffi clusters) and the cluster centers obtained are given. Here, we only
u
u1 PR ðWit -/vðtÞSÞ2 give the number of niches of the first data set as example. In the
eð/vðtÞSÞ ¼ t i¼1
ð29Þ
experiments, the value of g in (3) should be determined by the
R R-1
CCA algorithm. The correlations for the five data sets are shown in
of /vðtÞS, 8t A f1; 2; . . . ; Gg. Table 3.
ARTICLE IN PRESS
1356 D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360

8 8
7.5 7.5
7 7
6.5 6.5
6 6
5.5 5.5
5 5
4.5 4.5
4 4
3.5 3.5
4 5 6 7 8 4 5 6 7 8

8 8
7.5 7.5
7 7
6.5 6.5
6 6
5.5 5.5
5 5
4.5 4.5
4 4
3.5 3.5
4 5 6 7 8 4 5 6 7 8

Fig. 12. The cluster centers obtained by using (a) DNS, (b) SCGA, (c) DFS, (d) DNNM-clustering. In all the experiments P ¼ 100.

Data 1: This data set consists of 300 two dimensional data 1


points distributed over six disjoint clusters where each cluster
contains 50 data points. This data set is shown in Fig. 4. The CCA 0.8
result is shown in the first row of Table 3. g ¼ 10 will be a good
0.6
estimate. The final number of clusters and the standard errors
obtained by DNS, SCGA, DFS and DNNM-clustering are given in 0.4
Figs. 5(a), (b), (c) and (d), respectively. From Fig. 5, we can see all
the algorithms are able to find out the number of optimal of the 0.2
fitness function (i.e., the cluster centers) when the radius is
0
properly selected. But for a small niching radius, s ¼ 0:5, only
DNNM-clustering algorithm works properly. The cluster centers −0.2
obtained by all algorithms are shown in Fig. 6.
Data 2: This data set consists of 250 two dimensional data −0.4
points distributed over five spherically shaped clusters as shown
in Fig. 7. The clusters present here are highly overlapping, each −0.6
consisting of 50 data points. The CCA result is shown in the second −0.8
row of Table 3. g ¼ 5 will be a good estimate. As is evident, all
algorithms succeed in providing the number of clusters as well as −1
the cluster centers with s ¼ 2 while DNS, SCGA and DFS fail in −1 −0.5 0 0.5 1
doing so with a small radius. The cluster centers obtained by all
Fig. 13. Data 5.
algorithms are shown in Fig. 8.
Data 3: This data set is consists of 16 clusters as shown in
Fig. 9. The CCA result is shown in the third row of Table 3. g ¼ 10 by all algorithms are shown in Fig. 12. For this data set, the
will be a good estimate. As earlier, DNS, SCGA and DFS succeed volumes of the clusters are different, so the peaks of the fitness
with a proper radius and fail with a small radius. Only DNNM- function are not identical. In fact, the three peaks are 106.3, 39.3
clustering algorithm is insensitive to the choice of the initial and 34.9, respectively. For DNS and DFS, there are some false
radius. The cluster centers obtained by all algorithms are shown optimal values around the local optimal with the larger cluster
in Fig. 10. with a smaller radius and succeed with a larger radius. For SCGA,
Data 4: This is a two dimensional data set with different the two small peaks are discarded during the global optima
volume as shown in Fig. 11. There is one large cluster and two identification phase due to their small values compared with the
small clusters. The CCA result is shown in the fourth row of largest one. Moreover, this result is also show that the DNNM-
Table 3. g ¼ 5 will be a good estimate. The cluster centers obtained clustering algorithm is robust to cluster volumes.
ARTICLE IN PRESS
D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360 1357

1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

Fig. 14. The cluster centers obtained by using (a) DNS, (b) SCGA, (c) DFS, (d) DNNM-clustering. In all the experiments P ¼ 100.

Table 4
The mean of the number of clusters obtained by DNS, SCGA, DFS and DNNM-
clustering applied to the two real-world data sets, here AC denotes the actual
number of clusters present in the data set.

Data set AC DNS SCGA DFS DNNM-clustering

Iris 3 9.65 1.85 9.50 2


Breast cancer 2 11.80 5.65 11.95 2

Data 5: This data set consists of 500 two dimensional data


points distributed over five clusters. A very noisy background
consisting 100 data points uniformly distributed within the
region defined by ½-1; 1  ½-1; 1 is added to this data set. This
data set is shown in Fig. 13. The CCA result is shown in the fifth
row of Table 3. g ¼ 5 will be a good estimate. The cluster centers
obtained by all algorithms are shown in Fig. 14. From Fig. 14, it is
seen that only the DNNM-clustering succeed in providing the
number of clusters as well as the cluster centers while DNS, SCGA
and DFS fail miserably in doing so. For SCGA, the false optima and
the optima with smallest fitness were discard through the
principle of identify global optima used by SCGA. But for DNS
and DFS, it is difficult to discard these values. The false optima
Fig. 15. The pseudocolor image of a part of MiYun obtained from Landsat-7
obtained by DNS and DFS are dependent of the initialization of the multispectral scanner composite by displaying band 5 as red, band 4 as green, and
algorithm. band 3 as blue. (For interpretation of the references to color in this figure legend,
In the following, two real-world high-dimensional data sets the reader is referred to the web version of this article.)
(Iris and Breast Cancer) from UCI Machine Learning Repository
[54] are used to test whether the DNNM-clustering algorithm
works well for the high-dimensional real data sets. Table 4 shows determine the number of clusters in a data set. For Iris data, all
cluster numbers found on the two real data sets. It can be seen algorithms cannot provide the correct number of clusters, but
from Table 4 that the DNNM-clustering algorithm can be used to only the DNNM-clustering algorithm can find two clusters. The
ARTICLE IN PRESS
1358 D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360

DNNM-clustering algorithm has separated the first class-Setosa- of multispectral remote sensing image based on the spectral data
from the others. It is known that two classes (Versicolor and of pixels. Although the remote sensing images usually have a large
virginica) have a large amount of overlap, while the class Setosa is number of overlapping clusters, the experimental results show
linearly separable from the other two. In fact, some researchers that the multispectral image can be effectively grouped into
think that the Iris data set can be classed into tow classes [55,56]. several clusters by the proposed method.
For breast data, only DNNM-clustering algorithm can provide the In this experiment, the algorithms are used to partition
correct number of clusters. And the classification error of DNNM- different landcover regions in the remote sensing image. A 512 
clustering algorithm is 3.53 percent. DNA, DFS and SCGA are also 512 remote sensing image of a part of MiYun obtained from
misled in the clustering because of the aforementioned problem Landsat-7 have been chosen. The image considered has three
in the selection of the niche radius. bands in the multispectral mode: band 3-red band, wavelength
From the experiment results of these data sets, it is seen that 0:63  0:69 mm; band 4-near-infrared band, wavelength
the DNNM-clustering algorithm is robust to the initializations. 0:76  0:94 mm; band 5-shortwave infrared band, wavelength
Since all the experiments have been repeated R ¼ 30 times with 1:55  1:75 mm. The pseudocolor images are shown in Fig. 15.
different initializations and in all the cases the correct estimation From the pseudocolor images, it can be seen that the landcovers
of the cluster centers is derived. of the images mainly contain five classes: water, vegetation (Veg),
mountain (Moun), residential areas (RA) and blank regions (BR). In
the experiment, we expect that the four algorithms can partition the
6.2. Experiment on remote sensing image clustering remote sensing images into visually distinct clusters automatically.
The number of population is set to 600 and the maximum
Remote sensing image analysis is attracting a growing interest generation 200. The crossover and mutation probabilities are the
in real-world applications. The design of robust and efficient same as those used in the first experiment.
clustering algorithms becomes one of the most important issues The clustering results for the image are shown in Fig. 16 with
addressed by the remote sensing community. In this section, we gray scale. The number of clusters identified by DNS, SCGA, DFS and
will apply DNS, SCGA, DFS and DNNM-clustering to the clustering DNNM-clustering are 3, 4, 3 and 6, respectively. As seen from Fig. 16,

Water Moun+Veg RA+BR Water Moun+Veg Veg RA+BR

Water Moun+Veg RA+BR Water Moun Veg Veg+BR RA BR

Fig. 16. The clustering results of the remote sensing image using: (a) DNS; (b) SCGA; (c) DFS; (d) DNNM-clustering.
ARTICLE IN PRESS
D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360 1359

25
36 DFS
DFS
SCGA
SCGA
DNS
31 DNS 20
Average Number of Niches

DNNM−clustering
DNNM−clustering

26

15
21

16
10
11

6 5

1
0 2 4 6 8 10 12 14 16 18 20 22 24
0
Niche Radius 0.1 0.4 0.7 1 1.3 1.6 1.9 2.2 2.4
Fig. 17. The number of clusters obtained by using DNS, SCGA, DFS and DNNM- Fig. 19. The number of clusters obtained by using DNS, SCGA, DFS and DNNM-
clustering algorithms for Data 1. clustering algorithms for Data 5.

24 DFS
SCGA are averaged over 30 runs for each value of s. The results obtained
21 DNS for Data 1, Data 4 and Data 5 are shown in Figs. 17, 18 and 19,
DNNM−clustering respectively. In the experiments, the maximum value of the niche
Average Number of Niches

18 radius is set to the largest distance between the data points.


The figures show that, as expected, as the niche radius is
15 increased, the numbers of niches found by DNNM-clustering
remain same to the numbers of the clusters. The DNNM-
12 clustering algorithm can consistently find all global optima of
the data sets while other three algorithms success only for some
9 values of radius.

6
7. Conclusion
3
In this paper, a robust clustering algorithm based on dynamic
0
niching with niche migration (DNNM-clustering) has been
0 1 2 3 4 5
developed for solving clustering problems with unknown cluster
Niche Radius
number. The DNNM-clustering algorithm can find the optimal
Fig. 18. The number of clusters obtained by using DNS, SCGA, DFS and DNNM- number of clusters as well as the cluster centers automatically. As
clustering algorithms for Data 4. the number of clusters is not known a priori in most practical
circumstance, DNNM-clustering algorithm can be used more
widely. In the DNNM-clustering algorithm, each chromosome is
the water and the rivers in the residential areas are distinctly encoded a center of a cluster by a sequence of real-valued
demarcated from the rest by all the four algorithms. For DNS, SCGA numbers. This is more natural and simple than the presentation
and DFS, there are some confusion between the residential areas and used by other clustering algorithms based on GA. The dynamic
blank regions and between the mountain and the vegetation. For the niching is accomplished without assuming any prior knowledge
DNNM-clustering algorithm, most of the landcover categories have on the number of niches and the niche radius. The introduction of
been correctly distinguished. For example, the vegetation on the top the niche migration makes the DNNM-clustering algorithm is
left of the image, the residential areas and many other structures are insensitive to the choice of the initial radius. The superiority of the
identified by the DNNM-clustering algorithm. So we can conclude DNNM-clustering algorithm over DNS, SCGA and DFS algorithm
that DNNM-clustering algorithm is an efficient clustering algorithm has demonstrated by the experiments. All the experiment results
for differentiating the various landover types present in the image. described in this paper have shown that our algorithm is effective,
because it provides all the actual cluster centers. Moreover, the
DNNM-clustering has been applied to the multispectral remote
6.3. Effect of niche radius sensing image for clustering the pixels into several classes, which
also illustrated its effectiveness and superiority.
As mentioned earlier, the performance of the DNNM-clustering Although the results presented here are extremely encoura-
algorithm is independent of the initial niche radius. To examine ging, there is an issue that deserves in-depth study in the future.
this claim, we conduct a series of experiments, in which we vary The population size is undoubtedly crucial to the performance of
the value of niche radius s and count the number of niches found. the algorithm. In order to steadily maintain the actual number of
For these runs we use pc ¼ 0:8, pm ¼ 0:005, set the population size cluster, we should estimate the minimum population size needed
P ¼ 100, and set the number of generations G ¼ 200. The results by our method.
ARTICLE IN PRESS
1360 D.-X. Chang et al. / Pattern Recognition 43 (2010) 1346–1360

References [30] M. Laszlo, S. Mukherjee, A genetic algorithm that exchanges neighboring


centers for K-means clustering, Pattern Recognition Lett. 28 (2007) 2359–2366.
[31] D.X. Chang, X.D. Zhang, C.W. Zheng, A genetic algorithm with gene
[1] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Englewood rearrangement for K-means clustering, Pattern Recognition 42 (7) (2009)
Cliffs, NJ, 1988. 1210–1222.
[2] J.T. Tou, R.C. Gonzalez, Pattern Recognition Principles, Addison-Wesley, [32] R. Srikanth, R. George, N. Warsi, et al., A variable-length genetic algorithm for
Reading, MA, 1974. clustering and classification, Pattern Recognition Lett. 16 (1995) 789–800.
[3] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis, Wiley, New [33] P. Scheunders, A genetic c-means clustering algorithm applied to color image
York, 1973.
quantization, Pattern Recognition 30 (6) (1997) 859–866.
[4] R. Xu, D. Wunsch, Survey of clustering algorithm, IEEE Trans. Neural
[34] L.Y. Tseng, S.B. Yang, A genetic approach to the automatic clustering problem,
Networks 16 (3) (2005) 645–678.
Pattern Recognition 34 (2) (2001) 415–424.
[5] L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction to
[35] S. Bandyopadhyay, U. Maulik, Genetic clustering for automatic evolution of
Cluster Analysis, Wiley, New York, 1990.
clusters and application to image classification, Pattern Recognition 35 (6)
[6] P.H.A. Sneath, The application of computers to taxonomy, J. Gen. Microbiol.
(2002) 1197–1208.
17 (1957) 201–226.
[36] S. Bandyopadhyay, S. Saha, A point symmetry-based clustering technique for
[7] S. Hands, B.S. Everitt, A Monte Carlo study of the recovery of cluster structure
automatic evolution of clusters, IEEE Trans. Knowl. Data Eng. 20 (11) (2008)
in binary data by hierarchical clustering techniques, Multivar. Behav. Res. 22
1441–1457.
(1987) 235–243.
[37] E. Mayr, Systematics and the Origin of Species from the Viewpoint of a
[8] J.H. Ward, Hierarchical groupings to optimize an objective function, J. Am.
Zoologist, Columbia University Press, New York, 1942.
Stat. Assoc. 58 (1963) 236–244.
[38] D.E. Goldberg, J. Richardson, Genetic algorithms with sharing for multimodal
[9] R.W. Payne, D.A. Preece, Identification keys and diagnostic tables: a review, J.
function optimization, in: J.J. Grefenstette (Ed.), Genetic Algorithms and Their
R. Stat. Soc. A 143 (1980) 253–292.
Applications, Lawrence Erlbaum, Hillsdale, NJ, 1987, pp. 41–49.
[10] Z. Hubálek, Coefficients of association and similarity based on binary
[39] K. Deb, D.E. Goldberg, An investigation of niche and species-formation in
(presence–absence) data: an evaluation, Biol. Rev. 57 (1982) 669–689.
genetic function optimization, in: J.D. Schaffer (Ed.), Proceedings of the 3rd
[11] G.W. Milligan, M.C. Cooper, An examination of procedures for determining
International Conference on Genetic Algorithms, San Mateo, CA, 1989,
the number of clusters in a data set, Psychometrika 50 (1985) 159–179.
pp. 42–50.
[12] X.L. Xie, G. Beni, A validity measure for fuzzy clustering, IEEE Trans. Pattern
Anal. Mach. Intell. 13 (8) (1991) 841–847. [40] R.E. Smith, S. Forrest, A.S. Perelson, Searching for diverse, cooperative
[13] N.R. Pal, J.C. Bezdek, On cluster validity for fuzzy c-means model, IEEE Trans. populations with genetic algorithms, Evol. Comput. 1 (2) (1992) 127–149.
Fuzzy Syst. 3 (3) (1995) 370–379. [41] S. Forrest, R.E. Smith, B. Javornik, A.S. Perelson, Using genetic algorithms to
[14] R. Krishnapuram, C.P. Freg, Fitting an unknown number of lines and planes to explore pattern recognition in the immune system, Evol. Comput. 1 (1993)
image data through compatible cluster merging, Pattern Recognition 25 (4) 191–211.
(1992) 385–400. [42] P. Darwen, X. Yao, Every niche method has its niche: fitness sharing and
[15] R. Krishnapuram, H. Frigui, O. Nasraoui, Fuzzy and possibilistic shell implicit sharing compared, in: H.-M. Voigt, W. Ebeling, I. Rechenberg, H.-P.
clustering algorithms and their application to boundary detection and Schwefel (Eds.), Parallel Problem Solving from Nature-PPSN IV, Lecture Notes in
surface approximation, IEEE Trans. Fuzzy Syst. 3 (1) (1995) 29–60. Computer Science, vol. 1141, Springer, Berlin, Germany, 1996, pp. 398–407.
[16] X. Zhuang, Y. Huang, K. Palaniappan, Y. Zhao, Gaussian mixture density [43] C. Goodall, M-estimator of location: an outline of the theory, in: D.C. Hoaglin,
modeling, decomposition and applications, IEEE Trans. Image Process. 5 (9) F. Mosteller, J.W. Tukey (Eds.), Understanding Robust and Exploratory Data
(1996) 1293–1302. Analysis, New York, 1983, pp. 339–403.
[17] J.M. Jolion, P. Meer, S. Bataouche, Robust clustering with applications in [44] F.R. Hampel, E.M. Ponchotti, P.J. Rousseeuw, W.A. Stahel, Robust Statistics:
computer vision, IEEE Trans. Pattern Anal. Mach. Intell. 13 (8) (1991) 791–802. The Approach based on Influence Functions, Wiley, New York, 1986.
[18] J.H. Holland, Adaptation in Natural and Artificial Systems, University of [45] R.A. Maronna, R.D. Martin, V.J. Yohai, Robust Statistics: Theory and Methods,
Michigan Press, 1975. Wiley, New York, 2006.
[19] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine [46] B.L. Miller, M.J. Shaw, Genetic algorithms with dynamic niche sharing for
Learning, Addison-Wesley, Reading, MA, 1989. multimodal function optimization, in: Proceedings of the 1996 IEEE
[20] Z. Michalewicz, Genetic Algorithms þ Data Structures ¼ Evolution Programs, Transactions on Evolutionary Computation, 1996, pp. 786–791.
AI Series, Springer, New York, 1994. [47] C.D. Antonio, S.D. Claudio, M. Angelo, Where are the niches? Dynamic fitness
[21] K.A. De Jong, An analysis of the behavior of a class of genetic adaptive systems, sharing, IEEE Trans. Evol. Comput. 11 (4) (2007) 453–465.
Doctoral Dissertation, University of Michigan, Ann Arbor, Michigan, 1975. [48] J.P. Li, M.E. Balazs, G.T. Parks, P.J. Clarkson, A species conserving genetic
[22] C.A. Murthy, N. Chowdhury, In search of optimal clusters using genetic algorithm for multimodal function optimization, Evol. Comput. 10 (3) (2002)
algorithms, Pattern Recognition Lett. 17 (1996) 825–832. 207–234.
[23] U. Maulik, S. Bandyopadhyay, Genetic algorithm based clustering technique, [49] M.S. Yang, K.L. Wu, A similarity-based robust clustering method, IEEE Trans.
Pattern Recognition 33 (9) (2000) 1455–1465. Pattern Anal. Mach. Intell. 26 (4) (2004) 434–448.
[24] A. Tucker, J. Crampton, S. Swift, RGFGA: an efficient representation [50] D. Beasley, D.R. Bull, R.R. Martin, A sequential niche technique for multimodal
and crossover for grouping genetic algorithms, Evol. Comput. 13 (4) (2005) function optimization, Evol. Comput. 1 (2) (1993) 101–125.
477–499. [51] S.W. Mahfoud, Genetic drift in sharing methods, in: Proceedings of the 1st
[25] S. Bandyopdhyay, U. Maulik, An evolutionary technique based on K-means IEEE Conference on Evolutionary Computation, 1994, pp. 67–72.
algorithm for optimal clustering in RN, Inf. Sci. 146 (2002) 221–237. [52] S.W. Mahfoud, Population size and genetic drift in fitness sharing, in: L.D.
[26] L.O. Hall, I.B. Özyurt, J.C. Bezdek, Clustering with a genetically optimized Whitley, M.D. Vose (Eds.), Proceedings of the Foundations Genetic Algo-
approach, IEEE Trans. Evol. Comput. 3 (2) (1999) 103–112. rithms, 1995, pp. 185–223.
[27] K. Krishna, M.N. Murty, Genetic K-means algorithm, IEEE Trans. Syst. Man [53] W.M. Spears, Simple subpopulation schemes, in: A.V. Sebald, L.J. Fogel (Eds.),
Cybern. Part B Cybern. 29 (1999) 433–439. Proceedings of the 4th Annual Conference on Evolutionary Programming,
[28] S. Bandyopdhyay, S. Saha, GAPS: a clustering method using a new point 1994, pp. 296–307.
symmetry-based distance measure, Pattern Recognition 40 (12) (2007) [54] /http://www.ics.uci.edu/mlearn/MLRepository.htmlS.
3430–3451. [55] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms,
[29] M. Laszlo, S. Mukherjee, A genetic algorithm using hyper-quadtrees for Plenum Press, New York, 1981.
lowdimensional K-means clustering, IEEE Trans. Pattern Anal. Mach. Intell. 28 [56] R. Kothari, D. Pitts, On finding the number of clusters, Pattern Recognition
(4) (2006) 533–543. Lett. 20 (4) (1999) 405–416.

About the Author—DONGXIA CHANG received the B.S. and M.S. degrees in mathematics from Xi Xidian University, in 2000 and 2003, respectively. She is currently
pursuing the Ph.D. degree at the Department of Automation, Tsinghua University. Her current research interests include evolutionary computation, clustering and
intelligent signal processing.

About the Author—XIANDA ZHANG received the B.S. degree in radar engineering from Xidian University, in 1969, the M.S. degree in instrument engineering from Harbin
Institute of Technology in 1982, and the Ph.D. degree in electrical engineering from Tohoku University, Sendai, Japan, in 1987. Since 1992, he has been with the Department
of Automation, Tsinghua University. His current research interests are signal processing with applications in radar and communications and intelligent signal processing.

About the Author—CHANGWEN ZHENG received the B.S. degree in mathematics and Ph.D. degree in control science and engineering from Huazhong Normal University in
1992 and 2003, respectively. He was with the General Software Laboratory, Institute of Software, Chinese Academy of Sciences since 2003. His current research interests
include route planning, evolutionary computation and neural networks.

About the Author—DAOMING ZHANG received the B.S. and M.S. degrees in physics from National University of Defense Technology, in 2000 and 2003, respectively. He is
currently pursuing the Ph.D. degree at the Department of Automation, Tsinghua University. His current research interests include image fusion and intelligent signal
processing.

You might also like