0% found this document useful (0 votes)

2 views

2015 Elsevier Pattern Matching Based Classification Using Ant Colony Optimization Based Feature Selection

This paper presents a novel instance-based classification algorithm called Pattern Matching based Classification (PMC), which classifies unlabelled samples by matching their features with those in a training dataset. To enhance PMC's classification accuracy, an Ant Colony Optimization-based feature selection method is proposed, and the algorithm is evaluated on 35 datasets, demonstrating its competitive performance compared to existing instance-based classifiers. The results indicate that PMC offers a simpler classification procedure with lower computational complexity while achieving high predictive accuracy.

Uploaded by

chandreshgovind

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

2015 Elsevier Pattern Matching Based Classification Using Ant Colony Optimization Based Feature Selection

Uploaded by

chandreshgovind

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Applied Soft Computing 31 (2015) 91–102

Contents lists available at ScienceDirect

Applied Soft Computing

journal homepage: www.elsevier.com/locate/asoc

Pattern Matching based Classiﬁcation using Ant Colony Optimization

based Feature Selection
N.K. Sreeja a,∗ , A. Sankar b
a
Department of Computer Applications, Sri Krishna College of Technology, Coimbatore 641042, India
b
Department of Computer Applications, PSG College of Technology, Coimbatore 641004, India

a r t i c l e i n f o a b s t r a c t

Article history: Classification is a method of accurately predicting the target class for an unlabelled sample by learning
Received 15 October 2012 from instances described by a set of attributes and a class label. Instance based classifiers are attractive
Received in revised form 18 January 2015 due to their simplicity and performance. However, many of these are susceptible to noise and become
Accepted 25 February 2015
unsuitable for real world problems. This paper proposes a novel instance based classification algorithm
Available online 5 March 2015
called Pattern Matching based Classification (PMC). The underlying principle of PMC is that it classifies
unlabelled samples by matching for patterns in the training dataset. The advantage of PMC in comparison
Keywords:
with other instance based methods is its simple classification procedure together with high performance.
Classification
Pattern matching
To improve the classification accuracy of PMC, an Ant Colony Optimization based Feature Selection algo-
Feature selection rithm based on the idea of PMC has been proposed. The classifier is evaluated on 35 datasets. Experimental
Ant Colony Optimization results demonstrate that PMC is competent with many instance based classifiers. The results are also val-
idated using nonparametric statistical tests. Also, the evaluation time of PMC is less when compared to
the gravitation based methods used for classification.
© 2015 Elsevier B.V. All rights reserved.

1. Introduction the parameters are set properly. Lin et al. [23] adopted a simulated
annealing approach and a particle swarm optimization approach
Machine learning is the development of algorithms that allow for parameter setting and feature selection for SVM. However,
computers to learn based on empirical data. The goal of Machine SVM and BPN classifier does not handle missing values effectively
learning is to build computer systems that adapt and learn from [35].
their experience. Machine learning can be either supervised or Linear Discriminant Analysis (LDA) is a commonly used classi-
unsupervised. An example of supervised learning is classification. fication method. It can provide important weight information for
It is defined as the task of learning from instances described by a constructing a classification model. LDA often suffers from the small
set of features (attributes) and a class label. The result of learning sample size problem when the number of dimensions of the data
is a classification model that is capable of accurately predicting the is much greater than the number of data points. Lin et al. [34] have
class label of unlabelled samples. proposed a particle swarm optimization (PSO) method to enhance
Several algorithms such as artificial neural networks [28], deci- the classification accuracy of LDA. However this method is sensitive
sion tree, support vector machines (SVMs) [42], instance based to parameter settings.
learning methods [1] and nature-inspired techniques such as Although many classification algorithms exist in literature,
genetic programming [10] have been proposed in literature for instance based methods are attractive due to its simplicity. The
classification. Among these, decision tree, back-propagation net- Nearest Neighbor (NN) algorithm [6] is an instance based method
work (BPN) and SVM classifiers are popular, and can be applied to which employs a simple classification procedure. Neighborhood
various areas [7,21]. However, choosing the best kernel function is based methods are attractive primarily due to their simplicity
necessary for SVM. The usually preferred kernel function is Radial and good performance. However, the major problem with these
Basis Functions (RBF). RBF gives optimal performance only when methods is that they severely deteriorate with noisy data or high
dimensionality, their performance becomes very slow, and their
accuracy tends to deteriorate as the dimensionality increases, espe-
∗ Corresponding author. Tel.: +91 9659425507. cially when classes are not separable or they overlap [22].
E-mail addresses: sreeja.n.krishnan@gmail.com (N.K. Sreeja), In recent years, new instance based methods have been pro-
dras@mca.psgtech.ac.in (A. Sankar). posed to overcome the drawbacks existing in NN classifiers. One

http://dx.doi.org/10.1016/j.asoc.2015.02.036
1568-4946/© 2015 Elsevier B.V. All rights reserved.
92 N.K. Sreeja, A. Sankar / Applied Soft Computing 31 (2015) 91–102

such approach is Data Gravitation based Classification (DGC) an unlabelled sample more heavily than the evidence of another
proposed by Peng et al. [32,40,44]. The basic principle of DGC algo- neighbor which is at a greater distance from the unlabelled sample.
rithm is to classify data samples by comparing the data gravitation It assigns the unlabelled sample to the class for which the weights
between the different data classes [32]. The algorithm creates data of the representatives among the K NNs sum to the greatest value.
particles using the maximum distance principle. However, the Gao and Wang proposed the center-based NN classifier (CNN)
drawback of this method is that it reduces the accuracy, especially [13]. CNN classifies the unknown sample by computing the distance
in the area away from the data particle centroid and along the between the training instances and centers of their class to find how
border between classes [2]. Cano et al. [2] have proposed another far the training instances are from the unlabelled sample. However,
instance based method called Weighted Data Gravitation based the performance of the algorithm deteriorates when the center of
Classification (DGC+) and is proved to achieve greater classification the data classes is overlapped. Wang et al. proposed an adaptive k
accuracy than DGC approach [32]. However, the computational NN algorithm (KNN-A) [41] which weights the distance from the
complexity of DGC+ is considerably higher. To overcome the training instance to its nearest instance belonging to a different
drawbacks existing in NN classifiers and gravitation based models, class. Thus the instances near to the decision boundaries become
a simple instance based algorithm based on pattern matching is more relevant.
proposed. Paredes and Vidal [29,30] proposed a class-dependent weighted
In this paper, a novel instance based algorithm called Pattern dissimilarity measure in vector spaces to improve the performance
Matching based Classification (PMC) is proposed to classify unla- of the NN classifier. It computes a dissimilarity measure such that
belled samples based on the similarity between the feature values the distances between points belonging to the same class are small
of the instances in the dataset and the unlabelled sample. PMC while interclass distances are large. The accuracy of NN classifiers
classifies the unlabelled samples by matching the features of the can further be improved by Prototype selection [15] and prototype
unlabelled sample with that of the features of the instances in the generation [37]. Prototype methods aim to select a relatively small
dataset. The instances in the dataset having the maximum number number of samples from a dataset which if well chosen can serve
of matching features are grouped. PMC votes for the majority class as a summary of the original dataset. Paredes and Vidal extended
label in the group to classify the unlabelled sample. A probabilistic their NN model together with a prototype reduction algorithm [37],
approach is used to predict the target class of the unlabelled sample which simultaneously trains both a reduced set of prototypes and
when more than one class label have the same majority. To improve a suitable local metric for them.
the classification accuracy of PMC algorithm, an Ant Colony Opti- Zhou and Chen [43] proposed a Cam weighted distance (CamNN)
mization based Feature Selection approach based on the idea of to improve the performance of classification by optimizing the
PMC is used. distance measure based on the analysis of inter-prototype rela-
Experiments have been carried out on 35 data sets collected tionships. Triguero et al. [38] applied differential evolution to the
from the KEEL [3] and UCI [12] repositories. The experiments prototype selection problem as a position adjusting of prototypes.
have been carried out for different problem domains, number of SSMA–SFLSDE [38] combines a prototype selection stage with an
instances, attributes, and classes. It is shown that PMC is competent optimization of the position of prototypes, prior to the NN clas-
with the recent instance based algorithms obtaining significantly sification. This enhances the performance of SSMA [16] + SFLSDE
better results in terms of predictive accuracy and Cohen’s kappa [27].
rate [4,5]. The result of statistical analysis such as Iman and Cano et al. [2] have shown that although the NN classifiers have
Davenport test [24] and Bonferroni–Dunn tests [9,14,33] show achieved fame by their simplicity, they are very sensitive to irrel-
that there are significant differences in the results of the algo- evant, redundant, or noisy features because all features contribute
rithms. Also, the computational complexity of PMC algorithm is less to classification. Also, it is stated that the problem worsens for
when compared to the gravitation based approaches such as DGC+ high dimensional datasets. To overcome these drawbacks, instance
and DGC. based methods such as Data Gravitation based approaches are
The paper is organized as follows. Section 2 presents the related proposed.
work. Section 3 describes the proposed PMC algorithm. Section 4 Peng et al. [32] proposed Data Gravitation based Classification
describes the feature selection for PMC. Section 5 describes a case (DGC) method to classify datasets. According to this model, a kind
study. The experimental study is described in Section 6. Section 7 of “force” called data gravitation between two data samples are
describes the results of the experiments. The time performances computed. Data from the same class are combined as a result of
of PMC, DGC+ and DGC are compared in Section 8. Section 9 gravitation. The algorithm also employs a tentative random fea-
presents the discussion and Section 10 presents some concluding ture selection to calculate the weights of features by simulating the
remarks. mutation operation in a genetic algorithm. DGC also achieved rea-
sonably high accuracies, but fails to classify imbalanced datasets.
2. Related work As an improvement of DGC, Cano et al. [2] proposed an algorithm
called Weighted DGC (DGC+) that compares the gravitational field
This section presents an overview of the various instance based for the different data classes to predict the class with the highest
methods and the gravitation based methods developed recently for magnitude. The proposal improves previous data gravitation algo-
classification. The simplest amongst all nearest neighbor classifier rithms by learning the optimal weights of the attributes for each
is K-Nearest Neighbor (KNN). They perform a class voting among class and solves some of their issues such as nominal attributes
the k closest training instances. The drawback of standard KNN clas- handling, imbalanced data performance, and noisy data filter-
sifier is that it does not output meaningful probabilities associated ing. However, the computational complexity of DGC+ algorithm
with class prediction [26]. Therefore, higher values of k are consid- is high.
ered for classification which provides smoothing that reduces the
risk of over-fitting due to noise in the training data [26]. However,
choosing higher value of k, leads to misclassification of unlabelled 3. Pattern Matching based Classification
samples having an exact pattern as that of an instance in the
dataset. PMC algorithm is used to classify an unlabelled sample based
Dudani proposed the distance-weighted NN rule (DW-KNN) [8] on the instances in the training dataset. Consider a dataset with
for classification. It weights the evidence of a neighbor close to p instances and n features. It can be represented as a data matrix
N.K. Sreeja, A. Sankar / Applied Soft Computing 31 (2015) 91–102 93

given in Eq. (1). Each instance is denoted as Xk where k = 1, 2, . . ., p. The maximum attribute match count is found as shown in the
Each of these instance belongs to a class Cli where i = 1, 2, . . ., Q following equation.

⎡ ⎤ S = max(AttributeMatchCount(Xk )) (4)
X11 X12 ··· X1n
⎢X ⎥ The instances in the dataset having the maximum attribute
⎢ 21 X22 · · · X2n ⎥
⎢ ⎥ (1) match count are grouped in to G as shown in the following equation.
⎢ . .. .. ⎥
⎣ .. . . ⎦ Xk ∈ G if AttributeMatchCount(Xk ) = S
XP1 XP2 ··· XPn (5)
Xk ∈
/G if AttributeMatchCount(Xk ) =
/ S

The unlabelled sample whose class label is to be predicted is The count of each class label in G is found as given in the follow-
given by (x = x1 , x2 , x3 , . . ., xn ) where x1 , x2 , x3 , . . ., xn are the feature ing equation.
values. To classify an unlabelled sample using PMC algorithm, the
count of each class Cli in the dataset is found. This denotes the class Counti = Count(Cli ∈ G)∀Xk ∈ G where i = 1, 2, . . ., Q (6)
label count of the class Cli as given in the following equation.
The class label of the unlabelled sample is predicted as the
majority class label that occurs in the group G as given in the fol-
CLCi = ClassLabelCount(Cli ) where i = 1, 2, . . ., Q (2)
lowing equation.

The feature values of the unlabelled sample are compared with R = max(Counti )
the corresponding selected feature values of each instance in the (7)
dataset. The number of features of the unlabelled sample that x ∈ Cli where Counti = R
matches with that of the instance in the dataset denotes the
If there are more than one majority class label in the group G,
attribute match count. m the probability of occurrence of each majority class label is found
AttributeMatchCount(Xk ) = C where m is the number of
i=1 j by dividing the majority class label count in the group G by its cor-
selected features and k = 1, 2, . . ., p.
responding class label count in the training dataset as shown in the
following equation.
1 if xj = Xkj
Cj = , where j = 1, 2, . . ., m (3) Counti
0 if xj =
/ Xkj P(Cli ) = (8)
CLCi

Fig. 1. Pseudo-Code of PMC algorithm.

94 N.K. Sreeja, A. Sankar / Applied Soft Computing 31 (2015) 91–102

The class label of the unlabelled sample is predicted as the class the pheromone deposition should not be repetitive. To evaluate
label with the highest probability as shown in the following equa- the selected features in the pheromone deposition, the ant agent
tion. computes the energy value by finding the average classification
accuracy of PMC algorithm using leave-one-out cross-validation
x ∈ Cli having max(P(Cli )) (9) [31] for the selected features. The average classification accuracy
The pseudo-code of PMC algorithm is shown in Fig. 1. denotes the energy value of the ant agent. The features in the
pheromone deposition of the ant agent and the energy value are
stored in the solution list. This denotes the first trail of the ant agent.
4. Feature selection for PMC
The ant agent moves to the next trail by pheromone evapora-
In many classification problems, irrelevant or redundant fea- tion. To update the pheromone, the ant agent chooses two random
tures decrease not only the classification efficiency, but also the numbers in the positive and negative position. Based on the ran-
accuracy [32]. Therefore choosing pivotal features aid in achiev- dom number present in the positive position, the ant agent chooses
ing better classification accuracy. This section gives an overview of a group of positions at random from the storage list. The features at
an Ant Colony Optimization based Feature Selection algorithm for these positions in the storage list are grouped. Based on the random
PMC (ACOFSPMC) designed to improve the performance of PMC. number present in the negative position, the ant agent chooses a
ACOFSPMC outputs an optimal subset of features that are relevant group of positions at random from the solution list. The features
for the classification task. at the positions in the solution list which are not chosen by the
ant agent are grouped. The features grouped from the storage list
and the solution list denotes the pheromone deposition of the ant
4.1. Ant Colony Optimization
agent. To evaluate the selected features, the energy value of the ant
A colony of ants denoting a set of computational concurrent and agent is computed by finding the average classification accuracy of
asynchronous agents moves through states of the problem corre- PMC for the selected features denoted by the pheromone deposition
sponding to partial solutions of the problem to solve [25]. They using leave-one-out cross-validation. If the energy value of the ant
move by applying a stochastic local decision policy based on two agent in the current trail is greater than or equal to the energy value
parameters, called trails and attractiveness. By moving, each ant in the solution list of the ant agent, the solution list is updated with
incrementally constructs a solution to the problem. When an ant the features in the pheromone deposition and the energy value.
completes a solution, or during the construction phase of the solu- The features added to the pheromone deposition from the storage
tion, the ant evaluates the solution and modifies the trail value on list are deleted from the storage list. The features in the positions
the components used in its solution. This pheromone information chosen by the ant agent in the solution list are added to the storage
will direct the search of the future ants. Furthermore, an ACO algo- list. If the energy value is less than the energy value in the solu-
rithm includes two more mechanisms such as trail evaporation and, tion list, then the newly chosen features from the storage list and
optionally, daemon actions. Trail evaporation decreases all trail val- the solution list are ignored. The process is repeated for 50 trails if
ues over time, in order to avoid unlimited accumulation of trails the number of features in the dataset is greater than 10 and for 10
over some component. Daemon actions can be used to implement trails otherwise. If the energy value in the global value is less than
centralized actions which cannot be performed by single ants, such the energy value of the ant agent, the features in the solution list
as the invocation of a local optimization procedure, or the update of along with the energy value is updated in the global value. This is
global information to be used to decide whether to bias the search repeated by all ant agents and the feature subset in the global value
process from a non-local perspective [39]. denotes the selected features in the dataset used for classification.
Thus ACOFSPMC algorithm obtains an optimal subset of features
4.2. Ant Colony Optimization based Feature Selection for PMC for the dataset to achieve higher classification accuracy for PMC
algorithm. Fig. 2 shows the pseudocode for ACOFSPMC algorithm.
To find the optimal subset of features for the dataset, an Ant
Colony Optimization based Feature Selection for PMC (ACOFSPMC) 5. Case study
algorithm is proposed. According to this approach, a group of ant
agents are chosen. Each ant agent finds the optimal subset of fea- A case study is discussed for selected instances of the Contact
tures by depositing pheromone. The group of ant agents has a global Lenses dataset [12]. The dataset contains 11 instances, 4 features
value which is initialized to zero. Each ant agent has a tabu list and 2 classes. Table 1 shows the training dataset describing the type
denoting the memory. The tabu list maintains a storage list and a of lens suitable for the patients.
solution list. Initially, all the features of the instance in the dataset
are stored in the storage list. The solution list stores the features in 5.1. Classification of unlabelled samples
the pheromone deposition and the energy value. Initially, the solu-
tion list is empty. The ant agent has a positive and negative position The unlabelled samples for which the class label is to be pre-
to deposit pheromone. The random number chosen for the positive dicted using PMC algorithm is shown in Table 2.
position must be between 0 and n + 1, where n denotes the max- The features ‘Astigmatic’ and ‘Tear Production rate’ are selected
imum number of features in the storage list. The random number for comparison using ACOFSPMC algorithm as explained in Sec-
generated for the negative position must be between 0 and q + 1, tion 5.2. The shaded columns in Table 3 denote the features that
where q denotes the maximum number of features in the solution are selected for comparison. The classification of unlabelled sample
list. The number in the positive position denotes the number of number 1 using PMC algorithm is as follows. According to PMC algo-
features to be chosen from the storage list. The number in the neg- rithm, the values of the features ‘Astigmatic’ and ‘Tear Production
ative position denotes the number of features to be chosen from rate’ of unlabelled sample number ‘1’ are compared with the cor-
the solution list. responding feature values of each instance in the training dataset.
Initially, the random number in the negative position is 0. Table 3 shows the attribute match count for each training instance
Depending on the random number in the positive position, the with the unlabelled sample number 1.
ant agent chooses a group of positions at random from the storage The training instances having the maximum attribute match
list. The features at these positions are added to the pheromone count are grouped as shown in Table 4. As observed from Table 4,
deposition and removed from the storage list. The features in the majority class label having the highest attribute match count is
N.K. Sreeja, A. Sankar / Applied Soft Computing 31 (2015) 91–102 95

Fig. 2. Pseudocode of ACOFSPMC algorithm.

Table 1
Training dataset.

Training instance number Age Spectacle prescription Astigmatic Tear production rate Class

1 1 1 2 2 1
2 1 2 2 2 1
3 2 1 2 2 1
4 2 2 2 2 3
5 3 2 2 2 3
6 1 1 1 1 3
7 1 1 2 1 3
8 1 2 2 1 3
9 3 1 2 2 1
10 2 1 2 1 3
11 2 2 2 1 3

Table 2
Test dataset.

Unlabelled sample number Age Spectacle prescription Astigmatic Tear production rate

1 2 2 2 1
2 2 2 1 1

Table 3
Training instances and their attribute match count.

Training instance number Age Spectacle prescription Astigmatic Tear production rate Class Attribute match count

1 1 1 2 2 1 1
2 1 2 2 2 1 1
3 2 1 2 2 1 1
4 2 2 2 2 3 1
5 3 2 2 2 3 1
6 1 1 1 1 3 1
7 1 1 2 1 3 2
8 1 2 2 1 3 2
9 3 1 2 2 1 1
10 2 1 2 1 3 2
11 2 2 2 1 3 2
96 N.K. Sreeja, A. Sankar / Applied Soft Computing 31 (2015) 91–102

Table 4
Grouped training instances with maximum attribute match count.

Training instance number Age Spectacle prescription Astigmatic Tear production rate Class Attribute match count

7 1 1 2 1 3 2
8 1 2 2 1 3 2
10 2 1 2 1 3 2
11 2 2 2 1 3 2

Table 5
Predicted class for the unlabelled samples.

Unlabelled sample number Age Spectacle prescription Astigmatic Tear production rate Predicted class

1 2 2 2 1 3
2 2 2 1 1 3

class ‘3’. Therefore the class of unknown sample ‘1’ is predicted as number 2, the ant agent randomly chooses two positions {1, 2}
‘3’. Similarly the class label of the new unlabelled sample number from the storage list. Thus in trail-1, the features ‘1’ and ‘2’ at
2 is predicted as ‘3’ as shown in Table 5. position 1 and 2 in the storage list are chosen as the pheromone
deposition and deleted from the storage list. The storage list has
5.2. Feature selection using ACOFSPMC algorithm the features {3, 4}. The energy value of the ant agent I is computed
by estimating the leave-one-out cross-validation classification
This section explains the selection of optimal subset of features accuracy of PMC on the dataset as described in Section 4.2 for
for the training dataset using ACOFSPMC algorithm. Table 6 shows the feature subset {1, 2} denoted in the pheromone deposition of
the ACO based Feature Selection for the training dataset in Table 1. the ant agent I. The leave-one-out cross-validation classification
A group of ant agents are taken where each ant agent deposits accuracy of PMC algorithm for the feature subset {1, 2} is found to
pheromone to find the optimal subset of features for classification. be 27.3%. Thus the energy value of the ant agent I in trail-1 is 27.3%.
The ant agents have a global value which is initially zero. Each The features in the pheromone deposition along with the energy
of the ant agents has a positive and negative position to deposit value are stored in the solution list. Thus the solution list has the
pheromone. Each ant agent has a tabu list which includes the features {1, 2} with the energy value 27.3%. The ant agent I move
storage list and the solution list. Initially, the storage list in each to the next trail and the pheromone deposition evaporates. To
ant agent has a set of all the features of the training set. In this deposit pheromone in trail-2, the ant agent I randomly chooses the
case study, two ant agents are taken to find the optimal subset of numbers 0 and 1 in the positive and negative position respectively.
features. The process of finding the optimal subset of features by Since, the number in the positive position is 0, no features are
ant agent I is explained. The features Age, Spectacle prescription, chosen from the storage list. For the negative position, the ant
Astigmatic and Tear Production rate of the dataset are numbered agent randomly chooses position 1. Thus all the features except the
as 1, 2, 3 and 4 respectively. Initially, the storage list of the ant feature ‘1’ at position 1 stored in the solution list is grouped. Thus
agent has four features {1, 2, 3, 4}. During trail-1, the ant agent I the feature ‘2’ in the solution list is grouped. Since there are no
chooses two random numbers 2 and 0 in the positive and negative features chosen from the storage list, the feature ‘2’ is added to the
position respectively. Since the positive position has a random pheromone deposition. Thus the pheromone deposition of the ant

Table 6
ACO based approach for feature selection.

Trail Positive, Position of features included Position of features excluded Pheromone Classiﬁcation Tabu list Features
negative from the storage list from the solution list deposition accuracy in selected
position percentage Solution list Storage list

Ant Agent-I
Initial set of features {1,2,3,4} {3,4}
Trail-1 2,0 1,2 – 1,2 27.3 1,2 {27.3} {3,4}
Trail-2 0,1 – 1 2 36.4 2 {36.4} {1,3,4}
Trail-3 1,0 2 – 2,3 63.6 2,3 {63.6} {1,4}
Trail-4 1,1 2 2 2,4 72.7 2,4 {72.7} {1,3}
Trail-5 1,1 2 1 3,4 81.8 3,4 {81.8} {1,2}
Trail-6 1,0 1 – 1,3,4 63.6 3,4 {81.8} {1,2}
Trail-7 2,1 1,2 1 1,2,4 72.7 3,4 {81.8} {1,2}
Trail-8 1,1 1 1 1,4 63.6 3,4 {81.8} {1,2}
Trail-9 1,0 2 – 2,3,4 72.7 3,4 {81.8} {1,2}
Trail-10 0,1 – 2 3 54.5 3,4 {81.8} {1,2}

Ant Agent-II
Initial set of features {1,2,3,4} {2,4}
Trail-1 1,0 2 – 2 36.4 2 {36.4} {1,3,4}
Trail-2 1,1 2 1 3 54.5 3 {54.5} {1,2,4}
Trail-3 1,1 1 1 1 27.3 3 {54.5} {1,2,4}
Trail-4 1,0 1 – 1 27.3 3 {54.5} {1,2,4}
Trail-5 2,1 1,2 1 1,2 27.3 3 {54.5} {1,2,4}
Trail-6 3,1 1,2,3 1 1,2,4 72.7 1,2,4 {72.7} {3}
Trail-7 1,2 1 1,3 2,3 63.6 1,2,4 {72.7} {3}
Trail-8 0,1 – 2 1,4 63.6 1,2,4 {72.7} {3}
Trail-9 0,1 – 3 1,2 27.3 1,2,4 {72.7} {3}
Trail-10 0,1 – 1 2,4 72.7 2,4 {72.7} {1,3}
Global value 3,4 {81.8}
N.K. Sreeja, A. Sankar / Applied Soft Computing 31 (2015) 91–102 97

agent I in trail 2 is {2}. The energy value of the ant agent I is found instances varies between 101 for zoo dataset and 12,690 for Nursery
by estimating the leave-one- out cross-validation classification dataset. The number of features range between 2 for banana dataset
accuracy of PMC which is found to be 36.4%. Since the energy value and 60 for sonar dataset. The number of classes range between 2 and
of the ant agent in trail-2 is greater than that of the energy value in 11. Table 7 shows the details of the datasets used for experiments.
trail-1, the features in the pheromone deposition and the energy Since the values of the features of the datasets such as Contracep-
value of the ant agent is stored in the solution list. The feature ‘1’ at tive, Dermatology, Flare, Glass, HayesRoth, Heart, Iris, NewThyroid,
position 1 chosen from the solution list is added to the storage list. Phoneme, Pima, Saheart, Vehicle and Wine fall in a large range,
Thus the features in the storage list are now {1, 3, 4}. The ant agent they were scaled in the range [0,1] using min–max normalization.
moves to the next trail and the pheromone deposition evaporates. Min–max normalization performs a linear transformation on the
To deposit pheromone in trail-3, the ant agent randomly chooses original data. Suppose that minA and maxA are the minimum and
the numbers 1 and 0 in the positive and negative position respec- maximum values of an attribute, A. Min–max normalization maps
tively. For the positive position, the ant agent randomly chooses a value, v of A to v in the range [new minA ; new maxA ] where new
position 2 and the feature ‘3’ at position 2 in the storage list is minA and new maxA denotes the range of values for normalization.
grouped. This feature ‘3’ is added to the pheromone deposition. v is computed by the formula given in the following equation.
Since, the negative position is 0, no feature is chosen form the solu-
v − minA
tion list. Hence the features in the solution list which are not chosen v = (new maxA − newminA ) + new minA (10)
by the ant agent I are grouped and added to the pheromone depo- maxA − minA
sition. Thus the pheromone deposition of the ant agent in trail-3
is {2, 3}. The energy value is found by estimating the leave-one- 6.2. Performance measure
out cross-validation classification accuracy of PMC which is found
to be 63.6%. Since the energy value in trail-3 is greater than that Classification algorithms may be evaluated based on various
of the energy value in trail-2, the solution list is updated with the performance measures depending on the classification subfield
features {2, 3} and the energy value. The feature ‘3’ is deleted from problem [18,36]. Different measures allow us to observe different
the storage list. Thus the features in the storage list are {1, 4}. behaviors, which increases the strength of the empirical study in
To deposit pheromone in trail-4, the ant agent randomly chooses such a way that more complete conclusions can be obtained from
the numbers 1 and 1 in the positive and negative position respec- different (not opposite yet complementary) deductions [11,19].
tively. For the positive position, the ant agent randomly chooses These measures are based on the values of the confusion matrix
the number 2. Hence the feature ‘4’ at position 2 in the storage list obtained. Each column of the confusion matrix represents the count
is grouped and added to the pheromone deposition. For the nega- of instances in a predicted class, while each row represents the
tive position, the ant agent randomly chooses the number 2. Hence number of instances in an actual class.
all the features except the feature ‘3’ at position 2 in the solution The performance of a classifier is a compound characteristic,
list are grouped and added to the pheromone deposition. Thus the whose most important component is the classification accuracy
features {2, 4} denote the pheromone deposition of the ant agent [24]. Classification accuracy is defined as the ratio of successful pre-
I in trail-4. The energy value of the ant agent in trail-4 is 72.7%. dictions to the total number of instances. To validate the accuracy
The process is repeated and the energy value of the ant agent in of the algorithm, 10 fold cross-validation [31] is used. In 10 fold
trail-5 is 81.8%. The solution list has the features {3, 4} along with cross-validation, each data set was split into 10 mutually exclu-
the energy value 81.8. The process is repeated for 10 trails. Since sive subsets randomly, and all samples of each class are uniformly
the energy value obtained in the successive trails was less than assigned to these subsets. Then a subset was used as the testing set,
that of the energy value in the solution list for the previous trails, the other nine subsets as a whole training set, one time validation
the features selected in the successive trails were ignored. Thus the was performed for this pair of training and testing set. The proce-
subset of features {3, 4} is stored in the solution list of the ant agent dure was repeated ten times and each subset was used as test set
I. Since the energy value is greater than that of the global value, the for one time.
ant agent I updates the global value with the features {3, 4} and the However, in certain cases good classification accuracy is
energy value 81.8. obtained due to random hits. Therefore, the classification algo-
Similarly ant agent II finds the solution and the subset of features rithm has to be evaluated based on other criterion. Cohen’s kappa
{2, 4} along with the energy value 72.4 is stored in the solution list. rate [4,5] is an alternative measure to accuracy since it compen-
Since the energy value of the ant agent II is less than the energy sates for random hits. The kappa measure evaluates the actual hits
value 81.8 in the global value, the ant agent II does not update that can be attributed to the classifier and not by mere chance.
the global value. Thus the features {3, 4} denoting the features Cohen’s kappa statistic ranges from −1 (total disagreement) to 0
‘Astigmatic’ and ‘Tear Production rate’ respectively in the dataset in (random classification) to 1 (total agreement). Cohen’s kappa rate
Table 1 are selected for comparison with the corresponding features for a dataset can be calculated from the following equation
of the unlabelled samples.
Q Q
P x
i=1 ii
− x x
i=1 i. .i
Kappa = Q (11)
6. Experimental study P2 − x x
i=1 i. .i

This section describes the details of experiments performed on where xii denotes the count of cases in the main diagonal of the
35 datasets pertaining to various problem domains, the perfor- confusion matrix, P is the total number of samples, Q is the number
mance measure used to evaluate the algorithm and the statistical of classes, x.i and xi. are the column and row total counts, respec-
analysis used to validate the results. tively. Kappa is very useful for multiclass problems, measuring a
classiﬁer’s accuracy while compensating for random successes [2].

6.1. Problem domain

6.3. Statistical analysis for comparison of algorithms
The datasets used for the experiments were collected from KEEL
[3] and UCI [12] machine learning repository. These datasets have To analyze and validate the results obtained from the experi-
varied number of instances, features and classes. The number of ments, some nonparametric statistical tests are used [14,17].
98 N.K. Sreeja, A. Sankar / Applied Soft Computing 31 (2015) 91–102

Table 7
Datasets used for classiﬁcation.

Data set Number of instances Number of features Type of features Number of classes Missing values Origin Source
(real/integer/nominal)

Appendicitis 106 7 (7/0/0) 2 No Real world KEEL

Australian 690 14 (3/5/6) 2 No Real world KEEL
Balance 625 4 (4/0/0) 3 No Real world UCI
Banana 5300 2 (2/0/0) 2 No Laboratory KEEL
Bupa 345 6 (1/5/0) 2 No Real world KEEL
Car 1728 6 (0/0/6) 4 No Real world UCI
Contraceptive 1473 9 (0/9/0) 3 No Real world UCI
Dermatology 366 34 (0/34/0) 6 Yes Real world KEEL
E-coli 336 7 (7/0/0) 8 No Real world KEEL
Flare 1066 11 (0/0/11) 6 No Real world KEEL
German 1000 20 (0/7/13) 2 No Real world UCI
Glass 214 9 (9/0/0) 7 No Real world UCI
Haberman 306 3 (0/3/0) 2 No Real world KEEL
Hayes-Roth 160 4 (0/4/0) 3 No Laboratory KEEL
Heart 270 13 (1/12/0) 2 No Real world UCI
Hepatitis 155 19 (2/17/0) 2 Yes Real world UCI
Ionosphere 351 34 (32/1/0) 2 No Real world UCI
Iris 150 4 (4/0/0) 3 No Real world KEEL
Lymphography 148 18 (0/3/15) 4 No Real world KEEL
Monk-2 432 6 (0/6/0) 2 No Laboratory KEEL
New-thyroid 215 5 (4/1/0) 3 No Real world KEEL
Nursery 12,690 8 (0/0/8) 5 No Real world UCI
Page-blocks 5473 10 (4/6/0) 5 No Real world UCI
Phoneme 5404 5 (5/0/0) 2 No Real world KEEL
Pima 768 8 (8/0/0) 2 Yes Laboratory UCI
Saheart 462 9 (5/3/1) 2 No Real world KEEL
Sonar 208 60 (60/0/0) 2 No Real world UCI
Tae 151 5 (0/5/0) 3 No Real world UCI
Thyroid 7200 21 (6/15/0) 3 No Real world KEEL
Tic-tac-toe 958 9 (0/0/9) 2 No Real world KEEL
Vehicle 846 18 (0/18/0) 4 No Real world KEEL
Vowel 990 13 (10/3/0) 11 No Real world KEEL
Wine 178 13 (13/0/0) 3 No Real world UCI
Yeast 1484 8 (8/0/0) 10 No Real world KEEL
Zoo 101 16 (0/0/16) 7 No Laboratory UCI

6.3.1. Iman and Davenport test 6.3.2. Bonferroni–Dunn post hoc test
Iman and Davenport test is performed to show that the results The Bonferroni–Dunn post hoc test [9] is used to compare the
of the algorithms are significantly different. Iman and Davenport quality of the classifiers. The quality of the classifiers is significantly
test is a nonparametric alternative of the Analysis-Of-Variance different if the corresponding average of rankings is greater than
(ANOVA) test. Consider there are M classifiers. The classifier models or equal to its critical difference [20]. The critical difference can be
are ranked on each of the N data sets. The best classifier receives computed using the formula given in the following equation.
rank 1 and the worst receives rank N. Tied ranks are shared equally.
j
Let ri be the rank of classifier j on the dataset i, where i = 1, 2, . . ., N k(k + 1)
CriticalDifference = Q˛ (15)
and j = 1,2, . . ., M. Let Rankj be the average rank of the model j. The 6N
average rank may be found using the following equation. where Q˛ is the critical difference for ˛ = 0.05, k is the number of
N j
algorithms and N is the number of datasets.
r
i=1 i
Rankj = (12)
N 7. Experimental results

The test statistic xF2 is found using the following equation This section presents the results of experiments carried out on
35 data sets. The classification accuracy and the Cohen’s kappa
⎛ ⎞
M rate of PMC has been compared with the instance based classi-
2
12N M(M + 1)
xF2 = ⎝ Rankj2 − ⎠ (13) fiers such as DGC+, DGC, KNN, KNN-A, DW-KNN, Cam-NN, CNN,
M(M + 1) 4 SSMA + SFLSDE. The results are validated using non parametric sta-
j=1
tistical tests such as Iman and Davenport test and Bonferroni–Dunn
post hoc test.
Demsar [20] advocates an amendment of the test statistic pro-
posed by Iman and Davenport as in the following equation.
7.1. Classification accuracy

(N − 1)xF2 The average classiﬁcation accuracy of PMC using 10 fold cross-

FF = (14)
N(M − 1) − xF2 validation is compared with the other instance based classiﬁers
using the results reported in Ref. [2]. It is observed from Table 8
which follows the F-distribution with (M − 1) and (M − 1)(N − 1) that PMC outperforms Gravitation based methods namely DGC+,
degrees of freedom. The statistic’s value is compared with the DGC and other NN methods for 22 datasets out of the 35 datasets.
tabled critical values for the F-distribution. If FF is larger, it indicates It is shown that PMC outperforms other instance based algorithms
a difference between the classiﬁers. for classifying datasets like Dermatology, Hepatitis and Pima which
N.K. Sreeja, A. Sankar / Applied Soft Computing 31 (2015) 91–102 99

Table 8
Classiﬁcation accuracy of datasets.

Dataset PMC DGC+ DGC KNN KNN-A DW-KNN Cam-NN CNN SSMA + SFLSDE

Appendicitis 0.8962 0.8409 0.8713 0.8427 0.8882 0.8227 0.8491 0.7736 0.8509
Australian 0.8696 0.8374 0.8496 0.8478 0.8391 0.8319 0.8522 0.8174 0.8594
Balance 0.9232 0.8966 0.8999 0.8337 0.8943 0.856 0.867 0.7855 0.8896
Banana 0.8538 0.8952 0.8931 0.8864 0.894 0.8836 0.8858 0.5725 0.8892
Bupa 0.6725 0.6744 0.6527 0.6066 0.6257 0.6376 0.5962 0.6316 0.6426
Car 0.9549 0.9523 0.9126 0.9231 0.8797 0.8148 0.9045 0.8773 0.9097
Contraceptive 0.5343 0.4945 0.4954 0.4495 0.4827 0.4488 0.4719 0.4203 0.4969
Dermatology 0.9809 0.9544 0.917 0.969 0.9521 0.9633 0.8966 0.9578 0.941
E-coli 0.7619 0.8217 0.7672 0.8067 0.8247 0.8248 0.7824 0.6789 0.8009
Flare 0.7655 0.6655 0.6733 0.6229 0.4634 0.7148 0.6041 0.6492 0.7102
German 0.735 0.7322 0.703 0.696 0.727 0.721 0.712 0.708 0.71
Glass 0.7336 0.7036 0.6893 0.7011 0.6498 0.7218 0.6102 0.7094 0.7176
Haberman 0.7614 0.7171 0.7277 0.7058 0.6927 0.6863 0.6963 0.6865 0.6994
Hayesroth 0.7625 0.84 0.7738 0.25 0.375 0.7063 0.475 0.4625 0.7562
Heart 0.837 0.8452 0.8119 0.7741 0.8111 0.763 0.7704 0.7667 0.8296
Hepatitis 0.871 0.8628 0.8343 0.8251 0.8508 0.8108 0.8651 0.8233 0.7617
Ionosphere 0.9373 0.9311 0.6724 0.8518 0.9372 0.8747 0.7379 0.8917 0.9088
Iris 0.96 0.9533 0.9533 0.94 0.9533 0.94 0.9467 0.9267 0.9533
Lymphography 0.8716 0.814 0.8033 0.7739 0.8085 0.7855 0.753 0.7599 0.795
Monk2 1 0.9995 0.9982 0.9629 0.7058 0.8437 0.8494 0.749 0.9681
New thyroid 0.9721 0.9786 0.8684 0.9537 0.9721 0.9677 0.8693 0.8753 0.9677
Nursery 0.9748 0.9696 0.9378 0.9254 0.861 0.8131 0.8637 0.7875 0.8489
Pageblock 0.954 0.9508 0.9268 0.9591 0.9629 0.9624 0.8942 0.9543 0.953
Phoneme 0.8658 0.8718 0.8471 0.8849 0.8901 0.8999 0.8679 0.883 0.8533
Pima 0.7565 0.7451 0.6662 0.7319 0.7476 0.7163 0.728 0.6994 0.7463
Saheart 0.7251 0.7118 0.7105 0.6818 0.6927 0.6752 0.7099 0.6407 0.686
Sonar 0.9087 0.8487 0.7694 0.8307 0.8798 0.8648 0.7743 0.894 0.8079
Tae 0.7086 0.6715 0.6709 0.4113 0.4375 0.5704 0.5112 0.3983 0.5371
Thyroid 0.9467 0.9704 0.9256 0.9389 0.9396 0.9368 0.93 0.925 0.9463
Tic-Tae-Toe 0.9917 0.8549 0.6906 0.7756 0.7672 0.6848 0.7629 0.7317 0.7348
Vehicle 0.6974 0.7116 0.6572 0.7175 0.6879 0.7258 0.617 0.7423 0.6456
Vowel 0.9111 0.9824 0.9788 0.9778 0.9687 0.9909 0.8879 0.9929 0.898
Wine 0.9755 0.9731 0.9706 0.9549 0.9663 0.9438 0.9497 0.9663 0.9438
Yeast 0.5088 0.5926 0.5151 0.5317 0.5539 0.5418 0.4515 0.467 0.5708
Zoo 0.9901 0.9553 0.9348 0.9281 0.9447 0.9447 0.8919 0.9281 0.91
Average accuracy 0.8448 0.8349 0.7991 0.7849 0.7865 0.7969 0.7667 0.7581 0.8040
Average rank 2.3286 3.0714 5.2429 5.6286 4.6571 5.5714 6.7143 6.8 4.9857

The bold values denote the highest value.

contains missing values. It is observed that in datasets such as Con-

traceptive, Flare and Haberman, there are two or more instances
with the same feature values belonging to different classes. It is
found from Table 8 that the classification accuracy of PMC for these
datasets having overlapping classes is significantly higher when
compared to instance based methods. Also, it obtains competitive
accuracy for noisy dataset like HayesRoth.
Also, it is shown in Table 8, that the average classification accu-
racy of PMC is much higher compared to the classifiers such as DGC,
KNN, KNN-A, Cam-NN, DW-KNN, CNN and SSMA + SFLSDE. Though
the average classification accuracy of PMC and DGC+ are closer, it
is shown that PMC obtains higher classification accuracy for 24 out
of 35 datasets than DGC+. Also, it is shown that the average rank of
PMC is significantly higher when compared to the instance based Fig. 3. Bonferroni–Dunn test for classification accuracy.
methods.
To evaluate whether there are significant differences in the
results of the algorithms, Iman and Davenport test was performed. rank for classification accuracy obtained for each algorithm. Since
The value of F-distribution using Iman and Davenport test is 1.97 PMC obtains the best rank among all other algorithms, its average
for a significance level of alpha = 0.05. The Iman and Davenport rank is added with the critical difference value 1.7833. The result
statistic for the classification accuracy (distributed according to the is represented using a straight line that goes through all the algo-
F-distribution with 8 and 272 degrees of freedom) is found to be rithms represented in the graph. Those algorithms whose average
14.4942. Therefore, the null hypothesis is rejected and it is proved rank falls above the straight line perform significantly worse than
that the classification accuracy of the classifiers is significantly dif- PMC. Therefore all algorithms but DGC+ perform worse than PMC
ferent. algorithm.
Bonferroni–Dunn post hoc test was performed on the classifica-
tion accuracy with ˛ = 0.05 and critical difference is 1.7833. Fig. 3 7.2. Cohen’s kappa rate
summarizes the ranking obtained by the Friedman test and the
critical difference of Bonferroni–Dunn’s test with ˛ = 0.05. The Hor- The average Kappa results obtained from 10 fold cross-
izontal axis represents various instance based algorithms with PMC validation is presented in Table 9. It is compared with the kappa
as the control algorithm and the vertical axis represents the average rate of the other methods as reported in Ref. [2]. It can be
100 N.K. Sreeja, A. Sankar / Applied Soft Computing 31 (2015) 91–102

Table 9
Cohen’s kappa rate for datasets.

Dataset PMC DGC+ DGC KNN KNN-A DW-KNN Cam-NN CNN SSMA + SFLSDE

Appendicitis 0.6412 0.4549 0.471 0.4565 0.588 0.465 0.3808 0.2773 0.4573
Australian 0.7378 0.6724 0.6976 0.6917 0.6719 0.6607 0.7035 0.6313 0.7146
Balance 0.8662 0.8083 0.8144 0.7004 0.8042 0.7403 0.7559 0.6271 0.7953
Banana 0.7039 0.7873 0.7825 0.7699 0.785 0.7642 0.7675 0.1367 0.7955
Bupa 0.3259 0.3076 0.222 0.1944 0.2021 0.2645 0.1024 0.2571 0.2731
Car 0.8997 0.8981 0.8025 0.8264 0.7009 0.5255 0.7935 0.7457 0.8023
Contraceptive 0.2912 0.1994 0.1952 0.1358 0.173 0.1517 0.1396 0.1079 0.2112
Dermatology 0.976 0.9425 0.8939 0.9608 0.9394 0.9538 0.8689 0.9468 0.9257
E-coli 0.6715 0.7498 0.6585 0.73 0.7539 0.7579 0.6882 0.562 0.7245
Flare 0.6976 0.5662 0.5738 0.5101 0.2757 0.6314 0.4753 0.5518 0.6261
German 0.3562 0.2812 0.0092 0.2194 0.2539 0.3015 0.1651 0.288 0.256
Glass 0.6391 0.5834 0.5548 0.5887 0.5224 0.6194 0.4229 0.6037 0.6069
Haberman 0.2249 0.0558 0.0315 0.1362 0.0752 0.1374 0.1417 0.1832 0.0935
Hayesroth 0.613 0.7468 0.6294 −0.2326 0.0441 0.5217 0.1835 0.1721 0.6146
Heart 0.6689 0.6826 0.6094 0.542 0.6168 0.5192 0.542 0.5291 0.6501
Hepatitis 0.5547 0.5512 0.2 0.4683 0.4069 0.4084 0.3615 0.4672 0.3451
Ionosphere 0.8604 0.8487 0.1142 0.6494 0.8595 0.7083 0.5145 0.7526 0.7986
Iris 0.94 0.93 0.93 0.91 0.93 0.91 0.92 0.89 0.93
Lymphography 0.7551 0.6289 0.6026 0.5507 0.6162 0.5765 0.4993 0.5333 0.5974
Monk2 1 0.9991 0.9963 0.9254 0.3987 0.6818 0.6965 0.4968 0.9357
New thyroid 0.9389 0.9544 0.6566 0.8957 0.9362 0.9282 0.757 0.7073 0.931
Nursery 0.9631 0.9551 0.9083 0.8907 0.7963 0.7279 0.8001 0.686 0.7786
Pageblock 0.7396 0.7152 0.4457 0.7655 0.789 0.7887 0.5792 0.7445 0.7246
Phoneme 0.6756 0.6898 0.5889 0.7172 0.7296 0.7539 0.6764 0.7144 0.6379
Pima 0.4353 0.4063 0.0702 0.3892 0.4259 0.3576 0.3838 0.3191 0.4196
Saheart 0.3519 0.3081 0.2708 0.2646 0.2929 0.251 0.3335 0.2056 0.2687
Sonar 0.8164 0.6943 0.5187 0.6554 0.7549 0.7248 0.5364 0.7861 0.61
Tae 0.5630 0.5062 0.5049 0.1171 0.1559 0.3563 0.265 0.0946 0.3017
Thyroid 0.5874 0.7709 −0.0002 0.4012 0.3593 0.4235 0.1049 0.3996 0.4608
Tic-Tae-Toe 0.9815 0.6607 0.1472 0.4149 0.3998 0.1153 0.4243 0.4054 0.3941
Vehicle 0.5966 0.6152 0.5437 0.6233 0.5844 0.6342 0.4905 0.6563 0.5273
Vowel 0.9022 0.9807 0.9767 0.9756 0.9656 0.99 0.8767 0.9922 0.8878
Wine 0.9659 0.959 0.9552 0.9318 0.9491 0.9152 0.9228 0.9491 0.9145
Yeast 0.3642 0.4695 0.3309 0.3942 0.4225 0.407 0.2413 0.3176 0.443
Zoo 0.9869 0.9426 0.9127 0.9041 0.9254 0.9255 0.8553 0.9043 0.887
Average kappa value 0.6941 0.6664 0.5320 0.5735 0.5744 0.5885 0.5249 0.5326 0.6097
Average rank 2.2571 3.2571 5.8857 5.5714 4.9286 5.0143 6.8143 6.2143 4.9143

The bold values denote the highest value.

observed from Table 9 that PMC outperforms the other instance

based methods for 23 out of 35 datasets. The kappa rate of PMC
for the datasets Appendicitis, Australian, Balance, Contraceptive,
Dermatology, Flare, German, Glass, Haberman, Ionosphere, Lym-
phography, Sonar, Tae, Tic-Tae-Toe and Zoo is significantly higher
when compared to the other methods indicating the notable per-
formance of PMC. It is worth mentioning that the Kappa value
of PMC for the dataset Monk2 is 1. Interestingly, it can be noted
form Tables 8 and 9 that although the classification accuracy of the
datasets such as Australian, Bupa and German was closer for both
PMC and DGC+, it is found that the kappa rate for DGC+ is much
lesser than PMC indicating the efficiency of PMC. Also, the aver-
age kappa rate and average rank of PMC algorithm is considerably
higher when compared to the instance based methods.
To evaluate whether there are significant differences in the
results of the algorithms, Iman and Davenport test was performed. Fig. 4. Bonferroni–Dunn test for kappa rate.
The value of F-distribution using Iman and Davenport test is 1.97 for
a significance level of alpha = 0.05. The Iman and Davenport statistic best rank among all other algorithms, its average rank is added with
for the kappa rate (distributed according to the F-distribution with the critical difference value 1.7833. The result is represented using a
8 and 272 degrees of freedom) is found to be 11.3555. Therefore, straight line that goes through all the algorithms represented in the
the null hypothesis is rejected and it is proved that the kappa rate graph. Those algorithms whose average rank falls above the straight
of the classifiers is significantly different. line perform significantly worse than PMC. Therefore all algorithms
Bonferroni–Dunn post hoc test was performed on the kappa rate but DGC+ perform worse than PMC algorithm for kappa rate.
with ˛ = 0.05 and critical difference is 1.7833. Fig. 4 summarizes
the ranking obtained by the Friedman test and the critical differ- 8. Time performance
ence of Bonferroni–Dunn’s test with ˛ = 0.05. The Horizontal axis
represents various instance based algorithms with PMC as the con- The evaluation time to classify an unlabelled sample using PMC
trol algorithm and the vertical axis represents the average rank algorithm is compared with the evaluation time of gravitation
obtained for kappa rate for each algorithm. Since PMC obtains the based methods such as DGC and DGC+ reported in Ref. [2].
N.K. Sreeja, A. Sankar / Applied Soft Computing 31 (2015) 91–102 101

Table 10 10. Conclusion

Comparison of evaluation time between PMC, DGC+ and DGC.

Dataset PMC (ms) DGC+ (ms) DGC (ms) A Pattern Matching based Classification algorithm has been
Appendicitis 0.0018 0.3248 0.2869 proposed for classification of datasets. To improve the classifica-
Australian 0.0028 3.404 3.1751 tion accuracy of PMC, an Ant Colony Optimization based Feature
Balance 0.0002 1.2662 0.5935 Selection algorithm based on the idea of PMC has been proposed.
Banana 5.6604 8.1707 0.0784 The advantage of PMC in comparison with other instance based
Bupa 0.0017 0.877 0.5057
methods is its simple classification procedure together with high
Car 0.0021 4.2234 3.1972
Contraceptive 0.0032 5.941 3.9952 performance. It is shown that the proposal achieved better classi-
Dermatology 0.0024 5.3921 5.3739 fication accuracies and kappa rate for most of the datasets when
E-coli 0.0041 1.3645 1.0586 compared to the instance based methods. Statistical tests were
Flare 0.0023 5.6916 1.971
also performed to support the better performance of the proposal.
German 0.0039 8.2059 8.1356
Glass 0.0014 0.9352 0.5338 It is also proved that the computation time of PMC is less when
Haberman 0.0013 0.5011 0.1745 compared to DGC+ and DGC.
Hayesroth 0.0006 0.3507 0.1619
Heart statlog 0.0022 1.2681 1.2623
Hepatitis 0.0013 0.557 0.5551
References
Ionosphere 0.0031 5.1187 4.8494
Iris 0.0027 0.3381 0.1433
[1] D.W. Aha, D. Kibler, M.K. Albert, Instance-based learning algorithms, Mach.
Lymphography 0.0020 1.034 1.016
Learn. 6 (January (1)) (1991) 37–66.
Monk2 0.0014 1.0772 0.6516
[2] A. Cano, A. Zafra, S. Ventura, Weighted data gravitation classification for
New thyroid 0.0018 0.5152 0.2236 standard and imbalanced data, IEEE Trans. Cybern. 43 (December (6)) (2013).
Nursery 17.3365 107.3627 61.6712 [3] J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. García, L. Sánchez, F. Herrera,
Pageblock 5.4745 33.0672 1.7722 KEEL data-mining software tool: data set repository, integration of algorithms
Phoneme 5.5453 16.2879 0.8975 and experimental analysis framework, J. Mult.-Valued Log. Soft Comput. 17
Pima 0.0019 2.2839 2.1357 (2011) 255–287.
Saheart 0.0021 1.7139 1.7108 [4] A. Ben-David, Comparison of classification accuracy using Cohen’s weighted
Sonar 0.0038 8.032 7.9831 kappa, Expert Syst. Appl. 34 (February (2)) (2008) 825–832.
Tae 0.0006 0.3959 0.1983 [5] A. Ben-David, About the relationship between ROC curves and Cohen’s kappa,
Thyroid 8.3333 87.0068 12.2842 Eng. Appl. Artif. Intell. 21 (September (6)) (2008) 874–882.
Tic-Tae-Toe 0.0029 3.9907 3.9885 [6] T. Cover, P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theory
IT-13 (January (1)) (1967) 21–27.
Vehicle 0.0035 6.5558 6.4839
[7] J. Diederich, A. Al-Ajmi, P. Yellowlees, Ex-ray: data mining and mental health,
Vowel 0.0021 7.3149 4.4131
Appl. Soft Comput. 7 (2007) 923–928.
Wine 0.0017 0.8732 0.8741
[8] S.A. Dudani, The distance-weighted k-nearest-neighbor rule, IEEE Trans. Syst.
Yeast 0.0028 7.053 3.2389 Man Cybern. B: Cybern SMC-6 (April (4)) (1976) 325–327.
Zoo 0.0018 0.6748 0.4077 [9] O.J. Dunn, Multiple comparisons among means, J. Amer. Stat. Assoc. 56 (March
(293)) (1961) 52–64.
[10] P.G. Espejo, S. Ventura, F. Herrera, A survey on the application of genetic pro-
gramming to classification, IEEE Trans. Syst. Man Cybern. C: Appl. Rev. 40
(March (2)) (2010) 121–144.
It is found from Table 10 that PMC has less evaluation time for [11] A. Fernández, S. García, J. Luengo, E. Bernado-Mansilla, F. Herrera, Genetics-
classification for all the datasets when compared to DGC+. Also, the based machine learning for rule induction: state of the art, taxonomy, and
evaluation time of PMC is less than that of DGC for all datasets but comparative study, IEEE Trans. Evol. Comput. 14 (December (6)) (2010)
913–941.
for Banana, Pageblock and Phoneme. [12] A. Frank, A. Asuncion, UCI Machine Learning Repository, Univ. California, School
Inf. Comput. Sci., Irvine, CA, 2015, Available at: http://archive.ics.uci.edu/
ml/citation policy.html
[13] Q. Gao, Z. Wang, Center-based nearest neighbor classifier, Pattern Recognit. 40
9. Discussion
(January (1)) (2007) 346–349.
[14] S. García, A. Fernández, J. Luengo, F. Herrera, Advanced nonparametric tests
This section analyses the results of experiments conducted on 35 for multiple comparisons in the design of experiments in computational intel-
datasets. Out of the 35 datasets, PMC achieves the first best results ligence and data mining: experimental analysis of power, Inf. Sci. 180 (May
(10)) (2010) 2044–2064.
in 23 datasets and second best results in 4 datasets when com- [15] S. Garcia, J. Derrac, J.R. Cano, F. Herrera, Prototype selection for nearest neighbor
pared to the instance based methods. It is also shown that PMC classification: taxonomy and empirical study, IEEE Trans. Pattern Anal. Mach.
obtains high classification accuracy for datasets such as Contracep- Intell. 34 (March (3)) (2012) 417–435.
[16] S. García, J.R. Cano, F. Herrera, A memetic algorithm for evolutionary proto-
tive, Flare and Haberman having overlapping classes in comparison type selection: a scaling up approach, Pattern Recognit. 41 (August (8)) (2008)
with instance based methods. It is apparent from Table 8 that 2693–2709.
PMC achieves very high classification accuracy for datasets such [17] S. García, D. Molina, M. Lozano, F. Herrera, A study on the use of non-parametric
tests for analyzing the evolutionary algorithms’ behaviour: a case study, J.
as Appendicitis, Balance, Contraceptive, Dermatology, Flare, Lym- Heurist. 15 (December (6)) (2009) 617–644.
phography, Tae, Tic-Tae-Toe and Zoo when compared to the DGC+, [18] Q. Gu, L. Zhu, Z. Cai, Evaluation measures of the classification performance
DGC and NN methods. It can also be noted from Tables 8 and 9 that of imbalanced data sets, Commun. Comput. Inf. Sci. 51 (October (1)) (2009)
461–471.
the classification accuracy and kappa rate of PMC for the dataset [19] M. Galar, A. Fernández, E. Barrenechea, H. Bustince, F. Herrera, An overview of
Monk2 is 1. Also, the average kappa rate of PMC is significantly ensemble methods for binary classifiers in multi-class problems: experimental
higher than the other instance based methods. study on one-vs-one and one-vs-all schemes, Pattern Recognit. 44 (August (8))
(2011) 1761–1776.
PMC fails to classify linearly inseparable datasets with very few
[20] J. Demsar, Statistical comparisons of classifiers over multiple data sets, J. Mach.
instances such as XOR problem. While PMC is applied to classify Learn. Res. 7 (2006) 1–30.
a new unlabelled sample which is not present in such a dataset, [21] K. Lee, D. Booth, P. Alam, A comparison of supervised and unsupervised neural
the maximum match occurs with the instances belonging to a class networks in predicting bankruptcy of Korean firms, Expert Syst. Appl. 29 (1–16)
(2005).
different from the target class as there is no pattern that is partially [22] B. Li, Y.W. Chen, Y.Q. Chen, The nearest neighbor algorithm of local probabil-
or exactly similar to the new unlabelled sample. However, for such ity centers, IEEE Trans. Syst. Man Cybern. B: Cybern. 38 (February (1)) (2008)
datasets, PMC would classify an unlabelled sample if it is already 141–154.
[23] S.W. Lin, Z.J. Lee, S.C. Chen, T.Y. Tseng, Parameter determination of support
present in the dataset. It is also shown that PMC has less evaluation vector machines and feature selection using simulated annealing approach,
time when compared to DGC+ for all datasets. Appl. Soft Comput. 8 (2008) 1505–1512.
102 N.K. Sreeja, A. Sankar / Applied Soft Computing 31 (2015) 91–102

[24] L. Kuncheva, Combining Pattern Classifiers-Methods and Algorithms, 2nd ed., [35] S.-C. Chen, S.-W. Lin, S.-Y. Chou, Enhancing the classification accuracy
John Wiley and Sons, 2013. by scatter-search-based ensemble approach, Appl. Soft Comput. 11 (2011)
[25] M. Dorigo, T. Stutzle, Ant Colony Optimization, Prentice Hall of India Private 1021–1028.
Limited, New Delhi, 2005, pp. 37–38. [36] M. Sokolova, G. Lapalme, A systematic analysis of performance measures for
[26] K. Ming Leung, K-Nearest Neighbor Algorithm for Classification, Polytechnic classification tasks, Inf. Process. Manage. 45 (July (4)) (2009) 427–437.
University Department of Computer Science/Finance and Risk Engineering, [37] I. Triguero, J. Derrac, S. Garcia, F. Herrera, A taxonomy and experimental study
2007. on prototype generation for nearest neighbor classification, IEEE Trans. Syst.
[27] F. Neri, V. Tirronen, Scale factor local search in differential evolution, Memet. Man Cybern. C: Appl. Rev. 42 (January (1)) (2012) 86–100.
Comput. 1 (June (2)) (2009) 153–171. [38] I. Triguero, S. García, F. Herrera, Differential evolution for optimizing the posi-
[28] M. Paliwal, U.A. Kumar, Neural networks and statistical techniques: a review tioning of prototypes in nearest neighbor classification, Pattern Recognit. 44
of applications, Expert Syst. Appl. 36 (June (1)) (2009) 2–17. (April (4)) (2011) 901–916.
[29] R. Paredes, E. Vidal, Learning weighted metrics to minimize nearest neighbor [39] V. Maniezzo, L.M. Gambardella, F. de Luigi, Ant Colony Optimization, 2004
classification error, IEEE Trans. Pattern Anal. Mach. Intell. 28 (July (7)) (2006) http://www.idsia.ch/∼luca/aco.pdf
1100–1110. [40] C. Wang, Y.Q. Chen, Improving nearest neighbor classification with simulated
[30] R. Paredes, E. Vidal, Learning prototypes and distances: a prototype reduction gravitational collapse, in: Proc. ICNC, vol. 3612, 2005, pp. 845–854.
technique based on nearest neighbor error minimization, Pattern Recognit. 39 [41] J. Wang, P. Neskovic, L.N. Cooper, Improving nearest neighbor rule with a sim-
(February (2)) (2006) 180–188. ple adaptive distance measure, Pattern Recognit. Lett. 28 (January (2)) (2007)
[31] P. Refaeilzadeh, L. Tang, H. Liu, Cross-Validation, Arizona State University, 2008. 207–213.
[32] L. Peng, B. Peng, Y. Chen, A. Abraham, Data gravitation based classification, Inf. [42] A. Widodo, B.S. Yang, Support vector machine in machine condition moni-
Sci. 179 (March (6)) (2009) 809–819. toring and fault diagnosis, Mech. Syst. Signal Process. 21 (August (6)) (2007)
[33] D.J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 2560–2574.
Chapman & Hall, London, UK, 2007. [43] C. Zhou, Y. Chen, Improving nearest neighbor classification with cam weighted
[34] S.-W. Lin, S.-C. Chen, PSOLDA: a particle swarm optimization approach for distance, Pattern Recognit. 39 (April (4)) (2006) 635–645.
enhancing classification accuracy rate of linear discriminant analysis, Appl. Soft [44] Y. Zong-Chang, A vector gravitational force model for classification, Pattern
Comput. 9 (2009) 1008–1015. Anal. Appl. 11 (May (2)) (2008) 169–177.