2015-Elsevier-BeeOWA-A-novel-approach-based-on-ABC-algorithm-and-induced-OWA-operators-for-constructing-one-class-classifier-ensembles
2015-Elsevier-BeeOWA-A-novel-approach-based-on-ABC-algorithm-and-induced-OWA-operators-for-constructing-one-class-classifier-ensembles
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
art ic l e i nf o a b s t r a c t
Article history: In recent years, classifier ensembles have received increasing attention in the machine learning and
Received 3 September 2014 pattern recognition communities. However, constructing classifier ensembles for one-class classification
Received in revised form problems has still remained as a challenging research topic. To pursue this line of research, we need to
10 January 2015
address issues on how to generate a set of diverse one-class classifiers that are individually accurate and
Accepted 17 March 2015
how to combine the outputs of them in an effective way. In this paper, we present BeeOWA, a novel
Communicated by A. Suarez
Available online 10 April 2015 approach to construct highly accurate one-class classifier ensembles. It uses a novel binary artificial bee
colony algorithm, called BeePruner, to prune an initial one-class classifier ensemble and find a near-
Keywords: optimal sub-ensemble of base classifiers in a reasonable computational time. To evaluate the fitness of
One-class classification
an ensemble solution, BeePruner uses two different measures: an exponential consistency measure and
Classifier ensemble
a non-pairwise diversity measure based on the Kappa inter-rater agreement. After one-class classifier
Classifier pruning
Classifier fusion pruning, BeeOWA uses a novel exponential induced OWA (ordered weighted averaging) operator, called
Binary artificial bee colony EIOWA, to combine the outputs of base classifiers in the sub-ensemble. The results of experiments
Induced OWA operator carried out on a number of benchmark datasets show that BeeOWA can outperform several state-of-the-
art approaches, both in terms of classification performance and statistical significance.
& 2015 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2015.03.051
0925-2312/& 2015 Elsevier B.V. All rights reserved.
368 E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381
increase the classification accuracy [17,18]. Hence, another important overview of current techniques for constructing one-class classifier
step in constructing a classifier ensemble is to choose a good strategy ensembles. The experimental results are described in Section 5.
for combining the outputs of base classifiers, using a process also Finally, conclusions are given in Section 6.
known as classifier fusion. In the literature, several classifier fusion
strategies have been proposed, which can be categorized according
to the level of classifier outputs into abstract level, rank level, and 2. Background
measurement level [19,20]. Measurement level outputs provide more
information than the other types of outputs and a number of In this section, we give a brief introduction to some basic
aggregation functions or fusion rules, such as mean, max, and product concepts used throughout this paper.
are employed for combining them [21].
Aggregation of different pieces of information obtained from 2.1. Classifier ensemble
different sources is a common aspect of any fusion system. A very
interesting class of powerful aggregation operators is called the To improve the performance of different classifiers, which may
ordered weighted averaging (OWA) [22]. An OWA operator takes differ in complexity or training algorithm, an ensemble of classifiers
multiple values as input and returns a single value that is a is a viable solution. This may serve to increase the performance and
weighted sum based on an order ranking of all input values. also the robustness of the classification. The underlying idea of
Classifier fusion using OWA operators seems more robust than the ensemble learning for classification problems is to build a number
simple weighted averaging, where the coefficients are derived of base classifiers and then combine their outputs using a fusion rule.
based on the classifier accuracy [23]. It has been shown that classifier ensembles outperform single
In recent years, there has been a substantial amount of research classifiers for a wide range of classification problems [27–29]. The
conducted in the field of one-class classification, resulting in different reason is that a combination of multiple classifiers reduces risks
one-class classifiers, including one-class SVM (OCSVM) [24], support associated with choosing an insufficient single classifier.
vector data description (SVDD) [25], and so on. The goal of one-class In general, the process of constructing an ensemble of base
classification is to distinguish a set of target objects from all the other classifiers consists of three main steps: classifier generation,
possible objects [26]. Since in one-class classification problems we classifier pruning, and classifier fusion. In the following, we briefly
have only the information of the target class, therefore constructing describe each of these steps.
highly accurate one-class classifier ensembles is more challenging
than constructing multi-class classifier ensembles. 2.1.1. Classifier generation
As mentioned before, to construct classifier ensembles, we need One of the key challenges for the classifier ensemble is to
to address issues on how to generate a set of accurate and diverse generate a set of diverse base classifiers. For this purpose, we can
base classifiers and how to combine the outputs of them using an use different strategies. One effective strategy is to train homo-
effective fusion rule. Although these issues have been adequately geneous classifiers with different datasets. To do this, we can
addressed in multi-class classifier ensembles, relatively little work divide the original dataset into partitions or generate a number of
has been reported in the literature to address them in one-class subsets through data splitting, bagging, or boosting [6,7,30], in the
classifier ensembles. In this paper, we present BeeOWA, a novel hope that different classification models are generated for differ-
approach to construct highly accurate one-class classifier ensembles. ent distributions of the original dataset. Another strategy, also
BeeOWA uses a novel binary artificial bee colony algorithm, called known as ensemble feature selection, is to train homogeneous
BeePruner, to prune an initial one-class classifier ensemble and find a classifiers with different subsets of features. It has been shown
near-optimal sub-ensemble of base classifiers. More precisely, the that simple random selection of feature subsets, called the random
goal of BeePruner is to exclude the non-diverse base classifiers from subspacing (RS) or random subspace method (RSM), is an effective
the initial ensemble and, at the same time, keep the classification technique for ensemble feature selection [8]. The last and intuitive
accuracy. In the subsequent step, if the fusion rule does not properly strategy is to train heterogeneous classifiers with the same
utilize the ensemble diversity, then no benefit arises from the dataset. It should be noted that we can combine the above
classifier fusion. Considering this fact, BeeOWA uses a novel expo- strategies to take the advantages of all of them, for example, we
nential induced OWA operator, called EIOWA, to combine the outputs can train heterogeneous classifiers with different datasets or
of base classifiers in the sub-ensemble. different subsets of features [31,32].
The major contributions of this paper are listed as follows:
2.1.2. Classifier pruning
We present a novel artificial bee colony algorithm for one-class Classifier pruning, also known as ensemble pruning, is a useful
classifier pruning that utilizes two measures simultaneously, an technique for reducing the ensemble size by selecting only a
exponential consistency measure and a non-pairwise diversity subset of the base classifiers that are both accurate and have
measure based on the Kappa inter-rater agreement. diversity in their outputs [4]. It is well-known that combining the
To the best of our knowledge, the most widely used fusion rules in same base classifiers does not contribute to anything apart from
one-class classification problems are fixed rules, such as majority increasing the computational complexity of the classification. Also,
voting, mean, max, and product. We propose a novel exponential combining the diverse but too weak base classifiers is unlikely to
induced OWA operator for one-class classifier fusion and experi- bring any benefits in the classification accuracy. Therefore, classi-
mentally show it can outperform the fixed rules. fier pruning can be considered as a search for an optimal subset of
We conduct extensive experiments on benchmark datasets to base classifiers, making trade-off between the classification accu-
evaluate the performance of BeeOWA and show that it per- racy and diversity. For small ensembles, the optimal subset can be
forms significantly better than state-of-the-art approaches in found through exhaustive search. For large ensembles, a near-
the literature. optimal subset can be found using meta-heuristic optimization
algorithms like genetic algorithms [33,34].
The rest of this paper is organized as follows. Section 2 is fully
dedicated to the background. Short descriptions of classifier ensem- 2.1.3. Classifier fusion
bles, OWA operators, and ABC algorithm are included in this section. An important step in constructing an effective ensemble is to
Section 3 presents the main steps of BeeOWA. Section 4 provides an choose a good strategy for combining the outputs of the base
E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381 369
as input, in which the first values are used to induce an ordering 3.1. One-class classifier pruning
over the second values which are then aggregated.
As previously mentioned, the aim of classifier pruning is to
reduce the ensemble size by selecting only a subset of the base
classifiers. Because of the complexity of the problem and the size
2.5. Artificial bee colony
of the solution space, finding an optimal subset of the base
classifiers is difficult and time-consuming. Therefore, we use a
Artificial bee colony (ABC) [12] is a recently introduced swarm
novel binary ABC algorithm, called BeePruner, to find a near-
intelligence algorithm that simulates the intelligent foraging behavior
optimal solution in a reasonable computational time. In the
of honeybees. It consists of two main concepts: food sources and
following, we describe the algorithm in more detail.
artificial bees. Artificial bees search for food sources having high
Unlike real-valued optimization problems, where the candidate
nectar amounts. The position of a food source represents a possible
solutions can be represented by vectors of real values, the candidate
solution to an optimization problem and its nectar amount corre-
solutions to the classifier pruning problem are represented in binary
sponds to the quality (or fitness) of the solution. The colony of
space. To adapt the ABC algorithm to this problem, we consider a
artificial bees contains three groups of bees: employed bees, onlooker
binary vector yi corresponding to each food source position zi A P.
bees, and scout bees. Employed bees search for new food sources
Each element of yi can take on a binary value (0 or 1). Formally,
having more nectar amount in the vicinity of their food sources and
yij A f0; 1g. The element zij A zi is then interpreted as the probability
share the obtained information with onlooker bees by dancing in the
for yij to be 0 or 1. For example, zij ¼ 0:4 implies a 40% chance for yij
dance area. The number of employed bees is equal to the number of
to be 1 and a 60% chance to be 0. This means that the elements of
food sources. After watching the dance, the onlooker bees probabil-
food source positions are restricted to be in the range [0,1] to be
istically select some of the food sources and search the vicinity of
interpreted as a probability. Therefore, we set the lower and upper
them to find new food sources with more nectar amount, just like the
bounds of each element to 0 and 1, respectively. An artificial bee may
employed bees. However, the onlooker bees have more trends to
then be seen to move to near and far corners of a hypercube when
select richer food sources. Employed bees, whose food sources cannot
searching for food sources.
be improved through a predetermined number of trials, become scout
The aim of BeePruner is to find a near-optimal subset (sub-
bees and start to search for new food sources randomly.
ensemble) of base one-class classifiers. Hence, we refer to food
At the initialization step, the scout bees randomly initialize a
source positions and their binary interpretations as ensemble
population P of food source positions. Afterward, P is subjected to
positions and ensemble solutions, respectively.
repeated cycles of the search processes of the artificial bees. Let SN be
The pseudo-code of BeePruner is shown in Algorithm 1. Given
the number of food sources and D be the number of optimization
an initial ensemble S of base one-class classifiers, the artificial
parameters. Each employed bee generates a candidate food source
bees explore and exploit them to obtain a sub-ensemble C D S that
position vi in the vicinity of its food source position zi A P according to
achieves the maximum possible fitness function. BeePruner first
vij ¼ zij þ φij ðzij zkj Þ; ð24Þ sends the scout bees to randomly initialize a population P of
ensemble positions (Line 1). Each ensemble position zi A P is a
where k A f1; …; SNg and jA f1; …; Dg are two randomly chosen vector of D real values, each in the range [0,1], where D is set to the
indices. φij is a random value in the range ½ 1; 1. After generating total number of base classifiers in S. BeePruner then applies a
vi, its fitness is calculated and a greedy selection is applied between vi binary interpretation operator to each ensemble position zi A P to
and zi. form the ensemble solution yi (Line 2):
Subsequently, each onlooker bee randomly selects a food source i (
with a food source selection probability pi, which is calculated as 1; ψ ij o zij ;
yij ¼ ð26Þ
0 otherwise;
f
pi ¼ PSN i ; ð25Þ
k¼1 fk where ψij is a random value in the range [0,1]. The element zij is
now a probability for yij to be 0 or 1. In fact, yi indicates a possible
where fi is the fitness of the food source i. sub-ensemble of base one-class classifiers in S. If yij ¼ 1, this
After selecting a food source, each onlooker bee generates a means that the base classifier C j A S is part of the sub-ensemble
candidate food source position in the vicinity of its food source to be evaluated. On the other hand, if yij ¼ 0, it means that Cj is not
position, exactly the same as the employed bees. After that, few part of the sub-ensemble. Note that yij can change even if the value
food sources that are not improved during a predetermined of zij does not change, due to the random value ψij. For example,
number of cycles, as controlled by a parameter limit (LM), are let S ¼ fC 1 ; …; C 5 g be an ensemble of five base one-class class-
detected and abandoned. These food sources are replaced with ifiers. Then the ensemble solution yi ¼ ½0; 0; 1; 0; 1 indicates a sub-
randomly selected food sources by the scout bees. ensemble fC 3 ; C 5 g.
Since ABC has few control parameters and also easy to imple- Subsequently, BeePruner calculates the fitness of each ensemble
ment, it has been widely used in many optimization applications, solution yi and repeats the following steps until a termination crite-
such as dynamic anomaly detection in MANETs [15], stacking rion is met (Lines 3–27): each employed bee i generates a new ens-
ensemble [16], classification rule discovery [50], and so on. emble position z0i in the vicinity of its own ensemble position zi A P
and applies the binary interpretation operator to z0i to form the
ensemble solution y0i . It then calculates the fitness of y0i and performs
a greedy selection between yi and y0i . After that, all employed bees
3. BeeOWA share their information about ensemble positions with onlooker bees
by dancing in the dance area (Lines 5–13). According to the informa-
In this section, we present BeeOWA, a novel approach for tion provided by the employed bees, each onlooker bee i chooses an
constructing one-class classifier ensembles. It assumes that we ensemble position zk A P using the binary tournament selection and
are given an initial ensemble of base one-class classifiers. There- generates a new ensemble position z0k in the vicinity of zk. It then
fore, it consists of two main steps: one-class classifier pruning and applies the binary interpretation operator to z0k to form the ensemble
one-class classifier fusion. solution y0k and performs a greedy selection between yk and y0k similar
372 E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381
to that of the employed bees (Lines 15–23). Some ensemble positions In the following, we describe the exponential consistency and
in P may not be improved through a pre-specified number of trials by non-pairwise diversity measures in detail.
the employed and onlooker bees. Therefore, scout bees are sent to
determine these so-called infertile ensemble positions and replace Exponential consistency measure. The exponential consistency
them with new ensemble positions (Line 25). measure indicates how consistent a sub-ensemble of base classi-
fiers is in rejecting a fraction of the target objects. Given an ense-
mble solution yi, we compute the exponential consistency of yi as
Algorithm 1. BeePruner.
νðyi Þ ¼ e αΔi ; ð28Þ
1: Initialize a population P of ensemble positions
2: Apply a binary interpretation operator to each ensemble where α Z1 is a constant and Δi is the average consistency of
position zi A P to form the ensemble solution yi and then base classifiers in the sub-ensemble indicated by yi and is given
calculate the fitness value f ðyi Þ by
3: repeat
1 X D
4: // Employed bee phase Δi ¼ y δ; ð29Þ
M i j ¼ 1 ij j
5: for each employed bee i¼1 to SN do
6: Generate a new ensemble position z0i in the vicinity of P
where D is the total number of base classifiers in S, M i ¼ D j ¼ 1 yij
the ensemble position zi A P is the number of base classifiers in yi, and δj is the consistency of
7: Apply the binary interpretation operator to z0i to form the base classifier C j A S:
the ensemble solution y0i
δj ¼ j ε^ j εj j ; ð30Þ
8: Calculate the fitness value f ðy0i Þ
9: if f ðy0i Þ 4 f ðyi Þ then where εj and ε^ j are the designated and estimated rejection rates
10: zi ’z0i of Cj on the target objects, respectively. Note that to compute the
11: yi ’y0i exponential consistency measure, we only need the information
12: end if of the tar;get class, which makes it suitable for one-class
13: end for classification problems.
14: // Onlooker bee phase
Non-pairwise diversity measure. Previous researches have shown
15: for each onlooker bee i ¼1 to SN do that the success of classifier ensembles not only depends on a set
16: Choose an ensemble position zk A P using the binary of appropriate base classifiers, but also on the diversity being
tournament selection and then generate a new inherent in the base classifiers [52,53]. Statistical diversity mea-
ensemble position z0k in the vicinity of it sures can be divided into two categories [54]: pairwise and non-
17: Apply the binary interpretation operator to z0k to form pairwise. Pairwise diversity measures are designed to measure
the diversity between all possible pairings of base classifiers and
the ensemble solution y0k
non-pairwise ones are designed to measure the diversity among
18: Calculate the fitness value f ðy0k Þ
all base classifiers in a classifier ensemble. Although there is a
19: if f ðy0k Þ 4 f ðyk Þ then substantial number of diversity measures in the literature, but
20: zk ’z0k few of them have been applied for one-class classifier pruning
21: yk ’y0k [55]. To compute the diversity of ensemble solutions, we use a
22: end if non-pairwise diversity measure based on the Kappa inter-rater
23: end for agreement [54,56]. This measure assumes that the output of each
24: // Scout bee phase base classifier represents a correct/incorrect decision. This type of
25: Determine any infertile ensemble position in P and the classifier output is also known as the oracle output, because it
replace it with a new ensemble position assumes that we know the correct labels of input feature vectors.
26: Memorize the best ensemble solution yn Note that in the case of one-class classification problems, we have
27: until a termination criterion is met only the oracle outputs for target feature vectors. Given an
ensemble solution yi and a dataset XT of target feature vectors,
let Ωj ðxk Þ be an indicator function that takes the value 1 if the
base classifier C j A S correctly classifies a feature vector xk A X T as
3.1.1. Fitness function belonging to the target class ωT and 0 otherwise:
The fitness function evaluates the quality of the ensemble 1; C j ðxk Þ ¼ ωT ;
solutions. Ideally, an ensemble solution should consist of base Ωj ðxk Þ ¼ ð31Þ
0 otherwise:
classifiers that are both accurate and diverse. More precisely, each
base classifier in the ensemble should have high individual classifica- We compute the non-pairwise diversity of yi as
tion accuracy as well as high diversity when compared to other base PN
M l ðx Þð1 li ðxk ÞÞ
classifiers. Although classification accuracy is a main concern, it υðyi Þ ¼ i k ¼ 1 i k ; ð32Þ
NðM i 1ÞRð1 RÞ
cannot be used for one-class classifier ensembles. As an alternative,
we modified the unsupervised consistency measure in [51]. where N ¼ jX T j is the number of target feature vectors in XT, li ðxk Þ
Given an ensemble solution yi, the fitness of yi is defined as is the proportion of base classifiers in yi that correctly classify xk,
and R is the average true positive rate of individual base classifiers
f ðyi Þ ¼ ω νðyi Þ þ ð1 ωÞ υðyi Þ; ð27Þ in yi:
If there is no variation in the values of li ðxk Þ for all target feature performance of the ensemble. Recall that orness is a measure of
vectors xk A X T , then there is more diversity among base classi- the degree of optimism. The larger the orness degree is, the more
fiers in yi. In this case, υðyi Þ may be seen to assume its maximum optimistic the aggregator is.
value of M i =ðM i 1Þ. On the other hand, if the value of li ðxk Þ is 0 or Using EIOWA, the decision rule for xk can be written as
1 for all target feature vectors xk A X T , then there is no diversity (
ωT ; Oε;ρ ðβ1 ðxk Þ; …; βL ðxk ÞÞ Z Oε;ρ ðξ1 ; …; ξL Þ;
among base classifiers in yi and υðyi Þ may be seen to assume its yO ðxk Þ ¼ ð36Þ
minimum value of 0. ωO otherwise;
where ξj is the decision threshold for Cj.
To define the EIOWA weights, we use the following exponential
3.1.2. Time complexity function:
Here, we provide a complexity analysis for BeePruner. For any (
ð1 zÞe λz ; 0 r z r 1;
dataset XT of N target feature vectors and an initial ensemble S of gðzÞ ¼ ð37Þ
base one-class classifiers, suppose we are given the consistency of 0 otherwise;
each base classifier in S and the outputs of this base classifier for all where λ Z 0 is a constant value. In this case, each weight wj is
target feature vectors in XT. The aim of BeePruner is to obtain a sub- computed as
ensemble C D S that achieves the maximum possible fitness function.
BeePruner first sends the scout bees to randomly initialize a gðj=LÞ ðL jÞe λj=L
w j ¼ PL ¼ PL ; ð38Þ
population P of SN ensemble positions of length D, where D is the λk=L
k ¼ 1 gðk=LÞ k ¼ 1 ðL kÞe
total number of base classifiers in S. It then applies a binary
where L is the total number of weights.
interpretation operator to each ensemble position of P to form its
For different values of λ, we get the EIOWA weights with
corresponding ensemble solution. Therefore, the time complexity
different levels of orness. Furthermore, it can be proved that the
of population initialization and binary interpretation is essentially
degree of orness is always in the range (0.5,1]. For example, for
OðSN DÞ. Next, BeePruner calculates the fitness of the ensemble
solutions based on two different measures: exponential consis-
λ ¼ 0, the degree of orness is given by
tency and non-pairwise diversity. Given an ensemble solution, 1 L 2L2 3L þ 1
these measures use the average consistency and the average true ∨ðw1 ; …; wL Þ ¼ ∑ ðL jÞwj ¼ ; ð39Þ
L 1j¼1 3ðL 1Þ2
positive rate of all base classifiers in the sub-ensemble indicated
by that ensemble solution, respectively. Hence, their respective which converges to 2/3 as the number of weights increases
time complexities for each ensemble solution are bounded by O(D) infinitely, that is, limL- þ 1 3 ðw1 ; …; wL Þ ¼ 2=3.
and OðN DÞ, resulting in a time complexity of OðN DÞ for the
Lemma 1. The orness degree of EIOWA is always in the range (0.5,1].
fitness function. After that, BeePruner repeatedly sends the
employed and onlooker bees to explore and exploit ensemble Proof. It has been shown that for any collection of OWA weights
solutions for finding the best one. Therefore, the worst-case time having the property that wj Z wk for j o k, the degree of orness is
complexity of these steps is OðCN SN N DÞ, where CN is the always in the range (0.5,1] [57]. Therefore, we only need to show
maximum cycle number. Since we usually set SN to a small value, that the EIOWA weights have the property that wj Z wj þ 1 for all
the overall time complexity of BeePruner becomes OðCN N DÞ. j ¼ 1; …; L 1. Since for λ Z 0 and for all j ¼ 1; …; L 1, we have
λj λðj þ 1Þ
3.2. One-class classifier fusion 4 : ð40Þ
L L
100
Over the last few years, one-class classifier ensembles have
been used in various domains, such as information security
95
[58,59], signature verification [60,61], image retrieval [62], and
so on. As previously mentioned, the two challenging steps in
90
constructing classifier ensembles are classifier pruning and classi-
fier fusion. These steps have proven to be promising research
85
directions for one-class classifier ensembles. Hence, in the follow-
ing, we give a brief overview of the state-of-the-art techniques for
80
one-class classifier pruning and one-class classifier fusion.
Average AUC
75
4.1. One-class classifier pruning
70
Given a number of base classifiers, most conventional approaches
combine all of them to construct a classifier ensemble. However, 65
many researchers suggested that using a sub-ensemble of base
classifiers may be better than using the ensemble as a whole [9,63]. 60
Cheplygina and Tax [63] proposed to apply pruning to random
sub-spaces of one-class classifiers using the supervised AUC 55
measure [64] or an unsupervised consistency measure [51]. This
method, also known as PRSM, is similar to RSM, already described 50
C.H. disease Cancer non-ret Imports Vehicle
in Section 2.1.1; but with this difference that it prunes inaccurate
base classifiers using the AUC or consistency measures. However, Fig. 1. Effect of the parameter SN on the average AUC of BeeOWA.
Table 1
Summary of the one-class datasets used in the experiments.
Dataset Total objects Target objects Features Dataset Total objects Target objects Features
100
95
90
85
80
Average AUC
75
70
65
60
55
50
0 5 10 15 20 25 30 35 40 45 50 60 70 80 90 100
C.H. disease 75.9 74.4 76.3 75.4 69.1 69.3 73.1 70.2 65.6 66.2 66.8 65.3 65.9 65.1 66.2 64.7
Cancer non-ret 57.4 59.2 55.7 56.9 55.3 56.8 56.6 56.4 55.7 56.0 56.0 56.8 55.8 55.2 55.6 52.8
Imports 83.9 87.1 86.1 84.2 86.2 85.3 84.7 83.4 83.3 81.7 82.2 82.3 82.1 81.8 81.4 79.9
Vehicle 91.9 90.0 89.9 89.7 90.1 89.4 89.2 87.6 88.7 87.1 88.2 88.2 88.2 86.4 86.2 83.6
λ
Fig. 2. Effect of the parameter λ on the average AUC of BeeOWA.
Table 2
Comparison of BeePruner with other pruning algorithms in terms of AUC.
Overall average AUC 86.5 (1.13) 85.1 (2.13) 82.4 (3.33) 83.9 (3.40)
Table 3
Results of the Friedman/Nemenyi test for ranking the pruning algorithms in
Table 2.
SA SGA
Tests Average ranks
BeePruner TS
34.41 2.83 Yes 1.21 1.13 2.13 3.33 3.40 Fig. 3. Results of the Nemenyi test for the pairwise comparison of the pruning
algorithms in Table 2.
376 E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381
ensembles is majority voting [58,62]. Perdisci et al. [58] used an made based on the majority voting of all the classifiers. Liu et al.
ensemble of one-class SVM classifiers to build a payload-based [65] presented FENOC, a framework to construct one-class classi-
anomaly detection system. During the detection phase, a payload fier ensembles for malware detection. The core algorithm of
was classified as target (or normal) if it was labeled as target by FENOC is CosTOC, a one-class learning algorithm, which uses a
the majority of the classifiers, otherwise it was classified as outlier pair of one-class classifiers to describe malicious and benign
(or anomalous). Wu and Chung [62] proposed to construct an classes, respectively.
ensemble of one-class SVM classifiers for content-based image In addition to majority voting, other fusion rules, such as mean,
retrieval. They segmented training images into different groups on max, and product, are used in the literature [45]. These rules are
which a number of one-class SVM classifiers were separately applied to the thresholded probabilistic outputs or directly to the
trained. The final decision of the ensemble for given images was probabilistic outputs generated by base one-class classifiers.
Nanni [60] designed a one-class classifier ensemble using the
random subspace method [8] for on-line signature verification, in
100
which the max rule was used for combining the outputs of base
classifiers. The same fusion rule was also used for fingerprint
90
matching [66]. Giacinto et al. [59] proposed an unsupervised
network anomaly detection system based on a modular ensemble
80
of base one-class classifiers. Each module was designed to model
a particular group of similar protocols or network services. For all
70
modules, the performance was reported after combining base
Average Ensemble Size
Table 4
Comparison of EIOWA with other fusion rules in terms of AUC.
Arrhythmia 79.7 (1) 79.0 (3) 77.8 (4) 79.5 (2) 72.3 (5)
Cancer non-ret 59.2 (2) 56.4 (5) 61.6 (1) 58.7 (3) 57.1 (4)
Cancer ret 59.4 (1) 58.6 (3) 54.2 (5) 57.8 (4) 59.0 (2)
Concordia 2 90.6 (1) 88.9 (3) 87.5 (4) 90.2 (2) 76.3 (5)
Concordia 3 93.8 (1) 93.2 (3) 91.8 (4) 93.7 (2) 80.9 (5)
Glass 98.4 (1) 82.5 (4) 70.9 (5) 85.0 (3) 96.7 (2)
Imports 87.1 (1) 80.4 (4) 84.2 (2) 82.6 (3) 80.0 (5)
Ionosphere 97.0 (1) 96.7 (2) 95.4 (5) 96.6 (3) 96.0 (4)
Pump 2 2 noisy 90.1 (1) 87.5 (3) 78.9 (5) 88.2 (2) 83.7 (4)
Pump 1 3 noisy 94.1 (1) 90.0 (3) 84.4 (4) 91.1 (2) 70.1 (5)
Sonar mines 84.5 (1) 75.9 (5) 83.4 (2) 80.2 (3) 76.6 (4)
Sonar rocks 69.1 (1) 61.6 (5) 66.7 (3) 66.3 (4) 68.2 (2)
Spectf normal 96.3 (1) 95.0 (4) 95.2 (3) 95.6 (2) 94.6 (5)
Vowel 4 99.2 (2.5) 98.7 (4) 94.0 (5) 99.2 (2.5) 99.4 (1)
Wine 2 98.4 (1) 93.0 (4) 87.2 (5) 94.3 (2) 93.7 (3)
Overall average AUC 86.5 (1.17) 82.5 (3.67) 80.9 (3.80) 83.9 (2.63) 80.3 (3.73)
Table 5
Results of the Friedman/Nemenyi test for ranking the fusion rules in Table 4.
F-statistic Critical value Significant differences Critical difference EIOWA Majority voting Max Mean Product
Majority voting stacking technique, and as so, it uses a single meta-classifier for
Product
Mean Max classifier fusion. However, contrary to the stacking technique
EIOWA where a meta-classifier is directly trained on the outputs of base
classifiers in an ensemble, TUPSO trains its meta-classifier on a
series of aggregations obtained from the outputs of the base
classifiers. More clearly, it evaluates the base classifiers us-
Fig. 5. Results of the Nemenyi test for the pairwise comparison of the fusion rules
in Table 4. ing some performance measures and translates the resulting
Table 6
Comparison of the average AUC of BeeOWA with that of RSM and PRSM [63].
Arrhythmia 79.7 (1) 78.0 (2) 75.5 (4) 75.2 (7) 77.8 (3) 75.3 (5.5) 75.3 (5.5)
Cancer non-ret 59.2 (1) 53.6 (2) 53.2 (4) 51.7 (7) 53.4 (3) 53.1 (5) 51.9 (6)
Cancer ret 59.4 (3) 60.9 (1) 57.6 (6.5) 58.6 (4) 59.6 (2) 58.0 (5) 57.6 (6.5)
Concordia 2 90.6 (1) 87.0 (2) 80.3 (5) 69.8 (6) 86.3 (3) 80.4 (4) 69.5 (7)
Concordia 3 93.8 (1.5) 93.8 (1.5) 87.2 (4) 81.9 (6) 93.2 (3) 86.6 (5) 81.5 (7)
Glass 98.4 (1) 74.7 (2) 73.7 (4) 73.9 (3) 73.3 (5) 72.5 (6.5) 72.5 (6.5)
Imports 87.1 (1) 68.5 (7) 86.5 (2) 72.3 (5) 71.0 (6) 86.3 (3) 73.1 (4)
Ionosphere 97.0 (3.5) 96.9 (5) 96.1 (6.5) 97.3 (1.5) 97.0 (3.5) 96.1 (6.5) 97.3 (1.5)
Pump 2 2 noisy 90.1 (1) 83.1 (3) 76.0 (5) 72.6 (6) 83.8 (2) 76.3 (4) 72.3 (7)
Pump 1 3 noisy 94.1 (1) 84.0 (3) 77.2 (4) 69.5 (6) 86.2 (2) 76.9 (5) 69.4 (7)
Sonar mines 84.5 (1) 64.1 (4) 68.8 (2) 63.6 (6) 63.5 (7) 68.2 (3) 63.8 (5)
Sonar rocks 69.1 (3) 64.7 (7) 72.3 (2) 67.4 (4.5) 65.0 (6) 72.4 (1) 67.4 (4.5)
Spectf normal 96.3 (1) 88.3 (5) 95.6 (3) 87.2 (7) 88.4 (4) 95.8 (2) 87.4 (6)
Vowel 4 99.2 (1.5) 95.6 (6) 99.2 (1.5) 98.5 (4) 94.5 (7) 99.1 (3) 98.2 (5)
Wine 2 98.4 (1) 92.9 (2) 91.7 (4.5) 92.5 (3) 90.4 (7) 90.5 (6) 91.7 (4.5)
Overall average AUC 86.5 (1.50) 79.1 (3.50) 79.4 (3.87) 75.5 (5.07) 78.9 (4.23) 79.2 (4.30) 75.3 (5.53)
Table 7
Results of the Friedman/Nemenyi test for ranking the ensemble approaches in Table 6.
F-statistic Critical value Significant differences Critical difference BeeOWA RSM-Gauss RSM-NN RSM-KM PRSM-Gauss PRSM-NN PRSM-KM
8.11 2.21 Yes 2.33 1.50 3.50 3.87 5.07 4.23 4.30 5.53
Table 8
Comparison of the average AUC of BeeOWA with that of TUPSO, ESBE, and RC [68].
Overall average AUC 86.7 (1.28) 80.0 (1.88) 72.2 (3.16) 67.8 (3.68)
378 E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381
estimates into static weights, which are then used to train the exponential consistency measure in Section 3.1.1 to compute the
meta-classifier. order-inducing values of the base one-class classifiers.
As mentioned above, most of rules used for one-class classifier BeePruner employs four control parameters: the number of
fusion are the simple fixed rules, such as majority voting, mean, food sources (SN) which is equal to the number of employed or
max, min, and product. All of these rules suffer from an intrinsic onlooker bees, the value of limit (LM), the maximum cycle number
limitation that they do not take into account the properties of base (CN), and the consistency coefficient (α). For all experiments, we
classifiers in the ensemble. set SN to 25, LM to 100, CN to 1000, and α to 10. However, it has
been shown that the ABC algorithm does not need a fine tuning for
its control parameters in order to obtain satisfactory good results
5. Experiments [75]. Moreover, for EIOWA, we set the control parameter λ to 5.
TUPSO ESBE
Tests Average ranks
BeeOWA RC
68.36 2.73 Yes 0.94 1.28 1.88 3.16 3.68 Fig. 7. Results of the Nemenyi test for the pairwise comparison of the ensemble
approaches in Table 8.
RSM-NN
PRSM-Gauss
PRSM-NN
RSM-KM
RSM-Gauss
PRSM-KM
BeeOWA
Fig. 6. Results of the Nemenyi test for the pairwise comparison of the ensemble approaches in Table 6.
E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381 379
18.9
6. Conclusions
0 5 10 15 20 25 30 35 40 45 50
Improvement It has been shown that using the best classifier and discarding the
classifiers with poorer performance might waste valuable informa-
Fig. 8. Absolute improvements in the overall average AUC when comparing tion [45]. For this reason, we commonly use classifier ensembles to
BeeOWA to the other approaches.
improve the classification accuracy. In general, the process of con-
structing a classifier ensemble consists of three main steps: classifier
performance by 1.6%, 5.0%, and 3.1%, respectively. The statistical generation, classifier pruning, and classifier fusion. Although classi-
significance of the ranking differences was determined by the fier pruning and classifier fusion have been sufficiently studied in
Friedman test, followed by the Nemenyi post hoc test for pairwise multi-class classifier ensembles, so far little work has been reported
comparisons. The results are summarized in Table 3. The F-statistic in the literature to address them in one-class classifier ensembles.
indicates whether or not the ranking differences are statistically In this paper, we presented BeeOWA, a novel approach to
significant. More specifically, significant differences are found if F- construct highly accurate one-class classifier ensembles. It uses a
statistic is greater than a critical value, 2.83 in this case. In such a novel binary artificial bee colony algorithm, called BeePruner, for
situation, the Nemenyi test is used to compare any two pruning one-class classifier pruning and a novel exponential induced OWA
algorithms. A significant difference between two pruning algo- operator, called EIOWA, for one-class classifier fusion. We evalu-
rithms occurs when the difference in rankings is greater than a ated the performance of BeeOWA using a wide range of bench-
critical difference, 1.21 in this case. Fig. 3 illustrates the results of mark datasets and compared it with that of several state-of-the-
the pairwise comparisons performed using the Nemenyi test. The art approaches in the literature, such as RSM, PRSM [63], and
horizontal axis indicates the average ranks of the pruning algo- TUPSO [68]. The results of experiments showed that BeeOWA can
rithms. The colored lines on the top of the axis connect the significantly improve the average AUC as compared to RSM, PRSM,
pruning algorithms that are not significantly different. Fig. 4 shows and TUPSO.
the average number of base one-class classifiers in the sub-
ensembles found by the pruning algorithms. SA finds larger sub-
ensembles than BeePruner, incurring more computational cost in References
subsequent steps. In contrast, SGA is able to find smaller sub-
ensembles; however, its average AUC is significantly less than that [1] J.J. Rodriguez, L.I. Kuncheva, C.J. Alonso, Rotation forest: a new classifier
of BeePruner. Moreover, on the basis of the obtained results, we ensemble method, IEEE Trans. Pattern Anal. Mach. Intell. 28 (10) (2006)
1619–1630. http://dx.doi.org/10.1109/TPAMI.2006.211.
conclude that BeePruner can help to reach a better trade-off [2] C. Zhang, Q. Cai, Y. Song, Boosting with pairwise constraints, Neurocomputing
between the ensemble complexity and classification performance. 73 (4–6) (2010) 908–919. http://dx.doi.org/10.1016/j.neucom.2009.09.013.
Table 4 shows the average AUC and the ranks for all the fusion [3] L. Li, B. Zou, Q. Hu, X. Wu, D. Yu, Dynamic classifier ensemble using
classification confidence, Neurocomputing 99 (2013) 581–591. http://dx.doi.
rules. Obviously, EIOWA has the best average rank among all other org/10.1016/j.neucom.2012.07.026.
fusion rules. The next best fusion rule, by a large margin, is mean. [4] Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms, CRC Press, Boca
Moreover, in comparison with majority voting, max, mean, and Raton, FL, USA, 2012.
[5] C. Lin, W. Chen, C. Qiu, Y. Wu, S. Krishnan, Q. Zou, LibD3C: ensemble classifiers
product, the average AUC of EIOWA improves the performance by with a clustering and dynamic selection strategy, Neurocomputing 123 (2014)
4.8%, 6.9%, 3.1%, and 7.7%, respectively. Table 5 and Fig. 5 summar- 424–435. http://dx.doi.org/10.1016/j.neucom.2013.08.004.
ize the results of the Friedman/Nemenyi test for ranking the fusion [6] L. Breiman, Bagging predictors, Mach. Learn. 24 (2) (1996) 123–140. http://dx.
doi.org/10.1023/A:1018054314350.
rules. Based on the obtained results, we find that EIOWA has the [7] R.E. Schapire, The boosting approach to machine learning: an overview, in: D.
highest average rank, 1.17, which is significantly greater than that D. Denison, M.H. Hansen, C.C. Holmes, B. Mallick, B. Yu (Eds.), Nonlinear
of majority voting, max, and product. Estimation and Classification, Lecture Notes in Statistics, vol. 171, Springer,
New York, NY, USA, 2003, pp. 149–171. http://dx.doi.org/10.1007/978-0-387-
Accordingly, we conclude that BeePruner and EIOWA can be
21579-2_9.
considered as good candidates for one-class classifier pruning and [8] T.K. Ho, The random subspace method for constructing decision forests, IEEE
one-class classifier fusion, respectively. Trans. Pattern Anal. Mach. Intell. 20 (8) (1998) 832–844. http://dx.doi.org/
Subsequently, we conducted experiments to compare the 10.1109/34.709601.
[9] Z.-H. Zhou, J. Wu, W. Tang, Ensembling neural networks: many could be better
classification performance of BeeOWA with that of RSM and PRSM than all, Artif. Intell. 137 (1–2) (2002) 239–263. http://dx.doi.org/10.1016/
as reported in [63] and that of TUPSO, ESBE, and RC as reported in S0004-3702(02)00190-X.
380 E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381
[10] R. Lysiak, M. Kurzynski, T. Woloszynski, Optimal selection of ensemble classifiers [35] J. Kittler, M. Hatef, R.P.W. Duin, J. Matas, On combining classifiers, IEEE Trans.
using measures of competence and diversity of base classifiers, Neurocomputing Pattern Anal. Mach. Intell. 20 (3) (1998) 226–239. http://dx.doi.org/10.1109/
126 (2014) 29–35. http://dx.doi.org/10.1016/j.neucom.2013.01.052. 34.667881.
[11] B. Krawczyk, M. Woźniak, Optimization algorithms for one-class classi- [36] R.P.W. Duin, The combining classifier: to train or not to train?, in: Proceedings
fication ensemble pruning, in: N.T. Nguyen, B. Attachoo, B. Trawiński, of the 16th International Conference on Pattern Recognition, IEEE, Washing-
K. Somboonviwat (Eds.), Intelligent Information and Database Systems, ton, DC, USA, 2002, pp. 765–770. http://dx.doi.org/10.1109/ICPR.2002.1048415.
Lecture Notes in Computer Science, vol. 8398, Springer International Publish- [37] M. Grabisch, J.-M. Nicolas, Classification by fuzzy integral: performance and
ing, Switzerland, 2014, pp. 127–136. http://dx.doi.org/10.1007/978-3-319- tests, Fuzzy Sets Syst. 65 (2–3) (1994) 255–271. http://dx.doi.org/10.1016/
05458-2_14. 0165-0114(94)90023-X.
[12] D. Karaboga, B. Basturk, A powerful and efficient algorithm for numerical [38] S.-B. Cho, J.H. Kim, Combining multiple neural networks by fuzzy integral for
function optimization: artificial bee colony (ABC) algorithm, J. Global Optim. robust classification, IEEE Trans. Syst. Man Cybern. 25 (2) (1995) 380–384.
39 (3) (2007) 459–471. http://dx.doi.org/10.1007/s10898-007-9149-x. http://dx.doi.org/10.1109/21.364825.
[13] J. Kennedy, Particle swarm optimization, in: C. Sammut, G.I. Webb (Eds.), [39] M. Reformat, R.R. Yager, Building ensemble classifiers using belief functions
Encyclopedia of Machine Learning, Springer, New York, NY, USA, 2010, and OWA operators, Soft Comput. 12 (6) (2008) 543–558. http://dx.doi.org/
pp. 760–766. http://dx.doi.org/10.1007/978-0-387-30164-8_630. 10.1007/s00500-007-0227-2.
[14] R. Storn, K. Price, Differential evolution: a simple and efficient heuristic for [40] D.M.J. Tax, R.P.W. Duin, Uniform object generation for optimizing one-class
global optimization over continuous spaces, J. Global Optim. 11 (4) (1997) classifiers, J. Mach. Learn. Res. 2 (2002) 155–173.
341–359. http://dx.doi.org/10.1023/A:1008202821328. [41] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons,
[15] F. Barani, M. Abadi, An ABC-AIS hybrid approach to dynamic anomaly Hoboken, NJ, USA, 2012.
detection in AODV-based MANETs, in: Proceedings of the 10th IEEE Interna- [42] S.-W. Lee, J. Park, S.-W. Lee, Low resolution face recognition based on support
tional Conference on Trust, Security and Privacy in Computing and Commu- vector data description, Pattern Recognit. 39 (9) (2006) 1809–1812. http://dx.
nications, IEEE, Washington, DC, USA, 2011, pp. 714–720. http://dx.doi.org/10. doi.org/10.1016/j.patcog.2006.04.033.
1109/TrustCom.2011.92. [43] A. Banerjee, P. Burlina, C. Diehl, A support vector method for anomaly
[16] P. Shunmugapriya, S. Kanmani, Optimization of stacking ensemble configura- detection in hyperspectral imagery, IEEE Trans. Geosci. Remote Sens. 44 (8)
tions through artificial bee colony algorithm, Swarm Evol. Comput. 12 (2013) (2006) 2282–2291. http://dx.doi.org/10.1109/TGRS.2006.873019.
24–32. http://dx.doi.org/10.1016/j.swevo.2013.04.004. [44] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice Hall, Upper
[17] C. Brodley, T. Lane, Creating and exploiting coverage and diversity, in: Saddle River, NJ, USA, 1988.
Proceedings of the AAAI-96 Workshop on Integrating Multiple Learned [45] D.M.J. Tax, R.P.W. Duin, Combining one-class classifiers, in: J. Kittler, F. Roli (Eds.),
Models, Citeseer, 1996, pp. 8–14. Multiple Classifier Systems, Lecture Notes in Computer Science, vol. 2096, Springer,
[18] A. Tsymbal, M. Pechenizkiy, P. Cunningham, Diversity in search strategies for Berlin, Heidelberg, Germany, 2001, pp. 299–308. http://dx.doi.org/10.1007/
ensemble feature selection, Inf. Fusion 6 (1) (2005) 83–98. http://dx.doi.org/ 3-540-48219-9_30.
10.1016/j.inffus.2004.04.003. [46] R.R. Yager, A note on weighted queries in information retrieval systems, J. Am.
[19] L. Xu, A. Krzyzak, C.Y. Suen, Methods of combining multiple classifiers and Soc. Inf. Sci. 38 (1) (1987) 23–24.
their applications to handwriting recognition, IEEE Trans. Syst. Man Cybern. 22 [47] R.R. Yager, D.P. Filev, Fuzzy logic controllers with flexible structures, in:
(3) (1992) 418–435. http://dx.doi.org/10.1109/21.155943. Proceedings of the 2nd International Conference on Fuzzy Sets and Neural
[20] C.Y. Suen, L. Lam, Multiple classifier combination methodologies for different Networks, 1992, pp. 317–320.
output levels, in: Multiple Classifier Systems, Lecture Notes in Computer [48] R. Fuller, On obtaining OWA operator weights: a short survey of recent
Science, vol. 1857, Springer, Berlin, Heidelberg, Germany, 2000, pp. 52–66.
developments, in: Proceedings of the 5th IEEE International Conference on
http://dx.doi.org/10.1007/3-540-45014-9_5.
Computational Cybernetics, IEEE, Washington, DC, USA, 2007, pp. 241–244.
[21] R.P.W. Duin, D.M.J. Tax, Experiments with classifier combining rules, in:
http://dx.doi.org/10.1109/ICCCYB.2007.4402042.
Multiple Classifier Systems, Lecture Notes in Computer Science, vol. 1857,
[49] R.R. Yager, D.P. Filev, Induced ordered weighted averaging operators, IEEE
Springer, Berlin, Heidelberg, Germany, 2000, pp. 16–29. http://dx.doi.org/10.
Trans. Syst. Man Cybern. 29 (2) (1999) 141–150. http://dx.doi.org/10.1109/
1007/3-540-45014-9_2.
3477.752789.
[22] R.R. Yager, On ordered weighted averaging aggregation operators in multi-
[50] M. Talebi, M. Abadi, BeeMiner: a novel artificial bee colony algorithm for
criteria decision making, IEEE Trans. Syst. Man Cybern. 18 (1) (1988) 183–190.
classification rule discovery, in: Proceedings of the 2014 Iranian Conference on
http://dx.doi.org/10.1109/21.87068.
Intelligent Systems, IEEE, Washington, DC, USA, 2014, pp. 1–5. http://dx.doi.
[23] L.I. Kuncheva, Fuzzy Classifier Design, Studies in Fuzziness and Soft Comput-
org/10.1109/IranianCIS.2014.6802576.
ing, vol. 49, Physica-Verlag, Heidelberg, Germany, 2000 http://dx.doi.org/10.
[51] D.M.J. Tax, K. Muller, A consistency-based model selection for one-class
1007/978-3-7908-1850-5.
classification, in: Proceedings of the 17th International Conference on Pattern
[24] B. Schölkopf, A.J. Smola, Learning with Kernels: Support Vector Machines,
Recognition, IEEE, Washington, DC, USA, 2004, pp. 363–366. http://dx.doi.org/
Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, USA,
10.1109/ICPR.2004.1334542.
2001.
[52] A.J.C. Sharkey, N.E. Sharkey, Combining diverse neural nets, Knowl. Eng. Rev.
[25] D.M.J. Tax, R.P.W. Duin, Support vector data description, Mach. Learn. 54 (1)
(2004) 45–66. http://dx.doi.org/10.1023/B:MACH.0000008084.60811.49. 12 (3) (1997) 231–247.
[26] D.M.J. Tax, One-class classification (Ph.D. thesis), Delft University of Technol- [53] D. Ruta, B. Gabrys, Classifier selection for majority voting, Inf. Fusion 6 (1)
ogy, 2001. (2005) 63–81. http://dx.doi.org/10.1016/j.inffus.2004.04.008.
[27] W. Khreich, E. Granger, A. Miri, R. Sabourin, Iterative boolean combination of [54] L.I. Kuncheva, C.J. Whitaker, Measures of diversity in classifier ensembles and
classifiers in the ROC space: an application to anomaly detection with HMMs, their relationship with the ensemble accuracy, Mach. Learn. 51 (2) (2003)
Pattern Recognit. 43 (8) (2010) 2732–2752. http://dx.doi.org/10.1016/j. 181–207. http://dx.doi.org/10.1023/A:1022859003006.
patcog.2010.03.006. [55] B. Krawczyk, M. Woźniak, Diversity measures for one-class classifier ensem-
[28] M. Ghannad-Rezaie, H. Soltanian-Zadeh, H. Ying, M. Dong, Selection-fusion bles, Neurocomputing 126 (2014) 36–44. http://dx.doi.org/10.1016/j.
approach for classification of datasets with missing values, Pattern Recognit. neucom.2013.01.053.
43 (6) (2010) 2340–2350. http://dx.doi.org/10.1016/j.patcog.2009.12.003. [56] J.L. Fleiss, B. Levin, M.C. Paik, Statistical Methods for Rates and Proportions,
[29] T. Woloszynski, M. Kurzynski, A probabilistic model of classifier competence John Wiley & Sons, Hoboken, NJ, USA (2003) http://dx.doi.org/10.1002/
for dynamic ensemble selection, Pattern Recognit. 44 (10–11) (2011) 0471445428.
2656–2668. http://dx.doi.org/10.1016/j.patcog.2011.03.020. [57] D. Filev, R.R. Yager, Analytic properties of maximum entropy OWA operators,
[30] R.E. Schapire, Y. Freund, Boosting: Foundations and Algorithms, MIT Press, Inf. Sci. 85 (1–3) (1995) 11–27. http://dx.doi.org/10.1016/0020-0255(94)
Cambridge, MA, USA, 2012. 00109-O.
[31] Q.L. Zhao, Y.H. Jiang, M. Xu, Incremental learning by heterogeneous bagging [58] R. Perdisci, G. Gu, W. Lee, Using an ensemble of one-class SVM classifiers to
ensemble, in: L. Cao, J. Zhong, Y. Feng (Eds.), Advanced Data Mining and harden payload-based anomaly detection systems, in: Proceedings of the 6th
Applications, Lecture Notes in Computer Science, vol. 6441, Springer, Berlin, IEEE International Conference on Data Mining, IEEE, Washington, DC, USA,
Heidelberg, Germany, 2010, pp. 1–12. http://dx.doi.org/10.1007/978-3- 2006, pp. 488–498. http://dx.doi.org/10.1109/ICDM.2006.165.
642-17313-4_1. [59] G. Giacinto, R. Perdisci, M. Del Rio, F. Roli, Intrusion detection in computer
[32] A.L.V. Coelho, D.S.C. Nascimento, On the evolutionary design of heterogeneous networks by a modular ensemble of one-class classifiers, Inf. Fusion 9 (1)
bagging models, Neurocomputing 73 (16–18) (2010) 3319–3322. http://dx.doi. (2008) 69–82. http://dx.doi.org/10.1016/j.inffus.2006.10.002.
org/10.1016/j.neucom.2010.07.008. [60] L. Nanni, Experimental comparison of one-class classifiers for online signature
[33] S. Sheen, S.V. Aishwarya, R. Anitha, S.V. Raghavan, S.M. Bhaskar, Ensemble verification, Neurocomputing 69 (7–9) (2006) 869–873. http://dx.doi.org/
pruning using harmony search, in: E. Corchado, V. Snášel, A. Abraham, 10.1016/j.neucom.2005.06.007.
M. Woźniak, M. Graña, S.-B. Cho (Eds.), Hybrid Artificial Intelligent Systems, [61] M.R.P. Souza, G.D.C. Cavalcanti, T.I. Ren, Off-line signature verification: an
Lecture Notes in Computer Science, vol. 7209, Springer, Berlin, Heidelberg, approach based on combining distances and one-class classifiers, in: Proceed-
Germany, 2012, pp. 13–24. http://dx.doi.org/10.1007/978-3-642-28931-6_2. ings of the 22nd IEEE International Conference on Tools with Artificial
[34] L. Abdi, S. Hashemi, GAB-EPA: a GA based ensemble pruning approach to Intelligence, IEEE, Washington, DC, USA, 2010, pp. 7–11. http://dx.doi.org/10.
tackle multiclass imbalanced problems, in: A. Selamat, N.T. Nguyen, H. Haron 1109/ICTAI.2010.10.
(Eds.), Intelligent Information and Database Systems, Lecture Notes in Com- [62] R.-S. Wu, W.-H. Chung, Ensemble one-class support vector machines for
puter Science, vol. 7802, Springer, Berlin, Heidelberg, Germany, 2013, content-based image retrieval, Expert Syst. Appl. 36 (3) (2009) 4451–4459.
pp. 246–254. http://dx.doi.org/10.1007/978-3-642-36546-1_26. http://dx.doi.org/10.1016/j.eswa.2008.05.037.
E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381 381
[63] V. Cheplygina, D.M.J. Tax, Pruned random subspace method for one-class Elham Parhizkar received the M.Sc. degree in compu-
classifiers, in: C. Sansone, J. Kittler, F. Roli (Eds.), Multiple Classifier Systems, ter engineering, with first class honors, from Tarbiat
Lecture Notes in Computer Science, vol. 6713, Springer, Berlin, Heidelberg, Modares University in 2014, where she worked on
Germany, 2011, pp. 96–105. http://dx.doi.org/10.1007/978-3-642-21557-5_12. anomaly detection in web traffic as her master thesis.
[64] A.P. Bradley, The use of the area under the ROC curve in the evaluation of Her main research interests are in the field of machine
machine learning algorithms, Pattern Recognit. 30 (7) (1997) 1145–1159. http: learning, particularly in the areas of one-class classifi-
//dx.doi.org/10.1016/S0031-3203(96)00142-2. cation and outlier detection.
[65] J. Liu, J. Song, Q. Miao, Y. Cao, FENOC: an ensemble one-class learning
framework for malware detection, in: Proceedings of the 9th International
Conference on Computational Intelligence and Security, IEEE, Washington, DC,
USA, 2013, pp. 523–527. http://dx.doi.org/10.1109/CIS.2013.116.
[66] L. Nanni, A. Lumini, Random bands: a novel ensemble for fingerprint
matching, Neurocomputing 69 (13–15) (2006) 1702–1705. http://dx.doi.org/
10.1016/j.neucom.2006.01.011.
[67] B. Cyganek, Image segmentation with a hybrid ensemble of one-class support Mahdi Abadi received the B.Sc. degree in computer
vector machines, in: M. Graña Romay, E. Corchado, M.T. Garcia Sebastian engineering from Ferdowsi University of Mashhad in
(Eds.), Hybrid Artificial Intelligence Systems, Lecture Notes in Computer 1998. He also received the M.Sc. and Ph.D. degrees from
Science, vol. 6076, Springer, Berlin, Heidelberg, Germany, 2010, Tarbiat Modares University in 2001 and 2008, respec-
pp. 254–261. http://dx.doi.org/10.1007/978-3-642-13769-3_31. tively. Since 2009, he has been an assistant professor in
[68] E. Menahem, L. Rokach, Y. Elovici, Combining one-class classifiers via meta the Faculty of Electrical and Computer Engineering at
learning, in: Proceedings of the 22nd ACM International Conference on Tarbiat Modares University. His main research interests
Conference on Information & Knowledge Management, ACM, New York, NY, are network security, evolutionary computation, and
USA, 2013, pp. 2435–2440. http://dx.doi.org/10.1145/2505515.2505619. data mining.
[69] F. van der Heijden, R.P.W. Duin, D. de Ridder, D.M.J. Tax, Classification,
Parameter Estimation and State Estimation: An Engineering Approach Using
MATLAB, John Wiley & Sons, Hoboken, NJ, USA, 2004.
[70] D.M.J. Tax, DDtools, the data description toolbox for matlab, version 2.1.1, 2014.
URL: 〈http://prlab.tudelft.nl/david-tax/dd_tools.html〉.
[71] M. Friedman, The use of ranks to avoid the assumption of normality implicit in
the analysis of variance, J. Am. Stat. Assoc. 32 (200) (1937) 675–701. http://dx.
doi.org/10.1080/01621459.1937.10503522.
[72] P. Nemenyi, Distribution-free multiple comparisons (Ph.D. thesis), Princeton
University, 1963.
[73] J. Demšar, Statistical comparisons of classifiers over multiple data sets, J. Mach.
Learn. Res. 7 (2006) 1–30.
[74] M. Lichman, UCI machine learning repository, 2013. URL: 〈http://archive.ics.
uci.edu/ml〉.
[75] B. Akay, D. Karaboga, Parameter tuning for the artificial bee colony algorithm, in:
N.T. Nguyen, R. Kowalczyk, S.-M. Chen (Eds.), Computational Collective Intelli-
gence, Semantic Web, Social Networks and Multiagent Systems, Lecture Notes in
Computer Science, vol. 5796, Springer, Berlin, Heidelberg, Germany, 2009,
pp. 608–619. http://dx.doi.org/10.1007/978-3-642-04441-0_53.