Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

2015-Elsevier-BeeOWA-A-novel-approach-based-on-ABC-algorithm-and-induced-OWA-operators-for-constructing-one-class-classifier-ensembles

The paper presents BeeOWA, a novel method for constructing one-class classifier ensembles using a binary artificial bee colony algorithm called BeePruner for pruning and an exponential induced OWA operator for fusion. The approach addresses the challenges of generating diverse and accurate classifiers and effectively combining their outputs. Experimental results demonstrate that BeeOWA outperforms several state-of-the-art methods in classification performance and statistical significance.

Uploaded by

chandreshgovind
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

2015-Elsevier-BeeOWA-A-novel-approach-based-on-ABC-algorithm-and-induced-OWA-operators-for-constructing-one-class-classifier-ensembles

The paper presents BeeOWA, a novel method for constructing one-class classifier ensembles using a binary artificial bee colony algorithm called BeePruner for pruning and an exponential induced OWA operator for fusion. The approach addresses the challenges of generating diverse and accurate classifiers and effectively combining their outputs. Experimental results demonstrate that BeeOWA outperforms several state-of-the-art methods in classification performance and statistical significance.

Uploaded by

chandreshgovind
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Neurocomputing 166 (2015) 367–381

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

BeeOWA: A novel approach based on ABC algorithm and induced OWA


operators for constructing one-class classifier ensembles
Elham Parhizkar, Mahdi Abadi n
Faculty of Electrical and Computer Engineering, Tarbiat Modares University, P.O. Box 14115-194, Tehran, Iran

art ic l e i nf o a b s t r a c t

Article history: In recent years, classifier ensembles have received increasing attention in the machine learning and
Received 3 September 2014 pattern recognition communities. However, constructing classifier ensembles for one-class classification
Received in revised form problems has still remained as a challenging research topic. To pursue this line of research, we need to
10 January 2015
address issues on how to generate a set of diverse one-class classifiers that are individually accurate and
Accepted 17 March 2015
how to combine the outputs of them in an effective way. In this paper, we present BeeOWA, a novel
Communicated by A. Suarez
Available online 10 April 2015 approach to construct highly accurate one-class classifier ensembles. It uses a novel binary artificial bee
colony algorithm, called BeePruner, to prune an initial one-class classifier ensemble and find a near-
Keywords: optimal sub-ensemble of base classifiers in a reasonable computational time. To evaluate the fitness of
One-class classification
an ensemble solution, BeePruner uses two different measures: an exponential consistency measure and
Classifier ensemble
a non-pairwise diversity measure based on the Kappa inter-rater agreement. After one-class classifier
Classifier pruning
Classifier fusion pruning, BeeOWA uses a novel exponential induced OWA (ordered weighted averaging) operator, called
Binary artificial bee colony EIOWA, to combine the outputs of base classifiers in the sub-ensemble. The results of experiments
Induced OWA operator carried out on a number of benchmark datasets show that BeeOWA can outperform several state-of-the-
art approaches, both in terms of classification performance and statistical significance.
& 2015 Elsevier B.V. All rights reserved.

1. Introduction training datasets are different and determined by the accuracy of


former base classifiers. Random subspacing trains base classifiers
Combining multiple classifiers, also known as classifier ensem- independently on the same training dataset using different random
ble, is an effective technique for solving classification problems subsets of features.
using an ensemble of individual base classifiers. It has been Using a subset, or sub-ensemble, of base classifiers could provide
theoretically and empirically demonstrated that classifier ensem- higher diversity and accuracy than using the whole set, or ensemble.
bles can substantially improve the classification accuracy of their Thus, one of the most important issues in constructing a classifier
constituent members [1–3]. ensemble is to decide which ones of the base classifiers to choose
In addition to the accuracy of base classifiers, the success of a [9,10]. This process is also known as classifier pruning or ensemble
classifier ensemble also relies on the diversity being inherent in the pruning [11] and can be considered as an optimization problem with
base classifiers [4,5]. Diversity ensures that all the base classifiers are two objectives, classification accuracy and diversity, that both need to
as different from each other as possible and so can make uncorrelated be maximized. When the size of a classifier ensemble is relatively
errors. To achieve an initial diversity, a common technique is to train a large, classifier pruning is computationally expensive or even prohi-
set of homogeneous or heterogeneous base classifiers on different or bitive. One solution to this problem is to use meta-heuristic algo-
same training datasets using techniques such as bagging [6], boosting rithms, such as artificial bee colony (ABC) [12]. These algorithms can
[7], or random subspacing [8]. Bagging generates several different find near-optimal solutions in a short time. ABC is a new swarm based
training datasets with bootstrap sampling from the original training meta-heuristic algorithm that was initially proposed for solving
dataset and then trains a base classifier from each of those training numerical optimization problems. It is as simple as particle swarm
datasets. Boosting generates a sequence of base classifiers whose optimization (PSO) [13] and differential evolution (DE) [14], and uses
only common control parameters, such as population size and
maximum cycle number. ABC has shown promising results in the
n
Corresponding author. Tel.: þ 98 21 82884935; fax: þ98 21 82884325.
field of optimization [15,16].
E-mail addresses: e.parhizkar@modares.ac.ir (E. Parhizkar), Furthermore, it has been shown that increasing the coverage of
abadi@modares.ac.ir (M. Abadi). a classifier ensemble through classifier pruning is not enough to

http://dx.doi.org/10.1016/j.neucom.2015.03.051
0925-2312/& 2015 Elsevier B.V. All rights reserved.
368 E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381

increase the classification accuracy [17,18]. Hence, another important overview of current techniques for constructing one-class classifier
step in constructing a classifier ensemble is to choose a good strategy ensembles. The experimental results are described in Section 5.
for combining the outputs of base classifiers, using a process also Finally, conclusions are given in Section 6.
known as classifier fusion. In the literature, several classifier fusion
strategies have been proposed, which can be categorized according
to the level of classifier outputs into abstract level, rank level, and 2. Background
measurement level [19,20]. Measurement level outputs provide more
information than the other types of outputs and a number of In this section, we give a brief introduction to some basic
aggregation functions or fusion rules, such as mean, max, and product concepts used throughout this paper.
are employed for combining them [21].
Aggregation of different pieces of information obtained from 2.1. Classifier ensemble
different sources is a common aspect of any fusion system. A very
interesting class of powerful aggregation operators is called the To improve the performance of different classifiers, which may
ordered weighted averaging (OWA) [22]. An OWA operator takes differ in complexity or training algorithm, an ensemble of classifiers
multiple values as input and returns a single value that is a is a viable solution. This may serve to increase the performance and
weighted sum based on an order ranking of all input values. also the robustness of the classification. The underlying idea of
Classifier fusion using OWA operators seems more robust than the ensemble learning for classification problems is to build a number
simple weighted averaging, where the coefficients are derived of base classifiers and then combine their outputs using a fusion rule.
based on the classifier accuracy [23]. It has been shown that classifier ensembles outperform single
In recent years, there has been a substantial amount of research classifiers for a wide range of classification problems [27–29]. The
conducted in the field of one-class classification, resulting in different reason is that a combination of multiple classifiers reduces risks
one-class classifiers, including one-class SVM (OCSVM) [24], support associated with choosing an insufficient single classifier.
vector data description (SVDD) [25], and so on. The goal of one-class In general, the process of constructing an ensemble of base
classification is to distinguish a set of target objects from all the other classifiers consists of three main steps: classifier generation,
possible objects [26]. Since in one-class classification problems we classifier pruning, and classifier fusion. In the following, we briefly
have only the information of the target class, therefore constructing describe each of these steps.
highly accurate one-class classifier ensembles is more challenging
than constructing multi-class classifier ensembles. 2.1.1. Classifier generation
As mentioned before, to construct classifier ensembles, we need One of the key challenges for the classifier ensemble is to
to address issues on how to generate a set of accurate and diverse generate a set of diverse base classifiers. For this purpose, we can
base classifiers and how to combine the outputs of them using an use different strategies. One effective strategy is to train homo-
effective fusion rule. Although these issues have been adequately geneous classifiers with different datasets. To do this, we can
addressed in multi-class classifier ensembles, relatively little work divide the original dataset into partitions or generate a number of
has been reported in the literature to address them in one-class subsets through data splitting, bagging, or boosting [6,7,30], in the
classifier ensembles. In this paper, we present BeeOWA, a novel hope that different classification models are generated for differ-
approach to construct highly accurate one-class classifier ensembles. ent distributions of the original dataset. Another strategy, also
BeeOWA uses a novel binary artificial bee colony algorithm, called known as ensemble feature selection, is to train homogeneous
BeePruner, to prune an initial one-class classifier ensemble and find a classifiers with different subsets of features. It has been shown
near-optimal sub-ensemble of base classifiers. More precisely, the that simple random selection of feature subsets, called the random
goal of BeePruner is to exclude the non-diverse base classifiers from subspacing (RS) or random subspace method (RSM), is an effective
the initial ensemble and, at the same time, keep the classification technique for ensemble feature selection [8]. The last and intuitive
accuracy. In the subsequent step, if the fusion rule does not properly strategy is to train heterogeneous classifiers with the same
utilize the ensemble diversity, then no benefit arises from the dataset. It should be noted that we can combine the above
classifier fusion. Considering this fact, BeeOWA uses a novel expo- strategies to take the advantages of all of them, for example, we
nential induced OWA operator, called EIOWA, to combine the outputs can train heterogeneous classifiers with different datasets or
of base classifiers in the sub-ensemble. different subsets of features [31,32].
The major contributions of this paper are listed as follows:
2.1.2. Classifier pruning
 We present a novel artificial bee colony algorithm for one-class Classifier pruning, also known as ensemble pruning, is a useful
classifier pruning that utilizes two measures simultaneously, an technique for reducing the ensemble size by selecting only a
exponential consistency measure and a non-pairwise diversity subset of the base classifiers that are both accurate and have
measure based on the Kappa inter-rater agreement. diversity in their outputs [4]. It is well-known that combining the
 To the best of our knowledge, the most widely used fusion rules in same base classifiers does not contribute to anything apart from
one-class classification problems are fixed rules, such as majority increasing the computational complexity of the classification. Also,
voting, mean, max, and product. We propose a novel exponential combining the diverse but too weak base classifiers is unlikely to
induced OWA operator for one-class classifier fusion and experi- bring any benefits in the classification accuracy. Therefore, classi-
mentally show it can outperform the fixed rules. fier pruning can be considered as a search for an optimal subset of
 We conduct extensive experiments on benchmark datasets to base classifiers, making trade-off between the classification accu-
evaluate the performance of BeeOWA and show that it per- racy and diversity. For small ensembles, the optimal subset can be
forms significantly better than state-of-the-art approaches in found through exhaustive search. For large ensembles, a near-
the literature. optimal subset can be found using meta-heuristic optimization
algorithms like genetic algorithms [33,34].
The rest of this paper is organized as follows. Section 2 is fully
dedicated to the background. Short descriptions of classifier ensem- 2.1.3. Classifier fusion
bles, OWA operators, and ABC algorithm are included in this section. An important step in constructing an effective ensemble is to
Section 3 presents the main steps of BeeOWA. Section 4 provides an choose a good strategy for combining the outputs of the base
E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381 369

classifiers. It has been shown that increasing the coverage of an subject to


ensemble through diversity is not enough to increase the classifica-
J xk  c J 2 r R2 þ ξk ; ξk Z 0 for all k ¼ 1; …; N; ð4Þ
tion accuracy of the ensemble [18]. Several fusion rules, such as
majority voting, mean, max, and product, have been proposed in the where xk is the kth training object and N is the total number of
literature to combine the outputs of the base classifiers. The majority training objects. Moreover, c and R are the center and radius of the
voting is only used when the outputs are the class labels [35]. On the hypersphere, respectively. Also, ξk are slack variables, and C is a
other hand, if continuous outputs like posterior probabilities are parameter that controls the trade-off between the volume of the
given, the mean or some other linear fusion rules are applied [21,36]. hypersphere and the errors.
Also, if the outputs are interpreted as fuzzy membership values, This optimization problem can be solved using Lagrange multi-
fuzzy rules [37,38] and belief functions [39] are used. pliers. After that, a test object x is classified as belonging to the
target class ωT if its distance from c is less than or equal to R2:
N N N
2.2. One-class classification 2
:x  c: ¼ ðx  xÞ  2 ∑ αk  ðx  xk Þ þ ∑ ∑ αk  αr  ðxk  xr Þ ≤R2 :
k¼1 k¼1r¼1
The goal of one-class classification, also known as outlier ð5Þ
detection, is to distinguish a set of objects, the target objects, from
all the other possible objects, the outlier objects [26,40]. A number In this case, R is the distance from c to one of the support vectors
of one-class classification techniques have been proposed in the on the boundary (i.e., a support vector xs for which 0 o αs o C).
literature that can be grouped as density-based, boundary-based, If we define a mapping function ϕ that transforms the original
and reconstruction-based techniques [26]. One-class classification feature space into a higher dimensional feature space, we can
techniques are particularly useful when the information of the compute the inner product in the image of ϕ using a Gaussian
target class is available and nothing is known about the outlier kernel:
 
class. A situation may occur, for instance, where we want to detect 1
ψ ðx; yÞ ¼ ϕðxÞ  ϕðyÞ ¼ exp  2 J x  y J 2 ; ð6Þ
attacks against web-based applications. We can easily capture 2σ
legitimate web requests, and in doing so, form the target class. On
where σ is the kernel parameter. When this kernel is used, the first
the other hand, most severe attacks remain unknown at the time
term of (5) equals 1.0 and so the testing function reduces to a
of development; hence we may have little or no training objects
weighted sum of Gaussians [25]:
for the outlier class. Moreover, we may not want to wait for these
attacks to occur as they may damage the applications. 2
N
:x  c: ¼ ∑ αk  ψ ðx; xk Þ≥  R2 =2 þ C R ; ð7Þ
In the following, we briefly introduce a popular one-class k¼1
classifier from each category, namely Parzen-window (PW), sup-
port vector data description (SVDD), and K-means (KM). where CR only depends on the support vectors and not on x.
According to the above discussion, we conclude that when the
Gaussian kernel is used, the output of SVDD can be interpreted as a
2.2.1. Parzen-window probability estimate:
Parzen-window (PW) [41] is a kernel-based nonparametric tech- X
N
1
nique that can be used to estimate the density of the target class: pðxj ωT Þ ¼ αk  ψ ðx; xk Þ: ð8Þ
ð2πσ 2 ÞM=2
k¼1
1 X N x  x 
pðxj ωT Þ ¼ ϑ k
; ð1Þ
Nσ M k ¼ 1 σ
2.2.3. K-means
where ωT is the target class, N is the total number of training objects K-means (KM) [44] is a well-known clustering algorithm that
belonging to ωT , xk is the kth training object, M is the dimensionality can be used to build reconstruction-based one-class classifiers. The
of the feature space, ϑ is a window or kernel function, and σ is the main idea is to partition the training objects belonging to the
window width or smoothing parameter. It is assumed that
R target class ωT into K clusters so that a sum of squared errors is
ϑðzÞ dz ¼ 1 and ϑ is symmetric, that is, ϑðzÞ ¼ ϑð  zÞ. Hence, we minimized. Then the distance of a test object x to ωT is defined as
can use a Gaussian kernel, defined as the distance between x and the nearest cluster:
 
1 1 dðx; ωT Þ ¼ min J x  μr J 2 ; ð9Þ
ϑðzÞ ¼ exp  J z J 2 ; ð2Þ r
ð2π Þ M=2 2
where μr is the centroid of the rth cluster.
where J  J is the Euclidean norm. It is often hard to combine the output of K-means with that of a
density-based one-class classifier (e.g., Parzen-window). In order
to allow this algorithm to produce an output that can be inter-
2.2.2. Support vector data description
preted as a probability estimate, we can use all the K distances
Support vector data description (SVDD) [25] is a one-class
between x and the centroids μr as
classification technique that obtains a spherically shaped bound-
 
ary around the target class, such that it accepts as much of the 1 X K
x  μr
pðxj ωT Þ ¼ ϑ ; ð10Þ
target objects as possible, while minimizing the chance of accept- K ϱM r ¼ 1 ϱ
ing outlier objects. SVDD has been successfully applied in a wide
variety of applications such as face recognition [42] and anomaly where
detection in hyperspectral images [43]. KX1 X K
2
Given a dataset of target objects, to find a minimum enclosing ϱ¼ J μr  μs J : ð11Þ
KðK 1Þ r ¼ 1 s ¼ r þ 1
hypersphere that contains the dataset, we need to solve the
following constrained optimization problem: To put it more clearly, we model the distribution of the target
X
N class by a mixture of K normal densities, each one centered on a
min R2 þ C ξk ; ð3Þ centroid μr. A heuristic is used to compute ϱ as the average
c;R;ξk
k¼1 distance between the K centroids.
370 E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381

2.3. One-class classifier ensemble 1XL


μðωO j xÞ ¼ p ðωO j xÞ; ð17Þ
Lj¼1 j
In recent years, classifier ensembles have received widespread
attention in the machine learning literature. However, construct- where pj ðωT j xÞ and pj ðωO j xÞ are the posterior probabilities for Cj.
ing classifier ensembles for one-class classification problems is still Therefore, the decision rule for the classifier ensemble can be
considered as a challenging task. For one-class classifiers, we have written as
only the information of one of the classes, the target class, there- (
fore constructing a highly accurate ensemble of one-class classi- ωT ; μðωT j xÞ Z μðωO j xÞ;
yμ ðxÞ ¼ ð18Þ
fiers is more difficult than that of multi-class classifiers. The ωO otherwise;
controversial problems of one-class classifier ensembles are to
or
decide (1) which ones of base one-class classifiers to choose and
8
(2) how to combine the results produced by them. >
> X
L X
L
< ωT ; pj ðxj ωT Þ Z ξj ;
Let C ¼ fC 1 ; …; C L g be an ensemble (pool, committee, team) of
yμ ðxÞ ¼ j¼1 j¼1 ð19Þ
base one-class classifiers, Ω ¼ fωT ; ωO g be a set of class labels, and >
>

x A RM be an M-dimensional test object to be labeled in Ω. When O otherwise:
one-class classifiers in C are to be combined based on posterior The product rule is given by
probabilities, we should compute the posterior probabilities
pj ðωT j xÞ and pj ðωO j xÞ for each base classifier C j A C: L
Π ðωT j xÞ ¼ ∏ pj ðωT j xÞ;
pj ðxj ωT Þ  pðωT Þ j¼1
pj ðωT j xÞ ¼ ;
pj ðxj ωT Þ  pðωT Þ þ pj ðxj ωO Þ  pðωO Þ L
Π ðωO j xÞ ¼ ∏ pj ðωO j xÞ: ð20Þ
pj ðxj ωO Þ  pðωO Þ j¼1
pj ðωO j xÞ ¼ ; ð12Þ
pj ðxj ωT Þ  pðωT Þ þ pj ðxj ωO Þ  pðωO Þ
Therefore, the decision rule for the classifier ensemble can be
where ωT is the target class and ωO is the outlier class. Because the written as
outlier distribution pj ðxj ωO Þ is unknown, we cannot calculate 8 L L
>
< ωT ; ∏ pj ðxj ωT Þ Z ∏ ξ ;
pj ðωT j xÞ and pj ðωO j xÞ directly. The problem is solved when an j
yΠ ðxÞ ¼ j¼1 j¼1 ð21Þ
outlier distribution is assumed. When pj ðxj ωO Þ has a uniform >
:
distribution in the area of the feature space that we are consider- ωO otherwise:
ing, pj ðωT j xÞ and pj ðωO j xÞ can be estimated as This approach can be extended to other fusion rules.
pj ðxj ωT Þ  pðωT Þ
pj ðωT j xÞ ¼ ;
pj ðxj ωT Þ  pðωT Þ þ θj  pðωO Þ 2.4. Ordered weighted averaging operators
θj  pðωO Þ
pj ðωO j xÞ ¼ ; ð13Þ
pj ðxj ωT Þ  pðωT Þ þ θj  pðωO Þ The ordered weighted averaging (OWA) operators [22] are power-
ful operators to aggregate multiple input values obtained from
where θj is a constant value. If we set ξj ¼ θj  pðωO Þ=pðωT Þ, the different sources. Due to their simplicity and robustness, they have
above posterior probabilities can be written as been widely used in many applications, such as database systems
pj ðxj ωT Þ [46], fuzzy logic controllers [47], classifier ensembles [39], and so on.
pj ðωT j xÞ ¼ ;
pj ðxj ωT Þ þ ξj In the following, we give a brief introduction to the OWA operators.
An OWA operator of dimension L is a mapping O : RL -R such
ξj
pj ðωO j xÞ ¼ : ð14Þ that
pj ðxj ωT Þ þ ξj
X
L
Therefore, the decision rule for Cj can be written as Oða1 ; …; aL Þ ¼ wj  bj ; ð22Þ
( j¼1
ωT ; pj ðωT j xÞ Z pj ðωO j xÞ;
C j ðxÞ ¼ ð15Þ where bj is the jth largest value in the set fa1 ; …; aL g and wj is a
ωO otherwise; P
weight such that wj A ½0; 1 and Lj ¼ 1 wj ¼ 1.
or The result of the aggregation performed by an OWA operator
( largely depends upon its associated weights. A number of techni-
ωT ; pj ðxj ωT Þ Z ξj ;
C j ðxÞ ¼ ð16Þ ques have been suggested for obtaining the weights [48]. More-
ωO otherwise: over, a measure, called orness, has been presented to characterize
It is worth noting that ξj represents the decision threshold for the associated weights of an OWA operator [22]. This measure is
Cj. In practice, we can set ξj such that a given rejection rate is defined as
produced by Cj. 1 X L
As previously mentioned, some one-class classifiers are boundary- 3 ðw1 ; …; wL Þ ¼ ðL  jÞwj : ð23Þ
L1 j ¼ 1
based or reconstruction-based, and so do not produce a posterior
probability for the target class. Hence, we cannot directly make use of The orness measures the degree to which the aggregation is like
the output of these classifiers in (14). To address this problem, Tax an or (max) operation and can be viewed as a measure of optimism
et al. [45] proposed a heuristic mapping to transform the output of of the aggregator. More precisely, when the degree of orness is
distance-based one-class classifiers into a probability estimate. greater than 0.5, the OWA operator is considered as more orlike than
After estimating the posterior probabilities, one-class classifiers andlike. In this case, the larger input values play a more important
in C can be combined by using many different fusion rules, such as role in the aggregation process. Generally speaking, an orness degree
mean and product. The mean rule is given by greater than 0.5 indicates that the aggregator is optimistic, while an
orness degree less than 0.5 indicates that it is pessimistic.
1XL
μðωT j xÞ ¼ p ðωT j xÞ; The induced OWA (IOWA) operators [49] are more general type
Lj¼1 j
of the OWA operators. These operators take a set of paired values
E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381 371

as input, in which the first values are used to induce an ordering 3.1. One-class classifier pruning
over the second values which are then aggregated.
As previously mentioned, the aim of classifier pruning is to
reduce the ensemble size by selecting only a subset of the base
classifiers. Because of the complexity of the problem and the size
2.5. Artificial bee colony
of the solution space, finding an optimal subset of the base
classifiers is difficult and time-consuming. Therefore, we use a
Artificial bee colony (ABC) [12] is a recently introduced swarm
novel binary ABC algorithm, called BeePruner, to find a near-
intelligence algorithm that simulates the intelligent foraging behavior
optimal solution in a reasonable computational time. In the
of honeybees. It consists of two main concepts: food sources and
following, we describe the algorithm in more detail.
artificial bees. Artificial bees search for food sources having high
Unlike real-valued optimization problems, where the candidate
nectar amounts. The position of a food source represents a possible
solutions can be represented by vectors of real values, the candidate
solution to an optimization problem and its nectar amount corre-
solutions to the classifier pruning problem are represented in binary
sponds to the quality (or fitness) of the solution. The colony of
space. To adapt the ABC algorithm to this problem, we consider a
artificial bees contains three groups of bees: employed bees, onlooker
binary vector yi corresponding to each food source position zi A P.
bees, and scout bees. Employed bees search for new food sources
Each element of yi can take on a binary value (0 or 1). Formally,
having more nectar amount in the vicinity of their food sources and
yij A f0; 1g. The element zij A zi is then interpreted as the probability
share the obtained information with onlooker bees by dancing in the
for yij to be 0 or 1. For example, zij ¼ 0:4 implies a 40% chance for yij
dance area. The number of employed bees is equal to the number of
to be 1 and a 60% chance to be 0. This means that the elements of
food sources. After watching the dance, the onlooker bees probabil-
food source positions are restricted to be in the range [0,1] to be
istically select some of the food sources and search the vicinity of
interpreted as a probability. Therefore, we set the lower and upper
them to find new food sources with more nectar amount, just like the
bounds of each element to 0 and 1, respectively. An artificial bee may
employed bees. However, the onlooker bees have more trends to
then be seen to move to near and far corners of a hypercube when
select richer food sources. Employed bees, whose food sources cannot
searching for food sources.
be improved through a predetermined number of trials, become scout
The aim of BeePruner is to find a near-optimal subset (sub-
bees and start to search for new food sources randomly.
ensemble) of base one-class classifiers. Hence, we refer to food
At the initialization step, the scout bees randomly initialize a
source positions and their binary interpretations as ensemble
population P of food source positions. Afterward, P is subjected to
positions and ensemble solutions, respectively.
repeated cycles of the search processes of the artificial bees. Let SN be
The pseudo-code of BeePruner is shown in Algorithm 1. Given
the number of food sources and D be the number of optimization
an initial ensemble S of base one-class classifiers, the artificial
parameters. Each employed bee generates a candidate food source
bees explore and exploit them to obtain a sub-ensemble C D S that
position vi in the vicinity of its food source position zi A P according to
achieves the maximum possible fitness function. BeePruner first
vij ¼ zij þ φij  ðzij  zkj Þ; ð24Þ sends the scout bees to randomly initialize a population P of
ensemble positions (Line 1). Each ensemble position zi A P is a
where k A f1; …; SNg and jA f1; …; Dg are two randomly chosen vector of D real values, each in the range [0,1], where D is set to the
indices. φij is a random value in the range ½  1; 1. After generating total number of base classifiers in S. BeePruner then applies a
vi, its fitness is calculated and a greedy selection is applied between vi binary interpretation operator to each ensemble position zi A P to
and zi. form the ensemble solution yi (Line 2):
Subsequently, each onlooker bee randomly selects a food source i (
with a food source selection probability pi, which is calculated as 1; ψ ij o zij ;
yij ¼ ð26Þ
0 otherwise;
f
pi ¼ PSN i ; ð25Þ
k¼1 fk where ψij is a random value in the range [0,1]. The element zij is
now a probability for yij to be 0 or 1. In fact, yi indicates a possible
where fi is the fitness of the food source i. sub-ensemble of base one-class classifiers in S. If yij ¼ 1, this
After selecting a food source, each onlooker bee generates a means that the base classifier C j A S is part of the sub-ensemble
candidate food source position in the vicinity of its food source to be evaluated. On the other hand, if yij ¼ 0, it means that Cj is not
position, exactly the same as the employed bees. After that, few part of the sub-ensemble. Note that yij can change even if the value
food sources that are not improved during a predetermined of zij does not change, due to the random value ψij. For example,
number of cycles, as controlled by a parameter limit (LM), are let S ¼ fC 1 ; …; C 5 g be an ensemble of five base one-class class-
detected and abandoned. These food sources are replaced with ifiers. Then the ensemble solution yi ¼ ½0; 0; 1; 0; 1 indicates a sub-
randomly selected food sources by the scout bees. ensemble fC 3 ; C 5 g.
Since ABC has few control parameters and also easy to imple- Subsequently, BeePruner calculates the fitness of each ensemble
ment, it has been widely used in many optimization applications, solution yi and repeats the following steps until a termination crite-
such as dynamic anomaly detection in MANETs [15], stacking rion is met (Lines 3–27): each employed bee i generates a new ens-
ensemble [16], classification rule discovery [50], and so on. emble position z0i in the vicinity of its own ensemble position zi A P
and applies the binary interpretation operator to z0i to form the
ensemble solution y0i . It then calculates the fitness of y0i and performs
a greedy selection between yi and y0i . After that, all employed bees
3. BeeOWA share their information about ensemble positions with onlooker bees
by dancing in the dance area (Lines 5–13). According to the informa-
In this section, we present BeeOWA, a novel approach for tion provided by the employed bees, each onlooker bee i chooses an
constructing one-class classifier ensembles. It assumes that we ensemble position zk A P using the binary tournament selection and
are given an initial ensemble of base one-class classifiers. There- generates a new ensemble position z0k in the vicinity of zk. It then
fore, it consists of two main steps: one-class classifier pruning and applies the binary interpretation operator to z0k to form the ensemble
one-class classifier fusion. solution y0k and performs a greedy selection between yk and y0k similar
372 E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381

to that of the employed bees (Lines 15–23). Some ensemble positions In the following, we describe the exponential consistency and
in P may not be improved through a pre-specified number of trials by non-pairwise diversity measures in detail.
the employed and onlooker bees. Therefore, scout bees are sent to
determine these so-called infertile ensemble positions and replace  Exponential consistency measure. The exponential consistency
them with new ensemble positions (Line 25). measure indicates how consistent a sub-ensemble of base classi-
fiers is in rejecting a fraction of the target objects. Given an ense-
mble solution yi, we compute the exponential consistency of yi as
Algorithm 1. BeePruner.
νðyi Þ ¼ e  αΔi ; ð28Þ
1: Initialize a population P of ensemble positions
2: Apply a binary interpretation operator to each ensemble where α Z1 is a constant and Δi is the average consistency of
position zi A P to form the ensemble solution yi and then base classifiers in the sub-ensemble indicated by yi and is given
calculate the fitness value f ðyi Þ by
3: repeat
1 X D
4: // Employed bee phase Δi ¼ y δ; ð29Þ
M i j ¼ 1 ij j
5: for each employed bee i¼1 to SN do
6: Generate a new ensemble position z0i in the vicinity of P
where D is the total number of base classifiers in S, M i ¼ D j ¼ 1 yij
the ensemble position zi A P is the number of base classifiers in yi, and δj is the consistency of
7: Apply the binary interpretation operator to z0i to form the base classifier C j A S:
the ensemble solution y0i
δj ¼ j ε^ j  εj j ; ð30Þ
8: Calculate the fitness value f ðy0i Þ
9: if f ðy0i Þ 4 f ðyi Þ then where εj and ε^ j are the designated and estimated rejection rates
10: zi ’z0i of Cj on the target objects, respectively. Note that to compute the
11: yi ’y0i exponential consistency measure, we only need the information
12: end if of the tar;get class, which makes it suitable for one-class
13: end for classification problems.
14: // Onlooker bee phase
 Non-pairwise diversity measure. Previous researches have shown
15: for each onlooker bee i ¼1 to SN do that the success of classifier ensembles not only depends on a set
16: Choose an ensemble position zk A P using the binary of appropriate base classifiers, but also on the diversity being
tournament selection and then generate a new inherent in the base classifiers [52,53]. Statistical diversity mea-
ensemble position z0k in the vicinity of it sures can be divided into two categories [54]: pairwise and non-
17: Apply the binary interpretation operator to z0k to form pairwise. Pairwise diversity measures are designed to measure
the diversity between all possible pairings of base classifiers and
the ensemble solution y0k
non-pairwise ones are designed to measure the diversity among
18: Calculate the fitness value f ðy0k Þ
all base classifiers in a classifier ensemble. Although there is a
19: if f ðy0k Þ 4 f ðyk Þ then substantial number of diversity measures in the literature, but
20: zk ’z0k few of them have been applied for one-class classifier pruning
21: yk ’y0k [55]. To compute the diversity of ensemble solutions, we use a
22: end if non-pairwise diversity measure based on the Kappa inter-rater
23: end for agreement [54,56]. This measure assumes that the output of each
24: // Scout bee phase base classifier represents a correct/incorrect decision. This type of
25: Determine any infertile ensemble position in P and the classifier output is also known as the oracle output, because it
replace it with a new ensemble position assumes that we know the correct labels of input feature vectors.
26: Memorize the best ensemble solution yn Note that in the case of one-class classification problems, we have
27: until a termination criterion is met only the oracle outputs for target feature vectors. Given an
ensemble solution yi and a dataset XT of target feature vectors,
let Ωj ðxk Þ be an indicator function that takes the value 1 if the
base classifier C j A S correctly classifies a feature vector xk A X T as
3.1.1. Fitness function belonging to the target class ωT and 0 otherwise:

The fitness function evaluates the quality of the ensemble 1; C j ðxk Þ ¼ ωT ;
solutions. Ideally, an ensemble solution should consist of base Ωj ðxk Þ ¼ ð31Þ
0 otherwise:
classifiers that are both accurate and diverse. More precisely, each
base classifier in the ensemble should have high individual classifica- We compute the non-pairwise diversity of yi as
tion accuracy as well as high diversity when compared to other base PN
M l ðx Þð1  li ðxk ÞÞ
classifiers. Although classification accuracy is a main concern, it υðyi Þ ¼ i k ¼ 1 i k ; ð32Þ
NðM i  1ÞRð1  RÞ
cannot be used for one-class classifier ensembles. As an alternative,
we modified the unsupervised consistency measure in [51]. where N ¼ jX T j is the number of target feature vectors in XT, li ðxk Þ
Given an ensemble solution yi, the fitness of yi is defined as is the proportion of base classifiers in yi that correctly classify xk,
and R is the average true positive rate of individual base classifiers
f ðyi Þ ¼ ω  νðyi Þ þ ð1  ωÞ  υðyi Þ; ð27Þ in yi:

where ω A ½0; 1 is a user-defined parameter. νðyi Þ and υðyi Þ are the 1 X D


li ðxk Þ ¼ y  Ωj ðxk Þ; ð33Þ
exponential consistency and non-pairwise diversity of yi, respec- M i j ¼ 1 ij
tively. Generally, the fitness function f allows for simultaneously
searching of base classifiers having high consistency and diversity. 1XN
This way we prevent our ensemble from consisting of base class- R¼ l ðx Þ: ð34Þ
Nk¼1 i k
ifiers that are too weak or too similar to each other.
E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381 373

If there is no variation in the values of li ðxk Þ for all target feature performance of the ensemble. Recall that orness is a measure of
vectors xk A X T , then there is more diversity among base classi- the degree of optimism. The larger the orness degree is, the more
fiers in yi. In this case, υðyi Þ may be seen to assume its maximum optimistic the aggregator is.
value of M i =ðM i 1Þ. On the other hand, if the value of li ðxk Þ is 0 or Using EIOWA, the decision rule for xk can be written as
1 for all target feature vectors xk A X T , then there is no diversity (
ωT ; Oε;ρ ðβ1 ðxk Þ; …; βL ðxk ÞÞ Z Oε;ρ ðξ1 ; …; ξL Þ;
among base classifiers in yi and υðyi Þ may be seen to assume its yO ðxk Þ ¼ ð36Þ
minimum value of 0. ωO otherwise;
where ξj is the decision threshold for Cj.
To define the EIOWA weights, we use the following exponential
3.1.2. Time complexity function:
Here, we provide a complexity analysis for BeePruner. For any (
ð1  zÞe  λz ; 0 r z r 1;
dataset XT of N target feature vectors and an initial ensemble S of gðzÞ ¼ ð37Þ
base one-class classifiers, suppose we are given the consistency of 0 otherwise;
each base classifier in S and the outputs of this base classifier for all where λ Z 0 is a constant value. In this case, each weight wj is
target feature vectors in XT. The aim of BeePruner is to obtain a sub- computed as
ensemble C D S that achieves the maximum possible fitness function.
BeePruner first sends the scout bees to randomly initialize a gðj=LÞ ðL  jÞe  λj=L
w j ¼ PL ¼ PL ; ð38Þ
population P of SN ensemble positions of length D, where D is the  λk=L
k ¼ 1 gðk=LÞ k ¼ 1 ðL  kÞe
total number of base classifiers in S. It then applies a binary
where L is the total number of weights.
interpretation operator to each ensemble position of P to form its
For different values of λ, we get the EIOWA weights with
corresponding ensemble solution. Therefore, the time complexity
different levels of orness. Furthermore, it can be proved that the
of population initialization and binary interpretation is essentially
degree of orness is always in the range (0.5,1]. For example, for
OðSN  DÞ. Next, BeePruner calculates the fitness of the ensemble
solutions based on two different measures: exponential consis-
λ ¼ 0, the degree of orness is given by
tency and non-pairwise diversity. Given an ensemble solution, 1 L 2L2  3L þ 1
these measures use the average consistency and the average true ∨ðw1 ; …; wL Þ ¼ ∑ ðL  jÞwj ¼ ; ð39Þ
L  1j¼1 3ðL  1Þ2
positive rate of all base classifiers in the sub-ensemble indicated
by that ensemble solution, respectively. Hence, their respective which converges to 2/3 as the number of weights increases
time complexities for each ensemble solution are bounded by O(D) infinitely, that is, limL- þ 1 3 ðw1 ; …; wL Þ ¼ 2=3.
and OðN  DÞ, resulting in a time complexity of OðN  DÞ for the
Lemma 1. The orness degree of EIOWA is always in the range (0.5,1].
fitness function. After that, BeePruner repeatedly sends the
employed and onlooker bees to explore and exploit ensemble Proof. It has been shown that for any collection of OWA weights
solutions for finding the best one. Therefore, the worst-case time having the property that wj Z wk for j o k, the degree of orness is
complexity of these steps is OðCN  SN  N  DÞ, where CN is the always in the range (0.5,1] [57]. Therefore, we only need to show
maximum cycle number. Since we usually set SN to a small value, that the EIOWA weights have the property that wj Z wj þ 1 for all
the overall time complexity of BeePruner becomes OðCN  N  DÞ. j ¼ 1; …; L  1. Since for λ Z 0 and for all j ¼ 1; …; L  1, we have
λj λðj þ 1Þ
3.2. One-class classifier fusion  4 : ð40Þ
L L

As previously mentioned, choosing a good strategy for combin- Thus,


ing the outputs of base classifiers is an important step when e  λj=L 4 e  λðj þ 1Þ=L ; ð41Þ
constructing a classifier ensemble. So far, numerous strategies have
been proposed for classifier fusion, but few of them, such as or
majority voting, mean, and product, have been applied to one-class ðL jÞe  λj=L 4 ðL jÞe  λðj þ 1Þ=L : ð42Þ
classification problems [45]. Hence, finding a good strategy for one-
class classifier fusion is still an open problem as far as we know. To Since e  λðj þ 1Þ=L Z 0, we get
address this issue, we use an exponential IOWA operator, called ðL jÞe  λj=L 4 ðL j  1Þe  λðj þ 1Þ=L : ð43Þ
EIOWA. In the following, we describe the operator in more detail.
Given a dataset XA of target and outlier feature vectors and an Hence, we conclude that
ensemble C of base one-class classifiers, let ρ be an order-inducing wj Z wj þ 1 : □ ð44Þ
function that assigns a real value to each base classifier in C. For a
given feature vector xk A X A , we define the EIOWA operator as
Lemma 2. The orness degree of EIOWA is monotonically increasing
X
L with respect to λ.
Oε;ρ ðβ1 ðxk Þ; …; β L ðxk ÞÞ ¼ wj  γ j ðxk Þ; ð35Þ
j¼1 Proof. To prove this lemma, we need to demonstrate that the first
where L ¼ jCj is the number of base one-class classifiers in C and derivative of the orness 3 with respect to λ is non-negative. For
βj ðxk Þ ¼ pj ðωT j xk Þ is the posterior probability of xk for the base the sake of simplicity of notation, we only consider the degree of
classifier C j A C. Furthermore, γ 1 ðxk Þ; …; γ L ðxk Þ are simply β1 ðxk Þ; …; orness for L ¼3:
!
βL ðxk Þ reordered in descending order of the order-inducing values 1X3
1 4e  λ=3 þ e  2λ=3
of their associated base classifiers. It should be mentioned that the 3 ðw1 ; w2 ; w3 Þ ¼ ð3  jÞwj ¼ : ð45Þ
2j¼1 2 2e  λ=3 þ e  2λ=3
order-inducing value of a base one-class classifier can be its exp-
onential consistency measure or its true positive rate. Therefore, Now for the derivative ∂ 3 =∂λ, we get
the orness degree of EIOWA should always be in the range (0.5,1].  4  λ=3 2  2λ=3   
∂3  3e  3e ð4e  λ=3 þ 2e  2λ=3 Þ   43e  λ=3  43 e  2λ=3 ð4e  λ=3 þ e  2λ=3 Þ
This causes that it gives more importance to base classifiers with ¼
∂λ ð4e  λ=3 þ 2e  2λ=3 Þ2
higher classification accuracy and so improves the classification
374 E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381

4 λ pruning using AUC is not suitable for one-class classifier ensem-


3e
¼ 4 0: □ ð46Þ bles, because to obtain an estimate of AUC, we need both target
ð4e  λ=3 þ 2e  2λ=3 Þ2
and outlier objects to be available. Based on the obtained results,
they concluded that combining a few accurate base classifiers may
The results of this lemma indicate that it is reasonable in some be more beneficial than combining all available base classifiers.
cases to use λ as a replacement for the measure of orness. Classifier pruning can be considered as searching optimal
EIOWA has several characteristics that make it suitable for one- solutions to obtain acceptable classification accuracy. Krawczyk
class classifier fusion: (1) the base classifiers within a one-class and Woźniak [11] examined several meta-heuristic search algo-
classifier ensemble are reordered based on their exponential rithms, including genetic algorithms, simulated annealing, and
consistency values or their true positive rates and then the tabu search, to select the best subset of base classifiers from a
aggregation is performed. Since the orness degree of EIOWA is given one-class classifier ensemble. To evaluate the quality of sub-
always greater than 0.5 (as shown in Lemma 1), more importance ensembles, they used two separate measures: consistency and
is given to base classifiers with higher classification accuracy, diversity. The measures allowed selecting base classifiers in such a
resulting in an improvement in the classification performance of way that they, at the same time, display high classification
the ensemble, (2) there is a simple relationship between the accuracy and are not similar to each other.
orness degree and the parameter λ of EIOWA (as shown in
Lemma 2), therefore we are able to perform the aggregation with 4.2. One-class classifier fusion
the desired degree of optimism.
To combine the outputs of base classifiers in an ensemble, we
need to choose a good aggregation function or fusion rule. In the
4. Related work literature, one popular fusion rule for one-class classifier

100
Over the last few years, one-class classifier ensembles have
been used in various domains, such as information security
95
[58,59], signature verification [60,61], image retrieval [62], and
so on. As previously mentioned, the two challenging steps in
90
constructing classifier ensembles are classifier pruning and classi-
fier fusion. These steps have proven to be promising research
85
directions for one-class classifier ensembles. Hence, in the follow-
ing, we give a brief overview of the state-of-the-art techniques for
80
one-class classifier pruning and one-class classifier fusion.
Average AUC

75
4.1. One-class classifier pruning
70
Given a number of base classifiers, most conventional approaches
combine all of them to construct a classifier ensemble. However, 65
many researchers suggested that using a sub-ensemble of base
classifiers may be better than using the ensemble as a whole [9,63]. 60
Cheplygina and Tax [63] proposed to apply pruning to random
sub-spaces of one-class classifiers using the supervised AUC 55
measure [64] or an unsupervised consistency measure [51]. This
method, also known as PRSM, is similar to RSM, already described 50
C.H. disease Cancer non-ret Imports Vehicle
in Section 2.1.1; but with this difference that it prunes inaccurate
base classifiers using the AUC or consistency measures. However, Fig. 1. Effect of the parameter SN on the average AUC of BeeOWA.

Table 1
Summary of the one-class datasets used in the experiments.

Dataset Total objects Target objects Features Dataset Total objects Target objects Features

Arrhythmia 420 237 278 M-feature 400 200 240


Audiology 105 57 69 Opt-digits 1143 571 64
Breast-cancer 286 201 9 Pen digits 2288 1144 16
Cancer non-ret 198 151 33 Pump 2  2 noisy 240 64 64
Cancer ret 198 47 33 Pump 1  3 noisy 180 41 64
C.H. disease 303 165 13 Segment 660 330 19
Concordia 2 4000 400 256 Sonar mines 208 111 60
Concordia 3 4000 400 256 Sonar rocks 208 97 60
Credit-rating 690 307 15 Soybean 183 91 35
Diabetes 768 268 8 Spam-base 4601 1813 57
E-coli 220 77 7 Spectf normal 349 95 44
Glass 214 17 9 Splice 2423 768 61
Heart Statlog 270 150 13 Vehicle 435 218 18
Hepatitis 155 32 19 Vowel 4 528 48 10
Horse-colic 368 136 22 Vowel 180 90 13
Hypothyroid 3675 194 29 Waveform 3347 1655 40
Imports 159 71 25 WB-cancer 699 241 9
Ionosphere 351 225 34 Wine 2 178 71 13
Letter 1618 805 16 Zoo 61 41 17
E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381 375

100

95

90

85

80
Average AUC
75

70

65

60

55

50
0 5 10 15 20 25 30 35 40 45 50 60 70 80 90 100
C.H. disease 75.9 74.4 76.3 75.4 69.1 69.3 73.1 70.2 65.6 66.2 66.8 65.3 65.9 65.1 66.2 64.7
Cancer non-ret 57.4 59.2 55.7 56.9 55.3 56.8 56.6 56.4 55.7 56.0 56.0 56.8 55.8 55.2 55.6 52.8
Imports 83.9 87.1 86.1 84.2 86.2 85.3 84.7 83.4 83.3 81.7 82.2 82.3 82.1 81.8 81.4 79.9
Vehicle 91.9 90.0 89.9 89.7 90.1 89.4 89.2 87.6 88.7 87.1 88.2 88.2 88.2 86.4 86.2 83.6

λ
Fig. 2. Effect of the parameter λ on the average AUC of BeeOWA.

Table 2
Comparison of BeePruner with other pruning algorithms in terms of AUC.

Dataset BeePruner SA SGA TS

Arrhythmia 79.7 (1) 79.2 (3) 79.4 (2) 78.5 (4)


Cancer non-ret 59.2 (1) 55.5 (4) 56.8 (2) 56.7 (3)
Cancer ret 59.4 (1.5) 59.4 (1.5) 54.8 (4) 58.3 (3)
Concordia 2 90.6 (1) 90.4 (2) 88.5 (3) 88.0 (4)
Concordia 3 93.8 (1) 93.2 (3) 93.6 (2) 92.5 (4)
Glass 98.4 (1) 97.6 (2) 96.6 (3) 96.1 (4)
Imports 87.1 (1) 84.3 (2) 73.8 (4) 80.2 (3)
Ionosphere 97.0 (2) 97.2 (1) 96.8 (3) 96.7 (4)
Pump 2  2 noisy 90.1 (1) 88.8 (2) 86.3 (4) 87.9 (3)
Pump 1  3 noisy 94.1 (1) 92.1 (2) 90.7 (4) 91.2 (3)
Sonar mines 84.5 (1) 84.4 (2) 74.0 (4) 80.8 (3)
Sonar rocks 69.1 (1) 67.2 (2) 62.6 (4) 66.2 (3)
Spectf normal 96.3 (1) 95.8 (2) 94.0 (4) 94.4 (3)
Vowel 4 99.2 (1.5) 99.2 (1.5) 95.9 (4) 98.5 (3)
Wine 2 98.4 (1) 92.9 (2) 92.2 (3) 92.0 (4)

Overall average AUC 86.5 (1.13) 85.1 (2.13) 82.4 (3.33) 83.9 (3.40)

Table 3
Results of the Friedman/Nemenyi test for ranking the pruning algorithms in
Table 2.
SA SGA
Tests Average ranks
BeePruner TS

F- Critical Significant Critical BeePruner SA SGA TS


statistic value differences difference

34.41 2.83 Yes 1.21 1.13 2.13 3.33 3.40 Fig. 3. Results of the Nemenyi test for the pairwise comparison of the pruning
algorithms in Table 2.
376 E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381

ensembles is majority voting [58,62]. Perdisci et al. [58] used an made based on the majority voting of all the classifiers. Liu et al.
ensemble of one-class SVM classifiers to build a payload-based [65] presented FENOC, a framework to construct one-class classi-
anomaly detection system. During the detection phase, a payload fier ensembles for malware detection. The core algorithm of
was classified as target (or normal) if it was labeled as target by FENOC is CosTOC, a one-class learning algorithm, which uses a
the majority of the classifiers, otherwise it was classified as outlier pair of one-class classifiers to describe malicious and benign
(or anomalous). Wu and Chung [62] proposed to construct an classes, respectively.
ensemble of one-class SVM classifiers for content-based image In addition to majority voting, other fusion rules, such as mean,
retrieval. They segmented training images into different groups on max, and product, are used in the literature [45]. These rules are
which a number of one-class SVM classifiers were separately applied to the thresholded probabilistic outputs or directly to the
trained. The final decision of the ensemble for given images was probabilistic outputs generated by base one-class classifiers.
Nanni [60] designed a one-class classifier ensemble using the
random subspace method [8] for on-line signature verification, in
100
which the max rule was used for combining the outputs of base
classifiers. The same fusion rule was also used for fingerprint
90
matching [66]. Giacinto et al. [59] proposed an unsupervised
network anomaly detection system based on a modular ensemble
80
of base one-class classifiers. Each module was designed to model
a particular group of similar protocols or network services. For all
70
modules, the performance was reported after combining base
Average Ensemble Size

classifiers using four different fusion rules, namely mean, max,


60
and product. Cyganek [67] proposed an ensemble of one-class
SVM classifiers for image segmentation, in which a so-called
50
exclusive-voting rule was used to make the final decision. Using
this rule, they classified each input image as target iff it was
40
classified as target exclusively by one of base classifiers in the
30
ensemble. Menahem et al. [68] introduced two one-class classi-
fier ensembles, namely ESBE and TUPSO. ESBE is a simple one-
20
class classifier ensemble whose output is determined by a single
base classifier within the ensemble. This base classifier, also
10
known as dominant classifier, is selected during the training
phase by evaluating the classification performance of all partici-
0 pating base classifiers using a cross-validation procedure. More-
BeePruner SA SGA TS over, TUPSO is a one-class classifier ensemble that uses meta-
Fig. 4. Average number of base one-class classifiers in the sub-ensembles found by learning techniques to combine the outputs of base classifiers
the pruning algorithms. within the ensemble. In fact, TUPSO is roughly based on the

Table 4
Comparison of EIOWA with other fusion rules in terms of AUC.

Dataset EIOWA Majority voting Max Mean Product

Arrhythmia 79.7 (1) 79.0 (3) 77.8 (4) 79.5 (2) 72.3 (5)
Cancer non-ret 59.2 (2) 56.4 (5) 61.6 (1) 58.7 (3) 57.1 (4)
Cancer ret 59.4 (1) 58.6 (3) 54.2 (5) 57.8 (4) 59.0 (2)
Concordia 2 90.6 (1) 88.9 (3) 87.5 (4) 90.2 (2) 76.3 (5)
Concordia 3 93.8 (1) 93.2 (3) 91.8 (4) 93.7 (2) 80.9 (5)
Glass 98.4 (1) 82.5 (4) 70.9 (5) 85.0 (3) 96.7 (2)
Imports 87.1 (1) 80.4 (4) 84.2 (2) 82.6 (3) 80.0 (5)
Ionosphere 97.0 (1) 96.7 (2) 95.4 (5) 96.6 (3) 96.0 (4)
Pump 2  2 noisy 90.1 (1) 87.5 (3) 78.9 (5) 88.2 (2) 83.7 (4)
Pump 1  3 noisy 94.1 (1) 90.0 (3) 84.4 (4) 91.1 (2) 70.1 (5)
Sonar mines 84.5 (1) 75.9 (5) 83.4 (2) 80.2 (3) 76.6 (4)
Sonar rocks 69.1 (1) 61.6 (5) 66.7 (3) 66.3 (4) 68.2 (2)
Spectf normal 96.3 (1) 95.0 (4) 95.2 (3) 95.6 (2) 94.6 (5)
Vowel 4 99.2 (2.5) 98.7 (4) 94.0 (5) 99.2 (2.5) 99.4 (1)
Wine 2 98.4 (1) 93.0 (4) 87.2 (5) 94.3 (2) 93.7 (3)

Overall average AUC 86.5 (1.17) 82.5 (3.67) 80.9 (3.80) 83.9 (2.63) 80.3 (3.73)

Table 5
Results of the Friedman/Nemenyi test for ranking the fusion rules in Table 4.

Tests Average ranks

F-statistic Critical value Significant differences Critical difference EIOWA Majority voting Max Mean Product

14.78 2.54 Yes 1.57 1.17 3.67 3.80 2.63 3.73


E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381 377

Majority voting stacking technique, and as so, it uses a single meta-classifier for
Product
Mean Max classifier fusion. However, contrary to the stacking technique
EIOWA where a meta-classifier is directly trained on the outputs of base
classifiers in an ensemble, TUPSO trains its meta-classifier on a
series of aggregations obtained from the outputs of the base
classifiers. More clearly, it evaluates the base classifiers us-
Fig. 5. Results of the Nemenyi test for the pairwise comparison of the fusion rules
in Table 4. ing some performance measures and translates the resulting

Table 6
Comparison of the average AUC of BeeOWA with that of RSM and PRSM [63].

Dataset BeeOWA RSM-Gauss RSM-NN RSM-KM PRSM-Gauss PRSM-NN PRSM-KM

Arrhythmia 79.7 (1) 78.0 (2) 75.5 (4) 75.2 (7) 77.8 (3) 75.3 (5.5) 75.3 (5.5)
Cancer non-ret 59.2 (1) 53.6 (2) 53.2 (4) 51.7 (7) 53.4 (3) 53.1 (5) 51.9 (6)
Cancer ret 59.4 (3) 60.9 (1) 57.6 (6.5) 58.6 (4) 59.6 (2) 58.0 (5) 57.6 (6.5)
Concordia 2 90.6 (1) 87.0 (2) 80.3 (5) 69.8 (6) 86.3 (3) 80.4 (4) 69.5 (7)
Concordia 3 93.8 (1.5) 93.8 (1.5) 87.2 (4) 81.9 (6) 93.2 (3) 86.6 (5) 81.5 (7)
Glass 98.4 (1) 74.7 (2) 73.7 (4) 73.9 (3) 73.3 (5) 72.5 (6.5) 72.5 (6.5)
Imports 87.1 (1) 68.5 (7) 86.5 (2) 72.3 (5) 71.0 (6) 86.3 (3) 73.1 (4)
Ionosphere 97.0 (3.5) 96.9 (5) 96.1 (6.5) 97.3 (1.5) 97.0 (3.5) 96.1 (6.5) 97.3 (1.5)
Pump 2  2 noisy 90.1 (1) 83.1 (3) 76.0 (5) 72.6 (6) 83.8 (2) 76.3 (4) 72.3 (7)
Pump 1  3 noisy 94.1 (1) 84.0 (3) 77.2 (4) 69.5 (6) 86.2 (2) 76.9 (5) 69.4 (7)
Sonar mines 84.5 (1) 64.1 (4) 68.8 (2) 63.6 (6) 63.5 (7) 68.2 (3) 63.8 (5)
Sonar rocks 69.1 (3) 64.7 (7) 72.3 (2) 67.4 (4.5) 65.0 (6) 72.4 (1) 67.4 (4.5)
Spectf normal 96.3 (1) 88.3 (5) 95.6 (3) 87.2 (7) 88.4 (4) 95.8 (2) 87.4 (6)
Vowel 4 99.2 (1.5) 95.6 (6) 99.2 (1.5) 98.5 (4) 94.5 (7) 99.1 (3) 98.2 (5)
Wine 2 98.4 (1) 92.9 (2) 91.7 (4.5) 92.5 (3) 90.4 (7) 90.5 (6) 91.7 (4.5)

Overall average AUC 86.5 (1.50) 79.1 (3.50) 79.4 (3.87) 75.5 (5.07) 78.9 (4.23) 79.2 (4.30) 75.3 (5.53)

Table 7
Results of the Friedman/Nemenyi test for ranking the ensemble approaches in Table 6.

Tests Average ranks

F-statistic Critical value Significant differences Critical difference BeeOWA RSM-Gauss RSM-NN RSM-KM PRSM-Gauss PRSM-NN PRSM-KM

8.11 2.21 Yes 2.33 1.50 3.50 3.87 5.07 4.23 4.30 5.53

Table 8
Comparison of the average AUC of BeeOWA with that of TUPSO, ESBE, and RC [68].

Dataset BeeOWA TUPSO ESBE RC

Audiology 86.9 (2) 88.8 (1) 85.7 (3) 76.5 (4)


Breast-cancer 69.2 (1) 50.4 (3) 50.6 (2) 49.5 (4)
C.H. disease 74.4 (1) 65.4 (2) 49.0 (4) 56.0 (3)
Credit-rating 80.0 (1) 78.8 (2) 69.1 (3) 63.8 (4)
E-coli 94.7 (2) 96.3 (1) 93.4 (3) 91.7 (4)
H. statlog 74.5 (1) 69.2 (2) 50.0 (4) 56.9 (3)
Hepatitis 72.8 (2) 74.5 (1) 66.9 (3) 61.5 (4)
Horse-colic 72.4 (1) 63.5 (2) 55.2 (3) 55.1 (4)
Hypothyroid 83.8 (1) 56.8 (3) 60.7 (2) 50.7 (4)
Ionosphere 97.0 (1) 96.4 (2) 88.4 (3) 84.6 (4)
Letter 98.5 (1) 95.8 (2) 87.6 (3) 76.2 (4)
M-feature 99.8 (1) 97.2 (2) 92.4 (3) 81.5 (4)
Opt-digits 99.8 (1) 97.9 (2) 92.7 (3) 84.2 (4)
Pen digits 100.0 (1) 99.7 (2) 97.6 (3) 93.6 (4)
Diabetes 59.0 (1) 55.5 (2) 49.3 (4) 52.7 (3)
Segment 100.0 (1) 71.7 (2) 66.2 (3) 61.5 (4)
Sonar 69.1 (1) 57.5 (2) 47.3 (4) 52.2 (3)
Soybean 95.1 (1) 80.1 (2) 52.4 (4) 58.9 (3)
Spam-base 71.0 (1) 63.0 (3) 67.6 (2) 58.4 (4)
Splice 98.6 (2) 98.9 (1) 97.6 (3) 73.8 (4)
Vehicle 90.0 (1) 78.7 (2) 59.5 (4) 60.8 (3)
Vowel 99.2 (1) 84.9 (2) 65.1 (4) 67.1 (3)
Waveform 85.1 (2) 86.5 (1) 75.2 (4) 76.9 (3)
WB-cancer 96.2 (3) 97.9 (1) 96.7 (2) 76.8 (4)
Zoo 100.0 (1) 94.3 (2) 87.9 (3) 73.2 (4)

Overall average AUC 86.7 (1.28) 80.0 (1.88) 72.2 (3.16) 67.8 (3.68)
378 E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381

estimates into static weights, which are then used to train the exponential consistency measure in Section 3.1.1 to compute the
meta-classifier. order-inducing values of the base one-class classifiers.
As mentioned above, most of rules used for one-class classifier BeePruner employs four control parameters: the number of
fusion are the simple fixed rules, such as majority voting, mean, food sources (SN) which is equal to the number of employed or
max, min, and product. All of these rules suffer from an intrinsic onlooker bees, the value of limit (LM), the maximum cycle number
limitation that they do not take into account the properties of base (CN), and the consistency coefficient (α). For all experiments, we
classifiers in the ensemble. set SN to 25, LM to 100, CN to 1000, and α to 10. However, it has
been shown that the ABC algorithm does not need a fine tuning for
its control parameters in order to obtain satisfactory good results
5. Experiments [75]. Moreover, for EIOWA, we set the control parameter λ to 5.

In this section, we evaluate the performance of BeeOWA using


several benchmark datasets and compare it with that of state-of- 5.2. Experimental results
the-art one-class classifier ensemble approaches in the literature.
All experiments were implemented in MATLAB using PRTools [69] In order to analyze how various parameter settings affect the
and the Data Description toolbox [70]. All of the reported results classification performance of BeeOWA, we conducted several
were averaged over 5 times 10-fold cross-validation. For the compar- experiments and reported the results for four randomly selected
ison, we used the Friedman test [71] with the Nemenyi post hoc test datasets, namely C.H. disease, Cancer non-ret, Imports, and Vehi-
[72], as recommended in [73]. The confidence level was 95%. cle. Figs. 1 and 2 compare the average AUC of BeeOWA for different
values of the parameters SN and λ. From the results in Fig. 1, we
5.1. Datasets and experimental setup conclude that BeeOWA is able to preserve its robustness for
different values of SN and so does not need a fine tuning for this
The benchmark datasets used in the experiments were taken parameter. Furthermore, from Fig. 2, we find that with an increase
from the UCI Machine Learning Repository [74]. The datasets in λ, there is a corresponding decrease in the average AUC. As we
contain multiple classes and are widely used in multi-class proved in Lemma 2, the orness degree of EIOWA monotonically
classification problems. However, to evaluate the classification increases as λ increases. This causes that for large values of λ, the
performance of one-class classifier ensembles, we need datasets orness degree approaches to 1 and so EIOWA only considers the
containing only target and outlier objects. For this reason, we output of the base one-class classifier with the highest exponential
followed two different strategies, as reported in [63,68], to convert consistency measure (i.e., it becomes equivalent to the max
a multi-class dataset into a one-class dataset. For the datasets used operation) and does not pay attention to the outputs of other base
in [63], we treated one of the classes as the target class and joined one-class classifiers, resulting in more decrease in the average
the other classes into the outlier class. For the other datasets [68], AUC. From the results shown in Fig. 2, we find that λ ¼ 5 gives a
we selected two most prominent classes and filtered out the reasonable degree of orness.
remaining classes. A brief description of the obtained one-class Next, we conducted experiments to quantitatively evaluate the
datasets is given in Table 1. impact of BeePruner and EIOWA on the performance of BeeOWA. To
To construct an initial ensemble of base one-class classifiers, we do so, we implemented three metaheuristic search algorithms,
used RSM [8] with heterogeneous classifiers. To do so, we selected six namely standard genetic algorithm (SGA), tabu search (TS), and
typical one-class classifiers, namely Gaussian (Gauss), mixture of simulated annealing (SA) to prune the initial ensemble, similar to
Gaussian (MoG), Parzen-window (PW), support vector data descrip- that which we performed for BeePruner. We also implemented four
tion (SVDD), nearest neighbor (NN), and K-means (KM), and built 15 well-known fixed fusion rules, namely majority voting, max, mean,
models from each of them using random feature subsets of the and product, to combine the outputs of base one-class classifiers in
datasets, resulting an initial ensemble of 90 base one-class classifiers. the sub-ensembles found by BeePruner. For each experiment, we
We assumed that each feature subset consisted of 40% of original assigned ranks to the pruning algorithms or fusion rules. The best
features and the overlap among feature subsets was allowed. pruning algorithm or fusion rule received the rank 1.
Subsequently, we used BeePruner for one-class classifier pruning Table 2 compares the average AUC and the ranks for all the
and EIOWA for one-class classifier fusion. For EIOWA, we used the pruning algorithms. Clearly, BeePruner has the best average rank
among all other pruning algorithms. Moreover, in comparison
Table 9 with SA, SGA, and TS, the average AUC of BeePruner improves the
Results of the Friedman/Nemenyi test for ranking the ensemble approaches in
Table 8.

TUPSO ESBE
Tests Average ranks
BeeOWA RC

F- Critical Significant Critical BeeOWA TUPSO ESBE RC


statistic value differences difference

68.36 2.73 Yes 0.94 1.28 1.88 3.16 3.68 Fig. 7. Results of the Nemenyi test for the pairwise comparison of the ensemble
approaches in Table 8.

RSM-NN
PRSM-Gauss
PRSM-NN
RSM-KM
RSM-Gauss
PRSM-KM
BeeOWA

Fig. 6. Results of the Nemenyi test for the pairwise comparison of the ensemble approaches in Table 6.
E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381 379

[68]. It should be mentioned that PRSM is a pruned version of RSM


in which a simple consistency measure is used to find the best
7.4 subspaces and RC is a randomly selected classifier. Moreover,
similar to previous experiments, for each dataset, we assigned
7.1 ranks to the approaches. The results of comparisons are shown in
Tables 6 and 8. The average AUCs of RSM and PRSM are reported
11.0 for 100 base classifiers (subspaces) and a subset of 50 out of the
total of 100 base classifiers, respectively. Tables 7 and 9 show the
7.6 results of the statistical tests and Figs. 6 and 7 illustrate them in
more detail. For most datasets, BeeOWA has the highest rank and
7.3 outperforms the other approaches in terms of the average AUC.
The improvement is not always very large, but for some datasets,
11.2
such as Glass, Pump, Sonar mines, and Wine, it is quite significant.
Fig. 8 shows absolute improvements in the overall average AUC
when comparing BeeOWA to the other approaches. The minimum
6.7
and maximum improvements are 6.7 and 18.9, respectively. As a
result, we conclude that BeeOWA has taken more advantages for
14.5
one-class classifier ensemble.

18.9

6. Conclusions
0 5 10 15 20 25 30 35 40 45 50
Improvement It has been shown that using the best classifier and discarding the
classifiers with poorer performance might waste valuable informa-
Fig. 8. Absolute improvements in the overall average AUC when comparing tion [45]. For this reason, we commonly use classifier ensembles to
BeeOWA to the other approaches.
improve the classification accuracy. In general, the process of con-
structing a classifier ensemble consists of three main steps: classifier
performance by 1.6%, 5.0%, and 3.1%, respectively. The statistical generation, classifier pruning, and classifier fusion. Although classi-
significance of the ranking differences was determined by the fier pruning and classifier fusion have been sufficiently studied in
Friedman test, followed by the Nemenyi post hoc test for pairwise multi-class classifier ensembles, so far little work has been reported
comparisons. The results are summarized in Table 3. The F-statistic in the literature to address them in one-class classifier ensembles.
indicates whether or not the ranking differences are statistically In this paper, we presented BeeOWA, a novel approach to
significant. More specifically, significant differences are found if F- construct highly accurate one-class classifier ensembles. It uses a
statistic is greater than a critical value, 2.83 in this case. In such a novel binary artificial bee colony algorithm, called BeePruner, for
situation, the Nemenyi test is used to compare any two pruning one-class classifier pruning and a novel exponential induced OWA
algorithms. A significant difference between two pruning algo- operator, called EIOWA, for one-class classifier fusion. We evalu-
rithms occurs when the difference in rankings is greater than a ated the performance of BeeOWA using a wide range of bench-
critical difference, 1.21 in this case. Fig. 3 illustrates the results of mark datasets and compared it with that of several state-of-the-
the pairwise comparisons performed using the Nemenyi test. The art approaches in the literature, such as RSM, PRSM [63], and
horizontal axis indicates the average ranks of the pruning algo- TUPSO [68]. The results of experiments showed that BeeOWA can
rithms. The colored lines on the top of the axis connect the significantly improve the average AUC as compared to RSM, PRSM,
pruning algorithms that are not significantly different. Fig. 4 shows and TUPSO.
the average number of base one-class classifiers in the sub-
ensembles found by the pruning algorithms. SA finds larger sub-
ensembles than BeePruner, incurring more computational cost in References
subsequent steps. In contrast, SGA is able to find smaller sub-
ensembles; however, its average AUC is significantly less than that [1] J.J. Rodriguez, L.I. Kuncheva, C.J. Alonso, Rotation forest: a new classifier
of BeePruner. Moreover, on the basis of the obtained results, we ensemble method, IEEE Trans. Pattern Anal. Mach. Intell. 28 (10) (2006)
1619–1630. http://dx.doi.org/10.1109/TPAMI.2006.211.
conclude that BeePruner can help to reach a better trade-off [2] C. Zhang, Q. Cai, Y. Song, Boosting with pairwise constraints, Neurocomputing
between the ensemble complexity and classification performance. 73 (4–6) (2010) 908–919. http://dx.doi.org/10.1016/j.neucom.2009.09.013.
Table 4 shows the average AUC and the ranks for all the fusion [3] L. Li, B. Zou, Q. Hu, X. Wu, D. Yu, Dynamic classifier ensemble using
classification confidence, Neurocomputing 99 (2013) 581–591. http://dx.doi.
rules. Obviously, EIOWA has the best average rank among all other org/10.1016/j.neucom.2012.07.026.
fusion rules. The next best fusion rule, by a large margin, is mean. [4] Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms, CRC Press, Boca
Moreover, in comparison with majority voting, max, mean, and Raton, FL, USA, 2012.
[5] C. Lin, W. Chen, C. Qiu, Y. Wu, S. Krishnan, Q. Zou, LibD3C: ensemble classifiers
product, the average AUC of EIOWA improves the performance by with a clustering and dynamic selection strategy, Neurocomputing 123 (2014)
4.8%, 6.9%, 3.1%, and 7.7%, respectively. Table 5 and Fig. 5 summar- 424–435. http://dx.doi.org/10.1016/j.neucom.2013.08.004.
ize the results of the Friedman/Nemenyi test for ranking the fusion [6] L. Breiman, Bagging predictors, Mach. Learn. 24 (2) (1996) 123–140. http://dx.
doi.org/10.1023/A:1018054314350.
rules. Based on the obtained results, we find that EIOWA has the [7] R.E. Schapire, The boosting approach to machine learning: an overview, in: D.
highest average rank, 1.17, which is significantly greater than that D. Denison, M.H. Hansen, C.C. Holmes, B. Mallick, B. Yu (Eds.), Nonlinear
of majority voting, max, and product. Estimation and Classification, Lecture Notes in Statistics, vol. 171, Springer,
New York, NY, USA, 2003, pp. 149–171. http://dx.doi.org/10.1007/978-0-387-
Accordingly, we conclude that BeePruner and EIOWA can be
21579-2_9.
considered as good candidates for one-class classifier pruning and [8] T.K. Ho, The random subspace method for constructing decision forests, IEEE
one-class classifier fusion, respectively. Trans. Pattern Anal. Mach. Intell. 20 (8) (1998) 832–844. http://dx.doi.org/
Subsequently, we conducted experiments to compare the 10.1109/34.709601.
[9] Z.-H. Zhou, J. Wu, W. Tang, Ensembling neural networks: many could be better
classification performance of BeeOWA with that of RSM and PRSM than all, Artif. Intell. 137 (1–2) (2002) 239–263. http://dx.doi.org/10.1016/
as reported in [63] and that of TUPSO, ESBE, and RC as reported in S0004-3702(02)00190-X.
380 E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381

[10] R. Lysiak, M. Kurzynski, T. Woloszynski, Optimal selection of ensemble classifiers [35] J. Kittler, M. Hatef, R.P.W. Duin, J. Matas, On combining classifiers, IEEE Trans.
using measures of competence and diversity of base classifiers, Neurocomputing Pattern Anal. Mach. Intell. 20 (3) (1998) 226–239. http://dx.doi.org/10.1109/
126 (2014) 29–35. http://dx.doi.org/10.1016/j.neucom.2013.01.052. 34.667881.
[11] B. Krawczyk, M. Woźniak, Optimization algorithms for one-class classi- [36] R.P.W. Duin, The combining classifier: to train or not to train?, in: Proceedings
fication ensemble pruning, in: N.T. Nguyen, B. Attachoo, B. Trawiński, of the 16th International Conference on Pattern Recognition, IEEE, Washing-
K. Somboonviwat (Eds.), Intelligent Information and Database Systems, ton, DC, USA, 2002, pp. 765–770. http://dx.doi.org/10.1109/ICPR.2002.1048415.
Lecture Notes in Computer Science, vol. 8398, Springer International Publish- [37] M. Grabisch, J.-M. Nicolas, Classification by fuzzy integral: performance and
ing, Switzerland, 2014, pp. 127–136. http://dx.doi.org/10.1007/978-3-319- tests, Fuzzy Sets Syst. 65 (2–3) (1994) 255–271. http://dx.doi.org/10.1016/
05458-2_14. 0165-0114(94)90023-X.
[12] D. Karaboga, B. Basturk, A powerful and efficient algorithm for numerical [38] S.-B. Cho, J.H. Kim, Combining multiple neural networks by fuzzy integral for
function optimization: artificial bee colony (ABC) algorithm, J. Global Optim. robust classification, IEEE Trans. Syst. Man Cybern. 25 (2) (1995) 380–384.
39 (3) (2007) 459–471. http://dx.doi.org/10.1007/s10898-007-9149-x. http://dx.doi.org/10.1109/21.364825.
[13] J. Kennedy, Particle swarm optimization, in: C. Sammut, G.I. Webb (Eds.), [39] M. Reformat, R.R. Yager, Building ensemble classifiers using belief functions
Encyclopedia of Machine Learning, Springer, New York, NY, USA, 2010, and OWA operators, Soft Comput. 12 (6) (2008) 543–558. http://dx.doi.org/
pp. 760–766. http://dx.doi.org/10.1007/978-0-387-30164-8_630. 10.1007/s00500-007-0227-2.
[14] R. Storn, K. Price, Differential evolution: a simple and efficient heuristic for [40] D.M.J. Tax, R.P.W. Duin, Uniform object generation for optimizing one-class
global optimization over continuous spaces, J. Global Optim. 11 (4) (1997) classifiers, J. Mach. Learn. Res. 2 (2002) 155–173.
341–359. http://dx.doi.org/10.1023/A:1008202821328. [41] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons,
[15] F. Barani, M. Abadi, An ABC-AIS hybrid approach to dynamic anomaly Hoboken, NJ, USA, 2012.
detection in AODV-based MANETs, in: Proceedings of the 10th IEEE Interna- [42] S.-W. Lee, J. Park, S.-W. Lee, Low resolution face recognition based on support
tional Conference on Trust, Security and Privacy in Computing and Commu- vector data description, Pattern Recognit. 39 (9) (2006) 1809–1812. http://dx.
nications, IEEE, Washington, DC, USA, 2011, pp. 714–720. http://dx.doi.org/10. doi.org/10.1016/j.patcog.2006.04.033.
1109/TrustCom.2011.92. [43] A. Banerjee, P. Burlina, C. Diehl, A support vector method for anomaly
[16] P. Shunmugapriya, S. Kanmani, Optimization of stacking ensemble configura- detection in hyperspectral imagery, IEEE Trans. Geosci. Remote Sens. 44 (8)
tions through artificial bee colony algorithm, Swarm Evol. Comput. 12 (2013) (2006) 2282–2291. http://dx.doi.org/10.1109/TGRS.2006.873019.
24–32. http://dx.doi.org/10.1016/j.swevo.2013.04.004. [44] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice Hall, Upper
[17] C. Brodley, T. Lane, Creating and exploiting coverage and diversity, in: Saddle River, NJ, USA, 1988.
Proceedings of the AAAI-96 Workshop on Integrating Multiple Learned [45] D.M.J. Tax, R.P.W. Duin, Combining one-class classifiers, in: J. Kittler, F. Roli (Eds.),
Models, Citeseer, 1996, pp. 8–14. Multiple Classifier Systems, Lecture Notes in Computer Science, vol. 2096, Springer,
[18] A. Tsymbal, M. Pechenizkiy, P. Cunningham, Diversity in search strategies for Berlin, Heidelberg, Germany, 2001, pp. 299–308. http://dx.doi.org/10.1007/
ensemble feature selection, Inf. Fusion 6 (1) (2005) 83–98. http://dx.doi.org/ 3-540-48219-9_30.
10.1016/j.inffus.2004.04.003. [46] R.R. Yager, A note on weighted queries in information retrieval systems, J. Am.
[19] L. Xu, A. Krzyzak, C.Y. Suen, Methods of combining multiple classifiers and Soc. Inf. Sci. 38 (1) (1987) 23–24.
their applications to handwriting recognition, IEEE Trans. Syst. Man Cybern. 22 [47] R.R. Yager, D.P. Filev, Fuzzy logic controllers with flexible structures, in:
(3) (1992) 418–435. http://dx.doi.org/10.1109/21.155943. Proceedings of the 2nd International Conference on Fuzzy Sets and Neural
[20] C.Y. Suen, L. Lam, Multiple classifier combination methodologies for different Networks, 1992, pp. 317–320.
output levels, in: Multiple Classifier Systems, Lecture Notes in Computer [48] R. Fuller, On obtaining OWA operator weights: a short survey of recent
Science, vol. 1857, Springer, Berlin, Heidelberg, Germany, 2000, pp. 52–66.
developments, in: Proceedings of the 5th IEEE International Conference on
http://dx.doi.org/10.1007/3-540-45014-9_5.
Computational Cybernetics, IEEE, Washington, DC, USA, 2007, pp. 241–244.
[21] R.P.W. Duin, D.M.J. Tax, Experiments with classifier combining rules, in:
http://dx.doi.org/10.1109/ICCCYB.2007.4402042.
Multiple Classifier Systems, Lecture Notes in Computer Science, vol. 1857,
[49] R.R. Yager, D.P. Filev, Induced ordered weighted averaging operators, IEEE
Springer, Berlin, Heidelberg, Germany, 2000, pp. 16–29. http://dx.doi.org/10.
Trans. Syst. Man Cybern. 29 (2) (1999) 141–150. http://dx.doi.org/10.1109/
1007/3-540-45014-9_2.
3477.752789.
[22] R.R. Yager, On ordered weighted averaging aggregation operators in multi-
[50] M. Talebi, M. Abadi, BeeMiner: a novel artificial bee colony algorithm for
criteria decision making, IEEE Trans. Syst. Man Cybern. 18 (1) (1988) 183–190.
classification rule discovery, in: Proceedings of the 2014 Iranian Conference on
http://dx.doi.org/10.1109/21.87068.
Intelligent Systems, IEEE, Washington, DC, USA, 2014, pp. 1–5. http://dx.doi.
[23] L.I. Kuncheva, Fuzzy Classifier Design, Studies in Fuzziness and Soft Comput-
org/10.1109/IranianCIS.2014.6802576.
ing, vol. 49, Physica-Verlag, Heidelberg, Germany, 2000 http://dx.doi.org/10.
[51] D.M.J. Tax, K. Muller, A consistency-based model selection for one-class
1007/978-3-7908-1850-5.
classification, in: Proceedings of the 17th International Conference on Pattern
[24] B. Schölkopf, A.J. Smola, Learning with Kernels: Support Vector Machines,
Recognition, IEEE, Washington, DC, USA, 2004, pp. 363–366. http://dx.doi.org/
Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, USA,
10.1109/ICPR.2004.1334542.
2001.
[52] A.J.C. Sharkey, N.E. Sharkey, Combining diverse neural nets, Knowl. Eng. Rev.
[25] D.M.J. Tax, R.P.W. Duin, Support vector data description, Mach. Learn. 54 (1)
(2004) 45–66. http://dx.doi.org/10.1023/B:MACH.0000008084.60811.49. 12 (3) (1997) 231–247.
[26] D.M.J. Tax, One-class classification (Ph.D. thesis), Delft University of Technol- [53] D. Ruta, B. Gabrys, Classifier selection for majority voting, Inf. Fusion 6 (1)
ogy, 2001. (2005) 63–81. http://dx.doi.org/10.1016/j.inffus.2004.04.008.
[27] W. Khreich, E. Granger, A. Miri, R. Sabourin, Iterative boolean combination of [54] L.I. Kuncheva, C.J. Whitaker, Measures of diversity in classifier ensembles and
classifiers in the ROC space: an application to anomaly detection with HMMs, their relationship with the ensemble accuracy, Mach. Learn. 51 (2) (2003)
Pattern Recognit. 43 (8) (2010) 2732–2752. http://dx.doi.org/10.1016/j. 181–207. http://dx.doi.org/10.1023/A:1022859003006.
patcog.2010.03.006. [55] B. Krawczyk, M. Woźniak, Diversity measures for one-class classifier ensem-
[28] M. Ghannad-Rezaie, H. Soltanian-Zadeh, H. Ying, M. Dong, Selection-fusion bles, Neurocomputing 126 (2014) 36–44. http://dx.doi.org/10.1016/j.
approach for classification of datasets with missing values, Pattern Recognit. neucom.2013.01.053.
43 (6) (2010) 2340–2350. http://dx.doi.org/10.1016/j.patcog.2009.12.003. [56] J.L. Fleiss, B. Levin, M.C. Paik, Statistical Methods for Rates and Proportions,
[29] T. Woloszynski, M. Kurzynski, A probabilistic model of classifier competence John Wiley & Sons, Hoboken, NJ, USA (2003) http://dx.doi.org/10.1002/
for dynamic ensemble selection, Pattern Recognit. 44 (10–11) (2011) 0471445428.
2656–2668. http://dx.doi.org/10.1016/j.patcog.2011.03.020. [57] D. Filev, R.R. Yager, Analytic properties of maximum entropy OWA operators,
[30] R.E. Schapire, Y. Freund, Boosting: Foundations and Algorithms, MIT Press, Inf. Sci. 85 (1–3) (1995) 11–27. http://dx.doi.org/10.1016/0020-0255(94)
Cambridge, MA, USA, 2012. 00109-O.
[31] Q.L. Zhao, Y.H. Jiang, M. Xu, Incremental learning by heterogeneous bagging [58] R. Perdisci, G. Gu, W. Lee, Using an ensemble of one-class SVM classifiers to
ensemble, in: L. Cao, J. Zhong, Y. Feng (Eds.), Advanced Data Mining and harden payload-based anomaly detection systems, in: Proceedings of the 6th
Applications, Lecture Notes in Computer Science, vol. 6441, Springer, Berlin, IEEE International Conference on Data Mining, IEEE, Washington, DC, USA,
Heidelberg, Germany, 2010, pp. 1–12. http://dx.doi.org/10.1007/978-3- 2006, pp. 488–498. http://dx.doi.org/10.1109/ICDM.2006.165.
642-17313-4_1. [59] G. Giacinto, R. Perdisci, M. Del Rio, F. Roli, Intrusion detection in computer
[32] A.L.V. Coelho, D.S.C. Nascimento, On the evolutionary design of heterogeneous networks by a modular ensemble of one-class classifiers, Inf. Fusion 9 (1)
bagging models, Neurocomputing 73 (16–18) (2010) 3319–3322. http://dx.doi. (2008) 69–82. http://dx.doi.org/10.1016/j.inffus.2006.10.002.
org/10.1016/j.neucom.2010.07.008. [60] L. Nanni, Experimental comparison of one-class classifiers for online signature
[33] S. Sheen, S.V. Aishwarya, R. Anitha, S.V. Raghavan, S.M. Bhaskar, Ensemble verification, Neurocomputing 69 (7–9) (2006) 869–873. http://dx.doi.org/
pruning using harmony search, in: E. Corchado, V. Snášel, A. Abraham, 10.1016/j.neucom.2005.06.007.
M. Woźniak, M. Graña, S.-B. Cho (Eds.), Hybrid Artificial Intelligent Systems, [61] M.R.P. Souza, G.D.C. Cavalcanti, T.I. Ren, Off-line signature verification: an
Lecture Notes in Computer Science, vol. 7209, Springer, Berlin, Heidelberg, approach based on combining distances and one-class classifiers, in: Proceed-
Germany, 2012, pp. 13–24. http://dx.doi.org/10.1007/978-3-642-28931-6_2. ings of the 22nd IEEE International Conference on Tools with Artificial
[34] L. Abdi, S. Hashemi, GAB-EPA: a GA based ensemble pruning approach to Intelligence, IEEE, Washington, DC, USA, 2010, pp. 7–11. http://dx.doi.org/10.
tackle multiclass imbalanced problems, in: A. Selamat, N.T. Nguyen, H. Haron 1109/ICTAI.2010.10.
(Eds.), Intelligent Information and Database Systems, Lecture Notes in Com- [62] R.-S. Wu, W.-H. Chung, Ensemble one-class support vector machines for
puter Science, vol. 7802, Springer, Berlin, Heidelberg, Germany, 2013, content-based image retrieval, Expert Syst. Appl. 36 (3) (2009) 4451–4459.
pp. 246–254. http://dx.doi.org/10.1007/978-3-642-36546-1_26. http://dx.doi.org/10.1016/j.eswa.2008.05.037.
E. Parhizkar, M. Abadi / Neurocomputing 166 (2015) 367–381 381

[63] V. Cheplygina, D.M.J. Tax, Pruned random subspace method for one-class Elham Parhizkar received the M.Sc. degree in compu-
classifiers, in: C. Sansone, J. Kittler, F. Roli (Eds.), Multiple Classifier Systems, ter engineering, with first class honors, from Tarbiat
Lecture Notes in Computer Science, vol. 6713, Springer, Berlin, Heidelberg, Modares University in 2014, where she worked on
Germany, 2011, pp. 96–105. http://dx.doi.org/10.1007/978-3-642-21557-5_12. anomaly detection in web traffic as her master thesis.
[64] A.P. Bradley, The use of the area under the ROC curve in the evaluation of Her main research interests are in the field of machine
machine learning algorithms, Pattern Recognit. 30 (7) (1997) 1145–1159. http: learning, particularly in the areas of one-class classifi-
//dx.doi.org/10.1016/S0031-3203(96)00142-2. cation and outlier detection.
[65] J. Liu, J. Song, Q. Miao, Y. Cao, FENOC: an ensemble one-class learning
framework for malware detection, in: Proceedings of the 9th International
Conference on Computational Intelligence and Security, IEEE, Washington, DC,
USA, 2013, pp. 523–527. http://dx.doi.org/10.1109/CIS.2013.116.
[66] L. Nanni, A. Lumini, Random bands: a novel ensemble for fingerprint
matching, Neurocomputing 69 (13–15) (2006) 1702–1705. http://dx.doi.org/
10.1016/j.neucom.2006.01.011.
[67] B. Cyganek, Image segmentation with a hybrid ensemble of one-class support Mahdi Abadi received the B.Sc. degree in computer
vector machines, in: M. Graña Romay, E. Corchado, M.T. Garcia Sebastian engineering from Ferdowsi University of Mashhad in
(Eds.), Hybrid Artificial Intelligence Systems, Lecture Notes in Computer 1998. He also received the M.Sc. and Ph.D. degrees from
Science, vol. 6076, Springer, Berlin, Heidelberg, Germany, 2010, Tarbiat Modares University in 2001 and 2008, respec-
pp. 254–261. http://dx.doi.org/10.1007/978-3-642-13769-3_31. tively. Since 2009, he has been an assistant professor in
[68] E. Menahem, L. Rokach, Y. Elovici, Combining one-class classifiers via meta the Faculty of Electrical and Computer Engineering at
learning, in: Proceedings of the 22nd ACM International Conference on Tarbiat Modares University. His main research interests
Conference on Information & Knowledge Management, ACM, New York, NY, are network security, evolutionary computation, and
USA, 2013, pp. 2435–2440. http://dx.doi.org/10.1145/2505515.2505619. data mining.
[69] F. van der Heijden, R.P.W. Duin, D. de Ridder, D.M.J. Tax, Classification,
Parameter Estimation and State Estimation: An Engineering Approach Using
MATLAB, John Wiley & Sons, Hoboken, NJ, USA, 2004.
[70] D.M.J. Tax, DDtools, the data description toolbox for matlab, version 2.1.1, 2014.
URL: 〈http://prlab.tudelft.nl/david-tax/dd_tools.html〉.
[71] M. Friedman, The use of ranks to avoid the assumption of normality implicit in
the analysis of variance, J. Am. Stat. Assoc. 32 (200) (1937) 675–701. http://dx.
doi.org/10.1080/01621459.1937.10503522.
[72] P. Nemenyi, Distribution-free multiple comparisons (Ph.D. thesis), Princeton
University, 1963.
[73] J. Demšar, Statistical comparisons of classifiers over multiple data sets, J. Mach.
Learn. Res. 7 (2006) 1–30.
[74] M. Lichman, UCI machine learning repository, 2013. URL: 〈http://archive.ics.
uci.edu/ml〉.
[75] B. Akay, D. Karaboga, Parameter tuning for the artificial bee colony algorithm, in:
N.T. Nguyen, R. Kowalczyk, S.-M. Chen (Eds.), Computational Collective Intelli-
gence, Semantic Web, Social Networks and Multiagent Systems, Lecture Notes in
Computer Science, vol. 5796, Springer, Berlin, Heidelberg, Germany, 2009,
pp. 608–619. http://dx.doi.org/10.1007/978-3-642-04441-0_53.

You might also like