Anomaly Detection Using SOM and Particle Swarm Optimization

Uploaded by

jaesenn

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Anomaly Detection Using SOM and Particle Swarm Optimization

Uploaded by

jaesenn

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Scientia Iranica D (2011) 18 (6), 1460–1468

Sharif University of Technology

Scientia Iranica
Transactions D: Computer Science & Engineering and Electrical Engineering
www.sciencedirect.com

Research note

Anomaly detection using a self-organizing map and particle

swarm optimization
M. Lotfi Shahreza a,∗ , D. Moazzami a,b,c , B. Moshiri d , M.R. Delavar e
a
Department of Algorithms and Computations, Faculty of Engineering, University of Tehran, P.O. Box 14395-195, Tehran, Iran
b
School of Mathematics, Institute for Research in Fundamental Sciences (IPM), P.O. Box 19395-5746, Tehran, Iran
c
Center of Excellence in Geomatic Engineering and Disaster Management, Tehran, Iran
d
Control & Intelligent Processing, Center of Excellence, School of ECE, Department of Electronic and Computer Engineering, University of Tehran, Faculty
of Engineering, P.O. Box 14539-515, Tehran, Iran
e
Center of Excellence in Geomatic Engineering and Disaster Management, Department of Surveying and Geomatic Engineering, University of Tehran,
Faculty of Engineering, P.O. Box 11155-4563, Tehran, Iran

Received 30 March 2010; revised 12 July 2010; accepted 18 January 2011

KEYWORDS Abstract Self-Organizing Maps (SOMs) are among the most well-known, unsupervised neural network
Anomaly detection; approaches to clustering, which are very efficient in handling large and high dimensional datasets. The
Data fusion; original Particle Swarm Optimization (PSO) is another algorithm discovered through simplified social
Neural network; model simulation, which is effective in nonlinear optimization problems and easy to implement. In the
PSO; present study, we combine these two methods and introduce a new method for anomaly detection. A
Forest fire. discussion about our method is presented, its results are compared with some other methods and its
advantages over them are demonstrated. In order to apply our method, we also performed a case study
on forest fire detection. Our algorithm was shown to be simple and to function better than previous
ones. We can apply it to different domains of anomaly detection. In fact, we observed our method to be
a generic algorithm for anomaly detection that may need few changes for implementation in different
domains.
© 2012 Sharif University of Technology. Production and hosting by Elsevier B.V.
Open access under CC BY-NC-ND license.

1. Introduction We can use anomaly detection methods in a variety of

domains, such as intrusion detection, fraud detection, system
Anomaly detection refers to detecting patterns in a given health monitoring, event detection in sensor networks, and
data set that do not conform to an established, normal behavior. detecting eco-system disturbances.
The patterns thus detected are called anomalies, and often Three broad categories of anomaly detection techniques
translate to critical and actionable information in several exist:
application domains. Anomalies are also referred to as outliers,
surprise, aberrant, deviation, peculiarity, etc. • Supervised anomaly detection techniques learn a classifier,
using labeled instances belonging to normal and anomaly
classes, and then assign a normal or anomalous label to a test
∗ Corresponding author. instance [1,2].
E-mail address: Maryam.lotfi@gmail.com (M. Lotfi Shahreza). • Semi-supervised anomaly detection techniques construct a
1026-3098 © 2012 Sharif University of Technology. Production and hosting by
model representing normal behavior from a given normal
Elsevier B.V. Open access under CC BY-NC-ND license. training data set, and then test the likelihood of a test
Peer review under responsibility of Sharif University of Technology. instance to be generated by the learnt model [1–3].
doi:10.1016/j.scient.2011.08.025
• Unsupervised anomaly detection techniques detect anoma-
lies in an unlabeled test data set, under the assumption that
the majority of instances in the data set are normal [1,2].
Our method for anomaly detection is an unsupervised
method based on clustering, so firstly we should explain it
briefly.
M. Lotfi Shahreza et al. / Scientia Iranica, Transactions D: Computer Science & Engineering and Electrical Engineering 18 (2011) 1460–1468 1461

Clustering means the act of partitioning an unlabeled dataset system. Greensmith et al. [19] represented a new algorithm
into groups of similar objects. Each group, called a ‘cluster’, for anomaly detection, based on simulation of the human
consists of objects that are similar between themselves and immune system. According to the authors claim, the algorithm
dissimilar to objects of other groups [4]. There are some performs well on the task of detecting a ping-based port scan
different methods for measuring this similarity, some typical and may also be applied to other detection or data correlation
types of which we will describe. problems, such as the analysis of radio signal data from space,
First, you will see a brief survey of anomaly detection sensor networks, internet worm detection and other security
systems, and then we give an introduction to generic SOM and and defense applications. Another experiment in this field has
PSO in Sections 1.2 and 1.3, respectively. In Section 1.4, we undertaken by Twycross and Aickelin [20].
explain about different attempted methods to use a composite Artificial neural networks are another intelligent method
of SOM and PSO. In Section 2, we have some definitions, used for anomaly detection. Brause et al. [21] used a compound
and we represent main methods of similarity measuring in method based on rule-based systems and an artificial neural
Section 2.1. We need a criterion for measuring the performance network for credit card fraud detection. Other neural network-
of any approach to compare it with other methods. Thus, some based credit card fraud detection has been undertaken by
basic approaches for measuring the performance of clustering Hassibi [22], Dorronsoro et al. [23] and Syeda et al. [24]. Wang
methods have been expressed in Section 2.2, and in Section 3 et al. [6] proposed a new approach called FC-ANN based on ANN
we represent our method. Section 4 is a case study of forest and fuzzy clustering to solve the problem and help IDS achieve
fire detection based on our suggested method. We have a a higher detection rate.
discussion about our method and compare its results with some An important part of anomaly detection methods is focused
other methods in Section 5. Finally, the results are presented in on computer intrusion detection.
Section 6. The task of an intrusion detection system is to protect
a computer system by detecting and diagnosing attempted
1.1. A survey on anomaly detection breaches of the integrity of the system [17].
The process of automatically constructing models from data
Anomaly detection systems work by trying to identify
is not trivial, especially for intrusion detection problems. This
anomalies in an environment [5].
is because intrusion detection faces problems, such as huge
At the early stage, the research focus lies in using rule-based
network traffic volumes, highly imbalanced data distribution,
expert systems and statistical approaches. But when encounter-
the difficulty of realizing decision boundaries between normal
ing larger datasets, the results of rule-based expert systems and
and abnormal behavior, and a requirement for continuous
statistical approaches become worse. Thus, many data mining
adaptation to a constantly changing environment. Artificial
techniques have been introduced to solve the problem. Among
intelligence and machine learning have shown limitations in
these techniques, the Artificial Neural Network (ANN) is widely
achieving high detection accuracy and fast processing times
used and has been successful in solving many complex practical
when confronted with these requirements [2].
problems [6].
Intrusion detection techniques can be categorized into sig-
Some discussed a generic method for anomaly detection
nature detection and anomaly detection. Signature detection
that could be used in different areas, but others are about an
systems use patterns of well-known attacks or weak spots in
exclusive subject. Here, we try to introduce their significance.
Some believe unsupervised methods are the best choice the system to match and identify known intrusions. They per-
for anomaly detection, since they do not need any previous form a pattern matching between network traffic captured and
knowledge and only try to find anomalous patterns and the attack signature. If the matching succeeds, then the system
cases. While supervised methods can detect only pre-known generates an alarm. The main advantage of the signature detec-
abnormal cases, unsupervised methods can recognize new and tion paradigm is that it can accurately and efficiently detect in-
unknown objects. Rouil et al. [7], Eskin et al. [8] and Zakia stances of known attacks. The main disadvantage is that it lacks
and Akira [9] tried to express some unsupervised methods for the ability to detect the newly invented attacks. Anomaly detec-
anomaly detection. Also, Guthrie et al. [10] tried to develop an tion systems flag observed activities as anomalies when they
anomaly detection method for finding anomalous segments in deviated significantly from established normal usage profiles.
a document. Their method is unsupervised; they assumed that The main advantage of anomaly detection is that it does not
there is no data with which to characterize ‘‘normal’’ language. require prior knowledge of intrusion and can thus detect new
This method is not a classification or clustering method. The intrusions. The main disadvantage is that it may be unable to
method returns a list of all segments ranked, by how anomalous describe what the attack is, and may have a high false positive
they are with respect to the whole document. rate [18].
Meanwhile, results show that data fusion methods have Of course, we are able to obtain good ideas from these
good results in this area [5,11–18]. Chen and Aickelin [5] intrusion detection methods, or with some slight changes, use
have constructed a Dempster–Shafer based anomaly detection them for other anomaly detection fields.
system using the Java 2 platform. First, they used the Wisconsin Some main attempts towards computer intrusion detec-
Breast Cancer Dataset (WBCD) and then the Iris plant dataset, tion have been done by Anderson et al. [25] who used an
for their experiments. Thirdly, they experimented using an e- outlier detection method. Jake et al. [26] and Ghosh and
mail dataset, which had been created using a week’s worth of Schwartzbard [27] who both examined neural network-based
e-mails (90 e-mails) from a user’s sent box, with outgoing e- methods; Gravey and Lunt [28] who used an evidential rea-
mails (42 e-mails) sent by a computer infected with the netsky- soning approach; Kumar and Spafford [29] who used a misuse
d worm. The aim of the experiment was to detect the 42 infected detection method based on rule-based analysis; Lee and
e-mails. They used D–S to combine features of the e-mails to Stolfo [30] who tried a classification method by association
detect the worm infected e-mails. rules and so on.
Some various intelligent approaches have also been used A complete survey of fraud detection can be found in
for anomaly detection, one of which is the artificial immune [31–33]. Also, Wu and Banzhaf [2] provide an overview of
1462 M. Lotfi Shahreza et al. / Scientia Iranica, Transactions D: Computer Science & Engineering and Electrical Engineering 18 (2011) 1460–1468

the research progress in applying computational intelligence PSO, physical position is not an important factor. The member
methods to the problem of intrusion detection. The scope of that is called the particle is initialized by assigning random
this review will encompass core methods of CI, including ar- positions and velocities. During each iteration, every particle
tificial neural networks, fuzzy systems, evolutionary compu- is accelerated towards its own personal best, as well as in
tation, artificial immune systems, swarm intelligence and soft the direction of the global best position. This is achieved by
computing. calculating a new velocity term for each particle, based on the
distance from its personal best, as well as its distance from the
1.2. Self-organizing maps global best position, which will in turn affect the next position
of the particle during the next epoch [36].
Self-Organizing Maps (SOMs) are the most well known
unsupervised neural network approach to clustering. 1.3.1. Advantages of PSO
The architecture of the SOM is a feed-forward neural net- • PSO is effective in nonlinear optimization problems.
work with a single layer of neurons arranged into a rectan- • It is easy to implement.
gular array. When an input pattern is presented to the SOM, • Only a few input parameters need to be adjusted in PSO.
each neuron calculates how similar the input is to its weights. • Because the update process in PSO is based on simple
The neuron whose weights are most similar (minimal distance, equations, PSO can be efficiently used on large data sets.
d, in input space) is declared the winner of the competition for • PSO has been successfully applied to many areas: function
the input pattern, and the weights of the winning neuron are optimization, artificial neural network training, fuzzy sys-
strengthened to reflect the outcome. The winning neuron re- tem control and other areas where GA can be applied [35].
ceives the most learning at any stage; with neighbors receiving
less, the further away they are from the winning neuron [34]. 1.3.2. Disadvantages of PSO
A disadvantage of global PSO is that it tends to be trapped in
1.2.1. Advantages of SOM a local optimum under some initialization conditions [35].
• Working with high dimensional data sets is difficult;
the SOM reduces information while preserving the most 1.4. A survey on integration of SOM & PSO
important topological relationships of the data elements on
the two-dimensional plane [1], so that information from Our proposed method is really a combination of a self-
different sources can be efficiently fused. organizing map and particle swarm optimization.
• SOMs are trained using unsupervised learning, i.e. no prior First, we will represent a survey of different works that
knowledge is available and no assumptions are made about have been done in this field, and then our suggested method
the class membership of data [1]. is represented in Section 3.
• The SOM algorithm is very efficient in handling large Xiao-Feng et al. [37] used mass extinction to increase
datasets. The SOM algorithm is also robust even when the the efficiency of PSO. They stated that PSO performs well
data set is noisy [35]. in the early iterations, but has problems reaching a near
optimal solution in several real-valued function optimization
1.2.2. Disadvantages of SOM problems. So, they reinitialized the velocities of all particles at
• The number of clusters needs to be specified. Clustering is a a predefined extinction interval, Ie , after the determined step.
two-phase process: determining the number of clusters and Of course in this method, determining Ie and the required steps
clustering the data. Determining the number of clusters is for reinitiating the velocities are important and could increase
not trivial, since the characteristics of the data set are usually or decrease the performance of the algorithm.
not known a priori. This can be overcome by running the Xiao-Feng et al. [38] add a replacing criterion, based on the
algorithm with varying numbers of clusters and selecting the diversity of fitness between the current particle and the best
most appropriate clustering result according to a figure of historical experience. Indeed, they take off inactive particles
merit [35]. and create new particles instead. They called their algorithm
• A user has to either do manual inspection or apply traditional APSO. They believed that some particles may become inactive
algorithms, like hierarchical or partitive, to find the cluster during iterations and declared that an inactive particle means
boundaries [1]. that it will be only flying within quite a small space, which
Recently, there have been significant research efforts to ap- will occur when its position and its local best is close to the
ply Evolutionary Computation (EC) techniques for the purpose global best (if the global best has not significant change) and
of evolving one or more aspects of artificial neural networks. its velocity is close to zero (for all dimensions) [38]. In this way,
Evolutionary computation methodologies have been applied they could prevent it from finding the local optimum instead of
to three main attributes of neural networks: network connec- global. However, it is hard to identify the inactive particles for
tion weights, network architecture (network topology, transfer different problems. We extract some part of our method from
function) and network learning algorithms. APSO.
Most work involving the evolution of ANN has focused on Xiang et al. [35] proposed a SOM/PSO algorithm that uses
network weights and topological structure. PSO to evolve the weights for SOM. In this algorithm, at the
Over the past several years, there have been several papers first stage, weights are trained by SOM and, at the second
that reported using PSO to replace the back-propagation stage, they are optimized by PSO. In their method, each particle
learning algorithm in ANN. It showed PSO as a promising consists of a complete set of weights for SOM. The dimension
method to train ANN; it is faster and gets better results in most of each particle is the number of input neurons of SOM times
cases. It also avoids some problems met by other methods. the number of output neurons of SOM. This increases the time
and space complexity of the algorithm. In fact, their algorithm
1.3. Particle swarm optimization clusters input dataset by standalone SOM and then apply PSO
for refining these clusters.
The original Particle Swarm Optimization (PSO) algorithm O’Neill and Brabazon [39] introduced a hybrid method of
is discovered through simplified social model simulation. In SOM and PSO as SOSwarm. In fact, they used PSO for updating
M. Lotfi Shahreza et al. / Scientia Iranica, Transactions D: Computer Science & Engineering and Electrical Engineering 18 (2011) 1460–1468 1463

the weights of the neural network. In this method, the compo- • Categorical data distance
nents of the mapping layer represent particles, which move ac-
D(x, y) = (number of xi − number of yi )/N . (4)
cording to an adapted version of the Particle Swarm algorithm.
Instead of adjusting vector values in the map space, with N: total number of categorical attributes.
respect to the training input vectors alone, the particles • Minkowski distance
(vectors) in the mapping layer adjust their location using a PSO The Minkowski distance is a metricon Euclidean space,
update function. which can be considered as a generalization of both
They then applied their method to four benchmark classi- Euclidean and Manhattan distances.
fication problems from the UCI Machine Learning repository.  1/p
n
Their results were satisfactory and indeed the basis of our 
D(x, y) = |xi − yi | p
p ≥ 1. (5)
method is the SOSwarm.
i=1
Using PSO for updating the weights of neurons is a good idea.
However, they did not really use PSO. In fact, the SOSwarm is Minkowski distance is typically used with p being 1 or 2.
just a new version of the simple SOM, which instead of working The latter is the Euclidean distance, while the former is
just with one parameter ‘weight’, they update the ‘velocity’ sometimes known as the Manhattan distance [40].
parameter and then update a ‘position’ parameter by it; doing In the limiting case of p reaching infinity, we obtain the
the same for the next input. I think it is not really a PSO. Chebyshev distance:
Another problem is that the SOSwarm uses a Euclidean  1/p
n
distance for measuring distance, but the Euclidean distance is 
p
not a good choice for complex datasets [4]. lim |xi − yi | = max |xi − yi |. (6)
p→∞
i=1
Swagatam et al. [4] created another method, based on
SOM and PSO for clustering, as the Multi-Elitist PSO (MEPSO) Mahalanobis distance
algorithm.
One of the best things in this method is that it uses a kernel- D(x, y) = (xi − yi )T S −1 (xi − yi ), (7)
induced similarity measure instead of the conventional sum-of- where S is the within-group covariance matrix [40].
squares distance. Kernel functions make it possible to cluster • Kernel-based functions
data that are linearly non-separable. Here, we explain further Euclidean distance and kernel-
MEPSO prevents accepting a local optimum instead of a based functions.
global one, but because of the kind of particle representation,
this method has high complexity in time and space. In MEPSO, 2.1.1. Euclidean distance
clusters may or may not be active in some particles; this will The Euclidean distance metric, employed by most existing
reduce the performance of the method. partitional clustering algorithms, works well with datasets
Anurag and Christian [1] proposed another hybrid algo- in which the natural clusters are nearly hyperspherical and
rithm. They proposed to use the PSO algorithm for finding clus- linearly separable. But it causes severe misclassifications when
ter boundaries directly from the code vectors obtained from the dataset is complex, with linearly non-separable patterns [4].
SOM. In fact, they clustered their input data set by generic SOM The most popular way to evaluate similarity between two
and then found the cluster boundary automatically from output patterns amounts to the use of the Euclidean distance, which
code vectors, using generic PSO. Other methods that have been between any two d-dimensional patterns, ⃗ xl and ⃗
xJ , is given by:
represented for this purpose are sensitive to the number of cre- 
ated clusters, but since PSO works individually with a particular  d
cluster, it is insensitive to the number of clusters in the data set.

d(⃗
xi , ⃗
xj ) =  (xi,p , xj,p )2 = ∥⃗xi − ⃗xj ∥. (8)
One thing more is that PSO is not sensitive to noise and outlier, p=1
however, the choice of cluster centers affects its final result.

2. Definitions 2.1.2. The kernel-based similarity measure

A kernel function measures the distance between two data
2.1. Similarity measure in clustering points by implicitly mapping them into a high dimensional
feature space, where the data is linearly separable [4].
There are some basic methods for measuring the distance Given a dataset, X , in the d-dimensional real space, Rd , let us
between two data points in clustering algorithms, some of consider a non-linear mapping function from the input space to
which are: a high dimensional feature space, H:
• Euclidean distance (L2 )
 ϕ : Rd → H , ⃗xi → ϕ(⃗xi ), (9)
D(x, y) = (xi − yi )2 . (1) where:
i
⃗xi = [xi,1 , xi,2 , . . . , xi,d ]T ,
• Manhattan distance (L1 )
 and:
D(x, y) = |xi − yi |. (2)
i
ϕ(⃗xi ) = [ϕ1 (⃗xi ), ϕ2 (⃗xi ), . . . , ϕH (⃗xi )]T .
It is also known as city-block distance [40]. By applying the mapping, a dot product, ⃗ xTi . ⃗
xj , is transformed
• Chebychev distance (L∞ ) into ϕ (⃗
T
xi ).ϕ(⃗
xj ). Now, the central idea in kernel-based learning
D(x, y) = max |xi − yi |. (3) is that the mapping function, ϕ , need not be explicitly specified.
Hence, the kernelized distance measure between two
It is also known as sup distance [40]. patterns, ⃗xi and ⃗ xj , is given by:
1464 M. Lotfi Shahreza et al. / Scientia Iranica, Transactions D: Computer Science & Engineering and Electrical Engineering 18 (2011) 1460–1468

∥ϕ(⃗xi ) − ϕ(⃗xj )∥2 = (ϕ(⃗xi ) − ϕ(⃗xj ))T (ϕ(⃗xi ) − ϕ(⃗xj )) • One criterion for the quality of the clustering involves
measuring the degree of difference between clusters.
= ϕ (⃗xi ) − ϕ(⃗xi ) − 2ϕ (⃗xi ).ϕ(⃗xj ) + ϕ (⃗xj ).ϕ(⃗xj )
T T T
Inspecting the average values of variables across differ-
= K (⃗xi , ⃗xi ) − 2K (x⃗i , ⃗xj ) + K (⃗xj , ⃗xj ). (10) ent clusters is one simple method for measuring the differ-
entiation ability between clusters. It is preferable to have
The Gaussian kernel (also referred to as the Radial Basis
clusters whose profiles are statistically different from each
Function) is well known, owing to its better classification
other [34].
accuracy over linear and polynomial kernels on many test
• Ward’s minimum-variance method: for computation of the
problems. The Gaussian Kernel may be represented as:
distance between two clusters;
∥⃗xi − ⃗xj ∥2
 
K (⃗
xi , xj ) = exp −
⃗ , σ > 0. (11) NA NB DC (A, B)
2σ 2 Dw (A, B) = (15)
(NA + NB )
Clearly, for the Gaussian kernel, K (⃗
xi , ⃗
xi ) = 1 and, thus, Relation where:
(10) reduces to: NA number of objects in A
∥ϕ(⃗xi ) − ϕ(⃗xj )∥2 = 2(1 − K (⃗xi , ⃗xj )). (12) NB number of objects in B
DC (A, B) the centroid distance between the
In fact, we used this measurement in our algorithm. two clusters, computed as the
squared Euclidean distance between
2.2. Performance measuring the centroids.

One major issue in using a clustering algorithm to cluster • The clustering results can be judged using Huang’s accuracy
new and unknown expression data is measuring the robustness measure
of the clustering result. k

Here, we will see some methods for measuring the ni
i=1
performance of clustering methods: r = , (16)
n
• The general criterion of good partitioning is that objects in
where ni is the number of data occurring in both the ith
the same cluster are ‘‘close’’ or related to each other, whereas
cluster and its corresponding true cluster, and n is the total
objects of different clusters are ‘‘far apart’’ or very different.
number of data points in the data set. According to this
Some popular algorithms are k-means and k-medoids [1].
measure, a higher value of r indicates a better clustering
• One method for this is the resampling technique. This
result with perfect clustering yielding a value of r = 1.
technique is based on the simple idea that stipulates that,
if the algorithm is applied to a randomly selected subset of
the original set, then patterns that are in the same cluster 3. Suggested method
in the original clustering should also be in the same cluster
in the clustering result obtained for the subset, if the result In this method, we have a layer of neurons, but we will treat
is robust. Multiple subsets can be selected randomly, and each neuron just like a particle. In fact, we have a network
the results of clustering these subsets can be compared of particles whose network is based on the idea of a self-
to the original clustering result in order to measure the organizing map, but each particle of which will work according
robustness of the clusters obtained. The difference between to the general PSO algorithm. (It is like using PSO for updating
the clustering based on the randomly selected subset and the the weights’ of SOM, although with some differences). One can
original clustering result is measured using a merit function, see the pseudo-code of our method in Figure 1.
which is expressed as follows: We define a group of particles. The position of particle i is
 represented as xi = (xi1 , xi2 , . . . , xiD ), and the position of each
2 particle is equal to the weight of the neurons. In other words,
 (µ)


 Tij − Tij the dimension of each particle is equal to the dimension of each
j i neuron, and each dimension shows the amount of equal weight.
.

merit = (13)
No. of patterns in the selected subset At first, we set the positions of all particles randomly.
(µ) Each particle also maintains a memory of its previous best
Tij is an element in the original similarity matrix and Tij is position, represented as Pi = (pi1 , pi2 , . . . , piD ). The global best
an element in the resampled similarity matrix. A similarity is also represented by Pg = (pg1 , pg2 , . . . , pgD ).
matrix is constructed as follows: Each particle has a velocity, which can be represented as:

1 pattern i and j are in the same cluster Vi = (vi1 , vi2 , . . . , viD ).
Tij = (14)
0 pattern i and j not in the same cluster.
At first, we set the velocity vector of each particle to be 0.0001,
The smaller the value of the merit, the more robust the algo- and then in each iteration, we update the velocity and position
rithm is. The method can also be used to estimate the num- of the ith particle by Eqs. (17) and (18), respectively:
ber of clusters needed for a given dataset. Given an unknown Vi (t + 1) = Vi (t ) + c1 ∗ (Pi − Xi (t )) + c2 ∗ (Pg − Xi (t )), (17)
data set, several runs of a given clustering algorithm, under
varying input parameters, can be performed. If resampling Xi (t + 1) = Xi (t ) + Vi (t + 1). (18)
is used with each run, the clustering result of choice is the In each iteration, Pg for each dimension is the global best
one with the lowest merit value. This can be used to choose position for that dimension found so far.
an adequate number of clusters when running a clustering For each input, at first we set the P vector of each particle to
algorithm on an unknown data set. One major drawback of be its current position, and set Pg to the value of the winning
the resampling technique is that it is computationally ex- particle. For finding the winning particle, we need to measure
pensive [35]. similarity of each particle with the input vector.
M. Lotfi Shahreza et al. / Scientia Iranica, Transactions D: Computer Science & Engineering and Electrical Engineering 18 (2011) 1460–1468 1465

distance between the mean of investigating the cluster and

its members. Members who are so far from the mean of their
cluster are not normal.
In order to measure the accuracy of our method, in addition
to a ‘false positive’ rate and a ‘false negative’ rate, here we define
two new parameters, which are ‘correct probable’ and ‘false
probable’.
• A false positive is the rate of ‘normal’ cases realized as
‘anomalous’ incorrectly by the algorithm.
• A false negative is the rate of ‘anomalous’ cases realized as
‘normal’ incorrectly by the algorithm.
• A correct probable is the total number of cases that are
anomalies, but instead of announcing them as ‘anomalous’
data, the method announce them as ‘probable anomalous’.
• A false probable is the total number of cases that are normal,
but instead of announcing them as ‘normal’ data, the method
announces them as ‘probable anomalous’.
For measuring the performance of our algorithms, we also
use two other parameters: Huang’s accuracy measure and
Ward’s minimum-variance, which were introduced in previous
sections. As mentioned before, whichever algorithm whose
Huang’s accuracy measure is nearest to one is the best, and the
greatest Ward’s minimum-variance is desirable.

3.1. Advantages of suggested method

Figure 1: Pseudo-code of suggested method.

• One advantage of the proposed method is its low time and

For similarity measuring, we use the kernel function intro- space complexity, such that one can run it in a few minutes
duced in the section: ‘the kernel-based similarity measure’. For and at some mega byte memory for tens of thousands of data
each particle vector, we compute its similarity with the input record entries.
vector obtained by Eq. (5) • The algorithm is simple; it has no complex and difficult
computation.
∥ϕ(⃗xi ) − ϕ(⃗xj )∥ = 2(1 − K (⃗xi , ⃗xj )),
2
(19)
• The method can be applied to different and variant do-
which: mains of anomaly detection; in fact, it is a generic method

∥⃗xi − ⃗xj ∥2
 for anomaly detection and needs few changes for implemen-
K (⃗
xi , xj ) = exp −
⃗ , σ > 0, tation in different domains.
2σ 2 • It is an unsupervised method and because of the lack
and: of labeled data, unsupervised methods are more useful.
d Besides, supervised methods can only detect anomalies with
recognized patterns.

∥⃗xi − ⃗xj ∥2 = (xi,p , xi,p )2 .
p=1
• There is no need to be familiar with fields of used dataset. We
want to find anomalous data in a set of related data, without
Smaller values of this function show more similarity, so in knowing anything about them.
each iteration, the winning particle will be that which has the • Sources of information are often linked with some sort
smallest value of this function. of dependence in real life [14–16]. While most methods
Because of the difficulty in comparing two small values, we process one source or those that spot different sources
multiplied the value of the similarity function by the power of assume that they are statistically independent from each
ten. This created more accurate answers. other, this assumption does not always hold true. Our
In contrast with generic PSO, we update the velocity and method assumes a dependency between fields of a record
position vectors of the neighbors of the winning particle, and processes them, en bloc.
in addition to those of the winning particle itself. For its
implementation, according to the learning-radius, we use some 4. Case study: forest fire detection
parameters that show the distance between particles, according
to their topology. Forest fire is a major environmental issue, creating economi-
Finally, when all inputs are assigned to one particle, we will cal and ecological damage while endangering human lives. Fast
have as many clusters as the total number of particles. Now we detection is a key element in controlling such phenomenon.
should categorize these clusters. Because our purpose is to find Each year, millions of forest hectares (ha) are destroyed all
anomalies in the input data, we should define three categories. around the world. In particular, Portugal is highly affected by
One is ‘anomalous’ data, another is ‘probable anomalous’ forest fires. From 1980 to 2005, over 2.7 million ha of forest
data and the other is ‘normal’ data. One major method for area (equivalent to the area of Albania) were destroyed. The
categorizing clusters is to investigate their members. Clusters 2003 and 2005 fire seasons were especially dramatic, affecting
with few members are good candidates for ‘anomalous’ 4.6% and 3.1% of the territory, with 21 and 18 human deaths,
clusters. We use the mean of members of all clusters for this respectively [41].
categorization. For clusters with many members, we also use Fast detection is a key element in controlling such phe-
one other method for categorization. This part is based on the nomenon and traditional methods are no longer useful, so there
1466 M. Lotfi Shahreza et al. / Scientia Iranica, Transactions D: Computer Science & Engineering and Electrical Engineering 18 (2011) 1460–1468

Table 1: Results of comparing different methods on simulated data. Total

number of normal cases is 14 978 and total number of abnormal cases is
104.
Algorithm Parameter
False False Huang’s Ward’s
negative positive accuracy minimum
measure variance

SOM 49 2484 0.832 12.336

Bayesian estimation 30 890 0.939 90.309
Dempster–Shafer 80 4601 0.689 68.901
(statistical)
Dempster–Shafer (Chen) 65 4368 0.706 98.931
Our suggested method 48 249 0.99 342.12
Figure 2: The fire weather index structure.

has been an emphasis on developing automatic solutions. Dif- Table 2: Results of comparing different methods on real data. Total number
ferent new solutions can be categorized into three groups: of normal cases is 422 and total number of abnormal cases is 95.
Algorithm Parameter
• Satellite-based.
False False Huang’s Ward’s
• Infrared/smoke scanners.
negative positive accuracy minimum
• Local sensors (e.g. meteorological). measure variance
Satellites have acquisition costs, localization delays and their SOM 78 66 0.721 11.702
resolution is not adequate for all cases [41]. Moreover, scanners Bayesian estimation 39 212 0.514 146.873
have high equipment and maintenance costs. Dempster–Shafer 63 158 0.572 129.695
(statistical)
Local sensors seem to be the best option here, because of
Dempster–Shafer (Chen) 77 67 0.721 254.265
their low cost, and on the other hand the plurality of mete- Our suggested method 55 87 0.832 360.579
orological stations all over the world cause the simplicity of
measuring parameters, such as weather conditions (like tem-
perature and air humidity), which are known to affect fire 3. Month: month of the year: ‘‘Jan’’ to ‘‘Dec’’;
occurrence. 4. Day: day of the week: ‘‘Mon’’ to ‘‘Sun’’;
5. FFMC: FFMC index from the FWI system: 18.7–96.20;
4.1. Data 6. DMC: DMC index from the FWI system: 1.1–291.3;
7. DC: DC index from the FWI system: 7.9–860.6;
The forest Fire Weather Index (FWI) is the Canadian 8. ISI: ISI index from the FWI system: 0.0–56.10;
system for rating fire danger and it includes six components 9. Temp: temperature in Celsius degrees: 2.2–33.30;
(Figure 2) [41]: the Fine Fuel Moisture Code (FFMC), the Duff 10. RH: relative humidity in %: 15.0–100;
Moisture Code (DMC), the Drought Code (DC), the Initial Spread 11. Wind: wind speed in km/h: 0.40–9.40;
Index (ISI), the Buildup Index (BUI) and FWI. The first three are 12. Rain: outside rain in mm/m2 : 0.0–6.4;
related to fuel codes. The FFMC denotes the moisture content 13. Area: the burned area of the forest (in ha).
of surface litter and influences ignition and fire spread, while
the DMC and DC represent the moisture content of shallow 4.2. Implementation
and deep organic layers, which affect fire intensity. The ISI
is a score that correlates with fire velocity spread, while BUI In fact, we have two phases for our anomaly detection. First,
represents the amount of available fuel. The FWI index is an we should use test data and find suitable input parameters and
indicator of fire intensity and it combines the two previous then apply the program to unlabeled data using suitable input
components. Although different scales are used for each of parameters (suitable input parameters are those that cause the
the FWI elements, high values suggest more severe burning best results). So, our anomaly detection system uses a training
conditions. Also, the fuel moisture codes require a memory process to derive input parameters from the test data, and
(time lag) of past weather conditions: 16 h for FFMC, 12 days detects an entry as normal or abnormal.
for DMC and 52 days for DC. It is necessary to say that our method has been implemented
This study will consider forest fire data from the Mon- completely by visual studio.net 2005 (C# language).
tesinho natural park in the Tr’as-os-Montes, the northeast re-
gion of Portugal. These data were collected from January 2000 5. Discussion
to December 2003, and were built using two sources; the in-
spector that was responsible for the Montesinho fire occur- Results of comparing different methods on simulated data
rences and by the Braganca Polytechnic Institute, containing are presented in Table 1, and on real data in Table 2. Note
several weather observations located in the center of the Mon- that these results are the best of each method, and all were
implemented using the same language and run under the same
tesinho Park, which were then integrated into a single dataset
conditions.
with a total of 517 entries. This data is available at [42]:
Our simulated dataset is constructed of 14,987 normal cases
http://www.dsi.uminho.pt/~pcortez/forestfires/.
and 104 abnormal cases, and our real dataset is constructed of
Attribute information:
422 normal cases and 95 abnormal cases.
1. X : x-axis spatial coordinate within the Montesinho park In each row, we show the result parameters of each com-
map: 1–9; pared method (we have explained these parameters before).
2. Y : y-axis spatial coordinate within the Montesinho park Self-Organizing Map results are shown in the first row. It is
map: 2–9; constructed of a hundred neurons in the hidden layer. We try
M. Lotfi Shahreza et al. / Scientia Iranica, Transactions D: Computer Science & Engineering and Electrical Engineering 18 (2011) 1460–1468 1467

different input parameters for it (like learning rate and learning Variance (although its false positive and false negative are
radius) and save its best results. In Table 1, however, its False higher than others in some cases, this has not occurred at
Negative is comparable with our method, but its False Positive the same time for both and so Huang’s Accuracy Measure and
is very bad. Thus, its Huang’s Accuracy Measure is much lower Ward’s Minimum Variance are best). Of course, these are not
than our method. One can see its Ward’s Minimum Variance is as good as the results of Table 1. The major reason is that
also very low. our method supposes anomalous cases to be rare (Anomaly
We think that one reason for this bad result is the use of the detection is orthogonal to misuse detection. It hypothesizes that
Euclidean Distance for measuring the distance in generic SOM. abnormal behavior is rare and different from normal behavior.
As mentioned before, the Euclidean Distance is only suitable
Hence, it builds models for normal behavior and detects
for linear separable data, but in our discussed domain, the data
anomalies in observed data by noticing deviations from these
is complex. So, we do not use the Euclidean Distance in our
models [2].) The proportion of the number of abnormal cases
method. In our opinion, Kernel Based functions are the best
choice for measuring the distance here. The other reason is an to the total number in this dataset is 0.184, i.e. a large number
insufficient approach for updating the weights of neurons in (it seems that we did not choose a suitable dataset, although
generic SOM. In fact, we use PSO for solving this problem in our we did not have much choice). Of course, we believe that this
method. is not a shortcoming in our method, since, usually, normal
Bayesian Estimation (second row) is the most comparable cases are many more than abnormal cases in our interested
method with our approach. We use a statistical method on datasets. Almost all anomaly detection systems are based on
part of the labeled data for finding the probabilities of this this assumption and this means that we can use these methods
method. In Table 1, however, its False Negative is lower than our for detecting abnormal cases in nature monitoring datasets.
method, but because of its relative high False Positive, in total,
the Huang’s Accuracy Measure of our method is higher than that 6. Conclusions
of the Bayesian Estimation. Its Ward’s Minimum Variance is not
too bad. Today anomaly detection methods are of major interest
For Dempster–Shafer, we need some mass values for each to the world and are used in very different and various
hypothesis. In order to find them, we use two approaches domains like computer intrusion detection, credit card and
and implement them as two Dempster–Shafer methods. One telephone fraud detection, spam detection, and so on. Here,
is Dempster–Shafer (statistical) that, like Bayesian Estimation, we have introduced a new unsupervised method for anomaly
uses a statistical method on part of the labeled data for finding detection, based on a combination of a Self-Organizing Map
mass values, and the second is Dempster–Shafer (Chen) that and Particle Swarm Optimization that fuse information from
assigns mass values according to the Chen and Aickelin [5] various sources. It is a simple, time and space consuming
approach (it is based on some thresholds). method that can be used in different domains. In this paper,
In fact, in both methods, the mass values for each hypothesis we wished to implement it for crisis management, so we
are generated and sent to the Dempster–Shafer combination
chosed forest fires and their detection. In comparison with some
component. This component uses the Dempster rule of
other methods, like the Self-Organizing Map, Dempster–Shafer
combination to combine all mass values and generate the
and Bayesian Estimation, we obtained good results. Like other
overall mass values for each hypothesis. If the mass value of
anomaly detection methods, when abnormal cases are rare,
the ‘abnormal’ hypothesis is bigger than the mass value of the
‘normal’ hypothesis, then it is classified as abnormal; otherwise our suggested method has better results than when they are
it is classified as normal. not.
In Table 1, Dempster–Shafer (statistical) has the worst result We have implemented this method in various domains and
of three parameters, only its Ward Minimum variance is higher wish to investigate its results in some others.
than SOM. However, results of Dempster–Shafer (Chen) are
a little better than Dempster–Shafer (statistical), but are not Acknowledgments
sufficient at all.
Results of both Dempster–Shafer and Bayesian approaches This work was supported by Tehran University. Our special
are not acceptable because: thanks go to the University of Tehran, Faculty of Engineering,
• Both theories have a certain initial requirement; Demp- Department of Algorithms and Computations for providing all
ster–Shafer requires masses to be assigned and Bayesian Es- necessary facilities for successfully conducting this research.
timation requires prior probability. Both are highly sensitive Also, we would like to thank the Center of Excellence, Geomatic
to this assignment. We need to assign these values using an Engineering and Disaster Management for partial support of
expert or a computing or intelligence method and this could this research.
be time consuming and unfruitful. This paper was prepared while the first two authors were
• The computation complexity. visiting the Institute for Studies in Theoretical Physics and
• Dempster–Shafer assumes that the pieces of evidence are Mathematics (IPM). It is a pleasure to thank IPM for its
statistically independent from each other. Since sources of hospitality and facilities.
information are often linked with some sort of dependence
in real life situations, this assumption does not always hold References
true [14–16]
Our method does not have any of these problems. It is an [1] Anurag, S. and Christian, W.O. ‘‘Performance comparison of particle swarm
optimization with traditional clustering algorithms used in self-organizing
unsupervised method that does not need an initial assignment, map’’, International Journal of Computational Intelligence, 5(1), pp. 32–41
like Dempster–Shafer and Bayesian estimation. It is a simple (2009).
method and does not have any complex computation. Finally, [2] Shelly Xiaonan, Wu and Banzhaf, W. ‘‘The use of computational
our method assumes a dependency between fields of a record intelligence in intrusion detection systems: a review’’, Review Article
Applied Soft Computing, 10(1), pp. 1–35 (2010).
and processes them, en bloc. [3] Xiaojin, Zhu ‘‘Semi-supervised learning literature survey’’, Computer
The same can be seen in Table 2, that our suggested method Sciences TR 1530, University of Wisconsin–Madison, Last modified on July
has the best Huang’s Accuracy Measure and Ward’s Minimum 19 (2008).
1468 M. Lotfi Shahreza et al. / Scientia Iranica, Transactions D: Computer Science & Engineering and Electrical Engineering 18 (2011) 1460–1468

[4] Swagatam, D., Ajith, A. and Amit, K. ‘‘Automatic kernel clustering with a [31] Chun, W.C.P. ‘‘Investigative data mining in fraud detection’’, A Thesis,
multi-elitist particle swarm optimization algorithm’’, Pattern: Recognition submitted in partial fulfillment of the requirement for the Degree of
Letters, 29(5), pp. 688–699 (2008). Bachelor of Business Systems Honours, School of Business Systems,
[5] Chen, Q. and Aickelin, U. Dempster–Shafer for anomaly detection, Monash University November, 2003.
Proceedings of the International Conference on Data Mining DMIN 2006, Las [32] Clifton, P., Vincent, L., Smith, K. and Gayler, R. ‘‘A comprehensive survey of
Vegas, USA, pp. 232–238 (2006). data mining-based fraud detection, Research’’, Final version 2: 9/02/2005.
[6] Wang, Gang, Hao, Jinxing, Ma, Jian and Huang, Lihua ‘‘A new approach [33] Yufeng, K., Chang-Tien, L., Sirirat, S. and Yo-Ping, H. ‘‘Survey of fraud
to intrusion detection using artificial neural networks and fuzzy clus- detection techniques’’, Proceedings of the 2004 International Conference on
tering’’, Original Research Article Expert Systems with Applications, 37(9), Networking, Sensing and Control, pp. 749–754 (March 2004).
pp. 6225–6232 (2010). [34] Smith, K.A. ‘‘Introduction to Neural Networks and Data Mining for Business
[7] Rouil, R., Chevrollier, N. and Golmie, N. ‘‘Unsupervised anomaly detection Applications’’, Eruditions Publishing 1999. viii, 155 p.: ill.; 25 cm. ISBN
system using next-generation router architecture’’, Military Communica- 1864910046: 1864910046 Pbk 18649100046 (chapter one) (1999).
tion Conference MILCOM, USA (2005). [35] Xiang, X., Ernst, R.D., Russell, E., Zina, B.M. and Robert, J.O. ‘‘Gene
[8] Eskin, E., Arnold, A., Prerau, M., Portnoy, L. and Stolfo, S. ‘‘A geometric clustering using self-organizing maps and particle swarm optimization’’,
framework for unsupervised anomaly detection: detecting intrusions in Ipdps, International Parallel and Distributed Processing Symposium IPDPS’03,
unlabeled data’’, In Data Mining for Security Applications, Kluwer (2002). p. 154b (2003).
[36] Abdull, H. and Haza, N. ‘‘Particle swarm optimization for neural network
[9] Zakia, F. and Akira, M. ‘‘Unsupervised outlier detection in time series data’’,
learning enhancement’’, Master Thesis, University Teknology Malaysia
Proceedings of the Second International Special Workshop on Databases
(2006).
for Next-Generation Researchers SWOD2006, Atlanta, GA, pp. 51–56 (Apr.
[37] Xiao-Feng, X., Wen-Jun, Z. and Zhi-Lian, Y. ‘‘Hybrid particle swarm opti-
2006).
mizer with mass extinction’’, International Conference on Communication,
[10] Guthrie, D., Guthrie, L., Allison, B. and Wilks, Y. ‘‘Unsupervised anomaly
Circuits and Systems ICCCAS, Chengdu, China, pp. 1170–1173 (2002).
detection’’, IJCAI 2007, pp. 1624–2162 (2007). [38] Xiao-Feng, X., Wen-Jun, Z. and Zhi-Lian, Y. ‘‘Adaptive particle swarm
[11] Chatzigiannakis, V., Androulidakis, G., Pelechrinis, K., Papavassiliou, S. optimization on individual level’’, International Conference on Signal
and Maglaris, V. ‘‘Data fusion algorithms for network anomaly detection: Processing ICSP, Beijing, China, pp. 1215–1218, (2002).
classification and evaluation’’, Proceedings of the Third International [39] O’Neill, M. and Brabazon, A. ‘‘Self-organising Swarm SoSwarm’’, Soft
Conference on Networking and Services, pp. 50–51 (2007). Computing, 12(11), pp. 1073–1080 (2008).
[12] Yu, D. and Frincke, D. ‘‘Alert confidence fusion in intrusion detection [40] Rui, X. and Donald, W. ‘‘Survey of clustering algorithms’’, IEEE Transactions
systems with extended Dempster–Shafer theory’’, ACM-SE 43: Proceedings on Neural Networks, 16(3), pp. 645–678 (2005).
of the 43rd Annual Southeast Regional Conference, 2, pp. 142–147 (2005). [41] Cortez, P. and Morais, A. ‘‘A data mining approach to predict forest fires
[13] Te-Shun, C., Sharon, F., Wei, Z., Jeffrey, F. and Asad, D. ‘‘Intrusion using meteorological data’’, Proceedings of the 13th EPIA 2007 – Portuguese
aware system-on-a-chip design with uncertainty classification’’, The 2008 Conference on Artificial Intelligence, December, Guimaraes, Portugal, APPIA,
International Conference on Embedded Software and Systems-ICESS (2008). ISBN-13 978-989-95618-0-9, pp. 512–523 (2007).
[14] Siaterlis, C., Maglaris, B. and Roris, P. ‘‘A novel approach for a distributed [42] Asuncion, A. and Newman, D.J. ‘‘UCI Machine Learning Repository’’
denial of service detection engine’’, National Technical University of http://www.ics.uci.edu/~mlearn/MLRepository.html. Irvine, CA: Univer-
Athens, Athens, Greece (2003). sity of California, School of Information and Computer Science (2007).
[15] Siaterlis, C. and Maglaris, B. ‘‘Towards multisensor data fusion for DoS
detection’’, Proceedings of the 2004 ACM Symposium on Applied Computing,
pp. 439–446 (2004).
Maryam Lotfi Shahreza received a B.S. degree in Computer Engineering
[16] Siaterlis, C. and Maglaris, V. ‘‘One step ahead to multisensor data fusion for
from Isfahan University of Technology and an M.S. in Computer Engineering,
DDoS detection’’, Journal of Computer Security, 13, pp. 779–806 (2005).
Algorithm and Computation from Tehran University in 2009. Her research
[17] Ambareen, S., Susan, M. and Bridges, R.B.V. ‘‘Fuzzy cognitive maps
interests are crisis management, data fusion, bioinformatics and data mining.
for decision support in an intelligent intrusion detection system’’,
She is currently teaching in Isfahan University of Technology.
National Science Foundation Grant# CCR-9988524 and the Army Research
Laboratory Grant # DAAD17-01-C-0011.
[18] Maselli, G., Deri, L. and Suin, S. ‘‘Design and implementation of an Dara Moazzami received B.S. and M.S. degrees in Mathematics from Quebec
anomaly detection system: an empirical approach’’, Proceedings of Terena University, Montreal, Canada, and a Ph.D. in Mathematics from Boston
Networking Conference TNC 03, Zagreb, Croatia (May 2003). University, USA. He is Associate Professor of Graph Theory and Discrete
[19] Greensmith, J., Aickelin, U. and Tedesco, G. ‘‘Information fusion for Mathematics. He is Member of the Editorial Board of the Tehran University
anomaly detection with the dendritic cell algorithm’’, Information Fusion, Engineering Journal ‘‘Fani’’, Editor-in-Chief of the Journal of Algorithms and
pp. 21–34 (2010). Computation, and Member of the Board of the Center of Excellence in
[20] Twycross, J. and Aickelin, U. ‘‘An immune-inspired approach to anomaly Geomatic Engineering and Disaster Management. His research interests include
detection’’, In Handbook of Research on Information Assurance and Security, vulnerability and reliability in graphs, stability of communication networks,
J.N.D. Gupta and S.K. Sharma, Eds., pp. 109–121, IGI Global, New York success tree method and fuzzy set theory for evaluation of uranium resources,
(Chapter 10) (2009). and analysis of vulnerability measure in networks.
[21] Brause, R., Langsdorf, T. and Hepp, M. ‘‘Credit card fraud detection by
adaptive neural data mining’’, Proceedings of the 11th IEEE International
Conference on Tools with Artificial Intelligence, pp. 103–106 (1999). Behzad Moshiri was born in Tehran, Iran, in 1959. He received a B.S. degree
[22] Hassibi, K. ‘‘Detecting payment card fraud with neural networks’’, in Mechanical Engineering from Iran University of Science and Technology
In Business Applications of Neural Networks, P.J.G. Lisboa, A. Vellido and B. (IUST) in 1984. He pursued his higher studies leading to M.S. and Ph.D.
Edisbury, Eds., World Scientific, Singapore (2000). degrees in Control Systems Engineering from the University of Manchester,
[23] Dorronsoro, J.R., Ginel, F., Sanchez, C. and Cruz, C.S. ‘‘Neural fraud detection Institute of Science and Technology (UMIST), UK, in 1987 and 1991, respectively.
in credit card operations’’, IEEE Transactions on Neural Networks, 8, He was a member of ISA from 1991–1992. He joined the Department of
pp. 827–834 (1997). Electrical and Computer Engineering at the University of Tehran in 1992.
[24] Syeda, M., Zhang, Y.Q. and Pan, Y. ‘‘Parallel granular neural networks for He is currently Professor of Control Engineering and has been head of the
fast credit card fraud detection’’, Proceedings of the 2002 IEEE International Machine Intelligence & Robotics Division at the school of ECE. He received
Conference, 1, pp. 572–577 (2002). the Distinguished Researcher Award from Tehran University in 2003 and the
[25] Anderson, D., Rivold, T., Tamaru, A. and Valdes, A. Intrusion detection ex- Distinguished Alumnus Award from IUST in 2004. He has published more than
pert system nides, Next Generation, Software Users Manual, Beta-Update 260 papers in journals and conferences. Professor Moshiri has been elevated to
Release, Technical ReportSRI-CSL-95-07, Computer Science Laboratory, the grade of Senior Member in the IEEE since June 2006. His research interests
SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493 include advanced industrial control design, advanced instrumentation design,
(May 1994). sensor data fusion, mechatronics and bioinformatics.
[26] Jake, R., Meng-Jang, L. and Risto, M. ‘‘Intrusion detection with neural
networks’’, In Advances in Neural Information Processing Systems 10, M.I.
Jordan, M.J. Kearns and S.A. Solla, Eds., MIT Press (1998). Mahmoud Reza Delavar obtained his Ph.D. in Geomatic Eng.-GIS from the
[27] Ghosh, A.K. and Schwartzbard, A. ‘‘A study in using neural networks for University of New South Wales, Sydney, Australia in 1997, and is now
anomaly and misuse detection’’, Proceedings of the 8th USENIX Security working as Assistant Professor in the Department of Surveying and Geomatic
Symposium, Washington D.C. (1999). Engineering College of Engineering at Tehran University. He is founder of
[28] Garvey, T.D. and Lunt, T.F. ‘‘Model based intrusion detection’’, Proceedings the Iranian Society of Surveying and Geomatic Engineering, Director of GIS,
of the 14th National Computer Security Conference, October (1991). Center of Excellence in Geomatic Engineering and Disaster Management at
[29] Kumar, S. and Spafford, E.H. ‘‘A pattern matching model for misuse Tehran University, National representative of the Urban Data Management
intrusion detection’’, Proceedings of the 17th National Computer Security Society/Symposium (UDM.S.) and has been the Scientific Secretary of ISPRS WG
Conference, pp. 11–21 (1994). II/4 (uncertainty modeling and quality control for spatial data) from 2008–2012.
[30] Lee, V. and Stolfo, S. ‘‘Data mining approaches for intrusion detection’’, His research interests are GIS spatial data quality, spatio-temporal GIS and land
Proceedings of the 7th USENIX Security Symposium, San Antonio, TX (1998). administration.