Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Anomaly Detection Techniques

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23
At a glance
Powered by AI
The key takeaways are that anomaly detection systems can detect both known and unknown attacks by modeling normal behavior, but have challenges like high false alarms and inability to scale to high speeds.

Anomaly detection systems are a type of intrusion detection system that models normal behavior to detect deviations and find both known and unknown attacks.

Challenges with anomaly detection systems include high false alarm rates, failure to scale to gigabit network speeds, and other technological problems.

Computer Networks 51 (2007) 3448–3470

www.elsevier.com/locate/comnet

An overview of anomaly detection techniques: Existing


solutions and latest technological trends
Animesh Patcha *, Jung-Min Park
Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University,
Blacksburg, VA 24061, United States

Received 8 June 2006; received in revised form 8 February 2007; accepted 9 February 2007
Available online 16 February 2007

Responsible Editor: Christos Douligeris

Abstract

As advances in networking technology help to connect the distant corners of the globe and as the Internet continues to
expand its influence as a medium for communications and commerce, the threat from spammers, attackers and criminal
enterprises has also grown accordingly. It is the prevalence of such threats that has made intrusion detection systems—the
cyberspace’s equivalent to the burglar alarm—join ranks with firewalls as one of the fundamental technologies for network
security. However, today’s commercially available intrusion detection systems are predominantly signature-based intru-
sion detection systems that are designed to detect known attacks by utilizing the signatures of those attacks. Such systems
require frequent rule-base updates and signature updates, and are not capable of detecting unknown attacks. In contrast,
anomaly detection systems, a subset of intrusion detection systems, model the normal system/network behavior which
enables them to be extremely effective in finding and foiling both known as well as unknown or ‘‘zero day’’ attacks. While
anomaly detection systems are attractive conceptually, a host of technological problems need to be overcome before they
can be widely adopted. These problems include: high false alarm rate, failure to scale to gigabit speeds, etc. In this paper,
we provide a comprehensive survey of anomaly detection systems and hybrid intrusion detection systems of the recent past
and present. We also discuss recent technological trends in anomaly detection and identify open problems and challenges
in this area.
 2007 Elsevier B.V. All rights reserved.

Keywords: Survey; Anomaly detection; Machine learning; Statistical anomaly detection; Data mining

1. Introduction ing new business avenues. Business needs have


motivated enterprises and governments across the
Today, the Internet along with the corporate globe to develop sophisticated, complex information
network plays a major role in creating and advanc- networks. Such networks incorporate a diverse array
of technologies, including distributed data storage
*
Corresponding author. Tel.: +1 540 239 0574.
systems, encryption and authentication techniques,
E-mail addresses: apatcha@vt.edu (A. Patcha), jungmin@ voice and video over IP, remote and wireless access,
vt.edu (J.-M. Park). and web services. Moreover, corporate networks

1389-1286/$ - see front matter  2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.comnet.2007.02.001
A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470 3449

have become more accessible; for instance, most mal’’ baseline. On the other hand, a hybrid intru-
businesses allow access to their services on their sion detection system combines the techniques of
internal networks via extranets to their partners, the two approaches. Both signature detection and
enable customers to interact with the network anomaly detection systems have their share of
through e-commerce transactions, and allow emp- advantages and drawbacks. The primary advantage
loyees to tap into company systems through virtual of signature detection is that known attacks can be
private networks. detected fairly reliably with a low false positive rate.
The aforementioned access points make today’s The major drawback of the signature detection
networks more vulnerable to intrusions and attacks. approach is that such systems typically require a
Cyber-crime is no longer the prerogative of the signature to be defined for all of the possible attacks
stereotypical hacker. Joining ranks with the hackers that an attacker may launch against a network.
are disgruntled employees, unethical corporations, Anomaly detection systems have two major advan-
and even terrorist organizations. With the vulnera- tages over signature based intrusion detection
bility of present-day software and protocols com- systems. The first advantage that differentiates
bined with the increasing sophistication of attacks, anomaly detection systems from signature detection
it comes as no surprise that network-based attacks systems is their ability to detect unknown attacks as
are on the rise [1–4]. The 2005 annual computer well as ‘‘zero day’’ attacks. This advantage is
crime and security survey [5], jointly conducted because of the ability of anomaly detection systems
by the Computer Security Institute and the FBI, to model the normal operation of a system/network
indicated that the financial losses incurred by the and detect deviations from them. A second advan-
respondent companies due to network attacks/ tage of anomaly detection systems is that the
intrusions were US $130 million. In another survey aforementioned profiles of normal activity are
commissioned by VanDyke Software in 2003, some customized for every system, application and/or
66% of the companies stated that they perceived network, and therefore making it very difficult for
system penetration to be the largest threat to their an attacker to know with certainty what activities
enterprises. Although 86% of the respondents used it can carry out without getting detected. However,
firewalls, their consensus was that firewalls by them- the anomaly detection approach has its share of
selves are not sufficient to provide adequate protec- drawbacks as well. For example, the intrinsic com-
tion. Moreover, according to recent studies, an plexity of the system, the high percentage of false
average of twenty to forty new vulnerabilities in alarms and the associated difficulty of determining
commonly used networking and computer products which specific event triggered those alarms are some
are discovered every month. Such wide-spread of the many technical challenges that need to be
vulnerabilities in software add to today’s insecure addressed before anomaly detection systems can
computing/networking environment. This insecure be widely adopted.
environment has given rise to the ever evolving The aim of this paper is twofold. The first is to
field of intrusion detection and prevention. The present a comprehensive survey of recent literature
cyberspace’s equivalent to the burglar alarm, intru- in the domain of anomaly detection. In doing so,
sion detection systems complement the beleaguered we attempt to assess the ongoing work in this area
firewall. as well as consolidate the existing results. While
An intrusion detection system gathers and the emphasis of this paper is to survey anomaly
analyzes information from various areas within a detection techniques proposed in the last six years,
computer or a network to identify possible security we have also described some of the earlier work that
breaches. In other words, intrusion detection is the is seminal to this area. For a detailed exposition of
act of detecting actions that attempt to compro- techniques proposed before 2000, the reader is
mise the confidentiality, integrity or availability of directed to the survey by Axelsson [6]. Our second
a system/network. Traditionally, intrusion detection aim is to identify the open problems and the
systems have been classified as a signature detection research challenges.
system, an anomaly detection system or a hybrid/ The remainder of this article is organized as fol-
compound detection system. A signature detection lows. In Section 2, we define intrusion detection,
system identifies patterns of traffic or application put forth the generic architectural design of an
data presumed to be malicious while anomaly intrusion detection system, and highlight the three
detection systems compare activities against a ‘‘nor- main techniques for detecting intrusions/attacks in
3450 A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470

computer networks and systems. In Section 3, we external penetrations, internal penetrations, and mis-
describe the premise of anomaly detection and pro- feasance, and uses this classification to develop a
vide detailed discussions on the various techniques security monitoring surveillance system based on
used in anomaly detection. We highlight recently detecting anomalies in user behavior. External pen-
proposed hybrid systems in Section 4, and conclude etrations are defined as intrusions that are carried
the paper by discussing open problems and research out by unauthorized computer system users; inter-
challenges in Section 5. nal penetrations are those that are carried out by
authorized users who are not authorized for the
2. Intrusion detection data that is compromised; and misfeasance is
defined as the misuse of authorized access both to
An intrusion detection system is a software tool the system and to its data.
used to detect unauthorized access to a computer In a seminar paper, Denning [8] put forth the
system or network. An intrusion detection system idea that intrusions to computers and networks
is capable of detecting all types of malicious could be detected by assuming that users of a
network traffic and computer usage. This includes computer/network would behave in a manner that
network attacks against vulnerable services, data dri- enables automatic profiling. In other words, a
ven attacks on applications, host-based attacks— model of the behavior of the entity being monitored
such as privilege escalation, unauthorized logins could be constructed by an intrusion detection sys-
and access to sensitive files—and malware. An intru- tem, and subsequent behavior of the entity could
sion detection system is a dynamic monitoring entity be verified against the entity’s model. In this model,
that complements the static monitoring abilities of a behavior that deviates sufficiently from the norm is
firewall. An intrusion detection system monitors considered anomalous. In the paper, Denning men-
traffic in a network in promiscuous mode, very much tioned several models that are based on statistics,
like a network sniffer. The network packets that are Markov chains, time-series, etc.
collected are analyzed for rule violations by a pattern In a much cited survey on intrusion detection sys-
recognition algorithm. When rule violations are tems, Axelsson [9] put forth a generalized model of a
detected, the intrusion detection system alerts the typical intrusion detection system. Fig. 1 depicts
administrator. such a system where solid arrows indicate data/
One of the earliest work that proposed intrusion control flow while dotted arrows indicate a response
detection by identifying abnormal behavior can be to intrusive activity. According to Axelsson, the
attributed to Anderson [7]. In his report, Anderson generic architectural model of an intrusion detection
presents a threat model that classifies threats as system contains the following modules:

Security Officers Response to Intrusion

Configuration Entity Security


Monitored Entity Reference Data
Data Authority

Audit Collection Audit Storage Analysis and Detection Alarm

Active/Processing Data

Active Intrusion Response

Fig. 1. Organization of a generalized intrusion detection system [9].


A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470 3451

• Audit data collection: This module is used in the • Signature or misuse detection is a technique for
data collection phase. The data collected in this intrusion detection that relies on a predefined
phase are analyzed by the intrusion detection set of attack signatures. By looking for specific
algorithm to find traces of suspicious activity. patterns, the signature detection-based intru-
The source of the data can be host/network activ- sion detection systems match incoming packets
ity logs, command-based logs, application-based and/or command sequences to the signatures of
logs, etc. known attacks. In other words, decisions are
• Audit data storage: Typical intrusion detection made based on the knowledge acquired from
systems store the audit data either indefinitely the model of the intrusive process and the
or for a sufficiently long time for later reference. observed trace that it has left in the system. Legal
The volume of data is often exceedingly large. or illegal behavior can be defined and compared
Hence, the problem of audit data reduction is a with observed behavior. Such a system tries to
major research issue in the design of intrusion collect evidence of intrusive activity irrespective
detection systems. of the normal behavior of the system. One of
• Analysis and detection: The processing block is the chief benefits of using signature detection is
the heart of an intrusion detection system. It is that known attacks can be detected reliably with
here that the algorithms to detect suspicious a low false positive rate. The existence of specific
activities are implemented. Algorithms for the attack sequences ensures that it is easy for the
analysis and detection of intrusions have been system administrator to determine exactly which
traditionally classified into three broad catego- attacks the system is currently experiencing. If
ries: signature (or misuse) detection, anomaly the audit data in the log files do not contain the
detection and hybrid (or compound) detection. attack signature, no alarm is raised. Another
• Configuration data: The configuration data are benefit is that the signature detection system
the most sensitive part of an intrusion detection begins protecting the computer/network immedi-
system. It contains information that is pertinent ately upon installation. One of the biggest
to the operation of the intrusion detection system problems with signature detection systems is
itself such as information on how and when to maintaining state information of signatures in
collect audit data, how to respond to intrusions, which an intrusive activity spans multiple discrete
etc. events—that is, the complete attack signature
• Reference data: The reference data storage mod- spans multiple packets. Another drawback is that
ule stores information about known intrusion the signature detection system must have a signa-
signatures (in the case of signature detection) or ture defined for all of the possible attacks that an
profiles of normal behavior (in the case of anom- attacker may launch. This requires frequent sig-
aly detection). In the latter case, the profiles are nature updates to keep the signature database
updated when new knowledge about system up-to-date.
behavior is available. • An anomaly detection system first creates a base-
• Active/processing data: The processing element line profile of the normal system, network, or
must frequently store intermediate results such program activity. Thereafter, any activity that
as information about partially fulfilled intrusion deviates from the baseline is treated as a possible
signatures. intrusion. Anomaly detection systems offer sev-
• Alarm: This part of the system handles all output eral benefits. First, they have the capability to
from the intrusion detection system. The output detect insider attacks. For instance, if a user or
may be either an automated response to an intru- someone using a stolen account starts performing
sion or a suspicious activity alert for a system actions that are outside the normal user-profile,
security officer. an anomaly detection system generates an alarm.
Second, because the system is based on custom-
Historically, intrusion detection research has ized profiles, it is very difficult for an attacker
concentrated on the analysis and detection stage to know with certainty what activity it can carry
of the architectural model shown in Fig. 1. As men- out without setting off an alarm. Third, an anom-
tioned above, algorithms for the analysis and detec- aly detection system has the ability to detect pre-
tion of intrusions/attacks are traditionally classified viously unknown attacks. This is due to the fact
into the following three broad categories: that a profile of intrusive activity is not based
3452 A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470

on specific signatures representing known intru- 3.1. Premise of anomaly detection


sive activity. An intrusive activity generates an
alarm because it deviates from normal activity, The central premise of anomaly detection is that
not because someone configured the system to intrusive activity is a subset of anomalous activity
look for a specific attack signature. Anomaly [10]. If we consider an intruder, who has no idea
detection systems, however, also suffer from of the legitimate user’s activity patterns, intruding
several drawbacks. The first obvious drawback into a host system, there is a strong probability that
is that the system must go through a training per- the intruder’s activity will be detected as anomalous.
iod in which appropriate user profiles are created In the ideal case, the set of anomalous activities will
by defining ‘‘normal’’ traffic profiles. Moreover, be the same as the set of intrusive activities. In such
creating a normal traffic profile is a challenging a case, flagging all anomalous activities as intrusive
task. The creation of an inappropriate normal activities results in no false positives and no false
traffic profile can lead to poor performance. negatives. However, intrusive activity does not
Maintenance of the profiles can also be time-con- always coincide with anomalous activity. Kumar
suming. Since, anomaly detection systems are and Stafford [10] suggested that there are four pos-
looking for anomalous events rather than sibilities, each with a non-zero probability:
attacks, they are prone to be affected by time con-
suming false alarms. False alarms are classified as • Intrusive but not anomalous: These are false nega-
either being false positive or false negative. tives. An intrusion detection system fails to detect
A false positive occurs when an IDS reports as this type of activity as the activity is not anoma-
an intrusion an event that is in fact legitimate net- lous. These are called false negatives because the
work activity. A side affect of false positives, is intrusion detection system falsely reports the
that an attack or malicious activity on the net- absence of intrusions.
work/system could go undetected because of all • Not intrusive but anomalous: These are false pos-
the previous false positives. This failure to detect itives. In other words, the activity is not intrusive,
an attack is termed as a false negative in the but because it is anomalous, an intrusion detec-
intrusion detection jargon. A key element of tion system reports it as intrusive. These are
modern anomaly detection systems is the alert called false positives because an intrusion detec-
correlation module. However, the high percent- tion system falsely reports intrusions.
age false alarms that are typically generated in • Not intrusive and not anomalous: These are true
anomaly detection systems make it very difficult negatives; the activity is not intrusive and is not
to associate specific alarms with the events that reported as intrusive.
triggered them. Lastly, a pitfall of anomaly detec- • Intrusive and anomalous: These are true posi-
tion systems is that a malicious user can train an tives; the activity is intrusive and is reported as
anomaly detection system gradually to accept such.
malicious behavior as normal.
• A hybrid or compound detection system combines When false negatives need to be minimized,
both approaches. In essence, a hybrid detection thresholds that define an anomaly are set low. This
system is a signature inspired intrusion detection results in many false positives and reduces the
system that makes a decision using a ‘‘hybrid efficacy of automated mechanisms for intrusion
model’’ that is based on both the normal behav- detection. It creates additional burdens for the
ior of the system and the intrusive behavior of the security administrator as well, who must investi-
intruders. gate each incident and discard false positive
instances.

3. Anomaly detection techniques 3.2. Techniques used in anomaly detection

An anomaly detection approach usually consists In this subsection, we review a number of differ-
of two phases: a training phase and a testing phase. ent architectures and methods that have been pro-
In the former, the normal traffic profile is defined; posed for anomaly detection. These include
in the latter, the learned profile is applied to new statistical anomaly detection, data-mining based
data. methods, and machine learning based techniques.
A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470 3453

3.2.1. Statistical anomaly detection Haystack [11] is one of the earliest examples of a
In statistical methods for anomaly detection, the statistical anomaly-based intrusion detection sys-
system observes the activity of subjects and gener- tem. It used both user and group-based anomaly
ates profiles to represent their behavior. The profile detection strategies, and modeled system parameters
typically includes such measures as activity intensity as independent, Gaussian random variables. Hay-
measure, audit record distribution measure, cate- stack defined a range of values that were considered
gorical measures (the distribution of an activity over normal for each feature. If during a session, a fea-
categories) and ordinal measure (such as CPU ture fell outside the normal range, the score for
usage). Typically, two profiles are maintained for the subject was raised. Assuming the features were
each subject: the current profile and the stored independent, the probability distribution of the
profile. As the system/network events (viz. audit scores was calculated. An alarm was raised if the
log records, incoming packets, etc.) are processed, score was too large. Haystack also maintained a
the intrusion detection system updates the current database of user groups and individual profiles. If
profile and periodically calculates an anomaly score a user had not previously been detected, a new user
(indicating the degree of irregularity for the specific profile with minimal capabilities was created using
event) by comparing the current profile with the restrictions based on the user’s group membership.
stored profile using a function of abnormality of It was designed to detect six types of intrusions:
all measures within the profile. If the anomaly score attempted break-ins by unauthorized users, mas-
is higher than a certain threshold, the intrusion querade attacks, penetration of the security control
detection system generates an alert. system, leakage, DoS attacks and malicious use.
Statistical approaches to anomaly detection have One drawback of Haystack was that it was designed
a number of advantages. Firstly, these systems, like to work offline. The attempt to use statistical analy-
most anomaly detection systems, do not require ses for real-time intrusion detection systems failed,
prior knowledge of security flaws and/or the attacks since doing so required high-performance systems.
themselves. As a result, such systems have the capa- Secondly, because of its dependence on maintaining
bility of detecting ‘‘zero day’’ or the very latest profiles, a common problem for system administra-
attacks. In addition, statistical approaches can pro- tors was the determination of what attributes were
vide accurate notification of malicious activities that good indicators of intrusive activity.
typically occur over extended periods of time and One of the earliest intrusion detection systems
are good indicators of impending denial-of-service was developed at the Stanford Research Institute
(DoS) attacks. A very common example of such (SRI) in the early 1980’s and was called the Intru-
an activity is a portscan. Typically, the distribution sion Detection Expert System (IDES) [13,14]. IDES
of portscans is highly anomalous in comparison to was a system that continuously monitored user
the usual traffic distribution. This is particularly behavior and detected suspicious events as they
true when a packet has unusual features (e.g., a occurred. In IDES, intrusions could be flagged by
crafted packet). With this in mind, even portscans detecting departures from established normal
that are distributed over a lengthy time frame behavior patterns for individual users. As the anal-
will be recorded because they will be inherently ysis methodologies developed for IDES matured,
anomalous. scientists at SRI developed an improved version of
However, statistical anomaly detection schemes IDES called the Next-Generation Intrusion Detec-
also have drawbacks. Skilled attackers can train a tion Expert System (NIDES) [15,16]. NIDES was
statistical anomaly detection to accept abnormal one of the few intrusion detection systems of its gen-
behavior as normal. It can also be difficult to deter- eration that could operate in real time for continu-
mine thresholds that balance the likelihood of false ous monitoring of user activity or could run in a
positives with the likelihood of false negatives. In batch mode for periodic analysis of the audit data.
addition, statistical methods need accurate statistical However, the primary mode of operation of NIDES
distributions, but, not all behaviors can be modeled was to run in real-time. A flow chart describing the
using purely statistical methods. In fact, a majority real time operation of NIDES is shown in Fig. 2.
of the proposed statistical anomaly detection tech- Unlike IDES, which is an anomaly detection sys-
niques require the assumption of a quasi-stationary tem, NIDES is a hybrid system that has an
process, which cannot be assumed for most data upgraded statistical analysis engine. In both IDES
processed by anomaly detection systems. and NIDES, a profile of normal behavior based
3454 A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470

on a selected set of variables is maintained by the putation in NIDES [16] renders a value that is cor-
statistical analysis unit. This enables the system to related with how abnormal this measure is.
compare the current activity of the user/system/net- Combining the values obtained for each mea-
work with the expected values of the audited intru- sure and taking into consideration the correlation
sion detection variables stored in the profile and between measures, the unit computes an index of
then flag an anomaly if the audited activity is suffi- how far the current audit record is from the normal
ciently far from the expected behavior. Each vari- state. Records beyond a threshold are flagged as
able in the stored profile reflects the extent to possible intrusions.
which a particular type of behavior is similar to However the techniques used in [13–16] have sev-
the profile built for it under ‘‘normal conditions’’. eral drawbacks. Firstly, the techniques are sensitive
The way that this is computed is by associating each to the normality assumption. If data on a measure
measure/variable to a corresponding random vari- are not normally distributed, the techniques would
able. The frequency distribution is built and yield a high false alarm rate. Secondly, the tech-
updated over time, as more audit records are ana- niques are predominantly univariate in that a statis-
lyzed. It is computed as an exponential weighted tical norm profile is built for only one measure of
sum with a half-life of 30 days. This implies that the activities in a system. However, intrusions often
the half-life value makes audit records that were affect multiple measures of activities collectively.
gathered 30 days in the past to contribute with half Statistical Packet Anomaly Detection Engine
as much weight as recent records; those gathered 60 (SPADE) [17] is a statistical anomaly detection sys-
days in the past contribute one-quarter as much tem that is available as a plug-in for SNORT [18],
weight, and so on. The frequency distribution is and is can be used for automatic detection of
kept in the form of a histogram with probabilities stealthy port scans. SPADE was one of the first
associated with each one of the possible ranges that papers that proposed using the concept of an anom-
the variable can take. The cumulative frequency dis- aly score to detect port scans, instead of using the
tribution is then built by using the ordered set of bin traditional approach of looking at p attempts over
probabilities. Using this frequency distribution, and q seconds. In [17], the authors used a simple
the value of the corresponding measure for the cur- frequency based approach, to calculate the ‘‘anom-
rent audit record, it is possible to compute a value aly score’’ of a packet. The fewer times a given
that reflects how far away from the ‘‘normal’’ value packet was seen, the higher was its anomaly score.
of the measure the current value is. The actual com- In other words, the authors define an anomaly score

Target System Target System

Data
Coalescing
Center

Statistical Statistical Analysis Rulebase Analysis Rulebased


Analysis Results Results Analysis

Resolver

Resolved Analysis
Results

User Interface

Fig. 2. Flow chart of real time operation in NIDES [12].


A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470 3455

as the degree of strangeness based on recent past such as entropy and information gain, to evaluate
activity. Once the anomaly score crossed a thresh- the quality of anomaly detection methods, deter-
old, the packets were forwarded to a correlation mine system parameters, and build models. These
engine that was designed to detect port scans. How- metrics help one to understand the fundamental
ever, the one major drawback for SPADE is that it properties of audit data. The highlighting features
has a very high false alarm rate. This is due to the of some of the schemes surveyed in this section are
fact that SPADE classifies all unseen packets as presented in Table 1.
attacks regardless of whether they are actually intru-
sions or not. 3.2.2. Machine learning based anomaly detection
Anomalies resulting from intrusions may cause Machine learning can be defined as the ability of
deviations on multiple measures in a collective man- a program and/or a system to learn and improve
ner rather than through separate manifestations on their performance on a certain task or group of
individual measures. To overcome the latter prob- tasks over time. Machine learning aims to answer
lem, Ye et al. [19] presented a technique that used many of the same questions as statistics or data
the Hotellings T2 test1 to analyze the audit trails mining. However, unlike statistical approaches
of activities in an information system and detect which tend to focus on understanding the process
host based intrusions. The assumption is that host that generated the data, machine learning tech-
based intrusions leave trails in the audit data. The niques focus on building a system that improves
advantage of using the Hotellings T2 test is that it its performance based on previous results. In other
aids in the detection of both counter relationship words systems that are based on the machine
anomalies as well as mean-shift anomalies. In learning paradigm have the ability to change their
another paper, Kruegel et al. [20] show that it is pos- execution strategy on the basis of newly acquired
sible to find the description of a system that com- information.
putes a payload byte distribution and combines
this information with extracted packet header fea- 3.2.2.1. System call based sequence analysis. One of
tures. In this approach, the resultant ASCII charac- the widely used machine learning techniques for
ters are sorted by frequency and then aggregated anomaly detection involves learning the behavior
into six groups. However, this approach leads to a of a program and recognizing significant deviations
very coarse classification of the payload. from the normal. In a seminal paper, Forrest et al.
A problem that many network/system adminis- [23] established an analogy between the human
trators face is the problem of defining, on a global immune system and intrusion detection. They did
scale, what network/system/user activity can be this by proposing a methodology that involved ana-
termed as ‘‘normal’’. Maxion and Feather [21] char- lyzing a program’s system call sequences to build a
acterized the normal behavior in a network by using normal profile. In their paper, they analyzed several
different templates that were derived by taking the UNIX based programs like sendmail, lpr, etc., and
standard deviations of Ethernet load and packet showed that correlations in fixed length sequences
count at various periods in time. An observation of system calls could be used to build a normal pro-
was declared anomalous if it exceeded the upper file of a program. Therefore, programs that show
bound of a predefined threshold. However, Maxion sequences that deviated from the normal sequence
et al. did not consider the non-stationary nature of profile could then be considered to be victims of
network traffic which would have resulted in minor an attack. The system they developed was only used
deviations in network traffic to go unnoticed. off-line using previously collected data and used a
More recently, analytical studies on anomaly quite simple table-lookup algorithm to learn the
detection systems were conducted. Lee and Xiang profiles of programs. Their work was extended by
[22] used several information-theoretic measures, Hofmeyr et al. [24], where they collected a database
of normal behavior for each program of interest.
Once a stable database is constructed for a given
1
The Hotelling’s T2 test statistic for an observation xi is program in a particular environment, the database
determined as T 2 ¼ nðxi  x Þ0 W1 ðxi  x
Þ where x
 is the mean, was then used to monitor the program’s behavior.
xi = (xi1,xi2,. . .,xip) denote an observation of p measures on a
processor system at time i and W is the sample variance. A large
The sequences of system calls formed the set of nor-
value of T2 indicates a large deviation observation xi from the in- mal patterns for the database, and sequences not
control population. found in the database indicated anomalies.
3456 A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470

Table 1
A summary of statistical anomaly detection systems
Reference Highlighting feature Methodology
Haystack [11] It uses descriptive statistics to model user behavior Also modeled acceptable Host based statistical anomaly
behavior for a generic user within a particular user group detection
NIDES [15] A distributed intrusion detection system that had both anomaly as well as signature Network based statistical
detection modules anomaly detection
Staniford Statistical anomaly detection technique that calculates an anomaly score for each Network based statistical
et al. [17] packet that it sees; it forwards the packets to a correlation engine for intrusion anomaly detection
detection purposes when a predefined threshold was crossed
Ye et al. [19] It uses the Hotellings T2 test to analyze the audit trails of activities in an computer Host based multivariate
system and detect host-based intrusions statistical anomaly detection

Another machine learning technique that has drawbacks. Firstly, the computational overhead
been frequently used in the domain of machine that is involved in monitoring every system call is
learning is the sliding window method. The sliding very high. This high overhead leads to a perfor-
window method, a sequential learning methodol- mance degradation of the monitored system. The
ogy, converts the sequential learning problem into second problem is that system calls themselves are
the classical learning problem. It constructs a win- irregular by nature. This irregularity leads to high
dow classifier hw that maps an input window of false positive rate as it becomes difficult to differen-
width w into an individual output value y. Specifi- tiate between normal and anomalous system calls.
cally, let d = (w  1)/2 be the ‘‘half-width’’ of the
window. Then hw predicts yi,t using the window 3.2.2.2. Bayesian networks. A Bayesian network is a
hxi,t  d, xi, t  d+1, . . ., xi, t, . . ., xi, t + d  1, xi, t + di. The graphical model that encodes probabilistic relation-
window classifier hw is trained by converting each ships among variables of interest. When used in
sequential training example (xi, yi) into windows conjunction with statistical techniques, Bayesian
and then applying a standard machine learning networks have several advantages for data analysis
algorithm. A new sequence x is classified by con- [28]. Firstly, because Bayesian networks encode
verting it to windows, applying hw to predict each the interdependencies between variables, they can
yt and then concatenating the yt’s to form the pre- handle situations where data is missing. Secondly,
dicted sequence y. The obvious advantage of this Bayesian networks have the ability to represent cau-
sliding window method is that it permits any classi- sal relationships. Therefore, they can be used to pre-
cal supervised learning algorithm to be applied. dict the consequences of an action. Lastly, because
While, the sliding window method gives adequate Bayesian networks have both causal and probabilis-
performance in many applications; it does not take tic relationships, they can be used to model
advantage of correlations between nearby yt values. problems where there is a need to combine prior
Specifically, the only relationships between nearby knowledge with data. Several researchers have
yt values that are captured are those that are pre- adapted ideas from Bayesian statistics to create
dictable from nearby xt values. If there are correla- models for anomaly detection [29–31]. Valdes
tions among the yt values that are independent of et al. [30] developed an anomaly detection system
the xt values, then these are not captured. The slid- that employed naı̈ve Bayesian networks2 to perform
ing window method has been successfully used in a intrusion detection on traffic bursts. Their model,
number of machine learning based anomaly detec- which is a part of EMERALD [32], has the capabil-
tion techniques [25–27]. Warrender et al. [27] pro- ity to potentially detect distributed attacks in which
posed a method that utilized sliding windows to
create a database of normal sequences for testing 2
A naive Bayesian network is a restricted network that has
against test instances. Eskin et al. [26], improved only two layers and assumes complete independence between the
the traditional sliding window method by proposing information nodes (i.e., the random variables that can be
a modeling methodology that uses dynamic length observed and measured). These limitations result in a tree-shaped
of a sliding window dependent on the context of network with a single hypothesis node (root node) that has
arrows pointing to a number of information nodes (child nodes).
the system-call sequence. All child nodes have exactly one parent node, that is, the root
However, system call based approaches for host node, and no other causal relationship between nodes are
based intrusion detection system suffer from two permitted.
A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470 3457

each individual attack session is not suspicious cipal component of the transformation is the linear
enough to generate an alert. However, this scheme combination of the original variables with the larg-
also has a few disadvantages. First, as pointed out est variance. In other words, the first principal com-
in [29], the classification capability of naı̈ve Bayes- ponent is the projection on the direction in which
ian networks is identical to a threshold based system the variance of the projection is maximized. The sec-
that computes the sum of the outputs obtained from ond principal component is the linear combination
the child nodes. Secondly, because the child nodes of the original variables with the second largest
do not interact between themselves and their output variance and orthogonal to the first principal com-
only influences the probability of the root node, ponent, and so on. In many data sets, the first sev-
incorporating additional information becomes diffi- eral principal components contribute most of the
cult as the variables that contain the information variance in the original data set, so that the rest
cannot directly interact with the child nodes. can be disregarded with minimal loss of the variance
Another area, within the domain of anomaly for dimension reduction of the dataset.
detection, where Bayesian techniques have been fre- PCA has been widely used in the domain of
quently used is the classification and suppression of image compression, pattern recognition and intru-
false alarms. Kruegel et al. [29] proposed a multi- sion detection. Shyu et al. [36] proposed an anomaly
sensor fusion approach where the outputs of differ- detection scheme, where PCA was used as an outlier
ent IDS sensors were aggregated to produce a single detection scheme and was applied to reduce the
alarm. This approach is based on the assumption dimensionality of the audit data and arrive at a clas-
that any anomaly detection technique cannot clas- sifier that is a function of the principal components.
sify a set of events as an intrusion with sufficient They measured the Mahalanobis distance of each
confidence. Although using Bayesian networks observation from the center of the data for anomaly
for intrusion detection or intruder behavior predic- detection. The Mahalanobis distance is computed
tion can be effective in certain applications, their based on the sum of squares of the standardized
limitations should be considered in the actual imple- principal component scores. Shyu et al. evaluated
mentation. Since the accuracy of this method is their method over the KDD CUP99 data and have
dependent on certain assumptions that are typically demonstrated that it exhibits better detection rate
based on the behavioral model of the target system, than other well known outlier based anomaly detec-
deviating from those assumptions will decrease its tion algorithms such as the Local Outlier Factor
accuracy. Selecting an inaccurate model will lead ‘‘LOF’’ approach, the Nearest Neighbor approach
to an inaccurate detection system. Therefore, select- and the kth Nearest Neighbor approach. Other
ing an accurate model is the first step towards solv- notable techniques that employ the principal com-
ing the problem. Unfortunately selecting an ponent analysis methodology include the work done
accurate behavioral model is not an easy task as by Wang et al. [35], Bouzida et al. [37] and Wang
typical systems and/or networks are complex. et al. [38].

3.2.2.3. Principal components analysis. Typical data- 3.2.2.4. Markov models. Markov chains, have also
sets for intrusion detection are typically very large been employed extensively for anomaly detection.
and multidimensional. With the growth of high Ye et al. [39], present an anomaly detection tech-
speed networks and distributed network based data nique that is based on Markov chains. In their
intensive applications storing, processing, transmit- paper, system call event sequences from the
ting, visualizing and understanding the data is recent past were studied by opening an observa-
becoming more complex and expensive. To tackle tion window of size N. The type of audit events,
the problem of high dimensional datasets, research- EtN+1, . . ., Et in the window at time t was examined
ers have developed a dimensionality reduction tech- and the sequence of states XtN+1, . . ., Xt obtained.
nique known as principal component analysis Subsequently, the probability that the sequence of
(PCA) [33–35]. In mathematical terms, PCA is a states XtN+1, . . ., Xt is normal was obtained. The
technique where n correlated random variables are larger the probability, the more likely the sequence
transformed into d 6 n uncorrelated variables. The of states results from normal activities. A sequence
uncorrelated variables are linear combinations of of states from attack activities is presumed to
the original variables and can be used to express receive a low probability of support from the
the data in a reduced form. Typically, the first prin- Markov chain model of the normal profile.
3458 A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470

A hidden Markov model, another popular behavior. Given a hidden Markov model and the
Markov technique, like the one shown in Fig. 3, is associated observations, the third problem, also
a statistical model where the system being modeled known as the decoding problem, involves determin-
is assumed to be a Markov process with unknown ing the most likely set of hidden states that have
parameters. The challenge is to determine the hid- led to those observations.
den parameters from the observable parameters. Warrender et al. [27] compare the performance of
Unlike a regular Markov model, where the state four methods viz., simple enumeration of observed
transition probabilities are the only parameters sequences, comparison of relative frequencies of dif-
and the state of the system is directly observable, ferent sequences, a rule induction technique, and
in a hidden Markov model, the only visible elements hidden Markov models at representing normal
are the variables of the system that are influenced by behavior accurately and recognizing intrusions in
the state of the system, and the state of the system system call datasets. The authors show that while
itself is hidden. A hidden Markov model’s states hidden Markov models outperform the other three
represent some unobservable condition of the sys- methods, the higher performance comes at a greater
tem being modeled. In each state, there is a certain computational cost. In the proposed model, the
probability of producing any of the observable authors use an hidden Markov model with fully
system outputs and a separate probability indicating connected states, i.e., transitions were allowed from
the likely next states. By having different output any state to any other state. Therefore, a process
probability distributions in each of the states, and that issues S system calls will have Sstates. This
allowing the system to change states over time, the implies that we will roughly have 2S2 values in the
model is capable of representing non-stationary state transition matrix. In a computer system/net-
sequences. work, a process typically issues a very large number
To estimate the parameters of a hidden Markov of system calls. Modeling all of the processes in a
model for modeling normal system behavior, computer system/network would therefore be com-
sequences of normal events collected from normal putationally infeasible.
system operation are used as training data. An In another paper, Yeung et al. [40] describe the
expectation-maximization (EM) algorithm is used use of hidden Markov models for anomaly detec-
to estimate the parameters. Once a hidden Markov tion based on profiling system call sequences and
model has been trained, when confronted with test shell command sequences. On training, their model
data, probability measures can be used as thresholds computes the sample likelihood of an observed
for anomaly detection. In order to use hidden Mar- sequence using the forward or backward algorithm.
kov models for anomaly detection, three key prob- A threshold on the probability, based on the mini-
lems need to be addressed. The first problem, also mum likelihood among all training sequences, was
known as the evaluation problem, is to determine used to discriminate between normal and anoma-
given a sequence of observations, what is the prob- lous behavior. One major problem with this
ability that the observed sequence was generated by approach is that it lacks generalization and/or sup-
the model. The second is the learning problem which port for users who are not uniquely identified by the
involves building from the audit data a model, or a system under consideration.
set of models, that correctly describes the observed Mahoney et al. [41–43] presented several meth-
ods that address the problem of detecting anomalies
in the usage of network protocols by inspecting
packet headers. The common denominator of all
x1 x2 x3 xn
of them is the systematic application of learning
Hidden States techniques to automatically obtain profiles of nor-
mal behavior for protocols at different layers.
Mahoney et al. experimented with anomaly detec-
tion over the DARPA network data [44] by range
Observable States matching network packet header fields. Packet
Header Anomaly Detector (PHAD) [41], LEarning
y1 y2 y3 yn Rules for Anomaly Detection (LERAD) [42] and
Application Layer Anomaly Detector (ALAD)
Fig. 3. Example of a hidden Markov model. [43] use time-based models in which the probability
A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470 3459

of an event depends on the time since it last system, researchers are increasingly looking at using
occurred. For each attribute, they collect a set of data mining techniques for anomaly detection [45–
allowed values and flag novel values as anomalous. 47]. Grossman [48] defines data mining as being
PHAD, ALAD, and LERAD differ in the attributes ‘‘concerned with uncovering patterns, associations,
that they monitor. PHAD monitors 33 attributes changes, anomalies, and statistically significant
from the Ethernet, IP and transport layer packet structures and events in data’’. Simply put data min-
headers. ALAD models incoming server TCP ing is the ability to take data as input, and pull from
requests: source and destination IP addresses and it patterns or deviations which may not be seen eas-
ports, opening and closing TCP flags, and the list ily to the naked eye. Another term sometimes used
of commands (the first word on each line) in the is knowledge discovery. Data mining can help
application payload. Depending on the attribute, improve the process of intrusion detection by add-
it builds separate models for each target host, ing a level of focus to anomaly detection. By identi-
port number (service), or host/port combination. fying bounds for valid network activity, data mining
LERAD also models TCP connections. Even will aid an analyst in his/her ability to distinguish
though, the data set is multivariate network traffic attack activity from common everyday traffic on
data containing fields extracted out of the packet the network.
headers, the authors break down the multivariate
problem into a set of univariate problems and sum 3.2.3.1. Classification-based intrusion detection. An
the weighted results from range matching along intrusion detection system that classifies audit data
each dimension. While the advantage of this as normal or anomalous based on a set of rules, pat-
approach is that it makes the technique more com- terns or other affiliated techniques can be broadly
putationally efficient and effective at detecting net- defined as a classification-based intrusion detection
work intrusions, breaking multivariate data into system. The classification process typically involves
univariate data has significant drawbacks especially the following steps:
at detecting attacks. For example, in a typical SYN
flood attack an indicator of the attack, is having 1. Identify class attributes and classes from training
more SYN requests than usual, but observing a data.
lower than normal ACK rate. Because higher 2. Identify attributes for classification.
SYN rate or lower ACK rate alone can both happen 3. Learn a model using the training data.
in normal usage (when the network is busy or idle), 4. Use the learned model to classify the unknown
it is the combination of higher SYN rate and lower data samples.
ACK rate that signals the attack.
The one major drawback of many of the machine A variety of classification techniques have been
learning techniques, like the system call based proposed in the literature. These include inductive
sequence analysis approach and the hidden Markov rule generation techniques, fuzzy logic, genetic algo-
model approach mentioned above, is that they are rithms and neural networks-based techniques.
resource expensive. For example, an anomaly detec- Inductive rule generation algorithms typically
tion technique that is based on the Markov chain involve the application of a set of association rules
model is computationally expensive because it uses and frequent episode patterns to classify the audit
parametric estimation techniques based on the data. In this context, if a rule states that ‘‘if event
Bayes’ algorithm for learning the normal profile of X occurs, then event Y is likely to occur’’, then events
the host/network under consideration. If we consider X and Y can be described as sets of (variable, value)-
the large amount of audit data and the relatively high pairs where the aim is to find the sets X and Y such
frequency of events that occur in computers and net- that X ‘‘implies’’ Y. In the domain of classification,
works of today, such a technique for anomaly detec- we fix Y and attempt to find sets of X which are
tion is not scalable for real time operation. The good predictors for the right classification. While
highlighting features of some of the schemes sur- supervised classification typically only derives rules
veyed in this section are presented in Table 2. relating to a single attribute, general rule induction
techniques, which are typically unsupervised in nat-
3.2.3. Data mining based anomaly detection ure, derive rules relating to any or all the attributes.
To eliminate the manual and ad hoc elements The advantage of using rules is that they tend to be
from the process of building an intrusion detection simple and intuitive, unstructured and less rigid. As
3460 A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470

Table 2
A summary of machine learning based anomaly detection systems
Reference Highlighting features Methodology
Forrest et al. [23] It finds correlations in fixed length sequences of system calls Sequence analysis based
in the UNIX operating system, and uses them to build a normal statistical anomaly
profile for anomaly detection detection
Eskin et al. [26] It employs dynamic window sizes to improve system call modeling System call modeling and
sequence analysis
Valdes et al. [30] It employs Bayesian inference techniques to determine if traffic bursts Bayesian networks based
contain attacks. It also provides distributed attack detection capability anomaly detection
Shyu et al. [36] It uses principal component analysis as an outlier detection scheme to Principal component
detect intrusions analysis based anomaly
detection
Yeung et al. [40] It presents a dynamic modeling approach based on hidden Markov Host based anomaly
models and a static modeling approach based on event occurrence detection system
frequency distribution for modeling system calls. It showed that the
dynamic modeling approach is better for system call datasets
PHAD [41] Examines IP headers, connections to various ports as well as packet Network based anomaly
headers of Ethernet, IP and transport layer and other transport layer detection
packet headers
ALAD [43] It detects anomalies in inbound TCP connections to well known Network based anomaly
ports on the server detection
LERAD [42] It detects TCP stream anomalies like ALAD, but uses a learning Network based anomaly
algorithm to pick good rules from the training set, rather than using detection
a fixed set of rules

the drawbacks they are difficult to maintain, and in the primary advantages of using RIPPER is that
some cases, are inadequate to represent many types the generated rules are easy to use and verify. Lee
of information. A number of inductive rule genera- et al. [45,46,50] used RIPPER to characterize
tion algorithms have been proposed in literature. sequences occurring in normal data by a smaller
Some of them first construct a decision tree and then set of rules that capture the common elements in
extract a set of classification rules from the decision those sequences. During monitoring, sequences vio-
tree.3 Other algorithms (for e.g., RIPPER [25], C4.5 lating those rules are treated as anomalies.
[49]) directly induce rules from the data by employ- Fuzzy logic techniques have been in use in the
ing a divide-and-conquer approach. A post learning area of computer and network security since the late
stage involving either discarding (C4.5 [49]) or prun- 1990’s [51]. Fuzzy logic has been used for intrusion
ing (RIPPER [25]) some of the learnt rules is carried detection for two primary reasons [52]. Firstly, sev-
out to increase the classifier accuracy. RIPPER has eral quantitative parameters that are used in the
been successfully used in a number of data mining context of intrusion detection, e.g., CPU usage time,
based anomaly detection algorithms to classify connection interval, etc., can potentially be viewed
incoming audit data and detect intrusions. One of as fuzzy variables. Secondly, as stated by Bridges
et al. [52], the concept of security itself is fuzzy. In
other words, the concept of fuzziness helps to
3
Decision trees are powerful and popular tools for classifica- smooth out the abrupt separation of normal behav-
tion and prediction. The attractiveness of tree-based methods is ior from abnormal behavior. That is, a given data
due in large part to the fact that, in contrast to neural networks,
point falling outside/inside a defined ‘‘normal inter-
decision trees represent rules. A decision tree is a tree that has
three main components: nodes, arcs, and leaves. Each node is
val’’, will be considered anomalous/normal to the
labeled with a feature attribute which is most informative among same degree regardless of its distance from/within
the attributes not yet considered in the path from the root, each the interval. Dickerson et al. [53] developed the
arc out of a node is labeled with a feature value for the node’s Fuzzy Intrusion Recognition Engine (FIRE) using
feature and each leaf is labeled with a category or class. fuzzy sets and fuzzy rules. FIRE uses simple data
A decision tree can then be used to classify a data point by
starting at the root of the tree and moving through it until a leaf mining techniques to process the network input data
node is reached. The leaf node would then provide the classifi- and generate fuzzy sets for every observed feature.
cation of the data point. The fuzzy sets are then used to define fuzzy rules
A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470 3461

to detect individual attacks. FIRE does not estab- ral network based solutions have several drawbacks.
lish any sort of model representing the current state Firstly, they may fail to find a satisfactory solution
of the system, but instead relies on attack specific either because of lack of sufficient data or because
rules for detection. Instead, FIRE creates and there is no learnable function. Secondly, neural net-
applies fuzzy logic rules to the audit data to classify works can be slow and expensive to train. The lack
it as normal or anomalous. Dickerson et al. found of speed is partly because of the need to collect and
that the approach is particularly effective against analyze the training data and partly because the
port scans and probes. The primary disadvantage neural network has to manipulate the weights of
to this approach is the labor intensive rule genera- the individual neurons to arrive at the correct solu-
tion process. tion. There are a few different groups advocating
Genetic algorithms, a search technique used to various approaches to using neural networks for
find approximate solutions to optimization and intrusion detection. Ghosh et al. [58–60] used the
search problems, have also been extensively feed-forward back propagation and the Elman
employed in the domain of intrusion detection to recurrent network [61] for classifying system-call
differentiate normal network traffic from anomalous sequences. Their experimental results with the
connections. The major advantage of genetic algo- 1998 and 1999 DARPA intrusion detection evalua-
rithms is their flexibility and robustness as a global tion dataset verified that the application of Elman
search method. In addition, a genetic algorithm networks in the domain of program-based intrusion
search converges to a solution from multiple direc- detection provided superior results as compared to
tions and is based on probabilistic rules instead of using the standard multilayer perceptron based neu-
deterministic ones. In the domain of network intru- ral network. However, training the Elman network
sion detection, genetic algorithms have been used in was expensive and the number of neural networks
a number of ways. Some approaches [54,55] have required was large. In another paper, Ramadas
used genetic algorithms directly to derive classifica- et al. [62] present the Anomalous Network-Traffic
tion rules, while others [52,56] use genetic algo- Detection with Self Organizing Maps (ANDSOM).
rithms to select appropriate features or determine ANDSOM is the anomaly detection module for
optimal parameters of related functions, while dif- the network based intrusion detection system, called
ferent data mining techniques are then used to INBOUNDS, being developed at Ohio University.
acquire the rules. The earliest attempt to apply The ANDSOM module creates a two dimensional
genetic algorithms to the problem of intrusion Self Organizing Map or SOM4 for each network ser-
detection was done by Crosbie and Spafford [57] vice that is being monitored. In the paper, the
in 1995, when they applied multiple agent technol- authors test the proposed methodology using the
ogy to detect network based anomalies. While the DNS and HTTP services. Neurons are trained with
advantage of the approach was that it used numer- normal network traffic during the training phase to
ous agents to monitor a variety of network based capture characteristic patterns. When real time data
parameters, lack of intra-agent communication is fed to the trained neurons, then an anomaly is
and a lengthy training process were some issues that detected if the distance of the incoming traffic is
were not addressed. more than a preset threshold.
Neural network based intrusion detection systems Anomaly detection schemes also involve other
have traditionally been host based systems that data mining techniques such as support vector
focus on detecting deviations in program behavior machines (SVM) and other types of neural network
as a sign of an anomaly. In the neural network models [63,64]. Because data mining techniques are
approach to intrusion detection, the neural network data driven and do not depend on previously
learns to predict the behavior of the various users observed patterns of network/system activity, some
and daemons in the system. The main advantage of these techniques have been very successful at
of neural networks is their tolerance to imprecise detecting new kinds of attacks. However, these tech-
data and uncertain information and their ability to niques often have a very high false positive rate. For
infer solutions from data without having prior example, as pointed out in [65], the approach
knowledge of the regularities in the data. This in
combination with their ability to generalize from 4
A self organizing map is a method for unsupervised learning
learned data has shown made them an appropriate based on a grid of artificial neurons whose weights are adapted to
approach to intrusion detection. However, the neu- match input vectors in a training set.
3462 A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470

adopted by Sung and Mukkamala [66] that use a tributes equally to the calculation of the Euclidean
SVM technique to realize an intrusion detection sys- distance, this distance is undesirable in many appli-
tem for class-specific detection is flawed because cations. This is especially true when features have
they totally ignore the relationships and dependen- very different variability or different features are
cies between the features. measured on different scales. The effect of the fea-
tures that have large scales of measurement or high
3.2.3.2. Clustering and outlier detection. Clustering is variability would dominate others that have smaller
a technique for finding patterns in unlabeled data scales or less variability. A distance-based approach
with many dimensions.5 Clustering has attracted which incorporates this measure of variability is the
interest from researchers in the context of intrusion Mahalanobis distance6 [74,75] based outlier detec-
detection [67–69]. The main advantage that cluster- tion scheme. In this scheme, the threshold is com-
ing provides is the ability to learn from and detect puted according to the most distant points from
intrusions in the audit data, while not requiring the mean of the ‘‘normal’’ data, and it is set to be
the system administrator to provide explicit descrip- a user defined percentage, say m%, of the total num-
tions of various attack classes/types. As a result, the ber of points. In this scheme, all data points in the
amount of training data that needs to be provided audit data that have distances to the mean (calcu-
to the anomaly detection system is also reduced. lated during the training phase) greater than the
Clustering and outlier detection are closely related. threshold are detected as outliers. The Mahalanobis
From the viewpoint of a clustering algorithm, outli- distance utilizes group means and variances for
ers are objects not located in the clusters of a data each variable, and the correlations and covariances
set, and in the context of anomaly detection, they between measures.
may represent intrusions/attacks. The statistics The notion of distance-based outliers was
community has studied the concept of outliers quite recently introduced in the study of databases [68].
extensively [70]. In these studies, data points are According to this notion, a point, P, in a multidi-
modeled using a stochastic distribution and points mensional data set is an outlier if there are less than
are determined to be outliers based on their rela- p points from the data in the e-neighborhood of P,
tionship with this model. However, with increasing where p is a user-specified constant. Ramaswamy
dimensionality, it becomes increasingly difficult to et al. [68] described an approach that is based on
accurately estimate the multidimensional distribu- computing the Euclidean distance of the kth nearest
tions of the data points [71]. Recent outlier detec- neighbor from a point O. In other words, the
tion algorithms [68,72,73] are based on the full k-nearest neighbor algorithm classifies points by
dimensional distances between the points as well assigning them to the class that appears most fre-
as the densities of local neighborhoods. quently amongst the k nearest neighbors. Therefore,
There exist at least two approaches to clustering for a given point O, dk(O) denotes the Euclidean dis-
based anomaly detection. In the first approach, the tance from the point O to its kth nearest neighbor
anomaly detection model is trained using unlabelled and can be considered as the ‘‘degree of outlierness’’
data that consists of both normal as well as attack of O. If one is interested in the top n outliers, this
traffic. In the second approach, the model is trained approach defines an outlier as follows: Given values
using only normal data and a profile of normal for k and n, a point O is an outlier, if the distance to
activity is created. The idea behind the first its kth nearest neighbor is smaller than the corre-
approach is that anomalous or attack data forms sponding value for no more than (n  1) other
a small percentage of the total data. If this assump- points. In other words, the top n outliers with the
tion holds, anomalies and attacks can be detected
based on cluster sizes—large clusters correspond 6
The basic Euclidian distance treats each variable as equally
to normal data, and the rest of the data points, important in calculating the distance. An alternative approach is
which are outliers, correspond to attacks. to scale the contribution of individual variables to the distance
value according to the variability of each variable. This approach
The distances between points play an important is illustrated by the Mahalanobis distance which is a measure of
role in clustering. The most popular distance metric the distance between each observation in a multidimensional
is the Euclidean distance. Since each feature con- cloud of points and the centroid of the cloud. In other words, the
Mahalanobis distance between a particular point x and the
5 mean
q l of the ‘‘normal data’’ is computed as: DM ðxÞ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The number of dimensions is equivalent to the number of P P
attributes. ðx  lÞT 1 ðx  lÞ, where is the covariance matrix.
A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470 3463

maximum Dk(O) values are considered as outliers. support. The probability of rule confidence is
While the advantage of the k-nearest neighbors defined as the conditional probability P(Y 
approach is that it is robust to noisy data, the TjX  T) The rule X ) Y has support s in the trans-
approach suffers from the drawback that it is very action database D if s% of transactions in D contain
difficult to choose an optimal value for k in practice. X [ Y. Association rules have been successfully used
Several papers [76,77] have proposed using the to mine audit data to find normal patterns for
k-nearest neighbor outlier detection algorithms for anomaly detection [46,50,81]. They are particularly
the purpose of anomaly detection. important in the domain of anomaly detection
The Minnesota Intrusion Detection System because association rules can be used to construct
(MINDS) [78] is another network-based anomaly a summary of anomalous connections detected by
detection approach that utilizes data mining tech- the intrusion detection system. There is evidence
niques. The MINDS anomaly detection module that suggests program executions and user activities
assigns a degree of outlierness to each data point, exhibit frequent correlations among system features.
which is called the local outlier factor (LOF) [72]. These consistent behaviors can be captured in asso-
The LOF takes into consideration the density of ciation rules.
the neighborhood around the observation point to Lee et al. [46,50] proposed an association rule-
determine its outlierness. In this scheme, outliers based data mining approach for anomaly detection
are objects that tend to have high LOF values. where raw data was converted into ASCII network
The advantage of the LOF algorithm is its ability packet information, which in turn was converted
to detect all forms of outliers, including those into connection-level information. These connection
that cannot be detected by the distance-based level records contained connection features like ser-
algorithms. vice, duration, etc. Association rules were then
applied to this data to create models to detect intru-
3.2.3.3. Association rule discovery. Association rules sions. In another paper, Barbará et al. describe
[79,80] are one of many data mining techniques that Audit Data Analysis and Mining (ADAM), a real-
describe events that tend to occur together. The con- time anomaly detection system that uses a module
cept of association rules can be understood as to classify the suspicious events into false alarms
follows: Given a database D of transactions where or real attacks. ADAM’s training and intrusion
each transaction T 2 D denotes a set of items in detection phases are illustrated in Figs. 4a and 4b,
the database, an association rule is an implication respectively. ADAM was one out of seven systems
of the form X ) Y, where X  D, Y  D and tested in the 1999 DARPA evaluation [82]. It uses
X \ Y = ;. The rule X ) Y holds in the transaction data mining to build a customizable profile of rules
set Dwith confidence c if c% of transactions in X also of normal behavior and then classifies attacks
contain Y. Two important concepts when dealing (by name) or declares false alarms. To discover
with association rules are rule confidence and rule attacks in TCPdump audit trail, ADAM uses a

Attack Free
Training Data
Offline Single &
Domain Level Mining

Profile

Training Online Single &


Data Domain Level Mining

Classify as
Suspicious
Attacks or Classifier Builder
Item Sets
False Alarms Training
Feature Selection
Features

Fig. 4a. The training phase of ADAM [81].


3464 A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470

Test Data Offline Single &


Profile
Domain Level Mining
Suspicious
False Alarms
Item Sets
Classify as
Feature Selection Attacks or
False Alarms
Known /
Unknown
Attacks

Security
Officer

Fig. 4b. The intrusion detection phase of ADAM [81].

Table 3
A summary of data mining-based anomaly detection systems
Reference Highlighting features Methodology
Lee et al. [46] It uses inductive rule generation to generate rules for important, Classification based anomaly
yet infrequent events detection
FIRE [53] It generates fuzzy sets for every observed feature which are in turn Classification based anomaly
used to define fuzzy rules to detect individual attacks detection
ANDSOM [62] It uses a pre-processor to summarize certain connection parameters Classification based anomaly
(source and destination host and port) and then adds several values detection
to track and classify the connection’s behavior
MINDS [78] It clusters data and uses the density-based local outliers to detect Clustering based anomaly
intrusions detections
ADAM [81] It performs anomaly detection to filter out most of the normal traffic, Association rules and classification
then it uses a classification technique to determine the exact nature based anomaly detection
of the remaining activity

combination of association rules, mining and classi- cious item set and a vector of features, classify the
fication. During the training phase, ADAM builds a item set as a known attack (and label it with the
database of ‘‘normal’’ frequent itemsets using attack name of attack), an unknown attack, or a false
free data. Then it runs a sliding window online algo- alarm. The highlighting features of some of the
rithm that finds frequent item sets in the last D con- schemes surveyed in this section are presented in
nections and compares them with those stored in the Table 3.
normal item set repository. With the remaining item
sets that have deemed suspicious, ADAM uses a 4. Hybrid systems
classifier which has previously been trained to clas-
sify the suspicious connections as a known attack, It has been suggested in the literature [14,15,
unknown attack, or a false alarm. Association rules 32,83,84] that the monitoring capability of current
are used to gather necessary knowledge about the intrusion detection systems can be improved by tak-
nature of the audit data. If the item set’s support ing a hybrid approach that consists of both anomaly
surpasses a threshold, then that item set is reported as well as signature detection strategies. In such a
as suspicious. The system annotates suspicious item hybrid system, the anomaly detection technique aids
sets with a vector of parameters. Since the system in the detection of new or unknown attacks while
knows where the attacks are in the training set, the signature detection technique detects known
the corresponding suspicious item set along with attacks. The signature detection technique will also
their feature vectors are used to train a classifier. be able to detect attacks launched by a patient
The trained classifier will be able to, given a suspi- attacker who attempts to change the behavior
A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470 3465

patterns with the objective of retraining the anom- utilized the Common Intrusion Detection Frame-
aly detection module so that it will accept attack work (CIDF) to automatically get audit data, build
behavior as normal. Tombini et al. [83] used an models, and distribute signatures for novel attacks
approach wherein the anomaly detection technique to ensure that the time required to detect them is
is used to produce a list of suspicious items. The reduced. The advantage of using CIDF was that it
classifier module which uses a signature detection enabled different intrusion detection and response
technique then classified the suspicious items into components to interoperate and share the informa-
false alarms, attacks, and unknown attacks. This tion and resources in a distributed environment.
approach works on the premise that the anomaly Although it is true that combining multiple intru-
detection component would have a high detection sion detection technologies into a single system can
rate, since missed intrusions cannot be detected by theoretically produce a much stronger intrusion
the follow-up signature detection component. In detection system, the resulting hybrid systems are
addition, it also assumed that the signature detec- not always better. Different intrusion detection tech-
tion component will be able to identify false nologies examine system and/or network traffic and
alarms. While the hybrid system can still miss cer- look for intrusive activity in different ways. There-
tain types of attacks, its reduced false alarm rate fore, the major challenge to building an operational
increases the likelihood of examining most of the hybrid intrusion detection system is getting these
alerts. different technologies to interoperate effectively
EMERALD [32] was developed in the late 1990’s and efficiently.
at SRI. It is an extension of the seminal work done
in [8,13–15]. EMERALD is a hierarchical intrusion 5. The road ahead: open challenges
detection system that monitors systems at a variety
of levels viz. individual host machines, domains and In the last twenty years, intrusion detection sys-
enterprises to form an analysis hierarchy. EMER- tems have slowly evolved from host- and operating
ALD uses a subscription-based communication system-specific applications to distributed systems
scheme both within and between monitors. How- that involve a wide array of operating systems.
ever, inter monitor subscription methodology is The challenges that lie ahead for the next generation
hierarchical and therefore limits the access to events of intrusion detection systems and, more specifi-
and/or results from the layer immediately below. cally, for anomaly detection systems are many. First
The system has a built-in feedback system that and foremost, traditional intrusion detection sys-
enables the higher layers to request more informa- tems have not adapted adequately to new network-
tion about particular anomalies from the lower ing paradigms like wireless and mobile networks
layers. To achieve a high rate of detection, the archi- nor have they scaled to meet the requirements posed
tects of EMERALD employed an ensemble of tech- by high-speed (gigabit and terabit) networks (an
niques like statistical analysis engines and expert analysis of intrusion detection techniques for high-
systems. The single most defining feature of EMER- speed networks can be found in [87,88]). Factors
ALD is its ability to analyze system-wide, domain- like noise in the audit data, constantly changing
wide and enterprise-wide attacks like Internet traffic profiles, and the large amount of network
worms, DDoS attacks, etc. at the top level. traffic make it difficult to build a normal traffic pro-
In another paper [84], Zhang et al. [85] employed file of a network for the purpose of intrusion detec-
the random forests algorithm in the signature detec- tion. The implication is that, short of some
tion module to detect known intrusions. Thereafter, fundamental re-design, today’s intrusion detection
the outlier detection provided by the random forests approaches will not be able to adequately pro-
algorithm is utilized to detect unknown intrusions. tect tomorrow’s networks against intrusions and
Approaches that use signature detection and anom- attacks. Therefore, the design methodology of intru-
aly detection in parallel have also been proposed. In sion detection systems needs to closely follow the
such systems, two sets of reports of possible intru- changes in system and networking technologies.
sive activity are produced and a correlation compo- A perennial problem that prevents widespread
nent analyzes both sets to detect intrusions. An deployment of intrusion detection systems is their
example of such a system is NIDES [15,16]. inability to suppress false alarms. It has been shown
Lee et al. [45,86] extended the work done by them in lab testing [89] that state of the art intrusion
in [50] and proposed a hybrid detection scheme that detection systems often crash under the burden of
3466 A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470

the false alarms that they generate. When actual An important aspect of intrusion detection,
attacks do occur, either they are missed completely which has also been proposed as an evaluation met-
by the intrusion detection system or the traces left ric [94–96], is the ability of an intrusion detection
by the intruder and/or the traces indicating the system to defend itself from attacks. Attacks on
actual attack are lost in amongst the large number intrusion detection systems can take several forms.
of false alarms in the event logs. To make matters As a specific example, consider an attacker sending
worse, Axelsson [90], showed that for an intrusion a large volume of non-attack packets that are spe-
detection system to be effective the number of false cially crafted to trigger many alarms within an
alarms have to be really low. In his paper, Axelsson intrusion detection system, thereby overwhelming
suggested that 1 false alarm in 100,000 events was the human operator with false positives or crashing
the minimum requirement for an intrusion detection the alert processing or display tools. Axelsson [9], in
system to be effective. Therefore, the primary and his 1998 survey of intrusion detection systems,
probably the most important challenge that needs found that a majority of the available intrusion
to be met is the development of effective strategies detection systems at that time performed very
to reduce the high rate of false alarms. poorly when it came to defending themselves from
Over the years, numerous techniques, models, and attacks. Since then, the ability of intrusion detection
full-fledged intrusion detection systems have been systems at defending themselves from attacks has
proposed and built in the commercial and research improved only marginally.
sectors. However, there is no globally acceptable Another problem that still poses a major chal-
standard/metric for evaluating an intrusion detec- lenge is trying to define what is ‘‘normal’’ in a net-
tion system. Although the Receiver Operating Char- work. As mentioned in [97], there is a need for the
acteristic (ROC) curve has been widely used to discovery of ‘‘attack invariant’’ features. An attack
evaluate the accuracy of intrusion detection systems invariant would be a feature of the network/system
and analyze the tradeoff between the false positives that can always be verified except in the presence of
rate and the detection rate, evaluations based on an attack. Examples include traffic volume, number
the ROC curve are often misleading and/or incom- of connections to non standard ports, etc. In order
plete [91,92]. Recently, several methods have been to define such attack invariants, it is imperative that
proposed to address this issue [90–92]. However, a better understanding of the nature of anomalies in
most, if not all, of the proposed solutions rely on the network and/or host is gained.
parameters values (such as the cost associated with Traditionally, encryption has been a preferred
each false alarm or missed attack instance) that are methodology for securing data and preventing mali-
difficult to obtain and are subjective to a particular cious users from getting access to privileged/private
network or system. As a result, such metrics may information. However, the widespread use of
lack the objectivity required to conduct a fair evalu- encryption implies that network administrators
ation of a given system. Therefore, one of the open have a limited view of the network as traditional
challenges is the development of a general systematic intrusion detection systems do not have the ability
methodology and/or a set of metrics that can be used to decrypt the encrypted packets that they intercept.
to fairly evaluate intrusion detection systems. When an intrusion detection system intercepts an
There is a lack of a standard evaluation dataset encrypted packet, it typically discards it, which
that can simulate realistic network environments. results in greatly limiting the amount of traffic that
While the 1998 and 1999 intrusion detection evalu- it is capable of inspecting. Therefore, the challenge
ations from DARPA/MIT Lincoln labs have been for security researchers is the development of secu-
used to evaluate a large number of intrusion detec- rity mechanisms that provide data security while
tion systems, the methodology used to generate the not limiting the functions of intrusion detection
data as well as the data itself have been shown to be systems.
inappropriate for simulating actual network envi- An increasing problem in today’s corporate net-
ronments [93]. Therefore, there is a critical need to works is the threats posed by insiders, viz., disgrun-
build a more appropriate evaluation dataset. The tled employees. In a survey [98] conducted by the
methodology for generating the evaluation dataset United States Secret Service and CERT of Carnegie
should not only simulate realistic network condi- Mellon University, 71% of respondents out of 500
tions but also be able to generate datasets that have participants reported that 29% of the attacks that
normal traffic interlaced with anomalous traffic. they experienced were caused by insiders. Respon-
A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470 3467

dents identified current or former employees and [14] T.F. Lunt, A. Tamaru, F. Gilham, R. Jagannathm, C. Jalali,
contractors as the second greatest cyber security P.G. Neumann, H.S. Javitz, A. Valdes, T.D. Garvey, A
Real-time Intrusion Detection Expert System (IDES),
threat, preceded only by hackers. Configuring an Computer Science Laboratory, SRI International,
intrusion detection system to detect internal attacks Menlo Park, CA, USA, Final Technical Report, February
is very difficult. The greatest challenge lies in creat- 1992.
ing a good rule set for detecting ‘‘internal’’ attacks [15] D. Anderson, T. Frivold, A. Tamaru, A. Valdes, Next-
or anomalies. Different network users require differ- generation intrusion detection expert system (NIDES),
Software Users Manual, Beta-Update release, Computer
ent degrees of access to different services, servers, Science Laboratory, SRI International, Menlo Park, CA,
and systems for their work, thus making it extre- USA, Technical Report SRI-CSL-95-0, May 1994.
mely difficult to define and create user- or system- [16] D. Anderson, T.F. Lunt, H. Javitz, A. Tamaru, A. Valdes,
specific usage profiles. Although there is some Detecting Unusual Program Behavior Using the Statistical
existing work in this area (e.g., [99,100]), more Component of the Next-generation Intrusion Detection
Expert System (NIDES), Computer Science Laboratory,
research is needed to find practical solutions. SRI International, Menlo Park, CA, USA SRI-CSL-95-06,
May 1995.
References [17] S. Staniford, J.A. Hoagland, J.M. McAlerney, Practical
automated detection of stealthy portscans, Journal of
[1] E. Millard, Internet attacks increase in number, severity, in: Computer Security 10 (2002) 105–136.
Top Tech News, 2005. [18] M. Roesch, Snort – lightweight intrusion detection for
[2] J. Phillips, Hackers’ invasion of OU data raises blizzard of networks, in: Proceedings of the 13th USENIX Conference
questions, in: The Athens News Athens, OH, 2006. on System Administration Seattle, Washington, 1999, pp.
[3] C. Staff, Hackers: companies encounter rise of cyber 229–238.
extortion, vol. 2006, Computer Crime Research Center, [19] N. Ye, S.M. Emran, Q. Chen, S. Vilbert, Multivariate
2005. statistical analysis of audit trails for host-based intrusion
[4] M. Williams, Immense network assault takes down Yahoo, detection, IEEE Transactions on Computers 51 (2002) 810–
in: CNN.COM, 2000. 820.
[5] C.S. Institute, F.B.o. Investigation, in: Proceedings of the [20] C. Krügel, T. Toth, E. Kirda, Service specific anomaly
10th Annual Computer Crime and Security Survey 10, detection for network intrusion detection, in: Proceedings
2005, pp. 1–23. of the 2002 ACM symposium on Applied computing
[6] S. Axelsson, Intrusion Detection Systems: A Survey and Madrid, Spain 2002, pp. 201–208.
Taxonomy, Chalmers University, Technical Report 99-15, [21] R.A. Maxion, F.E. Feather, A case study of Ethernet
March 2000. anomalies in a distributed computing environment, IEEE
[7] J.P. Anderson, Computer security threat monitoring and Transactions on Reliability 39 (1990) 433–443.
surveillance, James P Anderson Co., Fort, Washington, [22] W. Lee, D. Xiang, Information theoretic measures for
PA, USA, Technical Report 98-17, April 1980. anomaly detection, in: Proceedings of the 2001 IEEE
[8] D.E. Denning, An intrusion-detection model, IEEE Trans- Symposium on Security and Privacy, Washington, DC,
actions in Software Engineering 13 (1987) 222–232. USA, 2001, pp. 130–143.
[9] S. Axelsson, Research in intrusion-detection systems: a [23] S. Forrest, S.A. Hofmeyr, A. Somayaji, T.A. Longstaff, A
survey, Department of Computer Engineering, Chalmers sense of self for unix processes, in: Proceedings of the IEEE
University of Technology, Goteborg, Sweden, Technical Symposium on Research in Security and Privacy, Oakland,
Report 98-17, December 1998. CA, USA, 1996, pp. 120–128.
[10] S. Kumar, E.H. Spafford, An application of pattern [24] S.A. Hofmeyr, S. Forrest, A. Somayaji, Intrusion detection
matching in intrusion detection, The COAST Project, using sequences of system calls, Journal of Computer
Department of Computer Sciences, Purdue University, Security 6 (1998) 151–180.
West Lafayette, IN, USA, Technical Report CSD-TR-94- [25] W.W. Cohen, Fast effective rule induction, in: Proceedings
013, June 17, 1994. of the 12th International Conference on Machine Learning,
[11] S.E. Smaha, Haystack: An intrusion detection system, in: Tahoe City, CA, 1995, pp. 115–123.
Proceedings of the IEEE Fourth Aerospace Computer [26] E. Eskin, S.J. Stolfo, W. Lee, Modeling system calls for
Security Applications Conference, Orlando, FL, 1988, pp. intrusion detection with dynamic window sizes, in: Pro-
37–44. ceedings of the DARPA Information Survivability Confer-
[12] D. Anderson, T. Frivold, A. Valdes, Next-generation ence & Exposition II, Anaheim, CA 2001, pp. 165–175.
Intrusion Detection Expert System (NIDES): A Summary, [27] C. Warrender, S. Forrest, B. Pearlmutter, Detecting intru-
Computer Science Laboratory, SRI International, Menlo sions using system calls: alternative data models, in:
Park, CA 94025, Technical Report SRI-CSL-95-07, May Proceedings of the IEEE Symposium on Security and
1995. Privacy, Oakland, CA, USA, 1999, pp. 133–145.
[13] D.E. Denning, P.G. Neumann, Requirements and Model [28] D. Heckerman, A Tutorial on Learning With Bayesian
for IDES—A Real-time Intrusion Detection System, Networks, Microsoft Research, Technical Report MSR-
Computer Science Laboratory, SRI International, Menlo TR-95-06, March 1995.
Park, CA 94025-3493, Technical Report # 83F83-01-00, [29] C. Kruegel, D. Mutz, W. Robertson, F. Valeur, Bayesian
1985. event classification for intrusion detection, in: Proceedings
3468 A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470

of the 19th Annual Computer Security Applications Con- [45] W. Lee, R.A. Nimbalkar, K.K. Yee, S.B. Patil, P.H. Desai,
ference, Las Vegas, NV, 2003. T.T. Tran, S.J. Stolfo, A data mining and CIDF based
[30] A. Valdes, K. Skinner, Adaptive model-based monitoring approach for detecting novel and distributed intrusions, in:
for cyber attack detection, in: Recent Advances in Intrusion Proceedings of the 3rd International Workshop on Recent
Detection Toulouse, France, 2000, pp. 80–92. Advances in Intrusion Detection (RAID 2000), Toulouse,
[31] N. Ye, M. Xu, S.M. Emran, Probabilistic networks with France, 2000, pp. 49–65.
undirected links for anomaly detection, in: Proceedings of [46] W. Lee, S.J. Stolfo, Data mining approaches for intrusion
the IEEE Systems, Man, and Cybernetics Information detection, in: Proceedings of the 7th USENIX Security
Assurance and Security Workshop, West Point, NY, 2000. Symposium (SECURITY-98), Berkeley, CA, USA, 1998,
[32] P.A. Porras, P.G. Neumann, EMERALD: event monitor- pp. 79–94.
ing enabling responses to anomalous live disturbances, in: [47] W. Lee, S.J. Stolfo, K.W. Mok, Adaptive intrusion
Proceedings of the 20th NIST-NCSC National Information detection: a data mining approach, Artificial Intelligence
Systems Security Conference, Baltimore, MD, USA, 1997, Review 14 (2000) 533–567.
pp. 353–365. [48] R. Grossman, Data Mining: Challenges and Opportunities
[33] R.A. Calvo, M. Partridge, M.A. Jabri, A comparative study for Data Mining During the Next Decade, 1997.
of principal component analysis techniques, in: Proceedings [49] J.R. Quinlan, C4.5: Programs for Machine Learning,
of the Ninth Australian Conference on Neural Networks, Morgan Kaufman, Los Altos, CA, 1993.
Brisbane, Qld, Australia, 1998. [50] W. Lee, S.J. Stolfo, K.W. Mok, A data mining framework
[34] H. Hotelling, Analysis of a complex of statistical variables for building intrusion detection models, in: Proceedings of
into principal components, Journal of Educational Psy- the IEEE Symposium on Security and Privacy, Oakland,
chology 24 (1993) 417–441, 498–520. CA, 1999, pp. 120–132.
[35] W. Wang, R. Battiti, Identifying intrusions in computer [51] H.H. Hosmer, Security is fuzzy!: applying the fuzzy logic
networks with principal component analysis, in: The First paradigm to the multipolicy paradigm, in: Proceedings of
International Conference on Availability, Reliability and the 1992–1993 Workshop on New Security Paradigms Little
Security, Vienna, Austria, 2006, pp. 270–279. Compton, RI, United States, 1993.
[36] M.-L. Shyu, S.-C. Chen, K. Sarinnapakorn, L. Chang, [52] S.M. Bridges, R.B. Vaughn, Fuzzy data mining and genetic
A novel anomaly detection scheme based on principal algorithms applied to intrusion detection, in: Proceedings of
component classifier, in: Proceedings of the IEEE Founda- the National Information Systems Security Conference,
tions and New Directions of Data Mining Workshop, Baltimore, MD, 2000.
Melbourne, FL, USA, 2003, pp. 172–179. [53] J.E. Dickerson, J.A. Dickerson, Fuzzy network profiling
[37] Y. Bouzida, F.e.e. Cuppens, N. Cuppens-Boulahia, for intrusion detection, in: Proceedings of the 19th Inter-
S. Gombault, Efficient intrusion detection using principal national Conference of the North American Fuzzy Infor-
component analysis, in: Proceedings of the 3ème Confé- mation Processing Society (NAFIPS), Atlanta, GA, 2000,
rence sur la Sécurité et Architectures Réseaux (SAR), pp. 301–306.
Orlando, FL, USA, 2004. [54] W. Li, Using Genetic Algorithm for Network Intrusion
[38] W. Wang, X. Guan, X. Zhang, A novel intrusion detection Detection, C.S.G. Department of Energy, 2004, pp. 1–8.
method based on principle component analysis in computer [55] M.M. Pillai, J.H.P. Eloff, H.S. Venter, An approach to
security, in: Proceedings of the International Symposium on implement a network intrusion detection system using
Neural Networks, Dalian, China, 2004, pp. 657–662. genetic algorithms, in: Proceedings of the 2004 Annual
[39] N. Ye, Y.Z.C.M. Borror, Robustness of the Markov-chain Research Conference of the South African Institute of
model for cyber-attack detection, IEEE Transactions on Computer Scientists and Information Technologists on IT
Reliability 53 (2004) 116–123. Research in Developing Countries, Stellenbosch, Western
[40] D.-Y. Yeung, Y. Ding, Host-based intrusion detection Cape, South Africa, 2004, pp. 221–228.
using dynamic and static behavioral models, Pattern [56] J. Gomez, D. Dasgupta, Evolving fuzzy classifiers for
Recognition 36 (2003) 229–243. intrusion detection, in: IEEE Workshop on Information
[41] M.V. Mahoney, P.K. Chan, PHAD: Packet Header Assurance, United States Military Academy, NY, 2001.
Anomaly Detection for Identifying Hostile Network Traffic [57] M. Crosbie, G. Spafford, Applying genetic programming to
Department of Computer Sciences, Florida Institute of intrusion detection, in: Working Notes for the AAAI
Technology, Melbourne, FL, USA, Technical Report CS- Symposium on Genetic Programming, Cambridge, MA,
2001-4, April 2001. 1995, pp. 1–8.
[42] M.V. Mahoney, P.K. Chan, Learning Models of Network [58] A.K. Ghosh, C. Michael, M. Schatz, A real-time intrusion
Traffic for Detecting Novel Attacks Computer Science detection system based on learning program behavior, in:
Department, Florida Institute of Technology CS-2002-8, Proceedings of the Third International Workshop on
August 2002. Recent Advances in Intrusion Detection Toulouse, France,
[43] M.V. Mahoney, P.K. Chan, Learning nonstationary mod- 2000, pp. 93–109.
els of normal network traffic for detecting novel attacks, in: [59] A.K. Ghosh, A. Schwartzbard, A study in using neural
Proceedings of the Eighth ACM SIGKDD International networks for anomaly and misuse detection, in: Proceedings
Conference on Knowledge Discovery and Data Mining, of the Eighth USENIX Security Symposium, Washington,
Edmonton, Canada, 2002, pp. 376–385. DC, 1999, pp. 141–151.
[44] R. Lippmann, J.W. Haines, D.J. Fried, J. Korba, K. Das, [60] A.K. Ghosh, A. Schwartzbart, M. Schatz, Learning
The 1999 DARPA off-line intrusion detection evaluation, program behavior profiles for intrusion detection, in:
Computer Networks 34 (2000) 579–595. Proceedings of the 1st USENIX Workshop on Intrusion
A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470 3469

Detection and Network Monitoring, Santa Clara, CA, [78] L. Ertöz, E. Eilertson, A. Lazarevic, P.-N. Tan, V. Kumar,
USA, 1999. J. Srivastava, P. Dokas, The MINDS - Minnesota intrusion
[61] J.L. Elman, Finding structure in time, Cognitive Science 14 detection system, in: Next Generation Data Mining, MIT
(1990) 179–211. Press, Boston, 2004.
[62] M. Ramadas, S.O.B. Tjaden, Detecting anomalous network [79] R. Agrawal, T. Imielinski, A. Swami, Mining association
traffic with self-organizing maps, in: Proceedings of the 6th rules between sets of items in large databases, in: Proceed-
International Symposium on Recent Advances in Intrusion ings of the ACM SIGMOD Conference on Management of
Detection, Pittsburgh, PA, USA, 2003, pp. 36–54. Data, Washington, DC, 1993, pp. 207–216.
[63] W. Lee, S.J. Stolfo, P.K. Chan, E. Eskin, W. Fan, M. [80] J. Hipp, U. Güntzer, G. Nakhaeizadeh, Algorithms for
Miller, S. Hershkop, J. Zhang, Real time data mining-based association rule mining - a general survey and comparison,
intrusion detection, in: Proceedings of the Second DARPA in: Proceedings of the ACM SIGKDD International
Information Survivability Conference and Exposition, Conference on Knowledge Discovery and Data Mining,
Anaheim, CA, 2001, pp. 85–100. Boston, MA, USA, 2000, pp. 58–64.
[64] K.M.C. Tan, R.A. Maxion, Determining the operational [81] D. Barbará, J. Couto, S. Jajodia, N. Wu, ADAM: a testbed
limits of an anomaly-based intrusion detector, IEEE Journal for exploring the use of data mining in intrusion detection,
on Selected Areas in Communication 2 (2003) 96–110. ACM SIGMOD Record: SPECIAL ISSUE: Special section
[65] S.T. Sarasamma, Q.A. Zhu, J. Huff, Hierarchical Kohone- on data mining for intrusion detection and threat analysis
nen net for anomaly detection in network security, IEEE 30 (2001) 15–24.
Transactions on Systems, Man and Cybernetics—PART B: [82] R. Lippmann, J.W. Haines, D.J. Fried, J. Korba, K. Das,
Cybernetics 35 (2005) 302–312. The 1999 DARPA off-line intrusion detection evaluation,
[66] A.H. Sung, S. Mukkamala, Identifying important features Computer Networks: The International Journal of Com-
for intrusion detection using support vector machines and puter and Telecommunications Networking 34 (2000) 579–
neural networks, in: Proceedings of the 2003 Symposium on 595.
Applications and the Internet 2003, pp. 209–216. [83] E. Tombini, H. Debar, L. Mé, M. Ducassé, A serial
[67] L. Portnoy, E. Eskin, S.J. Stolfo, Intrusion detection with combination of anomaly and misuse IDSes applied to
unlabeled data using clustering, in: Proceedings of the HTTP traffic, in: Proceedings of the 20th Annual Computer
ACM Workshop on Data Mining Applied to Security, Security Applications Conference, Tucson, AZ, USA,
Philadelphia, PA, 2001. 2004.
[68] S. Ramaswamy, R. Rastogi, K. Shim, Efficient algorithms [84] J. Zhang, M. Zulkernine, A hybrid network intrusion
for mining outliers from large data sets, in: Proceedings of detection technique using random forests, in: Proceedings
the ACM SIGMOD International Conference on Manage- of the First International Conference on Availability,
ment of Data, Dallas, TX, USA, 2000, pp. 427–438. Reliability and Security, Vienna University of Technology,
[69] K. Sequeira, M. Zaki, ADMIT: Anomaly-based data 2006, pp. 262–269.
mining for intrusions, in: Proceedings of the 8th ACM [85] L. Breiman, Random forests, Machine Learning 45 (2001)
SIGKDD International Conference on Knowledge Discov- 5–32.
ery and Data Mining, Edmonton, Alberta, Canada, 2002, [86] W.L.S.J. Stolfo, P.K. Chan, E. Eskin, W. Fan, M. Miller, S.
pp. 386–395. Hershkop, J. Zhang, Real time data mining-based intrusion
[70] V. Barnett, T. Lewis, Outliers in Statistical Data, Wiley, detection, in: Proceedings of the Second DARPA Informa-
1994. tion Survivability Conference and Exposition, Anaheim,
[71] C.C. Aggarwal, P.S. Yu, Outlier detection for high dimen- CA, USA, 2001, pp. 85–100.
sional data, in: Proceedings of the ACM SIGMOD [87] C. Kruegel, F. Valeur, G. Vigna, R. Kemmerer, Stateful
International Conference on Management of Data, 2001, intrusion detection for high-speed networks, in: Proceed-
pp. 37–46. ings of the IEEE Symposium on Security and Privacy, 2002,
[72] M. Breunig, H.-P. Kriegel, R.T. Ng, J. Sander, LOF: pp. 285– 294.
identifying density-based local outliers, in: Proceedings of [88] A. Patcha, J.-M. Park, Detecting denial-of-service attacks
the ACM SIGMOD International Conference on Manage- with incomplete audit data, in: Proceedings of the 14th
ment of Data, Dallas, TX, 2000, pp. 93–104. International Conference on Computer Communications
[73] E.M. Knorr, R.T. Ng, Algorithms for mining distance- and Networks, San Diego, CA, USA, 2005, pp. 263–268.
based outliers in large datasets, in: Proceedings of the 24th [89] D. Newman, J. Snyder, R. Thayer, Crying wolf: False
International Conference on Very Large Data Bases, New alarms hide attacks, in: Network World: Network World,
York, NY, USA, 1998, pp. 392–403. 2002.
[74] P.C. Mahalanobis, On tests and measures of groups [90] S. Axelsson, The base-rate fallacy and its implications for
divergence, Journal of the Asiatic Society of Bengal 26 the difficulty of intrusion detection, ACM Transactions on
(1930) 541. Information and System Security 3 (2000) 186–205.
[75] Wikipedia, Mahalanobis Distance, vol. 2006, 2006. [91] JE. Gaffney, JW. Ulvila, Evaluation of intrusion detectors:
[76] V. Hautamaki, I. Karkkainen, P. Franti, Outlier detection a decision theory approach, in: Proceedings of the 2001
using k-nearest neighbour graph, in: Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA,
17th International Conference on Pattern Recognition Los USA, 2001, pp. 50–61.
Alamitos, CA, USA, 2004, pp. 430–433. [92] S.J. Stolfo, W. Fan, W. Lee, Cost-based modeling for fraud
[77] Y. Liao, V.R. Vemuri, Use of K-nearest neighbor classifier and intrusion detection: results from the JAM Project, in:
for intrusion detection, Computers & Security 21 (2002) Proceedings of the DARPA Information Survivability
439–448. Conference & Exposition, 2000, pp. 130–144.
3470 A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470

[93] J. McHugh, Testing Intrusion detection systems: a critique Prior to coming to Virginia Tech, he received his M.S. degree
of the 1998 and 1999 DARPA intrusion detection, ACM in Computer Engineering from Illinois Institute of Technology,
Transactions on Information and System Security 3 (2000) Chicago, IL in May 2002 and his B.E. in Electrical and Elec-
262–294. tronics Engineering from Birla Institute of Technology Mesra,
[94] T. Ptacek, T. Newsham, Insertion, Evasion, and Denial of Ranchi, India in December 1998 respectively. From January 1999
Service: Eluding Network Intrusion Detection, Secure to December 2000, he was a software engineer at Zensar Tech-
Networks Inc, 1998. nologies in Pune, India. He is currently a student member of the
[95] U. Shankar, V. Paxson, Active mapping: resisting NIDS IEEE, SIAM, ASEE and a global member of the Internet Society.
evasion without altering traffic, in: Proceedings of the IEEE
Symposium on Research in Security and Privacy, Oakland,
CA, 2003. Jung-Min Park received the B.S. and
[96] K.M.C. Tan, K.S. Killourhy, R.A. Maxion, Undermining M.S. degrees both in electronic engi-
an anomaly-based intrusion detection system using com- neering from Yonsei University, Seoul,
mon exploits, in: Proceedings of the Fifth International South Korea, in 1995 and 1997, respec-
Symposium on Recent Advances in Intrusion Detection, tively; and the Ph.D. degree in electrical
Zurich, Switzerland, 2002, pp. 54–73. and computer engineering from Purdue
[97] J.M. Estevez-Tapiador, P. Garcia-Teodoro, J.E. Diaz- University, West Lafayette, IN, in 2003.
Verdejo, Anomaly detection methods in wired networks: a He is currently an Assistant Professor
survey and taxonomy, Computer Communications 27 in the Department of Electrical and
(2004) 1569–1584. Computer Engineering at Virginia Poly-
[98] M. Keeney, E. Kowalski, D. Cappelli, A. Moore, T. technic Institute and State University
Shimeall, S. Rogers, Insider threat study: computer system (Virginia Tech), Blacksburg, VA. From 1997 to 1998, he worked
sabotage in critical infrastructure sectors, U.S.S. Service as a cellular systems engineer at Motorola Korea Inc. His current
and C.M.U. Software Engineering Institute, Software interests are in network security, applied cryptography, and
Engineering Institute, Carnegie Mellon University, 2005, cognitive radio networks. More details about his research inter-
pp. 1–45. ests and publications can be found at http://www.ece.vt.edu/
[99] A. Liu, C. Martin, T. Hetherington, S. Matzner, A faculty/park.html.
comparison of system call feature representations for He is a member of the Institute of Electrical and Electronics
insider threat detection, in: Proceedings of the 6th Annual Engineers (IEEE), Association for Computing Machinery
IEEE Systems, Man and Cybernetics (SMC) Information (ACM), and the Korean-American Scientists and Engineers
Assurance Workshop, West Point, NY, 2005 pp. 340–347. Association (KSEA). He was a recipient of a 1998 AT& T
[100] J.S. Park, J. Giordano, Role-based profile analysis for Leadership Award.
scalable and accurate insider-anomaly detection, in: Pro-
ceedings of the 25th IEEE International Performance,
Computing, and Communications Conference, Phoenix,
AZ, 2006, pp. 463–470.

Animesh Patcha is a doctoral candidate


in the Bradley Department of Electrical
and Computer Engineering at Virginia
Tech. since August 2002. His area of
research is computer and network secu-
rity in wired and wireless networks.
Currently, he is working on stochastic
intrusion detection techniques under the
expert guidance of Dr. Jung-Min Park in
the Laboratory for Advanced Research
in Information Assurance and Security.

You might also like