02 Whole
02 Whole
02 Whole
Traffic
Doctor of Philosophy
By
Aruna Jamdagni
In
i
UNIVERSITY OF TECHNOLOFY, SYDNEY
SCHOOL OF COMPUTER AND COMMUNICATIONS
The undersigned hereby certify that they have read this thesis entitled “Payload-
based Anomaly Detection in HTTP Traffic” by Aruna Jamdagni and that in
his opinion it is fully adequate, in scope and in quality, as a thesis for the degree
of Doctor of Philosophy.
ii
CERTIFICATE OF AUTHORSHIP/ORIGINALITY
I certify that the work in this thesis has not been submitted for a degree nor has it been
submitted as part of requirements for a degree except as fully acknowledged within the
text.
I also certify that the thesis has been written by me. Any help that I have received in my
research work and the preparation of the thesis itself has been acknowledged. In
addition, I certify that all information sources and literature used are indicated in the
thesis.
------------------------------------------------------
Signature of Author
iii
Abstract
Internet provides quality and convenience to human life but at the same time it provides
a platform for network hackers and criminals. Intrusion Detection Systems (IDSs) have
been proven to be powerful methods for detecting anomalies in the network. Traditional
IDSs based on signatures are unable to detect new (zero days) attacks. Anomaly-based
systems are alternative to signature based systems. However, present anomaly detection
systems suffer from three major setbacks:
In this thesis, we address above issues and develop efficient intrusion detection
frameworks and models which can be used in detecting a wide variety of attacks
including web-based attacks. Our proposed methods are designed to have very few false
alarms. We also address Intrusion Detection as a Pattern Recognition problem and
discuss all aspects that are important in realizing an anomaly-based IDS.
iv
For efficient operation of IDSs, the detection speed is a key point. Current IDSs examine
a large number of data features to detect intrusions and misuse patterns. Hence, for
quickly and accurately identifying anomalies of Internet traffic, feature reduction
becomes mandatory. We have proposed two models to address this issue, namely two-
tier intrusion detection model and RePIDS.
Two-tier intrusion detection model uses Linear Discriminant Analysis approach for
feature reduction and optimal feature selection. It uses MDM technique to create a
model of normal network payload using an extracted feature set.
We test the proposed IDSs on two publicly available datasets of attacks and normal
traffic. Experimental results confirm the effectiveness and validation of our proposed
solutions in terms of detection rate, false alarm rate and computational complexity.
v
Acknowledgement
This research would not have been possible without the guidance and the help of many
people. First and foremost, my utmost gratitude to my supervisor, Prof. Xiangjian He,
for his excellent guidance, support and steadfast encouragement that I will never forget.
His comments and suggestions during preparation of this thesis have been extremely
valuable. Without his support and supervision, I could not have come this far. I would
thank him for his helpfulness and to have been by far more than a simple supervisor.
Many thanks to my Employer University of Western Sydney and Prof. Simeon Simoff,
Dean, School of Computing and Engineering, who gave me time off from work. I will
be always grateful for that.
I also appreciate Dr. Qiang Wu and Dr. Wenjing Jia for providing helpful suggestions.
My special thanks to collaborator and my good friend Thomas Tan for brain storming
discussions. My friends: Thomas Tan and Sheng Wang, they are always helpful
whenever I have questions not only on research but also on other matters. It would have
been a lonely lab without them.
Last but not the least, I would like to express my love and gratitude to my family
members, especially my daughter Divya, my husband Rishi, my sister Meera and brother
in-law Satya for their endless love, understanding and encouragement to work on this
thesis.
vi
“What we are is God's gift to us. What we become is our gift to God.”
Eleanor Powell
vii
Table of Contents
Table of Contents ......................................................................................................... viii
List of Tables ................................................................................................................. xii
List of Figures…………………………………………………………………………xiii
List of Acronyms ……………………………………………………………...…….....xv
viii
2.2.3 Hybrid Intrusion Detection System ................................................. 24
2.2.4 Data Audit Time ................................................................................ 25
2.2.5 System Structure ............................................................................... 26
2.2.6 Action after Intrusion Detection ...................................................... 27
2.3 Pattern Recognition Approach .................................................................. 27
2.4 Performance Evaluation of Intrusion Detection System ......................... 32
2.5 Related Research Works ............................................................................ 34
2.5.1 Review from the Perspectives of Intrusion Detection
Techniques……………………………………………………………...….35
2.5.2 Review from the Perspective of Payload-based Intrusion Detection
System…….. ................................................................................................. 44
2.6 Conclusions .................................................................................................. 49
Chapter 3 GSAD: Geometrical Structure Anomaly Detection System.................... 51
Introduction ........................................................................................................ 51
3.1 GSAD-Geometrical Structure Anomaly Detection System ..................... 54
3.1.1 Framework of the Proposed Intrusion Detection System.............. 54
3.1.2 Framework Modules ......................................................................... 56
3.1.3 Base-line Profile Generation ............................................................ 62
3.1.4 Model Testing .................................................................................... 62
3.2 GSAD Evaluation ........................................................................................ 63
3.2.1 Experimental Setup ........................................................................... 63
3.2.2 DARPA 1999 Dataset ........................................................................ 63
3.2.3 Experimental Results and Analysis ................................................. 64
3.3 HTTP and Examples on Attacks ............................................................... 70
3.3.1 HTTP .................................................................................................. 70
3.3.2 HTTP Attack Examples .................................................................... 72
3.4 Implementation of GSAD in HTTP Environment ................................... 73
3.5 Evaluation in HTTP Environment ............................................................ 74
3.5.1 Experimental Setup ........................................................................... 74
3.5.2 Datasets .............................................................................................. 75
ix
3.5.3 Experimental Results and Analysis………………………………76
3.6 Analysis of eRsults……………………………………………………...…88
3.7 Conclusion.................................................................................................... 90
Chapter 4 Feature Selection and Two Tier Based Intrusion Detection using
LDA………. …………………... ..................................................................................... 91
Introduction ........................................................................................................ 91
4.1 Feature Selection Algorithms ..................................................................... 93
4.2 Linear Discriminant Analysis .................................................................... 95
4.3 LDA-based Intrusion Detection System…………………………………96
4.3.1 Framework of LDA-based Intrusion Detection System………….97
4.3.2 Framework Modules……………………………………………….98
4.4 Experimental Results and Analysis ......................................................... 104
4.4.1 Experimental Results……………………………………………..104
4.4.2 Analysis of Results......................................................................... 1099
4.5 Two-Tier Intrusion Detection System…………………………………..109
4.5.1 Framework of Two-Tie.r System ................................................... 110
4.6 Experimental Results and Analysis ......................................................... 114
4.6.1 Experimental Results ...................................................................... 114
4.6.2 Analysis of Results........................................................................... 120
4.7 Common Profile (Signature) for Integrated Feature Set ...................... 123
4.8 Conclusion.................................................................................................. 123
Chapter 5 RePIDS: a Multi Tier Real Time Payload Based Intrusion Detection
System…………………………………………………………………………………125
5.1 Introduction ............................................................................................... 126
5.2 State-of-Art Systems ................................................................................. 129
5.3 RePIDS: Real-time Payload Based Network Intrusion Detection System
………………………………………………………………………………….130
5.3.1 Framework of Real-Time Intrusion Detection System ................ 131
5.3.2 Framework Modules ....................................................................... 133
5.4 Experimental Results and Analysis ......................................................... 140
x
5.4.1 Experimental Setup ......................................................................... 140
5.4.2 Datasets ............................................................................................ 140
5.4.3 Model Training and Testing Process ............................................. 141
5.4.4 Results and Analysis ....................................................................... 145
5.5 Comparison of RePIDS ............................................................................. 149
5.5.1 Detection Performance ................................................................... 150
5.5.2 Complexity Analysis........................................................................ 150
5.6 Conclusions ................................................................................................ 154
Chapter 6 Conclusion and Future work ................................................................... 155
6.1 Summary .................................................................................................... 156
6.1.1 Geometrical Structure Anomaly Detection Detector ................... 157
6.1.2 Two-tier LDA-Based Detector ....................................................... 158
6.1.3 Real-time Payload Based Intrusion Detection System ................. 158
6.1.4 Single Profile (Signature) for a Group of Similar Types of
Attacks…....................................................................................................159
6.2 Thesis Contributions ................................................................................. 160
6.3 Future Work .............................................................................................. 161
References...…………………………………………………………………………..163
xi
List of Tables
2.1 Mitigation of attack strategies…………………………………………………….16
3.2 Comparison of GSAD, McPAD and PAYL on GATECH attack dataset ……....89
3.3 Summary of experimental results for Generic attacks on various dataset ……....89
4.2 Confusion matrix for LDA-based IDS using integrated feature set ………….....108
4.4 Performance of two-tier system using features from 3-types of attacks ……….119
xii
Table of Figures
2.1 Taxonomy of intrusion detection system …………………………………….......19
3.2 Average relative frequency of each byte, (a) Normal Http payload,
3.3 Average MDM Images, (a) Normal Http payload, (b) Crashiis attack payload,
3.4 Weight factor scores, (a) Normal Http request, (b) Back attack packets ………...68
3.9 Average relative frequency of characters for normal HTTP GET request
3.10 Average MDM images of normal HTTP GET request, (a) marx, (b) hume …......80
3.11 MDM images of attack packets, (a) Apache2 attack, (b) Phf attack ……………..82
3.12 Weight factor scores of attack, (a) Apache2, (b) Phf ……………………....83-84
xiii
4.2 Flow model for feature selection process …………………………………….......101
4.3 Average MDMs, (a) normal HTTP request, (b) Phf attack packets ………….…..107
4.4 Difference distance map between normal HTTP and Phf attack packets .............107
4.8 Average MDM (a) Phf attack packets, (b) difference distance map between normal
4.9 Average MDM (a) Apache2 attack packets, (b) difference distance map
between normal HTTP and Apache2 attack packets …………………………….116
5.1 Framework for real-time payload based intrusion detection system …………......131
5.2 Scree test plot, (a) Full screen plot, (b) Enlarged scree plot with first 25-
eigenvectors……………………………………………………………….…........143
5.5 MDMs of (a) Apache2 attack, (b) Phf attack payloads ………………….…. 147-48
xiv
Acronyms and Abbreviations
xv
SBS Signature Based System
SRI Stanford Research International
SVM Support Vector Machines
TC Text Categorization
U2R User to Root
xvi
Authors Publications for the Ph.D
Published papers
Journal Papers
Conference papers
based on LDA for payload feature selection,” IEEE Globecom 2010 Workshop
on Web and Pervasive Security (WPS 2010), Miami, USA, 2010, pp.1590-1594.
xvii
6. A. Jamdagni, Z. Tan, P. Nanda, X. He, R. Liu, “Intrusion detection using GSAD
model for HTTP traffic on web services,” in IWCMC‟ 10 Proceedings of the 6th
Science and Technology (FCST 2009), Shanghai, China, December 17-19, 2009,
pp. 327-333.
Anomaly Detection Model,” in the Sixth Annual CSIRO ICT Centre Science and
xviii
CHAPTER 1
INTRODUCTION
Securing a computer system has traditionally been a battle of wits: the penetrator
tries to find the holes, and the designer tries to close them.
Gosser
The growth of Internet and local area networks provide quality and convenience to
human life. According to the latest statistical analysis [1-3], it is estimated that Internet
connects over 1.1 billion users worldwide, and thousands of sub-networks. Internet has
adopted a large number of new applications, such as on line banking, online gaming, and
Internet telephony, and social networks platforms such as facebook, twitter and
LinkedIn. It is evident that Internet has had an enormous impact on the everyday life of
Internet technologies, on one hand, provide large number of on-line services to the
end users. On the other hand, they attract the attention of hackers and provide a platform
for them to attack systems in the network. The trust in the Internet and its services is
1
Response Team (CERT) at Carnegie Mellon University reported 3,734 security incidents
worldwide. In the latest version of the Cyber Security Risks Report [3], the number of
vulnerabilities increased approximately 10% from 7,260 in 2009 to over 7,900 in 2010.
Thus, maintaining information security and securing computer systems and networks
are essential. To prevent these security compromises, layers of defense, such as proxies,
filters, anti-virus scanners and firewalls, are used. Since these traditional prevention
mechanisms are imperfect, Intrusion Detection Systems (IDSs) are used to monitor local
area networks, and computer systems for security compromises. The role of IDSs is to
In general, IDSs are classified into two broad categories. Anomaly-based systems
compare attack-free data to network traffic where anomalous events are identified as
deviations from the normal. Misuse-based systems match signatures or unique character
strings to known attacks. Present anomaly intrusion detection systems suffer from a
large number of false alarms and poor efficiency in operation, and cannot be deployed in
high speed networks and applications. In this thesis, we address these issues and develop
legacy network based attacks have largely been replaced by more sophisticated web
application attacks. According to HP DVLab 2010 Top Cyber Security Risk Report
released on April 4 [4-6], “Web-based attacks jumped from only a tenth of all attacks at
2
the beginning of 2010 to more than 70 percent of all attacks by the end of the year”.
Although other protocols are being used for attacks on networks and computers,
attacks against the HTTP protocol have become the most dominant. In the first quarter
of 2010, HTTP attacks accounted for about half the total number of attacks [4-6].
Based on research review, increase in network threats originates from various reasons.
We have identified three key reasons of increase in threats and attacks on networks. In
web-based attacks.
“We’re seeing a huge explosion in Web application attacks, and the attackers are not
just using one or two vulnerabilities, they’re sending a barrage of malicious requests,
While early computer attacks have been manually crafted for specific targets, now
sophisticated and inexpensive attack toolkits (estimated at $2,400) are widely available
in the market. These toolkits have amazing range of functionality, including network
3
surveillance, polymorphic shell-codes and distributed propagation. For example, the
“Slammer worm” can infect ten thousands of hosts in a couple of minutes [8] rendering
make malicious software and network attacks attractive for illegal business.
Application developers either have very little knowledge of security or are unaware
Due to the pressure of business competition, software developers put their networks
on high risk, because developed applications are seldom tested for vulnerabilities. In
addition, when new applications are implemented on the network, the native network
“Web application vulnerabilities now comprise about half the total number of newly
discovered security vulnerabilities and the organization web-sites are constantly at risk
Having said this, security of computer systems has become a major concern with
critical public infrastructures relying on computers and the Internet. Classic security
significantly strengthen security, they cannot generally stop the possibility of network
attacks. As new attacks appear every day, intrusion prevention measures like firewalls
and cryptographic protocols are not just sufficient in ensuring the security of the
4
networked systems. Thus, intrusion detection systems are needed to detect new attacks
and defend networks from all kinds of attacks launched against either stand-alone
An IDS can detect scanning and probing attacks by analyzing network packet headers
viruses and worms that propagate at high speed on the Internet can be detected by
analyzing the rate of scanning and probing methods. Both viruses and worms exploit
drivers and services. Furthermore, there are malicious activities which do not show
abnormality in network connections and protocol behaviors but carry malicious contents
and cannot be detected by using packet header analysis and traffic flow statistic
approaches [9, 10]. In response, effective techniques based on payload analysis are
[11].
focus on design and learning approach to efficiently train payload-based detector models
on the normal and attack free data for any application, service, network or host. This
trained model can then be used to identify “abnormal” or suspicious traffic. Although
the proposed model is useful in detecting a wide range of exploits against many, if not
5
all services and ports, our focus is mainly to identify web based attacks based on HTTP
protocol.
Although anomaly detection has emerged as a promising technology and appears to hold
great future, it is extremely difficult to achieve. There are still great deals of challenges
are:
Traffic profile is changing constantly and new applications are emerging every day,
Nature of anomalies keeps changing over time. It is not possible to know the “a
priori” knowledge of attack to efficiently identify new attacks and how the attacker
The dilemma between detection rate and false alarms is the major problem [12].
Improvement in the detection rate results in increase in the false positive rate.
Encryption and tunnelling hide access to data contained into application layer header
and payloads.
Attacks present in the encrypted payload data are considered normal from the
A very high volume of network traffic due to high data rates (Gbps) and accuracy
High sensitivity to packet loss data and fragmentation and segmentation issues are
6
Most of the IDSs perform poorly in defending themselves from attacks [12].
Most detection systems in use today are network- and signature-based IDSs. Due to the
addition, new types of attacks are appearing continuously and attacks against network
services can cause great harm. No signatures have been produced and deployed for these
approaches is a severe challenge. Our work addresses challenges associated with next
Develop an efficient algorithm to detect zero-day attacks and the variants of existing
Evaluate payload features using feature selection techniques which provide ability
Achieve good accuracy in detecting truly anomalous events, with low false positive
rates.
correlation between payloads or groups of payloads during a training phase, and then use
7
this learned “normal” model to detect abnormal, never-seen before content. These
suspicious contents coming through network connections may or may not be attacks.
For efficient operation of intrusion detection system, optimal feature set is computed.
Then, this reduced feature set is used to discriminate normal and abnormal contents
We model payload content to satisfy various challenges and research objectives of our
human intervention,
• accuracy in detecting truly anomalous events, with low false positive rates,
The ideal requirements of the intrusion detection system are 100% detection of the
attack and normal network packets, with 0% false positive rate and false negative rate,
able to detect attacks in real-time and be adaptive to dynamic profile of network traffic.
In this work, we aim to achieve this ideal level or near ideal level performance by
devising efficient IDS. We conduct several experiments using DARPA 99 dataset and
8
Georgia Institute of Technology attack dataset (real network traffic) to demonstrate how
To meet all objectives in Subsection 1.4.1 is a very challenging task. Modeling the
network traffic profile using payload anomaly detection approach, these objectives can
between the payload features and also among the network packets, which provides some
information about the payload structure. A geometrical structure model is created for
application layer packet payload (n-gram window size for normal traffic content) using
MDM. We consider “clear text” content, and do not address the issue of encrypted
content of the network traffic. We believe that our technique can be used for encrypted
content applied at the point of decryption and delivered to the targeted application
software.
We consider HTTP traffic for analysis and experimental evaluations of our model.
Since most web traffic contents are usually public and pose fewer privacy restrictions, it
is easy to obtain web traffic data. We believe that the algorithms and technology
presented in this thesis can be applied to other content based traffic too. We have also
chosen to limit our study to web traffic since the number of attacks against the known
vulnerabilities is rising continuously in respect of severity and frequency [13], and also
historically web services have been a common target of previous worm attacks [14-16].
9
1.5 Contributions to Thesis
An attacker often follows a sequence of events, which are highly correlated and the
sequences have dependencies among them. Furthermore, the attacker can also hide
individual events within a large number of normal events such that the events cannot be
recognised as harmful events. Additionally, events consist of multiple features which are
monitored continuously. These features are also highly correlated and must not be
events individually, thereby, discarding any correlation between features and also
sequential events, which results in a poor model. Hence, we introduce efficient intrusion
detection frameworks and methods which consider a group of events and analyze
In Chapter 3 of this thesis, we will introduce our novel framework for building network
In Chapter 3 of this thesis, we will experimentally demonstrate that the GSAD model
can successfully detect inbound attack and worm packets with high accuracy rate and a
low false positive rate, and will compare performance of GSAD against the state-of-the-
the payload by correlation between the payload features in order to decrease the number
Network monitoring is one of the common and widely applied methods for detecting
malicious activities in an entire network. However, GSAD model uses a large number of
features to discriminate normal and malicious (attack) packets that are flowing in the
the feature selection algorithm using Linear Discriminnant Analysis (LDA) technique.
The selected features provide strong correlations between anomalous behavior and
malicious activity. These features are used to develop normal traffic sensor profiles to
detect anomaly in the network traffic. This simplifies computational complexity and also
such situation, it is often very difficult to keep signature database up to date with all
possible signatures. Moreover, the size of the signature database will also increase. In
Chapter 4, we propose to generate one common signature for group of similar type of
attacks. This will help in reducing the number of signatures for similar type of attacks.
11
Distance Map to detect normal and malicious behavior of network traffic. PCA is used
to reduce the dimensionality (number of the features) of the dataset but retains original
variability in the dataset. Mathematical and graphical methods are used independently
experimentally demonstrate that the RePIDS can successfully detect inbound attack and
worm packets with high accuracy rate and a low false positive rate. We also compare its
This dissertation presents evidence to show the validity of the following hypotheses:
It is possible and feasible to process the values associated with identified (i.e., noisy)
attributes to extract useful features. Models built using these features exhibit reduced
The newly extracted features enable behavioral modeling of application traffic that is
Models built using dominant features exhibit real-time data processing characteristic.
detection sensor, the framework and detection techniques employed in GSAD, and
12
demonstrate how well it can detect attacks. Furthermore, we present the importance of
model in HTTP environment, and confirm high detection rate and low false alarm rate
Chapter 4 discusses Linear Discriminant Approach (LDA) and its use in feature
selection and evaluation of LDA based intrusion detectors for detection of attacks. We
also present our discussion on the generation of one common signature for a group of
Component Analysis (PCA) technique, and mathematical and graphical techniques for
real-time anomaly detection. This chapter also presents a novel real-time intrusion
detection system and matrix to evaluate computational complexity and time complexity.
Chapter 6 concludes the dissertation and presents future research work that extends our
research.
13
CHAPTER 2
Introduction
Intrusion detection has been at the centre of research in the last decade due to rapid
powerful attacks which can bring down an entire network [5]. Hence, detecting intrusion
in networks and applications has become one of the most critical tasks to prevent their
detecting attacks in the form of malicious and unauthorized activities. To identify the
used for dealing with security vulnerabilities and review related research in the intrusion
our proposed model. Our model uses Mahalanobis Distance Map, a pattern
14
strategies used to alleviate security problems. Then, we present intrusion detection as a
techniques.
This chapter is organised as follows. In Section 2.1, we list common strategies used
detection systems outlining their role and requirements. Then, we discuss the intrusion
evaluation matrices for intrusion detection method are discussed. Section 2.5 covers
literature review on the related research work. We conclude this chapter in Section 2.6.
computer system or network. In 1972, Anderson [17] identified the need for intrusion
detection and proposed a threat model. He identified various types of threats and the
for attack detection and prevention. For example, the use of proper policies and physical
access restriction (traditional locks and other physical security) can prevent attacks at the
physical layer. Different strategies are used to deal with security policies. These
strategies are classified into six categories [18]. A summary of threat mitigation
15
Table 2.1: Summary of attack mitigation techniques
1. Attack Deterrence Several technical and legal measures have been undertaken to
hosts. They refer as a fear of tracing the true source of an attack. Attackers are
sources IP addresses to launch attacks (IP Address Spoofing allows people to log
it before the attack can reach the target. Attack prevention mechanism can be viewed
16
suspected attacks without any intervention required by an operator. IDPS has the
traffic at the network level based on certain rules and policies. Such traffic filtering
3. Attack Deflection This refers to tricking an attacker by making the attacker believe
that the attack was successful. Though, in reality, the attacker was trapped by the
system and deliberately made to reveal the attack. For example, honey pot is a trap
that lures attackers away from production systems [20]. Honey pot runs special
from software before they are deployed in a security-critical environment .The aim
though the attacker is able to illegitimately access that resource. For example,
5. Attack Detection Attack detection refers to detecting an attack while the attack is
still in progress or to detect an attack which has already occurred in the past.
Detecting an attack is significant for two reasons. Firstly, the system must recover
from the damage caused by the attack. Secondly, it allows the system to take
measures to prevent similar attacks in the future. Research in this area focuses on
17
analyses network traffic activity and alerts an operator to potential vulnerabilities
and attacks.
6. Attack Reaction and Recovery Once an attack is detected, the system must react to
such attack and perform the recovery mechanisms as defined in the security policy.
Intrusion detection is the act of detecting actions that attempt to compromise the
detection systems help to better understand their capabilities and limitations. Debar et al.
[22] were the first to introduce a systematic and taxonomic approach of intrusion
survey and taxonomy of intrusion detection systems, which addressed some aspects in
introduce the basic definitions and discuss typical advantages and disadvantages by
characteristics: data sources (protected system type) and the detection methods. Based
on the sources of data being audited and used to design its detection model, an intrusion
According to Time of Audit, a system can be classified into real-time and off-time,
18
whereas according to System Structure, an IDS can be classified as distributed system
and centralised system. Based on the behavior of a system, an IDS can be classified as
active IDS and passive IDS. In this section, we discuss taxonomy of intrusion detection
HBIS
Data Source
NIDS
SBS
Detection
ABS
Method
Hybrid
Intrusion
Detection Real-time
Systems
Time of Audit
Off-time
Distributed
IDS
System
Structure
Centralised
IDS
Active IDS
Behavior after
Detection
Passive IDS
19
2.2.1 Intrusion Detection Systems Based on Data Sources
Based on the sources of data being audited and used to design its detection model,
Denning [24] classifies intrusion detection systems into host based and network based
A Host-based Intrusion Detection System (HIDS) monitors a single host (or a single
application) and analyses the audit pattern generated at the operating system or that of a
collects individual log produced by the host [25]. The audit pattern contains more
specific information than the network level audit patterns, which can be used to detect an
attack more reliably. However, the main drawback of HIDS is that it is difficult to
manage a large number of host-based systems. HIDS themselves can be the victims of
collects audit patterns flowing on this segment. These collected audit patterns at the
network level are analysed for attack patterns and these attack patterns are use to protect
a single host or an entire network [27, 28]. NIDS can analyse different types of data,
namely packet header data, packet payload data, or both. The main advantages of NIDS
approach are that: it is possible to monitor data and events without affecting host
performance, and a single NIDS can be used to monitor an entire network without the
20
need for installing dedicated IDS on each host. It can detect attacks that are not visible
from a single host and can correlate attacks against multiple hosts.
However, the attack detection capability of NIDS is limited. This is because it is hard
to infer the contextual information directly from the network audit patterns.
Furthermore, the audit patterns may be encrypted rendering them unusable by the
intrusion detector at the network level. In addition, large amount of audit patterns at the
network level may also affect the attack detection accuracy based on two reasons.
Firstly, a significant portion of the total incoming patterns may be allowed to pass into
the network without any analysis. Secondly, in high speed networks, it may be practical
Due to increasing severity of attacks from the Internet, NIDSs are employed in almost
This classification is based on the model it uses for intrusion detection. Based on the
where the strengths of one model are exploited to cover the weaknesses of another. An
Signature-based Systems
attack signatures. It looks for specific patterns, and signatures, present in the incoming
21
packets and/or command sequences, and uses pattern matching approaches to detect
this approach can detect known attacks fairly accurately with a low false positive rate.
They can protect computer/network immediately upon installation and are usually fast.
The major drawback of the signature detection approaches is that they have limited
attack detection capability, since they cannot detect new (i.e. zero-day) or polymorphic
attacks, i.e., the variants of the attacks. They have high false negative alarm rate [30,
defined for all the possible attacks that an attacker may launch against a network.
Human interaction is required to keep signature database up-to-date and to analyse each
attack to develop the signature. The response time for new attacks is limited to a
worms) can appear and spread in seconds [32]. Hence, maintaining state information of
application under attack. It makes a signature-based detection system less suitable for
protecting a web-based service, because of ad-hoc and dynamic nature of web traffic.
22
Anomaly-based Systems
system. The approach used by the anomaly-based system is entirely different than the
network that are not normal. An anomaly-based system first creates a base-line profile of
incoming event against the base-line profile. A significant deviation from the known
assumes that the intrusion attempts are rare and they have different characteristics from
normal behavior. A statistical model of normal behavior is created from the training
data. When an instance that does not match the created model (learned from the training
The main strength of an anomaly detection system is that it has the capability to
detect new (zero-day) and polymorphic attacks. Novel attacks can be detected as soon as
they take place. Anomaly-based system does not need a-priori knowledge of the
attacker to know with certainty what activity it can carry out without triggering an alarm.
However, anomaly detection system also suffers from several weaknesses. For anomaly-
system/network is required. Creating normal traffic profile is also very challenging for
23
the reason that finding a proper representation of training data, which shows the normal
data, maintenance of a normal profile is difficult and time consuming. In view of the fact
that user and network behavior are not always known beforehand and since an anomaly
detection system is looking for anomalous events rather than attacks, it has high false
alarms. Not all anomalous events are malicious. Furthermore, an attacker can train
review of anomaly intrusion detection systems can be found in [33] and [34]
maintain, despite the occurrence of high false negatives. Most intrusion detection
systems in use today are network- and signature-based systems. However, the popularity
of the Internet and increasing risk in breach of network security, anomaly-based network
A hybrid system uses the partial knowledge of both, i.e., normal and attack information
to detect attacks. Thus, they have a better performance, resulting in fewer false alarms
and improved attack detection rates. Hybrid systems generally use machine learning
approaches.
The hybrid system proposed in [35], combines misuse and anomaly detection to find
attacks in logged HTTP request. They resolved the conflicts between signature-based
24
systems and anomaly-based systems to provide the best accuracy. They used manual
methods to identify normal or anomaly web requests. This heavy reliance on human also
limits the usefulness of their system. Because of many weaknesses, this system remains
Conditional Random Fields (CRSs). They integrated layered framework with the
conditional random fields to build anomaly hybrid intrusion detection system. Normal
and abnormal traffic features are used for the training of the system and conditional
Intrusion detection system can be divided into real-time intrusion detection system and
off-line intrusion detection system based on whether the data analysis is done in real-
time or afterward.
commenced. However, in practice, it is very difficult to build such a system under the
constraints of a low false alarm rate and high detection rate. Snort, an anomaly
25
Off-time Intrusion Detection System
Off-time intrusion detection system works differently from real-time intrusion detection
system. The audit data logs are collected in a central repository and patterns are analysed
for intrusions at a predefined time interval. Such systems cannot provide any immediate
response to intrusion and can only perform the recovery task once an attack is detected.
Based on the system structure, intrusion detection system can be classified into two
system. We present a brief description on them below. Both centralized and distributed
intrusion detection systems may use host- or network-based data collection methods, or
a combination of both.
Centralized intrusion detection system determines the global state of the network.
Centralised intrusion detection system collects data either from single source or multiple
sources for processing and analysing data centrally. The location where the actual
A distributed intrusion detection system is one where data is collected and analysed in
multiple hosts and decisions are made locally. The advantage of a distributed system for
intrusion detection is that immediate response mechanism can be activated based upon
26
the number of monitored components. Furthermore, distributed intrusion detection
Based on the actions that an intrusion detection system takes after it detects an attack,
intrusion detection system can be classified as active intrusion detection system and
An active Intrusion Detection Systems (IDS) is also known as Intrusion Detection and
Prevention System (IDPS). Intrusion Detection and Prevention System (IDPS) blocks
Intrusion Detection and Prevention System (IDPS) has the advantage of providing real-
problem. Anomaly detection approach usually consists of two phases: a training phase
and a testing phase. In pattern recognition studies, the learning phase constructs a
classifier from example data, which is the same as the training of the model on training
27
dataset. During the recognition phase, the classifier classifies new data patterns into
pattern classes which are similar to testing phase, where the learned profile is applied to
new data. Therefore, anomaly detection problem can be seen as a pattern recognition
problem.
The pattern recognition task can be subdivided into the following four steps. Step one
performs data acquisition. Here, data is collected from various sources and sent to the
first stage for pre-processing, where cleaning of data is performed and data is separated
into groups. In the second step, the features are extracted and selected. These selected
features represent the pattern of data. In the third step, selected features are used to
choose the model type. Finally, the classification and analysis of the results are
performed in the fourth step. Figure 2.2 shows a generic pattern recognition process.
Input
data
Data pre- Feature Model Classificat-
processing extraction selection ion Results
Following section describes process for intrusion detection using pattern recognition
technique.
Network intrusion detection aims at distinguishing the attacks on the network and
Internet from normal use of the Internet. This is a typical classification problem, and so
distinguish normal behaviors and anomalies. Intrusion detection task can be formulated
28
Figure 2.3 illustrates the design process for network intrusion detection using the
pattern recognition technique. Traffic data is first processed in order to identify network
connections between hosts. In the network, the term “connection" refers to a sequence of
data packets related to a particular service between a pair of hosts, e.g., the transfer of a
web page via the http protocol. Each network connection represents network data and
can be defined as a “pattern" to be classified. Features are extracted from the collected
data. These features are used by a pattern recognition technique to describe the patterns.
Feature Selection
& Feature Extraction
Classification of
Incoming Packets
(patterns)
Results:or Normal
Normal or
attack class
Attack
We present a brief explanation of each step and then discuss the techniques used to
29
a) N-grams Text Categorization Method
techniques are adopted to convert each process to a vector and calculate the similarity
between two process activities. Since there is no need to learn individual process profiles
Initially, Forrest et al. [38] introduced the concept of n-gram text categorization to
build a program profile. He used the sequence of system calls to characterize the
Liao and Vimuri [40] used text categorization techniques in anomaly intrusion detection.
program‟s behavior. Text categorization techniques are adopted to convert each process
to a vector. They use the k-nearest neighbor classifier to classify new program behavior
into either normal or intrusive class. They draw an analogy between a text document and
the sequence of all system calls issued by a process, i.e., program execution. Table 2.2
illustrates the similarity in some respects between text categorization and intrusion
detection.
30
Table 2.2: Analogy between text categorization and intrusion detection [39]
We use a pattern recognition technique to detect network intrusions and the n-gram
normal and malicious patterns accurately. To improve the performance of the detection
process, correlation between the features and correlation among the packets is used in
approach using geometrical structures concepts, which is used for the detection of a
human face in the image. Detailed discussion of the framework and its mathematical
c) Classifier Selection
The design goal of an anomaly detection system is to generate a model that accurately
describes normal behavior of the system. The network intrusion detection system should
be able to classify network connections (network data) between two hosts for a required
technique for the training needs data labeled as normal and abnormal, and uses a two
class classifier model. Since the number of normal samples is much bigger than the
31
number of anomaly samples, the network data (training data) is not balanced data.
Hence, it is difficult to generate a classifier that can represent true normal behavior and
true anomaly behavior of network data. Moreover, the classifier with low training
samples will show weak classification ability. Under these constraints, it is not
appropriate to select two- (or multi-) class classification model approaches and it may be
In anomaly intrusion detection, we assume that the number of normal events is much
for the classification of samples that are mostly represented in a class. The rest of the
The main aims of the intrusion detection systems are to maximize the true-positive rate
performance of the intrusion detection method depends on two basic measures: the
number of attacks detected (i.e., the true-positive rate) and the number of normal events
classified as attacks (i.e., the false-positive rate). Confusion matrix is given in Table 2.3,
32
The true positive rate (TPR) and false positive rate (FPR) for intrusion detection
(2.1)
(2.2)
are not equal in number in the training and the testing datasets, and can bias the
performance of the system [41]. Therefore, we use precision, recall and F-Value as
measures for testing the performance of a system, which do not depend on the size of the
test dataset and thus calculate unbiased performance of the system. Precision, Recall and
(2.3)
(2.4)
(2.5)
here, β corresponds to the relative importance of precision versus recall and is usually
set to 1. It is easy to see that the Detection Rate (DR) is equivalent to the “recall” rate in
information retrieval systems, while the False-Positive (FP) rate is somehow the inverse
of “precision”.
33
A common technique for visualization of these quantities is Receiver Operating
Characteristic ROC curves [42], which show true-positive rate on the Y-axis and false-
positive rate on the X-axis for different value of thresholds. The concept of ROC curves
gives single numerical measure for the performance of an intrusion detection method:
the Area Under the ROC Curve (AUC) which integrates the true positive rate for a
The field of intrusion detection is broad, and over the last two decades research has been
devoted to the design and evaluation of effective intrusion detection methodologies. The
concept for intrusion detection starts from the seminal work of Anderson [17] and
Denning [24], which has laid the foundations for the design of numerous detection
systems. Denning proposed a general framework for detecting attacks against computer
systems by modelling normal behavior patterns generated by users of the system. Since
then, a number of intrusion detection systems were designed and deployed as surveyed
by Mukherjee et al. [43]. Later, they were evolved as Host-based Intrusion Detection
System (HIDS).
As discussed in Section 2.2, the standard taxonomy for intrusion detection systems
34
identifying and understanding novel attack behaviors. An anomaly detection approach
performs detection of patterns in two steps, namely training and testing. In the training
depending upon the type of anomaly network intrusion detection considered. Whereas,
in the testing phase, once the normal model for the system is available, it is compared
with the observed traffic. If the deviation found exceeds a given threshold, an alarm will
be triggered [44].
In this section, we review the detection techniques at two different levels. At the first
level, we go over the existing intrusion detection techniques from a general perspective,
second level, we concentrate on work which utilizes the techniques similar to what we
According to Patcha and Park [45], and Garcia-Teodore et al. [46], anomaly detection
techniques can be classified into three main categories, namely, statistical based, data-
mining based (or knowledge-based) and machine learning-based. Chandola et al. [33]
also reviewed anomaly detection methods and discussed several application domains
including credit card fraud, image processing and computer security. In this section, we
present a brief review on a number of architectures and methods that have been
35
1. Statistical-based Anomaly Network Intrusion Detection Techniques
In the statistical-based technique, the network traffic activity is captured and a profile
representing its behavior is created. This profile is typically based on metrics such as
traffic rate, number of packets for each protocol, and audit record distribution measure.
Two datasets of network traffic are considered. One corresponds to currently observed
profile over time, and the other is for the previously trained statistical profile. As the
network events occur, the current profile is detected and then compared against the
normal profile. An anomaly score is generated, which represents the degree of similarity
for a specific event. When the score exceeds a certain threshold, an alarm is raised.
Danning and Neumann [24] used univariate models to model the parameters as
Their model considered correlations between two or more metrics and showed a better
systems do not require prior knowledge of security flaws and/or the attacks themselves.
As a result, such systems have the capability of detecting “zero-day” attacks or the very
balance the likelihood of false positives with the likelihood of false negatives. In
36
addition, statistical methods need accurate statistical distributions but not all behaviors
can be modeled using purely statistical methods. Furthermore, most of these schemes
detection system. In the early 1980, scientists at the Stanford Research Institute (SRI)
monitored user behaviour and detected suspicious events as they occurred. They also
developed an improved version of intrusion detection expert system called the Next-
generation Intrusion Detection Expert System (NIDES) [50]. NIDS can operate in real-
time for continuous monitoring of user activity or can run in a batch mode for periodic
analysis of the audit data. Unlike intrusion detection expert system, which is an anomaly
detection system, NIDS is a hybrid system that has an upgraded statistical analysis
engine. By having the benefit of real-time detection ability, this system has high false
alarm rate.
Kruegel et al. in [51] showed that it was possible to use payload byte distribution and
then combined this information with extracted packet header features for intrusion
detection. In this approach, the resultant ASCII characters were sorted by frequency and
then aggregated into six groups. However, this approach leads to a very coarse
Intrusion detection systems discussed in the previous section are host-based intrusion
detection systems. In other words, they do not have ability to defend a network in a
global term. Since then, a great amount of research has been undertaken towards the
37
design and development of network-based intrusion detection systems. Maxion et al.
[52] proposed a network-based intrusion detection systems but they did not consider
dynamic behavior of network traffic and the intrusion detection systems did not fit in an
experimental environment.
Mahoney et al. [53, 54], described several methods that addressed the problem of
detecting anomalies in the use of network protocols by inspecting packet headers. They
used DARPA 1999 dataset for their experiments. The main aim of these methods was to
protocols at different layers. Mahoney et al. proposed PHAD (Packet Header Anomaly
Detector) [53], LERAD (LEarning Rules for Anomaly Detection) [54] and ALAD
features from the Ethernet, IP and transport layer packet headers. ALAD models
incoming server TCP requests. ALAD uses source and destination IP addresses and port
numbers, opening and closing TCP flags, and the list of commands (the first word on
each line) in the application payload. Depending on the attributes, ALAD builds separate
models for each target host, port number (service), or host/port combination. LERAD
also models TCP connections. Although the dataset is multivariate network traffic data
containing fields extracted from the packet headers, both ALAD and LERAD methods
break down the multivariate problem into a set of univariate problems and sum the
weighted results from range matching along each dimension. While the advantage of this
detecting network intrusions, breaking multivariate data into univariate data has
38
More recently, analytical studies on anomaly detection systems have been conducted.
Lee and Xiang [56] used several information-theoretic measures, such as entropy and
system parameters, and build models. These metrics help to understand the fundamental
Snort [29] pre-processor plug-in. In SPADE, the basic features are used to build a
normal traffic distribution model for the monitored network. Traffic distributions are
Machine learning techniques are based on establishing an explicit or implicit model that
analyses patterns and classifies them into the normal or malicious categories. A
noteworthy characteristic of these schemes is the need for labeled data to train the
In many cases, the applicability of machine learning principles coincides with the
statistical techniques, although the former focuses on building a model that improves its
performance on the basis of previous results. A machine learning ANIDS has the ability
to change its execution strategy as it acquires new information. The main drawback of
this technique is their resource expensive nature. Many machine learning techniques are
39
used in intrusion detection research field. A brief review on these techniques is given
here.
System call and sequence analysis are widely used techniques for anomaly detection.
Forrest et al. [38] established an analogy between the human immune system and an
intrusion detection system, and proposed a methodology that involved analysing system
More recent research based on call sequence approach is presented in [58], [59]. In
[58] authors proposed an anomaly detection method based on statistical inference and an
α-stable first-order model. Whereas in [59], authors used sequences and the parameters
of the system calls executed by a process to identify anomalous behavior of the system.
schemes. High computation and results depend on behavior of the target system.
Principal Component Analysis (PCA) [61] is a technique that is used to reduce the
technique where n correlated random variables are transformed into d < n uncorrelated
variables. This makes it possible to express the data in a reduced form, thus facilitating
the detection process [62]. Shyu et al. [63] proposed an anomaly detection scheme,
where PCA was used as an outlier detection scheme and was applied to reduce the
40
Markov model/A Hidden Markov model [64] is a statistical model where the system
Component Analysis (ICA) technique for feature extraction, and outlier detection
More recently, research in the field of back track attack is proposed in [66] and [67].
In [67], author proposed a Flexible Deterministic Packet Marking approach to find real
source of Internet attackers. FDPM uses flexible flow based marking scheme to detect
DDoS attacks launched on the Internet. In [66], two effective anomaly-based detection
Data mining is the ability to take data as input, and pull out patterns from the input data
or deviations which may not be seen easily to the naked eye. It is also known as
knowledge discovery. Data mining has been used for host-based and network-based
Lee and Stolfo [68] explored the application of different data mining techniques in
the area of intrusion detection. In [69], Lee and Stolfo used multiple data mining
framework for intrusion detection. They also introduced a feature construction system
41
for the classification, which categorized the connection based features into low-cost and
high-cost features in terms of their computation time. Thus, different features were
algorithm such as RIPPER. Lee and Stolfo further extended their previous work in [70],
where they applied association rules and frequent episodes to network connection record
to obtain additional features. RIPPER was applied on the labeled attack traffic to learn
the intrusion pattern. Barbara [71] proposed ADAM, which used applied association
rules.
Bridges and Vaughn [72] used traditional rule-based expert system for misuse
detection. They also contributed to anomaly detection using fuzzy logic and Genetic
Algorithms (GA). They created fuzzy association rules from the normal dataset, and also
built a set of fuzzy association rules from the new unknown dataset, and then compared
the similarity between the two groups of rules. If the similarity is low, it indicates a
possible intrusion in the new dataset. As stated by Bridges et al. [72], the concept of
security is fuzzy in itself. In other words, the concept of fuzziness helps to smooth out
the abrupt separation of normal behaviour from abnormal behaviour. Dickerson et al.
[73] developed the Fuzzy Intrusion Recognition Engine (FIRE) using fuzzy sets and
fuzzy rules.
Genetic Algorithm (GA) [74] has been used for tuning the membership function of
the fuzzy sets and to select the most relevant features. Basically, GA is used to give
rewards to a high similarity of normal data and reference data, and penalize a high
similarity of intrusion data and reference data. The major advantage of GA is its
42
Clustering technique was used for finding patterns in unlabeled data with many
dimensions. It is gaining popularity in the context of intrusion detection [75]. The main
advantage of clustering is the ability to learn from and detect intrusions in the audit data,
without requiring the explicit descriptions of various attack classes/types. As a result, the
amount of training data that needs to be provided to the anomaly detection system is also
reduced. The outlier detection scheme has been widely used for anomaly detection. An
outlier can be identified using statistic features, distance, density and clustering
techniques.
Mahoney and Chan extracted the features from the packet headers and clustered these
features to build normal profiles. They classified connection which did not fall in any
cluster as outlier. Taylor and Alves-Foss used less features. They extracted features from
packet headers that were used to build the clusters. Each feature is treated as a variable,
and each connection was abstracted to a point with multiple variables (features). The
nearest neighbour algorithm was used to compute the distance for the outlier detection.
Other data mining approaches, such as neural network [76], were also explored for
intrusion detection. Using neural network approach for intrusion detection, the neural
network learns to predict the behavior of various users and daemons in the system. The
main advantage of neural networks is their tolerance to imprecise data and their ability
to infer solutions from data without having prior knowledge of the data. However, neural
network based solutions have several drawbacks. Firstly, they may fail to find a
satisfactory solution. Secondly, neural networks can be slow and expensive to train.
Ramadas et al. [77] presented the anomalous network-traffic detection with self
organizing maps (ANDSOM). The ANDSOM module creates a two dimensional self
43
organizing map for every network service that is being monitored. Anomaly detection
schemes also involve other data mining techniques such as Support Vector Machines
(SVM). Because data mining techniques are data driven and do not depend on
been very successful in detecting new kinds of attacks. However, these techniques often
have very high false positive rates, resulting in a major challenge for the data mining
The use of anomaly detection techniques in the context of network intrusion detection
attack behaviours. Network intrusion detection system can extract information from the
packet header, packet payload or both. Header information is not helpful in detecting
attacks against vulnerable applications (since the connection that carries the attack is
established in a normal way). On the other hand, payload information is most suitable to
that 69% of vulnerabilities were caused by web services, and it was reported in [1-3] that
75% of cyber attacks occurred at the application layer. Thus, organizations rely more
networks. In this section, we concentrate on work which utilizes the techniques similar
44
We present a brief review on the design techniques used by PAYL, SOM,
advantage of Kohonen‟s Self Organising Map (SOM) is the ability to add new inputs
into patterns that it has already discovered. However, its original function was to
compress data.
Kruegel et al. [15]. They used payload to generate model and grouped 256 ASCII
characters present in the payload into six segments that was 0, 1-3, 4-6, 7-11, 12-15 and
In 2004, Wang and Stolfo [80] proposed PAYL (PAYLoad intrusion detection
system), a state-of-the-art system, which used the combination of type, length and
distribution to detect anomalous events. They developed the system using Byte
language parser and a method to predict the next sequence in a dataset. BFD is the total
number of n-gram occurrences, the values that are identified in sampling of payload
data. PAYL uses the BFD and standard deviation to compute an anomaly score, which
defines the similarity between attacks. Simplified Mahalanobis distance measure was
used to compare new incoming traffic to the model. The system was evaluated against
the 1999 Lincoln Lab IDEVAL dataset. The overall detection rate was close to 60% with
a false positive rate less than 1%. PAYL uses whole payload. However, PAYL does not
45
consider relative position of different bytes inside the payload into account so that the
Bolzoni et al. [81] proposed POSEIDON, a two tier payload based anomaly intrusion
detection system. In this system, payload length and frequency distribution were
architecture, SOMs was used for pre-processing of packet payload and PAYL was used
as a basis for intrusion detection. The SOMs mapped high-dimensional data points onto
a single or multi-dimensional grid. The aim of the SOM was to identify similar payloads
for a given destination address and port. SOM improved detection accuracy.
systems. They require manual updating from network administrators. Hence, the ideal
approach is to employ unsupervised learning, which does not require human interaction
and to have the systems initially setup and then run autonomously. Anomaly-based
To model the structure of payload, Wang and Stolfo proposed ANAGRAM [82].
ANAGRAM uses n-grams extracted from payloads using a sliding window of length n
to create unique signatures. Wang et al. [80] used value of n ≥ 2 to extract byte sequence
memory overhead and required training set size as n increases, the authors utilized
Bloom filters to record n-grams observed from packet payloads during the training
phase. They used supervised learning process to model normal traffic by storing n-grams
of normal packets into one bloom filter and modeled attack traffic by storing n-grams
46
from attack traffic into a separate bloom filter. During the detection phase, packet
payloads were scored according to the proportion of n-grams observed that were not
contained in the Bloom filter. The major difference between binary n-gram analysis and
1-gram analysis is that the latter has limitations and can be easily replicated using
more difficult to construct an accurate model because of the curse of dimensionality and
possible computational problem. Perdisci et al. [83] proposed McPAD (Multi classifier
used a sliding window to cover all sets of 2 bytes, ν positions apart in a network traffic
payload. Since each byte could have values in the range 0-255, and n=2, the
dimensionality of the feature space was very high (2562=65,536). The high
dimensionality of the feature space was then reduced using a clustering algorithm. They
combined multiple classifiers using a simple majority voting rule to make their model
Rieck and Laskov [84] also extracted language features in the form of high-order n-
grams from connection payloads. They compared high order n-grams and words in
connection payloads using vectorial similarity measures such as kernel and distance
functions.
uses String Equality (SE), Longest Common Substring (LCS), and Longest Common
47
Subsequence (LCSeq). These techniques correlate attacks using ingress/egress signature
matching.
technique. However, a major difference is the system correlates attacks based on user
requests that employ higher-level applications. The system is engineered to reduce false
Pruning to Produce Error Reduction (RIPPER) and Support Vector Machine (SVM).
RIPPER uses IF-THEN rules to predict a class and SVM classifies input features.
However, all these systems have not considered correlations between the payload
features and among the payloads. In contrast, we propose a novel approach to develop
model for packet payload to detect anomalies in the packet payloads. Each network
recognized through image processing), and each image will be viewed as a pattern to be
classified as normal or anomalous traffic class based upon the given information about
the connections. This model includes the correlation between various payload features
and increases the detection accuracy. We use Mahalanobis Distance Map (MDM)
technique to determine the hidden correlations between payload features and to calculate
the difference between normal and anomaly traffic of network. For feature reduction, we
propose to use Linear Discriminant Analysis (LDA) and Principle Component Analysis
(PCA) techniques. This will reduce the computational complexity. In addition, it will
improve time to train and test the model and improve detection accuracy as well.
48
2.6 Conclusions
In this chapter, we have presented various security strategies used to mitigate attacks and
protect the system from inside and outside attack as well from mis-configuration of the
system. We have then presented the taxonomy for intrusion detection systems. We have
shown how an intrusion detection system can be classified on the basis of the data
source that it analyses and the detection model that employs, and their strength and
limitations. We have discussed here the performance evaluation matrices for an intrusion
We have presented literature review in the perspective of techniques used for the
Anomaly-based systems have been extensively researched but there are still open
issues that limit the application of an anomaly-based intrusion detection system in real
based anomaly systems are unable to detect attacks against vulnerable applications
(since the connection that carries the attack is established in a normal way). There is a
need to design accurate payload-based intrusion detection system to protect the network
from web-based attacks. We will address this issue in the next chapter and provide a
frame work for payload-based intrusion detection system, which can detect attacks more
49
accurately. Chapter 3 and Chapter 4 discuss proposed solutions to these problems,
50
CHAPTER 3
Introduction
attacks in a network. Intrusion detection system appears to be one of the most effective
solutions for defending networks against malicious users. Sophisticated IDSs generally
fall into two categories: misuse detection (or signature detection) and anomaly detection.
Network based anomaly detection can be applied at a packet header level, packet
payload level or both to process the network traffic. An IDS that simply analyses packet
header information cannot adequately secure a network from malicious attacks. The
alternative is to perform deep-packet analysis and explore the payload of each incoming
51
SOM, POSEIDON, PAYL, Anagram and McPAD are examples of the payload-based
anomaly intrusion detection systems. Most of these research except Anagram and
McPAD use 1-gram analysis procedure in building a statistical model for certain types
of data based on the byte frequency and do not have structural information of the
payload features. However, all of these IDSs have high false positive rates since
dependencies and correlations of the features are intrinsically neglected. The goal of this
detection rate with a relatively lower false positive rate by using an image processing
technique. We intend to use the Mahalanobis Distance Map (MDM) approach, which is
idea that the geometric structures of all human beings are similar although they wear
clothes of different colours. Our work is motivated by this idea and the MDM is used to
the hidden correlations between features and the correlations among network packet
and structural information help improve the detection performance and reduce false
positive rate.
In the previous chapter, we have reported the relevant work of anomaly detection
approach. However, this approach has many issues. We discuss these issues in the
following section.
High number of false alarms, especially false positive alarms, is the key issue of the
anomaly detection system. This high false positive rate is due to the fact that an anomaly
52
systems are highly dependent on the quality of the training data. Training data, which
may contain attack samples, can leave an anomaly detection system non-functional
because the attacks will be learned as normal traffic and IDS will never produce an alert
related to them. Additionally, the lack of sufficient training samples degrades the ability
environments. New attacks and services appear every day and IDS must be able to cope
up with the new environment. Finally, anomaly detection systems have poor descriptive
The chapter is divided in two parts. In the first part, we present detailed information
second part, we first discuss the implementation of the GSAD model in the HTTP
successful HTTP requests that use the GET method and contain the query component.
Malicious inputs can be sent to a web application using the parameter-value portion of
the query.
The more detailed organisation of this chapter is listed as follows. Section 3.1
approach. Experimental results and discussion are presented in Section 3.2. Section 3.3,
provides a brief introduction of HTTP and some examples on web-attack. In Section 3.4,
53
describes the experimental setting and dataset used on which GSAD has been evaluated.
Experimental results and analysis are given in Section 3.6. Finally, we summarize our
payload-based IDS, to detect intrusion in the network. This model uses an image
processing technique that has been used for human face recognition [87]. Each network
normal or anomalous traffic based upon the given information about the connections.
Similar to other anomaly detection systems, GSAD models the normal behavior of the
network traffic rather than the malicious ones. Moreover, the most significant
In this section, we elaborate on our new approach. Firstly, we present the framework of
our GSAD system. Then, we discuss the modules in the framework, namely Payload
We present the framework of the GSAD, a payload based anomaly detector derived
from an image processing technique. The complete framework of our proposed intrusion
54
detection system (GSAD) has three stages as shown in Figure 3.1. In this figure, solid
arrows indicate data flows inside the GSAD model. The Payload Feature Analyst
module and the Payload Geometrical Structure module together form the Geometrical
given below.
The first stage of GSAD consists of payload classification and data preparation. For
data preparation, raw data are collected from the network input, e.g., tcpdump file,
In the second stage of the framework, payload features are analyzed using n-gram
text categorization technique, which converts the network traffic packet payloads into a
series of feature vectors. These feature vectors describe the patterns of the incoming
traffic. Correlation between the payload features is calculated and an MDM is created
for normal network traffic as a normal profile, which is used for the classification of the
In the third stage, Mahalanobis Distance [88] criterion is used to measure the
dissimilarity between the pre-developed normal profile and the profile of a new
incoming network packet. Score value is calculated, which is used for classification of
normal and malicious payloads and for the activation of the alarm. Detailed description
55
Feature Filter
Network Destination Address
Traffic Dataset
Services
1- gram model
Geometrical
Structural Model
Classifier
Alarm
Acknowledgement
Attack Recogniser
In this section, we provide a step-wise description and technical details of all modules
56
a) Payload Feature Classifier
Payload feature classifier is the first stage of the framework, where different datasets are
prepared. When a packet is received from GSAD, the payload is extracted. We group
network traffic into various categories using Wireshark [89], which is a traffic analyzer
and separates the network traffic based on type of services, destination address, payload
length and direction of network traffic flow. The source of network traffic can be real
network (for real-time operation) or collected tcpdump files. The prepared dataset is
The payload feature analyst is the first key constituent of the Geometrical Structure
Payload Model (GSPM). For feature extraction, text categorization technique [40] is
used, which is responsible for payload feature analysis and feature construction. It
extracts raw features using n-gram (i.e., the sequences of n consecutive bytes in the
payload) text categorization technique (n=1 in our case) from the packet payload and
feature vector in a 256-dimensional feature space. The mathematical model for 1-gram
The 1-gram payload model is a payload based statistical model, which does not take
network packet header features into account. In addition to average frequency of each
ASCII character (0-255), it calculates the mean value and the standard deviation of each
57
feature‟s frequency and correlations between these features. Each payload is represented
, (3.1)
where Oi is the occurrence of i-th n-gram. The overall value of the relative frequencies is
given by
. (3.2)
feature space. Here, T stands for „transpose‟ of a matrix. We assume that there is a
network traffic dataset with n network packets. The mean value and standard deviation
of each byte‟s frequency are described in Equations 3.3 and 3.4 respectively.
(3.3)
(3.4)
where and are the mean value and the standard deviation of each feature‟s frequency
and given by
(3.5)
(3.6)
The mean value and standard deviation vectors, and , are stored in a model M. The
network traffic dataset consists of traffic generated by the various network services.
Therefore, we need to filter network traffic for a selected service based on the following
58
features: size of payload, destination address, services and direction of traffic flow. Then,
models are developed according to types of service and extracted group of features.
Payload Geometrical Structure is the second key constituent of GSPM, used for payload
analysis. It includes techniques to determine where and how to separate data input prior
approach [87] develops geometrical structure models of the payloads. The following
The Geometrical Structure Model (GSM) is a new concept in intrusion detection system.
Network traffic profile is generated using Mahalanobis Distance Map (MDM) which
captures complex non-linear correlations of the data. By using MDM, the hidden
correlations between the features (256 ASCII characters) present in the payload of
, (3.7)
, (3.8)
, (3.9)
59
Where represents the i-th feature in the feature vector denotes the average of
each feature, defines the Mahalanobis distance between the i-th feature and the j-th
feature, is the covariance value of each feature, and finally D is the MDM, the image
of a network packet (the pattern of a network packet payload). D is a 256×256 image that
represents the distance from one block to another block. Distance map D is used to
generate the network traffic profiles (normal and attack) of the training and test data.
These profiles are used for the classification of incoming network traffic.
The above basic formulas are used in the GSM model to process a large amount of
sample network traffic with normal behaviors. To include the variations in different
normal payloads, we consider all normal traffic as a group. The distance maps of normal
behaviors for all traffic in the groups are calculated using Equations 3.7 and 3.8 and
shown in Equation 3.9. We assume that the group has m normal packets inside. Then, the
distance maps of individual normal packets are: ,…, , and the averages and
variances for all elements of the distance maps are computed by Equations 3.10 and 3.11.
, (3.10)
(3.11)
In the attack recognition phase, an incoming packet follows the same preprocessing
procedure (as discussed in Equations 3.7 to 3.11) to construct its Mahalanobis Distance
Map:
60
(3.12)
f) Attack Recogniser
Ideally, the attack recognition phase is to identify the difference between the normal and
abnormal patterns. In this study, GSAD uses Mahalanobis distance criterion to measure
the dissimilarity between the developed profile and new incoming traffic profile and
recognize the attacks. It compares each incoming packet payload profile against the pre-
developed base-line normal payload profile to calculate weight factor score. Weight w is
, (3.13)
If the weight factor w exceeds the threshold, the incoming packet is considered as an
intrusion. This weight factor score value is used for the payload classification and for the
While setting the threshold is entirely subjective, ultimately it should be set to capture all
used to determine the threshold value. A series of experiments are conducted for
threshold. We consider a threshold value between -3δ and +3δ in our experiments for
achieving optimal detection rates and low false positive alarm rates. Assuming the
distribution is normal, three standard deviations account for 99% of the sample
61
population are studied. The incoming request is considered as an attack or a threat if the
h) Alarm Acknowledgement
In this module, the attack alarm will be generated if the score of a test packet payload is
larger than the threshold and is then reported to the administrator. Otherwise, it will
Training of the GSAD model is required to generate base-line profile of network packet
(application) behavior and evaluate whether intrusion detection systems truly identify
known and unknown attacks. To do this, anomaly-based IDSs will use normal data to
learn the behavior of network packets during training and generate a normal profile. In
the training stage, the distance maps of all sample images are constructed using Equations
3.8 and 3.9. Let us assume that there are m normal packet images. Then, the average
distance of element (i,j) is computed using Equation 3.10 and the average
profile of network packets for an application. This profile is saved as a base-line profile
(GSAD model) and is used for the testing purpose of each new incoming packet.
Testing is a critical process necessary to evaluate the capability of an IDS and to identify
62
(as described in Section 3.1.3). Then, we calculate weigh factor w and compare it with the
pre-determined threshold value. If value of „w‟ is greater than the threshold value for
In this section, we report experimental results. We present the results based on the
accuracy of GSAD first. We evaluate GSAD on DARPA 1999 dataset [90]. Although the
DARPA 1999 dataset is criticized by McHugh [7] for many of its weaknesses, it is the
In the following subsections, we first present our experimental environment and brief
information on the dataset, and then we discuss the training and testing in our model.
3.33 GHz 8MB cache Quad Core Xeon CUPs and 48GB DDR3-1333 ECC memory.
The DARPA 1999 IDS dataset was collected at MIT Lincoln Labs to evaluate intrusion
detection systems. Entire network traffic was recorded in tcpdump format. The dataset
consists of three weeks of training data and two weeks of test data. In the training data,
63
there are two weeks of attack-free data and one week of data with labeled attacks. These
attacks are grouped into five classes as scan or probe, DoS, R2L, U2R and data. A Series
of experiments on DARPA 1999 [90] dataset are conducted to evaluate the performance
For our experiments, we consider inbound TCP traffic, extracted from the week 1,
week 2 and week 3, which was captured between routers and victims. We use Wireshark
[89], a packet analyzer, to extract TCP traffic. First 150 bytes of packet payload are used
for experiments and to identify various attacks coming through port 80 and port 25.
features, and generate MDM, which represents pattern in the payload (as discussed in
Subsection 3.1.2), and generate a MDM profile as described in Subsection 3.1.3 and
save it as a base-line model. We train our model on the DARPA dataset using week 1
and week 3 attack-free data. The model is then evaluated using week 2 data which
detecting three types of attacks [90],[91] namely Crashiis attack, Back attack and
Mailbomb attack. A brief explanation of these attacks is given in the following section.
For port 80, the attacks are often malformed HTTP requests and are very different
from normal requests. For instance, Crashiis Attack is a Denial of Service attack
against the NT IIS web server. The attacker sends a malformed GET request via
64
telnet to port 80 on the NT victim. The command "GET ../.." crashes the web server
Back Attack is a DoS attack against Apache web server, where client sends an HTTP
requests “GET ///////////….” with more than 6000 slashes, these requests will slow
Mailbomb Attack is a simple attack where an attacker floods a user‟s mailbox with
Figures 3.2(a)-(c) show the average relative frequency of each byte for normal HTTP
payload, Crashiis attack payload and back attack payload respectively. In Figures 3.2(a)-
(c), payload features (ASCII characters) are plotted on X-axis and relative frequency of
each byte in the payload on Y-axis. The MDM (geometrical structure model) of normal
HTTP payload, Crashiis attack payload and attack payload are given in Figures 3.3(a)-
(c) respectively. MDM presents the correlations between the features. The MDMs in
Figures 3.3(a)-(c) demonstrate that the correlations between normal features are different
65
(b)
(c)
Figure 3.2: Average relative frequency of each byte. (a) normal HTTP payload; (b) crashiis
attack payload; (c) back attack payload
It can be seen from Figures 3.2(a)-(c) that the average relative frequencies of bytes
appearing in the normal payload and various attack payloads are very different. For the
Crashiis attacks, the “.” character has the highest frequency whereas other characters
share equal frequencies. Relatively, the statistical nature of the back attack is totally
different and it has a perfect match with the signature. Around 98 percent of characters
66
(a)
(b)
(c)
Figure 3.3: Average MDM images. (a) normal HTTP payload; (b) crashiis attack payload;
(c) back attack payload
67
Results reported in Figures 3.3(a)-(c) demonstrate differences between the normal MDM
image and various attack MDM images. Therefore, we have strong evidences to
We use weight factor scores to classify normal and attack packets. Results for the
weight factor scores for normal HTTP request packet and Back attack packets are
presented in Figure 3.4. In the Figure 3.4, X-axis represents the number of test packets
and Y-axis represents weight factor score values. This is clear from Figure 3.4 that the
GSAD model is able to detect different types of attacks without any prior knowledge of
the attacks.
(a)
(b)
Figure 3.4: Weight factor scores. (a) normal HTTP request packets; (b) back attack packets
68
From Figures 3.4(a)-(b), we can conclude that the weight factor score for normal
packets is much smaller than the weight factor scores of back attack packets.
We use the Receiver Operating Characteristic (ROC) curve method to evaluate the
performance of GSAD model on DARPA 1999 dataset. The ROC curve shows
relationship between false positive rate and detection rate. ROC curve is shown in Figure
3.5. To show more clarity of results, the X axis and Y axis values are shown in the range
[0.8e-03, 1.5e-03] and [0.65, 1] respectively. Our model could achieve 100 percent
detection rate with a very low false positive rate of 0.087 percent on DARPA 1999
dataset.
Figure 3.5: ROC Curve for the accuracy of the GSAD model
various types of attacks and discriminating normal and attack packets. This is clear from
the geometrical structure models which explain the correlation among 256 ASCII
69
To further investigate the attack detection accuracy and detecting a variety of web-
based attacks, we implement our framework in HTTP environment. In the next section,
With the popularity of the internet, more and more systems are subject to attack by
intruders. 23 million attacks have been reported in HPDV Lab survey report [5].
Specifically, web applications continue to pose one of the biggest risks to company
networks, often due to vulnerabilities present in the systems. Thus, securing a network
server is an important but difficult task. This has motivated us to extend our proposed
GSAD model for anomaly detection on HTTP requests sent to web servers. In this
HTTP traffic and investigate detection accuracy of our GSAD model in HTTP
environment.
3.3.1 HTTP
2616 [92]. The protocol operates on a client-server mode. While a HTTP request is a
string, it is also structured. HTTP has become the universal transport protocol for almost
all kinds of web-server applications. HTTP passes through most of these firewalls, with
little or no trouble at all. Hence, web application developers started using HTTP as a
70
transport protocol for their new software. Some of the examples for this include
tunneling secure shell connections, Microsoft RPC for accessing Exchange (email)
servers, etc. The creation of all these new services over the HTTP protocol creates
additional opportunities for the intruders [93]. They also create a lot of variations in the
characteristics of the HTTP requests. The vast majority of requests use GET, but other
methods, such as HEAD and POST, exist, and extensions to the HTTP standard define
more.
containing a number of parameter names and their respective parameter values. RFC
2616 [92] defines the structure and the syntax of a request with parameters (Figure 3.6).
The method used in this example is GET. The fields of our interest are: the presence
of a path, a number of parameter names and their respective values (in Figure 3.6, the
parameter names are „name‟,„file‟ and „sid‟, and their respective values
are:“New”,“Article” and “25”). The set of parameters is finite. A value can be any
string, though not all of the strings will be accepted. Since no type is defined, the
semantics of each parameter is implicitly defined within the context of the web
application and such parameters are usually used in a consistent manner (i.e., their
syntax is fixed).
71
3.3.2 HTTP Attack Examples
As we know that attackers introduce bugs into code, the diversity of attacks against
HTTP is high. This diversity implies that knowing one attack provides no assistance and
information about the structure of the next one. Here, we present some of the examples
on HTTP attack. Indeed, most existing attacks are only one request long. HTTP is a
stateless protocol that contains structure useful in separating high- and low-variability
One famous attack is “Nimda worm”, as shown in Figure 3.7, which is present in the
resource path [94]. It is capable of affecting both the clients that use any version of
Windows as the host operating system and also the servers running Windows NT or
2000.
This attack targets a collection of bugs in Microsoft systems, and uses the fact that the
default configuration enables the attacker to exploit the vulnerability. This attack has
several known variants, all of them exploiting the vulnerability in the Windows
operating system.
Another popular attack type is the Cross-site scripting attack [95]. This vulnerability
enables the malicious user to use a web application program to inject code, most likely
72
as client (browser) side scripts, into the web pages viewed by a lot of other users, who
then become the victims of this attack. The malicious user causes a legitimate web-
server to send a response page to a client's browser that contains malicious script or
GET
////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////
Figure 3.8: Back attack, 300 /s, extra /s have been deleted to save space
The GSAD framework proposed in Section 3.2 is very general and can be easily
customized by adding domain specific knowledge as per the specific requirements of the
The analysis technique uses the particular structure of HTTP queries that contain
parameters. GSAD uses knowledge related to the application layer protocol by mapping
HTTP payloads into 256 features space. Each feature represents the occurrence
frequency of ASCII character in the HTTP request payload of one of the 256 possible
byte values. A simple model of normal HTTP traffic is then constructed using the MDM
73
In our experiments, we parse 150 bytes of HTTP GET request payload by using a
sliding window of 1 byte length and count the occurrence frequency of each feature in
the payload. The HTTP GET request payload is represented by a pattern vector in a
256–dimensional feature space. A profile is created for HTTP GET request payload
using Equations 3.1 to 3.12 as explained in Section 3.2. Then, we design a number of
environment.
In this section, we report the results of our experiments. We first present the results based
on the accuracy of GSAD in HTTP environment and then compare it to PAYL [80] and
McPAD [83]. We evaluate GSAD on DARPA 99 dataset [96] and Georgia Institute of
Technology Attack Dataset (GATECH) [83, 97]. In the following subsections, we first
The experimental setup used in this section is similar to that explained in Section 3.2.1.
Code for the implementation of GSAD in HTTP environment is written using Matlab
2009b.
Assumptions
Following assumptions are made to evaluate the robustness of GSAD model in HTTP
environment.
74
The attackers may have some information about the IDS deployed in the
organization, but do not have access to legitimate traffic from the same network
Polymorphic attacks used by the attacker are not specific to normal behavior of the
attacked network.
within a network.
3.5.2 Datasets
The following subsection describes characteristics of two datasets that we use in our
experiments. These two datasets are used by the state-of-the-art payload-based IDSs that
DARPA 1999 dataset is the only publicly available, large and well labeled dataset, and
is still the most widely used public benchmark for testing intrusion detection systems. In
The GATECH attack dataset is also publicly available and contains traces of real attack
traffic. This dataset is a collection of 63 attack requests. Attacks are collected from
various online security forums and other online sources. GATECH is a labeled dataset
75
and has several non-polymorphic HTTP attacks provided by Ingham and Inoue [97] and
several polymorphic HTTP attacks generated by Perdisci et al. [83] using both the
polymorphic engine CELT and Polymorphic Attacks. The GATECH attack data set is
divided into four groups of attacks used as test data, namely Generic attacks, Shell-code
attacks, CLET attacks known as Polymorphic attacks and Polymorphic blending attacks.
The total number of attacks and attack packets in the dataset are 6,512 and 72,539
respectively. All 66 HTTP generic attacks and 205 total HTTP request attack packets
In this section, we present a detailed description of the experiments conducted using our
GSAD model to detect various attacks coming through HTTP services. The DARPA
1999 dataset is used for constructing a normal profile of HTTP protocol. For attack
traffic, the HTTP-related attacks contained in DARPA 1999 dataset and GATECH
A series of experiments are conducted for the training and testing of the GSAD model
During the training phase, an average normal profile is generated using a geometrical
distance algorithm for HTTP GET request, and then the weight factor score and
threshold are calculated. During the test phase, similarities between the new incoming
HTTP request and the average profile of the HTTP requests are calculated using MDM
and a weight factor. The weight factor is used to determine whether the incoming packet
76
is an attack packet or not. We configure several model settings to optimize the
weight factor becoming infinite due to possibility of the variance δ2nor (I,,j) being equal to
zero. The smoothing factor γ reflects the statistical confidence of the sampled training
data. The larger the value of γ is, the less confidence of the samples truly representing
the actual data is. Analysis of samples during the training phase for different values of γ
Selection of threshold value is very important for the evaluation of IDS as this
directly impacts the performance of the IDS. According to Bolzoni & Etalle [81], a
lower threshold yields more alarms, significantly raising the false positive rate. In
contrary, a higher threshold yields lower alarms and thus would lower the false
positives. While setting the threshold is entirely subjective, ultimately it should be set to
appropriate criterion used to determine the threshold value. A series of experiments are
value of threshold. We consider a threshold value between -3δ and +3δ in our
experiments for achieving optimal detection rates and low false positive alarm rates.
Assuming the distribution is normal, three standard deviations account for 99% of the
threat if the weight factor is more than +3δ or less than -3δ.
77
Experimental Results
We conduct experiments on training and test dataset. In the first part of our experiments,
we present the model generation for normal HTTP traffic to host marx and hume. Then,
we evaluate the accuracy of our GSAD model in detecting various attacks, namely Back
attack, Phf attack, Crashiis attack, Generic attacks, Shell-code attacks, Polymorphic
attacks and the Polymorphic Blending attacks coming through HTTP services.
We extract inbound HTTP request traffic of week 1 and week 5 from DARPA 1999
dataset [90] for the training of the GSAD model. The extracted HTTP traffic packets
correspond to normal HTTP requests destined to two different HTTP servers existing in
DARPA 1999 data set: marx (Linux Server) and hume (NT Server). The total numbers
of packets used for training of the model after filtering are 13,933 and 10,464 for hosts
For a specific host, HTTP GET traffic has very similar behavior. In the experiments,
we train the GSAD models on training dataset (10 days normal HTTP GET request
traffic) for hosts marx and hume respectively, and generate an average normal profile for
the HTTP GET request. For normal profile, we generate character‟s relative frequency
model. Figures 3.9(a)-(b) show the relative frequencies of each character (0-255) in the
extracted normal packet payload for the hosts, marx and hume respectively.
78
(a)
(b)
Figure 3.9: Average relative frequency of characters for normal HTTP GET request
payloads. (a) marx; (b) hume
developed. Figures 3.10(a)-(b) show the MDM images of normal HTTP traffic behavior
79
for hosts marx and hume. MDM computes correlation between the packets and also
(a)
(b)
Figure 3.10: Average MDM images of Normal HTTP GET request, (a) marx, (b) hume
In Figure 3.10, X axis and Y axis show the 256 possible features present in a packet
payload. The cross point on the figure represents correlation between two features.
80
b) Model Testing Results on DARPA 1999 Test Dataset
In DARPA 1999 dataset, total numbers of packets after filtering for testing are 783,443
for marx, and 8431 for hume hosts respectively. In weeks 4 and 5 evaluation dataset, 23
We conducted experiments with the extracted test data from the DARPA 1999 dataset
and evaluated the accuracy of our GSAD model in detecting various attacks coming
through HTTP services including Apache2 attacks, Back attacks, Crashiis attacks,
NTInfoscan attacks and Phf attacks. MDM test results for Apache2 attacks and Phf
The behavior of Apache2 attacks in Figure 3.11(a), the behavior of Phf attacks in
Figure 3.11(b) and the normal profile in Figure 3.10 show clear differences in these
profiles. Moreover, the correlations between the features of Apache2 attack and Phf
attack packets are different from the correlations between the features of normal GET
From the MDM images, we can infer visually the differences between the suspected
incoming packets and the normal network traffic involving high link speed and large
81
(a)
(b)
Figure 3.11: MDM images of attack packets. (a) apache2 attack; (b) phf attack
factor score and threshold value are used. We use more than 90,000 Apache 2 attack
packets and more than 2500 phf packets to evaluate our model. For our model, we
choose a threshold value between -3δ and +3δ as this gives the best results. According to
our experiments, the threshold values for hosts marx and hume are [-6.6759e03,
82
Figure 3.12 shows the weight factor score results for Apache2 attacks and Phf
attacks. Here, X-axis denotes the number of packets and Y-axis represents the weight
score values. In Figure 3.12, green color plot shows a higher value of weight factor score
The red and blue lines in Figure 3.12 are the thresholds of hosts marx and/or hume.
They indicate that any packets assigned with a weight factor score beyond them are
classified as intrusions.
In Figure 3.12(b), red line and blue line overlap with each other because of the scale
of Y axis. In our definition of attack detection, an attack is detected as long as one of its
attack packets is identified as abnormal. Based on the experiments, we find that all of the
83
(b) phf attack
Figure 3.12: Weight factor scores of attack. (a) apache2; (b) phf
In this section of experiments, we use a GATECH attack dataset. All HTTP request
attack packets from the attack dataset are used in our experiments. We conduct several
experiments for various types of attacks using GATECH attack dataset. We evaluate the
similarity between the MDM of attack profile with the MDM of normal profile. The
incoming request is considered as an attack or a threat if the weight factor is more than
+3δ or less than -3δ. Results are encouraging. Results of some attacks are discussed in
Generic attacks: This dataset consists of 66 HTTP attacks, plus shell-code attacks (these
attacks carry executable codes in the payload that exploits vulnerability (MS03-022) in
Window Media Service (WMS)). Other attacks cause Information Leakage and Denial of
Service (DoS). Our model could detect around 90% of these attacks. Figure 3.13(a)-(d)
84
(a) Remote GET buffer overflow vulnerability
85
(d) Input validation error attack - NT index server directory travel
Shell-code Attacks: Shell-code attacks are particularly very harmful as they inject
executable code and hijack the normal execution of the target application. The data set
contains 11 shell-code attacks from the Generic Attack data set, such as Code-Red Worm
and Buffer Overflow attacks. Figure 3.14(a)-(b) show MDM results for Code-Red and
86
(b) Get buffer overflow
CLET Attacks: These attacks are generated from 8 shell-code attacks using
polymorphic engine CLET. Polymorphic version of each attack using the payload
statistics was computed on each distinct day of traffic from DARPA and GATECH
datasets for training CLET polymorphic engine. Overall, 96 polymorphic attacks are
present in the data set. Figure 3.15 shows the MDM result for CLET attacks.
Comparing results for various attacks shown in Figures 3.13-3.15, the MDM attack
profiles of generic, shell-code and polymorphic attacks with the MDM normal HTTP
87
request profiles show clear differences in there behaviors. Also the correlations between
the features in these attacks are different from the correlations between the features of
normal HTTP requests on the hosts marx and hume. The X axis and Y axis show the 256
possible features (ASCII characters) present in a packet payload. The cross points in the
The results reported in the previous section show that the Geometrical Structure
Anomaly Detector (Mahalanobis Distance Map and Weight factor score) can detect new
attacks without prior knowledge of the attacks with high accuracy and low false positive
rate.
The results obtained for our model are very encouraging. We have achieved nearly
100% detection rates for DARPA 1999 data set with 0.087% false positive rates. For the
comparison of GSAD model with PAYL, we use the data set used by Wang and Stolfo
[80] for the evaluation of PAYL. The differences in the detection rates are not big but
we have achieved very low false positive rates. Table 3.1 shows a comparison in terms
of detection rate and false positive rate for PAYL and GSAD on DARPA 1999 dataset.
88
Table 3.2 shows comparison of detection rates and false positive rates for PAYL,
McPAD and GSAD on GATECH attack dataset. Researchers of these models have also
used DARPA 1999 dataset for the training and test of their models. These datasets have
similar types of attacks, since the attacks are coming from the same sources (web sites).
Table 3.3 gives a summary of experimental results obtained for different algorithms.
Table 3.2: Comparison of GSAD, McPAD and PAYL on GATECH attacks dataset
Detection Rate
False Positive
Algorithm
Shell code Rates
Generic attack CLET attack
attack
Table 3.3: Summary of experimental results for Generic attacks on various datasets
False Positives
Modeling Approaches Detection Rate Dataset
Rate
89
3.7 Conclusion
In this chapter, we have presented the framework of our geometrical structure anomaly
detection (GSAD) scheme for building an effective intrusion detection system. We have
techniques: 1-gram text categorization technique and the Mahalanobis Distance Map
technique. This approach is based on geometrical structures of the payload features. The
important characteristics of our model are that: it considers co-relation between the
features and some structural information of the payload to build the behavior profile of
the network traffic for intrusion detection. We have compared the performance of our
attacks, we have further implemented GSAD model in the HTTP environment to detect
Our experiments on the DARPA dataset and GATECH attack dataset show low false
positive and high detection rate of our model. Experimental results show better
GSAD model uses 256×256 features for profile generation and discriminating normal
and anomalous packets and thus requires very heavy computational cost. In Chapter 4
and Chapter 5, we will propose solutions to reduce the computational complexity of the
GSAD model. We believe that our model can be used in real-time applications.
90
CHAPTER 4
Introduction
structure and demonstrates its ability to detect network attacks effectively. The detector
to any network service. Various experiments have demonstrated that GSAD can achieve
a high detection rate and low false positive rate for worms and exploits.
However, GSAD uses a large number of features to analyse the hidden patterns of
packet payload and uses these features to discriminate normal and attack patterns present
in the network traffic. It creates computational complexity. It requires large storage and
a long time for training and testing. Furthermore, the intrusion detection system has to
deal with a huge amount of network traffic, increasing computational complexity and
91
overhead. This large feature set and voluminous dataset limits the GSAD to off-line
applications only. Hence, feature reduction becomes mandatory for efficient operation of
system when taking into account the computational complexity and the classification
challenge. There are several methods used by researchers for the selection of header
features and relevance analysis in intrusion detection research [98-103]. However, there
are very few papers that have considered feature selection according to application-layer
payload. GSAD model uses packet payload to detect normal and attack patterns.
features to identify a suspicious attack pattern in the packet payload. This is time
consuming.
In this thesis, we propose to use Linear Discriminant Analysis (LDA) and Principal
Component Analysis (PCA) techniques, for payload feature selection and classification
of normal and attack patterns. LDA [104] is a supervised technique which is used for
selecting important features in a large set of features, whereas PCA [105, 106] is an
unsupervised technique used for analysis of the data, which examines and weights the
two techniques that are used in intrusion detection domain for feature selection of a
packet header have achieved good results. However, these techniques are not used for
the payload feature selection. We propose to use these two techniques for payload
92
feature selection and classification of normal and attack patterns. In this chapter, we
describe the implementation of LDA technique for the selection of payload features and
discriminating normal and malicious patterns in network traffic. Here, we propose our
solutions for feature reduction and identify group of features which can enable an
efficient use of GSAD system for intrusion detection in the network. The detailed
description of PCA technique for packet payload feature selection will be discussed in
Chapter 5.
In this chapter, a new method is proposed, which uses LDA technique and difference
distance map (DDM) [87] to order the potential features for payload feature selection
and to distinguish normal and attack patterns in the network traffic. LDA-based
approach reduces the computational complexity dramatically while retaining the high
detection rates and provides an elegant solution. The rest of this chapter is organised as
follows. Section 4.1 gives brief information on feature selection algorithms. In Section
4.2, we discuss basic concepts of LDA. In Section 4.3, we propose LDA-based intrusion
detection system and a detailed explanation of LDA-based feature selection approach for
intrusion detection. The experimental results and analysis are given in Section 4.4. In
Section 4.5, we present a two-tier intrusion detection system and discuss experimental
results and analysis in Section 4.6. We give a brief information on integrated signature
In machine learning and statistics, feature selection also known as feature reduction, or
variable subset selection is used to select a subset of relevant features for building robust
93
learning models. Intrusion detection system employs feature selection technique for data
noisy data from the dataset. There are two types of feature selection algorithms, namely,
filter and wrapper. Filter method is fast and uses correlation based approach for the
error. On the other hand, wrapper method uses classification algorithm and performs
cross validation to identify important features. Every time a feature is added, induction
very expensive and does not scale well to large datasets that contain many features and
approach is used, that incorporates some of the features of wrapper method into a fast
In recent years, linear methods [98-99] and [104-105] have played a major role in
finding features, that describe classes, build classifiers and assign classes to new feature
which can be solved by many existing algorithms [104-105]. Many linear feature
Discriminant Analysis (GDA) and Linear Discriminant Analysis (LDA), are proposed in
[100], [101], [102], [103], [105] and [109] to reduce the header features of packets.
However, there are very few papers that have considered feature reduction according to
application-layer payload.
94
The early feature reduction approach [51] on payload, developed by Krugel et al.,
grouped the byte frequency distributions of 256 ASCII characters into six bins, namely
0, 1-3, 4-6, 7-11, 12-15 and 16-255. Wang et al. [82] proposed an Anagram detector, in
which Bloom Filter (BF) was used to reduce memory overhead. Nwanze and
pattern hash functions were employed to map the bytes at the packet payload onto a set
of counters which were the selected features used for intrusion detection.
In this chapter, we propose to use the LDA technique for payload feature selection. It
attempts to select the discriminating features from Difference Distance Map (DDM)
between a normal Mahalanobis Distance Map (MDM) and the MDM of a particular type
Discriminant Analysis is a statistical method for obtaining a reduced feature set. LDA is
one of the commonly used dimensionality reduction and data classification techniques
and has been applied in human detection [87], face recognition, speech recognition,
Different from PCA, which extracts features that are the most efficient for
representation but may not be useful for discrimination, LDA selects an optimal
dimensional feature space while preserving the significant information for data
classification. We assume that there are n d-dimensional samples {x1, …, xn} assigned
95
to k different classes. Each class Ci, where i = 1, …, k, has ni samples. Projection matrix
(4.1)
(4.2)
where μ is the sample mean vector of the whole sample set denoted by
(4.3)
(4.4)
The Fisher criterion is defined as the ratio of the variance between classes to the
variance within the classes. Thus, the ratio, J, between the between-class scatter matrix
SB and the within-class scatter matrix SW can be easily maximized by the projection
matrix Ar.
(4.5)
Once the above optimization problem is solved, the classification decision can be
easily made on the low dimensional feature space by projecting the original feature
In this section, we elaborate our new approach. We first discuss the framework of an
LDA-based intrusion detection system. Then, detailed discussion of each block is given
96
4.3.1 Framework of LDA-based Intrusion Detection System
Figure 4.1 presents the framework of proposed LDA-based intrusion detection system. It
has four stages, namely Difference Distance Map (DDM) Generation, LDA-based
involves large computation power and time for classification process. LDA technique is
used to improve the computational complexity of GSAD model and visualizes the
used to select significant features from a Mahalanobis Distance Map (MDM), which is
generated by the Geometrical Structure Model (GSM), a key component of the GSAD
model, described in Chapter 3. Each MDM is used to explore the correlations among
features (ASCII characters) in the packet payload for each single network packet. Then,
selected significant features are used in the detection process, which is conducted on a
(DDMs) must be generated to measure the difference between normal traffic and
particular types of attack traffic, such as the difference between each pair of <Normal,
Phf attack>, <Normal, Back attack> and <Normal, Apache2 attack>. Then, LDA is
employed to select the most significant features for each normal and attack pair based on
the pre-generated difference distance maps. Finally, all the selected features are
integrated into a new significant feature set used for normal profile generation and
97
It sends an alert signal to administrator if the new incoming packet is identified as an
attack packet.
Attack
MDM
Normal
MDM DDM LDA- Profile Attack
Generat- based generation Recogni-
ion Feature tion
Selection
In order to discover the difference between the normal and attack samples, distance
maps of all samples images are constructed as discussed in Chapter 3. Let us assume that
there are m normal sample images and n attack sample images. Then, the average
distance of element (i,j) is computed using Equations 4.6 and 4.7, where is the
(i,j).
(4.6)
(4.7)
of the (i j)-th elements of attack samples are computed using Equations 4.8
98
and 4.9. The difference at each element (i,j), where i, j = 1, …, 256 between the MDMs
of the normal samples and the attack samples is computed using Equation 4.10.
(4.8)
(4.9)
(4.10)
In Equations 4.6-4.9, represents the (i, j)-th element of MDM of the k-th
normal sample, and stands for the (i, j)-th element of MDM of the k-th attack
sample. The difference between the normal samples and the attack samples is denoted
by . The difference distance map between the normal samples and the attack
for each pair of normal traffic and a particular type of attack traffic. Because the
dimension of the difference distance map is large (256×256), it is very time consuming
if the difference map is directly used to differentiate the normal traffic and the attack
traffic. Therefore, we use Linear Discriminant method [98] to reduce the number of
difference distance map elements (i.e., to reduce the dimension of the difference map).
difference distance map. In the following section, we discuss iterative feature selection
process in detail.
99
A difference distance map generated in the first stage, for each pair of normal traffic
and particular type of attack traffic, is used for the selection of significant features.
In the difference distance map, the larger a feature value (i.e., a matrix element
results) is, the more important feature is used to discriminate attack traffic from normal
traffic. Figure 4.2 shows a flow model for iterative feature selection process.
We first select the most significant r features from the difference distance map. The
element locations of these features in the difference distance map determine the element
dimensional distance vector represented by Dr,k = [dk(Ur,1 ,Vr,1), dk(Ur,2 ,Vr,2), …, dk(Ur,r ,Vr,r)]T,
where (Ur,1, Vr,1), (Ur,2, Vr,2), …, (Ur,r, Vr,r) indicate the element locations of the largest r
features in the difference distance map, r is ranged from 1 to 2562 and k indicates the k-
th sample. Let and represent the of the k-th normal sample and the
(4.11)
determining the projection matrix and removing the lower contribution elements. The
positions of the elements are retained. The whole process is carried out iteratively until
the number of significant features reaches the pre-set value. Then, the final projection
matrix is determined. Once the projection vector is finalized, the corresponding final
100
set of features is considered as the most significant features. The matrix is employed
to transform the multi-dimensional feature vector as a one dimensional score value for
each sample.
Start
End
101
c) Normal Profile Generation
The normal profile is utilized to detect the similarity between the normal behavior and
new incoming packet. It is developed by using the normal training samples and the
selected significant feature set. In this section, we explain how to perform the
Mean values of the significant r features of all normal training samples and a
detection threshold are the basic components of the normal profile. Given a set of
r features and k indicates the k-th sample. The mean values are denoted by
, (4.12)
and they are stored in the normal profile used for comparing with any new incoming
threshold value is, the less false positive alarm is generated. On the other hand, a smaller
distance between each normal training sample and the mean value . The Euclidean
distance from the k-th normal training sample to the mean value is computed by
, (4.13)
102
where is the (Ui,Vi)-th element of . The standard deviation of the Euclidean
distance from the k-th normal training sample to the mean value of the normal
training samples is
, (4.14)
Similar to normal profile development process, for any new incoming packet, the GSM
is applied to generate the MDM of the packet. Then, the most significant r features are
collected from the MDM. Projection matrix , and the r selected significant features
are used to calculate the score value for each input network packet. The score value of
(4.15)
feature vector with r elements of the distance map of the input network packet. When
the score is larger than a pre-calculated threshold, the input network packet is identified
To calculate the classification threshold, all training data consisting of normal and attack
data are fed into the testing procedure respectively. Corresponding scores obtained based
on Equation 4.15 are clustered into two classes on a d dimensional data domain. A
103
threshold is selected using the LDA optimizing criterion using Equation 4.5 which finds
experimental results is given. We evaluate the LDA feature selection approach on the
DARPA 1999 IDS dataset [90], which contains tcpdump network traffic recorded for
five weeks. Week 1 and week 3 data are attack-free data, and the other three-week data
contain both normal and attack traffic, as explained in Chapter 3. This dataset consists of
scan or probe, DoS, R2L, U2R and data. The dataset is divided into subsets. Each subset
comprises of data records of specific attack type and normal traffic. LDA is used on the
training dataset to obtain the important features for the classification process. The
Iterative feature selection is the first step of our experiments. We use heuristic
search method to select the number of elements as a starting point. Then, LDA
In the second phase of our experiments, the obtained optimal feature set is used
To verify the performance of the proposed algorithm, HTTP traffic is used. LDA-based
IDS is trained and tested with the inbound HTTP GET request traffic carrying payload
104
extracted on week 4 and week 5. We use the same dataset which is used in Chapter 3,
Section 3.6. The total numbers of packets after filtering are 783,443 packets for marx
(Linux server with IP address 172.16.114.50), and 8431 packets for hume (NT server
with IP address 172.16.114.100) hosts respectively. Then, we further filter the normal
and attack HTTP GET request packets, and divide them into normal and attack datasets
respectively. We randomly choose 300 normal packets and 900 attack packets for
feature selection, and choose 4000 packets for normal model training. Finally, we
randomly select 1000 normal packets and 3000 attack packets for testing. The types of
To select the most significant features, we first conduct experiments to choose a step
size for iterative feature selection. We randomly choose 2600 normal HTTP request
packets and 2600 Phf attack packets and calculate MDM using Equations 4.6 to 4.9. The
MDM represents the correlations between features and is symmetric along the diagonal.
neglect features that have small value in the difference distance map. We assume that
these features have little influence on traffic classification. We manually select 2700
features as initial starting point and use LDA-based iterative feature selection technique,
discussed in Section 4.3.2 to determine optimal feature set, detection rate, false positive
rate and threshold values for two step sizes, namely 10 and 20 respectively. Results for
105
Table 4.1: Performance of Phf attack for various selected features
We randomly select 300 normal packets and 900 attack packets for each type of
attack to select significant features using the feature selection process, discussed in
Section 4.3.2. The normal model of LDA-based IDS is trained on the 4000 normal
training packets. It can be observed from the results in Table 4.1 that the detection rate is
high, and is 99.76% for step size of value 20 when 100 features are selected.
Figures 4.3(a)-(b) show the average MDMs of normal HTTP request packet and Phf
attack packet used in the experiments. The average difference distance map (DDM)
between normal packets and Phf attack packets is represented in Figure 4.4. The
difference distance map (DDM) shows a clear difference between normal packets and
106
(a)
(b)
Figure 4.3: Average MDMs. (a) normal HTTP request; (b) phf attack packets
.
Figure 4.4: Difference distance map between Normal HTTP and Phf attack packets
107
To validate our results, we conduct several experiments to extract the optimal number
of significant features to best separate normal packets from attack packets for Back
attacks, Apache 2 attacks, Phf attacks and Crashiis attacks. We use step size of value 20
for our experiments. The optimal results are achieved with 100 features selected using
iterative feature selection process for each of the three types of attacks except Crashiis
attack. This is because Crashiis attack carries a small packet payload and has very few
features. Thus, we can conclude that LDA-based feature selection technique successfully
transforms the original 256×256 dimensional feature domain to a relatively very low
dimensional feature space and preserves the most significant information for the final
classification. We integrate all selected features into one feature vector, which is used
In the test stage, we evaluate the performance of LDA-based IDS on the testing
dataset containing both the normal packets and the attack packets. The test results are
shown in Table 4.2, which shows the performance of our one-tier model. A detailed
Table 4.2: Confusion Matrix for LDA-based IDS using integrated feature set
Normal 987 13
108
4.4.2 Analysis of Results
We evaluate the performance of our one-tier LDA-based model. From Table 4.2, we
further calculate detection rate and false positive rate. Our model could detect 95.3% of
attacks correctly with low false positive rate of 1.3%. The results in Table 4.2 reveal that
the 300 optimally selected significant features can well differentiate various attack
packets from the normal packets except Crashiis attack. The poor detection rate for
Crashiis attack packets is related with the size of attack payload. Crashiis attack carries a
small packet payload and has very few features, and some relevant features are removed
To overcome the problem of small packet payload and to further improve the
section.
The LDA-based feature selection approach, discussed in Section 4.3 is proven efficient
in reducing the computational complexity while retaining the high detection rates.
However, we have considered only three types of attacks, namely Apache2, Back and
Phf in the experiments. We have excluded the Crashiis attack due to the small packet
payload size, which bias the overall detection performance and increase the false
We propose a two-tier IDS, which uses payload length criterion to separate the small
size payloads from the normal size payloads. In the two-tier model, tier one is a
109
statistical based detector responsible for the detection of the small size payload attacks,
and tier two is LDM-based detector applied to identify the other attacks.
We first describe the framework of two-tier IDS. Then, we evaluate the IDS on
DARPA 99 attack dataset to detect various payload size attacks. Finally, we discuss the
The framework of two-tier intrusion detection system is given in Figure 4.5. The system
consists of four key components, namely Filter, Statistical Signature Based Detector,
LDM Based Detector and Alert Generator. The solid arrow indicates the incoming
network traffic, and the dotted arrow stands for the analysis decisions made by the
detectors.
Under the HTTP environment, we make use of the length of packet payload as the
filtering criterion because the normal HTTP packet has a very low probability to carry a
very short payload. Therefore, the Filter component pre-processes the non-zero
incoming HTTP Get request packets. Then, pre-processed request packets are grouped
together based on the length criterion. If the length of any payload is less than the length
criterion, the packet will be forwarded to the Statistical Signature Based Detector on the
first tier. Otherwise, the packet will be passed to the second tier detector, i.e., the LDM
Based Detector.
The detector analyzes the received packet and makes the final decision. Then, the
Alert Generator will decide to raise an alarm or not based on the detection result given
by the detector.
110
Figure 4.5: Framework of LDM based two-tier intrusion detection system
As the first tier detector, Statistical Signature Based Detector only processes the small
packet payloads. In this case, the observed HTTP Get request packets are highly
suspicious, and the anomaly patterns carried by the attacks are easy to learn from the
character relative frequencies. This is because these attacks have very high frequencies
on some particular ASCII characters in the payloads, which is unusual and is not going
to happen in the normal cases. Thus, we can develop the statistical signatures for these
types of attacks.
To develop the attack signatures, the techniques discussed in Chapter 3 are used to
parse and to extract the character relative frequencies from the labeled training attack
packet payloads. The patterns of the character relative frequencies are stored as the
signatures and are applied to identify the corresponding attacks in the future.
111
In the attack recognition phase, any new incoming packet is processed using the same
techniques mentioned above to generate character relative frequency profile. The profile
is compared with each known statistical signature, and the attack is identified as long as
If the length of HTTP Get request packet payload is larger than the pre-set length
criterion, the packet will be forwarded to the LDA-based Detector. The proposed LDA-
based feature selection approach is used to extract a low-dimensional feature space for
profile development and attack detection. The processes of normal profile development
The methodology used in two tier IDS to extract significant features is similar to the
between normal traffic and particular types of attack traffic, such as the difference
between each pair of <Normal, Phf attack>, <Normal, Back attack> and <Normal,
Apache2 attack>. The difference distance map between the normal samples and the
Then, LDA is employed to select the most signification features for each normal and
attack pair based on the pre-generated difference distance maps. For the selection of the
most significant features, we randomly choose normal training samples and various
112
attack training samples from the labeled samples set. A generated difference distance
map is used for the significant feature selection. We first select the most significant r
features from the difference distance map. Then, the optimal value of projection vector
is computed. Once the projection vector is finalized, the corresponding final set of
To measure the similarity between any new incoming packet and normal packets, the
which has been discussed in Section 4.3.2. In this section, we briefly explain the
Mean values of the significant r features of all normal training samples and a
detection threshold are the basic components of the normal profile. The mean values of
the significant r features of all normal training samples are calculated by Equation 4.12.
Each feature is represented by an index and location pair, such as (U1, V1), (U2, V2), …,
distribution analysis of the Euclidean distance between each normal training sample and
the mean value of the significant features. The Euclidean distance from the kth normal
The standard deviation of the Euclidean distances from the kth normal training sample
to the mean value of the normal training samples is calculated using Equation 4.14.
113
We assume that the distance is of normal distribution, so three standard deviations
c) Attack Recognition
In the attack recognition process, the values of the most significant r features are
attack or a threat if and only if the Euclidean distance from F to is greater than +3δ or
smaller than -3δ, where δ is the standard deviation computed by Equation 4.14.
are conducted on the DARPA 1999 IDS dataset and compared with the outcomes of
LDA-based IDS. In the following subsections, we present the experimental results and
the analysis.
DARPA 1999 IDS dataset is used for evaluation of our proposed system. We focus on
In the experiments, we use the same conditions as discussed in Chapter 3 to filter the
interested HTTP Get request traffic from the week 4 and week 5 data of the DARPA
1999 dataset, and the extracted packets are grouped into normal and attack sample sets
attack packets from the sample sets for the training of the model, and the rest of sets are
used for testing. The attack packets contain Crashiis attack, Phf attack Apache2 attack
114
and Back attack. The proposed two-tier system is trained and tested with the selected
inbound HTTP Get request traffic carrying non-zero payload as discussed in Section 4.3.
All four types of attacks are used for the LDA-based IDS to obtain the significant
feature set. The proposed two-tier system, however, uses Phf attack, Apache2 attack and
Back attack only, and we exclude the Crashiis attack. This is because Crashiis attack is
the only attack carrying a small packet payload with respect to the length criterion using
in our experiments.
Figure 4.7: Average Mahalanobis distance map of normal HTTP Get request packets
115
Thus, in the proposed two-tier system, the pattern of the character relative frequencies
of Crashiis is used as the statistical signature for the tier-one detector. Figure 4.6 shows
To obtain the optimal feature set for Phf and Apache2 attack, we use Figure 4.7, and
Figures 4.8(a) and 4.9(a) to generate the difference distance maps as shown in Figures
4.8(b) and 4.9(b) respectively. The same method is used to obtain the optimal feature
(a) (b)
Figure 4.8: Average Mahalanobis distance map. (a) phf attack packets; (b) difference
distance map between normal HTTP and phf attack packets
(a) (b)
Figure 4.9: Average Mahalanobis distance map. (a) apache2 attack packets, (b) difference
distance map between normal HTTP and apache2 attack packets
116
Experiments are conducted to extract the optimal number of significant features to
best separate normal packets from attack packets. The optimal result is found to be 100
features selected by LDA for each of four types of attacks. Then, the normal profiles of
the LDA-based IDS and the proposed two-tier system are developed based on the
In the test stage, the LDA-based IDS and the proposed two-tier system are evaluated
on the testing sample sets containing both the normal packets and the attack packets. All
the test samples are used for the testing of LDA-based IDS. However, in the proposed
two-tier system, the test samples are assigned to the detectors on different tiers
according to the length. In tier-one, the detector uses the character relative frequencies of
any assigned new incoming packet payload to compare with the pre-generated signatures
in order to identify the suspicious intrusive activity. In tier-two, the detector evaluates
the similarity between any new incoming packet and the normal profile using Euclidean
distance and the decision is made by comparing the distance with the pre-set threshold
(i.e. ±3δ).
The experimental results of the LDM-based IDS and the proposed two-tier system are
shown in Tables 4.3 and 4.4 respectively. Table 4.3 presents the performance of LDA-
based IDS using features extracted from four types of attacks. Table 4.3 gives a
comparison between the results obtained for the normal profiles developed using
different numbers of training samples, i.e., 300, 700 and 4000 samples.
117
Table 4.3: Performance of LDA-based IDS for four types of attacks
Test samples 300 training samples 700 training samples 4000 training samples
As can be seen from the table, the percentage of correct classification of normal
samples is improved as the number of training samples increases. Back attack and Phf
attack remain constant in all cases and have 100% correct classification rates. In
contrast, the trend of correct classification of Apache2 attack and Crashiis attack is
reverse. In the case of 4000 training samples, the classification of Apache2 attack drops
down to 0%. This behaviour shows that this set of training samples has some
discrepancy in dataset and integrated iterative feature set has removed some important
features.
The results in Table 4.3 show the LDA-based IDS is unable to classify Crashiis attack
correctly, and has misclassification rates higher than 93% consistently in all three
training scenarios.
In Table 4.4, the performance of two-tier system using features extracted from three
types of attacks is given. It compares the results obtained for the normal profiles
118
developed using the same numbers of training samples as Table 4.3. The difference is
that the normal profiles for tier-two detector are built up on three types of attacks
(Apache2 attack, Back attack and Phf attack) instead of all the four types.
Table 4.4: Performance of two-tier system using features from three types of attacks
Test samples 300 training samples 700 training samples 4000 training samples
Classify Mis- Classify Mis- Classify Mis-
correctly classify correctly classify correctly classify
Tier-two Apache2
100% 0% 100% 0% 86.94% 13.06%
(LDM- Attack
based Back
detector) 100% 0% 100% 0% 100% 0%
Attack
Phf
100% 0% 100% 0% 100% 0%
Attack
Tier-one
(Statist-
Crashiis
ical 100% 0% 100% 0% 100% 0%
Attack
signature
detector)
As can be seen from Table 4.4, the proposed two-tier system achieves encouraging
performance in all the cases except the detection of Apache2 attack using the normal
profile developed by 4000 training samples. DARPA 1999 dataset is used for the
generation of normal profile and has variability in the dataset, which over generalizes
trained model. Also number of Appache2 attack in DARPA 1999 dataset is very large,
this could be one of the reasons why performance for Apache2 attack is poorer in
comparison to other types of attacks. However, compared with the LDA-based IDS, the
proposed two-tier system is proven more promising. It outperforms the LDA-based IDS
119
in detecting Crashiis attack. Benefiting from two-tier architecture, we are able to classify
all the Crashiis attack samples. The detailed analysis is given in the next subsection.
The results in Tables 4.3 and 4.4 reveal that the 300 training samples can provide
sufficient knowledge for both the LDA-based IDS and the proposed two-tier system to
achieve good overall detection performance. In this section, the information contained in
these two tables is further analyzed using Detection Rate (DR) and False Positive Rate
(FPR).
Table 4.5 shows the comparison of the number of features, the detection rates and the
false positive rates for LDA-based IDS, two-tier IDS and GSAD model.
Systems Number of features Detection rate (%) False positive rate (%)
The results show that the proposed two-tier system has 100% detection rate and
3.38% false positive rate, which is slightly higher than LDA-based IDS. Two-tier system
successfully classify Crashiis attack, and it uses less number of features in comparison to
120
Compared with the GSAD model, the two-tier system achieves 100% detection rate.
Although it has a higher false positive rate, the system successfully transforms the
original 65536 dimensional feature space in GSAD model to a relatively very low
dimensional feature space. It integrates various attack signatures while preserving the
most significant information for the final detection. It not only significantly reduces the
In the following, we give two Receiver Operating Characteristic (ROC) curves for the
LDA-based IDS and the proposed two-tier system in Figures 4.10 and 4.11, which show
the relationships between detection rates and false positive rates to the corresponding
systems.
121
Figure 4.11: ROC curve of a two-tier IDS
As shown in Figure 4.10, the detection rate of LDA-based IDS increases significantly
from 13.7% to 99.82% when the false positive rate is set to be around 3.38%. Then, the
detection rate keeps going up slowly to 99.8%. Contrastively, the ROC curve of the two-
tier system in Figure 4.11 is more stable, and it always stays at 100%.
Despite the ROC curve of LDA-based IDS finally reaches to nearly 100% detection
rate, the detection performance of the LDA-based IDS in fact is significantly influenced
by the number of small payload (i.e. Crashiis attack) appearing in our test sample set.
The test sample set used in this paper is heavily dominated by the Apache2 attack
(97576 test samples), and the small payload attack (i.e. Crashiis attack) only contributes
a very small portion (195 test samples) to the test sample set.
Therefore, even around 93.33% of the Crashiis attack packets are classified
incorrectly by the LDA-based IDS shown in Table 4.3, its overall detection rate did not
drop dramatically. Hence, the ratio of the attacks in a test sample set bias the detection
performance of LDA-based IDS. However, our two-tier system does not have this issue.
122
4.7 Common Profile (Signature) for Integrated Feature Set
attack has been detected by an anomaly-based system. Chung and Mok [113]
demonstrated that it was possible to generate signatures that match normal traffic. We
use the same idea and develop one common signature for the normal traffic that can
classify three different types of attack, namely, Phf, Apache2 and Back attacks. We have
combined optimal features extracted for three attacks, namely Phf, Apache2 and Back
attacks. Then, we have removed all common features from the combined feature set.
This profile is used for the traffic classification. We have evaluated new integrated
signature in Section 4.4 and Section 4.6 on DARPA 1999 dataset and calculated
4.8 Conclusion
reduce the computational time and number of discriminating features of payload based
anomaly IDS. The approach not only extracts a set of low-dimensional features but also
We have proposed a two-tier IDS to detect various attacks. The system processes the
incoming packets based on the payload length of the packet. Tier-one uses the statistical
signature approach for the classification of small payload attack packets, and tier-two
uses LDA-based approach for the classification of the other attack packets.
The proposed two-tier model has been evaluated using DARPA 1999 IDS dataset for
the HTTP traffic. It has achieved encouraging results with 100% detection rate and
123
3.38% false positive rate, and it can classify the Crashiis attack successfully, which is
transforms a high dimensional feature space to a very low dimensional feature space,
Note that the amount of selected significant features may grow to a large number
when more types of attacks are considered. This is because more sets of significant
features will be selected with respect to the increasing number of types of attacks.
However, this approach is able to generate one common signature for three different
attack types, and reduces number of signatures required to classify patterns. The optimal
feature set can be used to generate the single signature for a group of attacks. This will
reduce the computation time for signature comparison for those selected attacks.
124
CHAPTER 5
provided over a network. In order to operate in high speed networks, intrusion detection
pattern matching and are limited to detect attacks with known signatures. On the other
hand, anomaly intrusion detection systems can detect new attacks. Unfortunately, they
are prone to false positives. In addition, network intrusion detection systems operate at
the periphery of the networks, and are overloaded with large amount of network traffic.
Thus, network intrusion detection systems have problems with handling heavy traffic
In addition, 75% of cyber attacks occur at the application layer and 69% of
vulnerabilities are caused by web services [78, 114]. This shows that, attackers are
trying to exploit vulnerabilities at the application level, where sensitive data is stored.
Header-based systems are not suitable to detect attacks intended for application level.
On the other hand, payload-based systems can identify attacks trying to exploit
125
vulnerabilities at the application level. Thus, organisations rely heavily on payload-
based intrusion detection for the protection of their networks. This poses significant
challenges to build efficient network intrusion detection systems to detect a wide variety
5.1 Introduction
in Chapter 2, indicates that most previous research works in anomaly detection do not
mention about data preprocessing techniques and traffic feature selection criteria used in
NIDS. Intrusion detection algorithms are used directly on the raw data of the network.
For practical applications, data preprocessing is one of the most important stages in the
development of detection algorithm, and it directly impacts the accuracy and capability
detect intrusions or misused patterns. Some of the features may be redundant or have
little contribution to the detection process. In Chapter 4, we also proposed to use the
experiments that the LDA-based system is able to reduce a large feature set to a small
feature set [115], and discriminate normal and malicious network traffic. The LDA-
LDA is a supervised technique for constructing network IDS and needs labeled dataset
as normal traffic and malicious (attack) traffic. Moreover, it is very hard and expensive
126
to create and analyse a labeled dataset of traffic from a real network. Furthermore,
payload attacks are computationally expensive to detect because they require deeper
searches into network sessions and also look for large number of payload features to
discriminate normal packets and anomalous packets in the network traffic. Such
features, which characterize behavioral patterns of network traffic and build a real-time
Component Analysis (PCA) [105] approach is used to construct important and suitable
features from network traffic data which can distinguish normal and abnormal activities
real-time applications.
achieve sensible feature reduction, no work has been done on the data pre-processing
using PCA for payload feature selection. Nwanze et al. [119] discussed modelling of
packet payload using data mining technique based on PCA. However, they ignored the
main idea of PCA and did not use the projection of original data on a new lower
dimensional feature space. Furthermore, they did not consider correlations between
features.
In this chapter, we propose a 3-tier Iterative Feature Selection Engine (IFSEng) for
feature subset selection, which addresses the issues related to the quality of feature set.
127
(RePIDS), which aims to detect payload-based attacks on a network in real-time. 3-tier
IFSEng and Mahalanobis Distance Map (MDM) are the key components of RePIDS,
which facilitate effective and efficient detection of attack packets in the network traffic.
promising in extracting the hidden correlations between features and the correlations
among network packet payloads. MDM also partially captures structural information of
payload which helps improve the detection performance and reduces false positive rate.
We evaluate our model on two datasets, namely DARPA 1999 [96] and Georgia
payload-based IDS with two state-of-the-art payload-based IDSs, namely, PAYL [80]
and McPAD [83]. Furthermore, we compute processing speed of our proposed model
and compare it with the processing speed of a real scenario of medium size enterprise
network.
the framework of RePIDS and its mathematical model. Experimental results and their
analysis are given in Section 5.4. Section 5.5 demonstrates the evaluation results of
RePIDS in terms of computational complexity, and compares RePIDS with the state-of-
the-art PAYL and McPAD intrusion detection systems. We also compare processing
speed of RePIDS with the processing speed of a real scenario of medium size enterprise
network with a gateway speed of 1GB. We conclude this chapter in Section 5.6.
128
5.2 State-of-Art Systems
In the following section, we review the main characteristics of PAYL and McPAD,
which are the most relevant to our work in payload-based anomaly detection.
a) PAYL
PAYL is a payload-based anomaly detector proposed by Wang and Stolfo [74], which
packets based on payload data length. The concept of n-gram text categorization can be
found in Subsection 5.3.2. In PAYL, intrusions are detected by analysing the distribution
of bytes inside the HTTP payload. The pre-processing of packet payload using 1-byte
sliding window creates a feature vector containing the relative frequency count of each
of the 256 possible 1-gram (byte) in the payload. Simplified Mahalanobis distance
measure was used to compare new incoming traffic against the model. The relative
position of different bytes inside the payload is not taken into account, so that the
structure of the payload is not modeled. To model the structure of the payload, a value of
n ≥ 2 should be considered.
To include the structural information of the payload, Wang and Stolfo proposed
Supervised learning was employed to model normal traffic and attack traffic by storing
n-grams of normal packets and attack packets into two separate Bloom Filters (BFs). A
grams of incoming payload with respect to n-grams of normal filter value. The
129
. It is easy to see that as the value of n increases, the size of feature space increases
exponentially. This is the reason why in a real scenario a value of n greater than 2 is not
used.
b) McPAD
Perdisci et al. proposed a Multi classifiers Payload Anomaly Detector (McPAD) [75].
McPAD measures the occurrence frequency of pair of bytes that are ν position apart and
from McPAD, the payload is extracted to perform feature extraction. After the extraction
of features, the same payload results are represented in m different feature spaces by
using a 2ν-grams sliding window. The dimensionality of each feature space is then
each working in a different space. Finally, the outputs of the classifiers are combined in
a fusion stage. 2ν-gram feature extraction technique may include partial structural
on payload, PCA [61] and MDM [117]. PCA selects important and suitable features
from network traffic data, and MDM extracts the hidden correlations between features
and the correlations among network packet payloads, which are used for the
130
classification of network traffic. Firstly, we present the framework of our real-time
framework, namely data preparation module, n-gram text categorization module, 3-tier
Iterative Feature Selection Engine (IFSEng), profile generation and traffic classification.
The complete framework of our proposed intrusion detection system has four stages as
shown in Figure 5.1. They are data preparation, data pre-processing, model generation
Tier 1 Tier 2
Network Principal
Data Analysis
traffic Packet n-gram Text Component
Using PCA
Filtering Categorization Selection
Tier3
Profile
Network Traffic Generation Using Generation and Refinement
Classification MDM Verification of of Feature
Model selection
Figure 5.1: Framework for real-time payload based intrusion detection system
131
The first stage of this IDS consists of data preparation and n-gram text categorization
[40]. For data preparation, the incoming network traffic is filtered according to type of
application and payload length, and n-gram text categorization converts network traffic
packet payloads into a series of feature vectors. These feature vectors describe the
In the second stage, a 3-tier IFSEng, detailed in Subsection 5.3.2, is used for feature
subset selection. Each tier performs a specific task. At tier 1, PCA technique [59] is used
in [118] and [119]. And tier 3 refines the optimal feature subset (PCs) and evaluates the
discriminative power of the feature subset to represent packet payloads. MDM shown in
[117] (to be further discussed in Subsection 5.3.2) is used to capture more complex non-
linear correlations among the selected features, and construct a distance map which
In the third stage of the framework, the finally selected PCs, (i.e., output of IFSEng)
are used to build a normal traffic profile. An MDM is created for normal network traffic
as a normal profile, which is used for classification of new incoming network traffic in
In the last stage, Mahalanobis Distance criterion is used to measure the similarity
between the pre-developed normal profile and the profile of a new incoming network
packet. The packet is classified as a normal or an attack packet depending upon the
132
amount of deviation of its profile from the normal profile. Detailed description of each
In this section, we provide a step-wise description and technical details of all modules
Data preparation is the first stage of the framework, where different datasets are
prepared. We group network traffic into various categories using Wireshark [89], which
is a traffic analyser, and separates the network traffic based on type of services,
destination address, payload length and direction of network traffic flow. The source of
network traffic can be real network (for real-time operation) or collected tcpdump files.
n-gram Text Categorization is responsible for payload feature analysis and feature
construction as discussed in Section 3.1.2 of this thesis. It extracts raw features using n-
gram text categorization technique (here n=1) from the packet payload and converts
space.
133
c) 3-tier Iterative Feature Selection Engine
The 3-tier Iterative Feature Selection Engine (IFSEng) consists of “Data Analysis Using
PCA” (tier 1), “Principal Component Selection” (tier 2), and “Refinement of Feature
At tier 1, PCA [61] is used to analyse the original dataset. As a linear mathematical
orthonormalized coordinate system, where the data are maximally decorrelated. The
(eigenvalues) make more contributions to the data representation. The first few most
contributing axes are usually used to construct a new lower dimensional feature space to
the observations to make PCA work properly. The mean shifted dataset is represented by
, (5.1)
. (5.2)
134
Using eigen-decomposition, the covariance matrix can be decomposed into a
matrix W and a diagonal matrix . They satisfy the condition, . The columns
of the matrix W stand for the eigenvectors (called the principal components) of the
covariance matrix , and the elements along the diagonal of the matrix are the
terms of data representation. It does not determine the number of principal components
that should be retained. Thus, some other supplementary techniques are applied at Tier
At tier 2, several techniques, such as cumulative energy [61], scree test [118, 120] and
parallel analysis criteria [119], help achieve one of the main goals of good pre-
Cumulative energy, scree test and parallel analysis criteria are utilized independently to
respectively. The selected k1, k2 and k3 principal components are three subsets of k
(which equals to 256 in our case), principal components contained in matrix W. These
mathematical and non-mathematical criteria are used to verify the outcomes of each
others. The subsets of principal components represent reduced feature spaces, which
provide the best presentations determined by the criteria for a packet payload. By
spaces, the dimension of the feature vector can be reduce significantly to smaller values,
135
namely k1, k2 and k3. At the meanwhile, the criteria guarantee that the reduced feature
vector can correctly represent the packet payload. A brief explanation of individual
corresponding eigenvalue. The greater an eigenvalue is, the larger energy the
cumulative energy of the first k1 components is defined by the sum of the energies across
the components from 1 through k1, and it is computed using Equation 5.3.
, (5.3)
(5.4)
In Equation 5.4, is the ratio of variation in the subspace to the total variation in the
Scree Test is a graphical method, first proposed by Cattell [118] in 1966. More
explanations of scree test are given by Nelson [121]. In a scree plot all eigenvalues are
plotted against all (k) principal components (eigenvectors) in the descending order. In
the scree plot, we look for the k2-th point, where sharp decrease in eigenvalue levels off
(the scree). This point is identified as an „elbow‟. After the k2-th point, the remaining
(k−k2) principal components (eigenvectors) are ignored and not used in the model. This
is based on the arguments that the most significant components extract a large
136
proportion of the variances from the covariance matrix, while the remaining
insignificant (k –k2) ones are associated with similar low value variances. The criticisms
of scree test criterion are that there is no sharp transition where the scree begins, and the
decision is not robust and reproducible. Alternatively, parallel analysis criterion is used
Parallel Analysis (PA) is a modification of Cattell‟s scree test. PA [117] alleviates the
significant for each component. This operation is repeated twice and the obtained
eigenvalues for each component are used to calculate means and Standard Deviations
(SD) in the two iterations. From the means and standard deviations, the 95 percentile
values are obtained (95 percentile = mean + 1.65 SD). If the eigenvalue of a component
exceeds the 95 percentile of the simulated values, then the component is retained.
At tier 3, feature refinement and evaluation module is used. In the refinement stage, we
extend the range of the selected principal components, obtained from tier 2, on both the
upper and lower sides. Then, we observe the discriminative power of the subsets of
model using F-Value (as discussed in Section 2.4 of this thesis) using Equation 5.5.
F-Value , (5.5)
A low value of precision means a higher degree of false positives and vice versa. A
lower value of recall represents a higher degree of false negatives and vice versa. In
Equations 5.5, β corresponds to the relative importance of precision versus recall and is
137
usually set to 1. On one hand, when precision and recall have equal weights and close to
1, the model can achieve F-Value close to 1, which indicates good performance meaning
that the classifier has 0% false alarms and 100% detection of attacks. On the other hand,
The selected kfinal principal components are the ones which facilitates the classifier to
achieve the greatest F-Value among the candidates k1, k2 and k3. Then, selected kfinal
principal components are used in the profile generation, which is briefly discussed in
Network traffic profile is generated using Mahalanobis Distance Map (MDM) which
captures complex non-linear correlations of the data. By using MDM (as described in
Section 3.1.2 of this thesis), we obtain the hidden correlations between the features of
, (5.6)
, (5.7)
, (5.8)
138
where represents the a-th projected feature in the projected feature vector, denotes
the average of each projected feature, defines the Mahalanobis distance between
the a-th projected feature and the b-th projected feature, is the covariance value of
each projected feature, and finally D is the MDM (the pattern of a network packet).
Distance map D is used to generate the network traffic profiles (normal and attack) of
the training and test data. These profiles are used for the classification of incoming
network traffic.
e) Traffic Classification
Mahalanobis distance is the criterion used to measure the similarity between the
developed profile and new incoming network traffic profile. Weight w is calculated
, (5.9)
where and are the average and variance of the (a,b)-th element in the
weight factor exceeds the calculated threshold, the input packet is considered as an
intrusion.
139
5.4 Experimental Results and Analysis
In the following subsections, we first present experimental setup and brief information
on the dataset and types of attacks. We then discuss training and test of our model.
A Series of experiments on DARPA 1999 [96] dataset and GATECH attack dataset
[97] are conducted to evaluate the performance of our proposed model. These two
datasets are used by the state-of-the-art payload-based IDSs that we will compare in this
paper.
We experiment with several different threshold (δ) values. For evaluation of our
model, we use threshold (δ) equal to ±3ζ, where ζ represents single standard deviation
value, because this value will give us good results for detection rate, false positive rate
and F-Value.
5.4.2 Datasets
We extract week 1 and week 3 inbound „HTTP request‟ traffic from DARPA 99 dataset
for the training of our model. The extracted normal traffic corresponds to two different
HTTP servers, as discussed in Chapter 4 of this thesis. The total numbers of packets
140
used for training of the model after filtering are 13,933 and 10,464 for hosts marx and
hume respectively.
In order to test the performance of our proposed model in detecting known attacks and
new attacks, we use attacks contained in DARPA 1999 dataset and GATECH attack
dataset. The labeled test data is further pre-processed to form two test datasets, which
contain instances that do not appear in our training dataset. For our experiments, we
As explained in Chapters 3 and 4 of this thesis, HTTP-based attacks are mainly from
the HTTP GET/POST requests to web servers. There are several HTTP-based attacks
provided by DARPA 1999 dataset, namely Apache2 attack, Crashiis attack, back attack
and Phf attack. The GATECH attack dataset has several non-polymorphic HTTP attacks
provided by Ingham and Inoue [97] and several polymorphic HTTP attacks created
using CLET engine generated by Perdisci et al. [83]. The attacks, namely Generic attack,
Shell-code attack and CLET attack (polymorphic attack), are placed in different groups,
and each group has attacks of the same category for the presentation of results. All
The experimental approach involves following procedure for training and testing of
model:
141
1. We use 185 bytes of HTTP GET request payload for training and testing of model.
2. As discussed in Subsection 5.3.2 of this thesis, tier 1 of IFSEng uses the PCA
technique to analyse raw data, by projecting raw data on a reduced feature space.
cumulative energy, scree test and parallel analysis criteria on the outcome of PCA.
First, cumulative energy criterion is applied for the selection, in which we consider
means that the first 7 principal components are selected as the best subspace to
represent the data by cumulative energy criterion. Then, we use scree test to draw
another set of principal components. Figure 5.2(a) shows full scree plot, where we
use k (k = 256 in our case) principal components (X-axis) of a particular dataset and
the corresponding variances, namely eigenvalues (Y-axis) to draw a scree plot, and
the PCs are sorted in descending order with respect to the values of the
the curve. To provide better vision, we magnify the scree plot and show the first 25
principal components in Figure 5.2(b). It can be observed from Figure 5.2(b) that
there is a sharp decrease of variance in the front part of the plot and then it starts
flattening out after the 6-th principal component. In Figure 5.2(b), we can observe
„elbow‟ somewhere in the range from 6 to 9 principal components and the first =
6 principal components are able to capture about 92 percent of the variance. After
142
the -th point, the remaining (k - k2) principal components capture only around 8
(a)
(b)
Figure 5 2: Scree test plot. (a) full scree plot; (b) enlarged scree plot with first
25 eigenvectors
We use = 6 as a dominant principal components in our case. However, from
is not very clear that what is the most appropriate value of To overcome this
143
ambiguity, we use parallel analysis criterion as described in the following and to
We verify the outcome of scree plot by using parallel analysis criterion as discussed
in Subsection 5.3.2 on the same dataset. The result of parallel analysis also suggests
a selection of first 7 principal components, which is the same as what has been
The results of three feature selection criteria are given in Table 5.1.
Cumulative Energy
PC Selection Method Scree Test Parallel Analysis
(0.93)
Number of PCs 7 6 7
training model generation and evaluation at tier 3 is performed using F-Value metric
defined in Equation 5.7. The MDM represents the correlations among the features
obtained from the projection of the original feature vector onto the finally selected
4. For testing, we project the extracted feature vector of an incoming packet payload
onto the reduced feature space (the finally selected principal components) and use
144
Mahalanobis distance dissimilarity criterion to detect intrusive behaviors. The
In the experimentation, the 10 days normal „HTTP GET request‟ traffic from
DARPA 1999 dataset is used. The normal traffic is randomly divided into three subsets.
One of the subsets is selected randomly and used for training the model. The remaining
In the testing stage, an attack is detected as long as one of its attack packets is
identified as abnormal. We conduct our experiments using the features obtained from the
projection of original feature vectors onto the optimal principal components determined
by the IFSEng for various types of attacks (Apache2, Phf, Crashiis and Back attacks)
present in DARPA 99 attack dataset. We further evaluate our model on GATECH attack
Experimental results are explained in two steps. In the first step of the experiments, we
experiments according to Figure 5.1, showing the RePIDS framework, to determine the
performance of our proposed model when using various subsets of principal components
different values of threshold varying from 2ζ to 3.5ζ. Results are presented in Table 5.2
for various feature subsets and using 3.5ζ as the optimal value of threshold.
Table 5.2 shows the variation of FP, TN, TP and FN rates along the change of the
145
To obtain the optimal number of principal components, F-Value is calculated for
The results in Figure 5.3 show that the best F-Value is achieved with 7 principal
components. In other words, the feature subspace of 7 principal components has good
representation, discriminative power and high accuracy. The increase and decrease of
0.998 0.9958
0.996 0.9943
0.9935
0.994
0.992 0.99
F-score
0.99
0.988 0.9865
0.986
0.984
0.982
0.98
5 6 7 8 9
Principal Components
146
It can be concluded that PCA and the three selection criteria help reduce the
IFSEng is high in the selected 7-dimensional feature space, which helps create more
accurate normal traffic profiles using MDM that is used for traffic classification.
To demonstrate how MDM presents the correlations between the features, the MDMs
of normal HTTP payload and some attack payloads are generated using projected
(a)
(b)
147
0 0.05178815 0.04735877 0.04525517 0.03765384 0.03965582 0.05104155
0.051788147 0 0.03508168 0.05975747 0.05529712 0.05478485 0.03144298
0.047358766 0.03508168 0 0.03686035 0.0250256 0.0571498 0.0332321
0.045255171 0.05975747 0.03686035 0 0.05269052 0.05324839 0.05400761
0.037653843 0.05529712 0.0250256 0.05269052 0 0.03450803 0.04522816
0.039655825 0.05478485 0.0571498 0.05324839 0.03450803 0 0.04336399
0.051041546 0.03144298 0.0332321 0.05400761 0.04522816 0.04336399 0
(c)
Figure 5.5: MDMs. (a) apache2 attack; (b) crashiis attack; (c) phf attack payloads
Figures 5.4-5.5 shows the MDMs of normal HTTP payload and some attack
payloads, respectively. It can be seen from Figures 5.4-5.5 that the MDM is a symmetric
matrix and the values of the elements along its diagonal are all equal to zeros. This is
because the correlation of a feature to itself is always zero. MDMs also demonstrate that
the correlations between normal projected features are different from the correlations
between attacks projected features. Besides, the 7-dimensional space is able to help
differentiate normal payload and various attack payloads efficiently and accurately.
Figure 5.4 shows the MDM of normal HTTP payload (normal profile), and Figures
5.5(a)–(c) show the MDMs of the attack profiles for Apache2, Crashiis and Phf attacks.
Although we can directly compare the normal profile (model) and attack profiles
(MDMs) to confirm the differences between normal and various attack payloads, it is a
time-consuming task. Having MDM profiles for training dataset and a new incoming
packet, the weight score w is calculated. If the deviation in weight score w is greater than
the pre-selected threshold, then the incoming packet is classified as an attack packet.
148
GATECH attack dataset using the same setup. Table 5.3 reports the FP rate, TN rate, TP
F- value 0.976
It can be concluded from Table 5.3 that RePIDS has a high detection rate, a low false
positive rate and a low false negative rate. The F-Value achieved is 0.976, which
confirms that the model can detect attacks with high accuracy and demonstrates its good
performance.
In conclusion, the proposed RePIDS is able to detect novel attacks very well, with a
In this section, comparisons between RePIDS and the state-of-the-art PAYL and
McPAD anomaly based intrusion detection systems are presented. Then, we further
compare throughput of our proposed model with that of real scenario of a medium sized
enterprise network.
149
5.5.1 Detection Performance
In order to provide a reasonable comparison for these payload-based IDSs, the detection
performance of RePIDS, PAYL and McPAD anomaly based intrusion detection systems
is first compared. Thus, we use the results of false positive rate and detection rate from
[83]. From Figures 5.6-5.7 in [83], we estimate average detection rates for generic, shell-
code and polymorphic attacks. We use false positive rate of 1% to calculate F-Values for
PAYL and McPAD on GATECH attack dataset respectively. As mentioned in [83], their
results for DARPA 1999 dataset are similar to those for GATECH attack dataset. Table
5.4 shows the comparison of F-Values for PAYL, McPAD and RePIDs on DARPA
1999 dataset and GATECH attack dataset. From Table 5.4, we can conclude that
RePIDS shows better F-Value in comparison with PAYL and McPAD on DARPA 99
* F-Values for DARPA 99 dataset and GATECH attack dataset for PAYL and McPAD have been
derived from [75].
algorithms used in RePIDS, PAYL and McPAD. Only the computation involved in the
test phase is taken into account in the analysis, due to the training of the algorithms can
be performed off-line, which does not affect efficiency of the algorithms in detection.
150
Given a payload P of length n and a fixed value of ν, the occurrence frequencies of 1-
gram and 2ν-grams can both be computed in O(n). The numbers of extracted features in
these algorithms are constant regardless of the actual values of n and ν (28 features
grams to the k feature clusters using a simple look-up table and a number of sum
operations that is always less than 216 (regardless of the value of k). Therefore, the
feature reduction processes of RePIDS and McPAD can be computed in O(1). However,
complexities of the feature extraction and reduction processes. Since RePIDS uses a
fixed payload length (185 bytes) to extract the occurrence frequency, the complete
different one class classifiers used to make a decision about each payload P) times every
151
Once the features have been extracted and the dimensionality has been reduced to k,
payload P, RePIDS computes the Mahalanobis distance between the payload and the
classify the payload P represented by 256 features. Therefore, the classification process
McPAD has m classifiers. Each classifier computes the distance between the payload P
represented by k feature clusters and each of the support vector s obtained during
O(ks). McPAD has to repeat the classification process m times and the results are then
O(mks).
in Table 5.5.
152
As shown in Table 5.5, the overall computational complexities of RePIDS, PAYL
and McPAD are O(1), O(n) and O(nm+mks) respectively. This proves that our RePIDS
has the lowest computational complexity in comparison with PAYL and McPAD.
RePIDS with a similar environment used within a medium size enterprise network with
processed through such a network against the packet processing speed of our scheme
considering the most ideal parameters. On one hand, the throughput calculated for a
medium size enterprise network, considering that ideal parameters is 25600 packets in
one second. However, for real-time applications using IDS we expect the throughput to
be much less. On the other hand, our proposed scheme could process 33146 packets per
second, which is 1.3 times more than the packet processing speed on the enterprise
such consideration involving real throughput analysis with most ideal network
parameters is beyond the scope of this thesis and we intend to extend it for our future
work.
art PAYL and McPAD anomaly based intrusion detection systems. Furthermore, in
terms of throughput, RePIDS can process more packets per second than the throughput
of a medium sized enterprise network with a gateway speed of 1GB. Hence, our model,
153
5.6 Conclusions
(RePIDS) to detect attacks against Web applications through the analysis of HTTP
payloads using 3-tier Iterative Feature Selection Engine (IFSEng) and Mahalanobis
network data. The proposed model uses selected, small size of feature subspace to detect
The proposed 3-tier IFSEng is used to select an optimal feature subspace and reduce
the dimensionality of the data, which significantly influence the detection efficiency.
RePIDS has been thoroughly tested on the normal traffic of DARPA dataset, and on
two different datasets of attacks, namely DARPA 1999 and GATECH datasets.
Experimental results indicate that the method is effective in detecting attacks with high
detection rates and low false positive rates. RePIDS has achieved high F-Value, 0.9958
the state-of-the-art PAYL and McPAD. In addition, we have also showed that the
Finally, in terms of throughput, RePIDS can process more packets per second than
the throughput of a medium sized enterprise network with a gateway speed of 1GB.
operation.
154
CHAPTER 6
systems can only detect known attacks because they depend on the generation of
signatures.
In addition, with the popularity of Internet and the increase in number of attack
incidents on the Internet, web security is one of the key challenges in computer security
research. Moreover, web applications are generally large, complex and highly
customized. To protect each web applications, it requires signatures written explicitly for
the application. It is not possible to guarantee that the application is completely free of
also difficult. Hence, an application protected only by a signature based IDS cannot be
detection systems are believed to provide a practical solution for identifying known and
novel attacks on the networks, and for the protection of web applications. Anomaly-
based IDS creates a statistical model of the normal behavior from a set of training data.
155
If any network activity deviates too far from the pre-developed normal model, then the
weaknesses. In this chapter, we summarize our thesis and review the contributions, and
6.1 Summary
frameworks and developed models which address three critical issues that severely
affect large scale deployment of payload-based anomaly detection systems in high speed
Inefficiency in operation.
This thesis described a number of novel frameworks using network payload for
detector, and Real-time Payload-based Intrusion Detection System (RePIDS), which can
detect novel attacks and protect networks at the application level. We have generated a
common profile using optimal features for a group of similar types of attacks.
156
6.1.1 Geometrical Structure Anomaly Detection Detector
The proposed GSAD anomaly detector uses pattern recognition techniques to identify
patterns of packet payloads. GSAD models the payload of network traffic using
geometrical structures of the payload features and correlations between the payload
features. This approach can detect new attacks without a-priori knowledge of the attacks.
n-grams Text Categorization and Mahalanobis Distance Map (MDM) approaches are
used to develop payload profile. MDM technique determines the hidden correlations
between payload features, and includes payload structural information partially, which
helps to improve the detection rate and false positive rate. We have implemented the
GSAD model in the HTTP environment to detect web-based attacks coming through
HTTP service, at port 80, and evaluated it on two datasets, namely DARPA 99 and
GATECH datasets. GATECH dataset contains real traces of various attacks coming
through HTTP service. The MDMs compute the correlations between the payload
features. Deviation between the average MDM profile for training dataset and MDM
profile of a new incoming packet has been used to classify the incoming packet into
either an attack packet or a normal packet. In Chapter 3 of this thesis, MDM images
between normal and various attack payloads. In addition, they have demonstrated
differences in the correlations between the features of various attack payloads and the
157
6.1.2 Two-tier LDA-Based Detector
We have proposed a novel framework for a two-tier detector, which uses LDA technique
and difference distance map (DDM) to order the potential features for payload feature
selection and to distinguish normal and attack patterns in the network traffic. Linear
features. The two-tier detector uses the packet payload length criterion to group packets
and forward them either to Statistical Signature Based Detector on the first tier or Linear
Discriminate Method (LDM) Based Detector on the second tier detector for further
analysis. Then, the detector analyzes the received packet and makes final decision to
raise an alarm or not. The Receiver Operating Characteristic (ROC) curves for the
proposed two-tier system are shown in Figure 4.11. The proposed two-tier system has
showed 100% detection rate and 3.38% false positive rate. The LDA-based approach
reduces the computational complexity dramatically while retaining the high detection
rates and providing a novel lightweight solution for network payload-based attacks
detection.
use of resources, computational complexity and packet processing speed. RePIDS uses
construct important and suitable features and select dominant Principal Components by
means of cumulative energy, scree test and parallel analysis criteria on the outcome of
158
PCA. We have built a real-time payload-based intrusion detection system using suitable
features. As discussed in Chapter 5, RePIDS has two key components, which are 3-Tier
IFSEng and MDM. 3-Tier IFSEng addresses the issues, related to the quality of feature
set, and Mahalanobis Distance Map (MDM) extracts the hidden correlations between
features and the correlations among network packet payloads. Together, they have
facilitated effective and efficient detection of attack packets in the network traffic.
RePIDS has achieved high F-Value of 0.9958 on DARPA dataset and 0.976 on
GATECH dataset respectively. This demonstrates that RePIDS can differentiate normal
performs better in comparison with the state-of-the-art PAYL and McPAD, and the
payload is lower than PAYL and much less than McPAD. Furthermore, we have shown
that RePIDS can process more packets per second than the throughput of a medium
sized enterprise network with a gateway speed of 1GB. All these facts have given
substantial evidence that the proposed model, RePIDS, is capable of processing packets
in real-time operation.
group of similar types of attacks. Based on research presented in [122], we have used
similar concept and developed one common profile (signature) for the normal traffic that
could classify three different types of attacks, namely, Phf, Apache2 and Back, and
159
6.2 Thesis Contributions
• We have identified and addressed three critical issues that may have severely
networks.
in detecting new and variants of known attacks using DARPA dataset and
• We have built a two-tier novel detector using LDA technique and Difference
Distance Map (DDM) approach to order the potential features for payload feature
selection and distinguish normal and attack patterns in the network traffic.
of similar types of attacks that could efficiently classify these different types of
detect attacks against Web applications through the analysis of HTTP payloads
Distance Map (MDM). The proposed model uses selected features from a low
160
Furthermore, RePIDS is capable of discriminating normal patterns and attack
patterns in real-time.
medium size enterprise network with a gateway speed of 1GB under similar
need to run RePIDS in real-time and choose the proper network parameters
RePIDS in real-time, work can be extended and regress simulation can be done
this scheme, we need to develop more signatures for similar types of attacks and
group of similar types of attacks that could efficiently classify these different
compared, more tests on GATECH dataset and any other datasets may be
performed.
• We have evaluated our two tier model on DARPA 1999 dataset. To further test
the effectiveness and performance of this model, we need to test this model on
161
GATECH attack dataset. And also need to develop more signatures for similar
types of attacks.
detection of unencrypted (plain text) payload data only and does not look into
encrypted data. However, it can detect attacks coming through encrypted data
when used at the host machine using an appropriate encryption key. Hence, this
Control and Data Acquisition (SCADA) system [123] is one of them and the
environment. This is a new research area for use of intrusion detection system in
162
References
163
13. Gupta, K., et al. Attacking confidentiality: An agent based approach. in
Intelligence and Security Informatics. LNCS 2006: Springer Verlag, London
14. Nascimento, G.M.B.A., ANOMALY DETECTION OF WEB-BASED
ATTACKS.2010.
15. Kruegel, C. and G. Vigna. Anomaly detection of web-based attacks. in 10th
ACM conference on Computer and communications security 2003. New York,
NY, USA: ACM.
16. http://isc.sans.org/index.php?on=toptrends. SANS Institute - Internet Storm
Center web site. . 2011 [cited.
17. Anderson, J.P., Computer security threat monitoring and surveillance. Technical
Report, 1980. p. 56.
18. Kruegel, C., F. Valeur, and G. Vigna, Intrusion detection and correlation:
challenges and solutions. Vol. 14. 2005: Springer-Verlag New York Inc.
19. Cheswick, W.R., S.M. Bellovin, and A.D. Rubin, Firewalls and Internet
security: repelling the wily hacker. 2003: Addison-Wesley Longman Publishing
Co., Inc.
20. Mell, R., Intrusion detection systems. National Institute of Standards and
Technology (NIST), Special Publication, 2001. 51.
21. Schneier, B., Applied cryptography: protocols, algorithms, and source code in C.
2007: A1bazaar.
22. Debar, H., M. Dacier, and A. Wespi, Towards a taxonomy of intrusion-detection
systems. Computer Networks, 1999. 31(8): p. 805-822.
23. Debar, H., M. Dacier, and A. Wespi, A revised taxonomy for intrusion-detection
systems. Annals of Telecommunications, 2000. 55(7): p. 361-378.
24. Denning, D.E., An intrusion-detection model. Software Engineering, IEEE
Transactions on Software Engineering, 1987(2): p. 222-232.
25. Vigna, G. and C. Kruegel, Host-based intrusion detection. Handbook of
Information Security. John Wiley and Sons, 2005.
26. Mischel, M., Modsecurity 2.5. 2009: Packt Pub.
164
27. Ghorbani, A.A., W. Lu, and M. Tavallaee, Network intrusion detection and
prevention: concepts and techniques, ed. A.A.G.L. Tavallaee. Vol. 47. 2009:
Springer-Verlag New York Inc.
28. Vigna, G. and R.A. Kemmerer, NetSTAT: A network-based intrusion detection
system. Journal of Computer Security, 1999. 7: p. 37-72.
29. Roesch, M. Snort-lightweight Intrusion Detection for Networks. in 13th USENIX
Conference on System Administration, Seattle, Washington. 1999: Seattle,
Washington.
30. Ptacek, T.H., Insertion, evasion, and denial of service: Eluding network intrusion
detection. 1998, DTIC Document.
31. Tang, Y. and S. Chen, An automated signature-based approach against
polymorphic internet worms. IEEE Transactions on Parallel and Distributed
Systems, 2007. 18(7): p. 879-892.
32. Paxson, V. and M. Handley. Defending against network IDS evasion. 1999.
33. Chandola, V., A. Banerjee, and V. Kumar, Anomaly detection: A survey. ACM
Computing Surveys (CSUR), 2009. 41(3): p. 15.
34. Lazarevic, A., et al. A comparative study of anomaly detection schemes, in
network intrusion detection. in Third SIAM International Conference on Data
Mining. 2003: SIAM.
35. Tombini, E., et al. A serial combination of anomaly and misuse IDSes applied to
HTTP traffic. in IEEE 20th Annual Computer Security Applications Conference.
2004. Tucson, AZ, USA.
36. Gupta, K.K., B. Nath, and R. Kotagiri, Layered Approach using Conditional
Random Fields for Intrusion Detection. IEEE Transactions on Dependable and
Secure Computing. Vol. 7(1), p. 35 - 49, 2010.
37. Damashek, M., Gauging similarity with n-grams: Language-independent
categorization of text. Science. 1995. 267(5199): p. 843.
38. Forrest, S., et al. A sense of self for unix processes. in SP '96 IEEE Symposium on
Security and Privacy 1996: IEEE Computer Society Washington, DC, USA.
39. Hofmeyr, S.A., S. Forrest, and A. Somayaji, Intrusion detection using sequences
of system calls. Journal of computer security, 1998. 6(3): p. 151-180.
165
40. Liao, Y. and V.R. Vemuri. Using text categorization techniques for intrusion
detection. in Proceedings of the 11th USENIX Security. 2002: USENIX
Association.
41. Dokas, P., et al. Data mining for network intrusion detection. 2002.
42. Fawcett, T., An introduction to ROC analysis. Pattern recognition letters, 2006.
27(8): p. 861-874.
43. Mukherjee, B., L.T. Heberlein, and K.N. Levitt, Network intrusion detection.
Network, IEEE, 1994. 8(3): p. 26-41.
44. Estevez-Tapiador, J.M., P. Garcia-Teodoro, and J.E. Diaz-Verdejo, Anomaly
detection methods in wired networks: a survey and taxonomy. Computer
communications, 2004. 27(16): p. 1569-1584.
45. Patcha, A. and J.M. Park, An overview of anomaly detection techniques:
Existing solutions and latest technological trends. Computer Networks, 2007.
51(12): p. 3448-3470.
46. Garcia-Teodoro, P., et al., Anomaly-based Network Intrusion Detection:
Techniques, Systems and Challenges. Computers & Security, 2009. 28(1-2): p.
18-28.
47. Ye, N., B. Harish, and T. Farley, Attack profiles to derive data observations,
features, and characteristics of cyber attacks. Information-Knowledge-Systems
Management, 2005. 5(1): p. 23-47.
48. Smaha, S.E. Haystack: An intrusion detection system. in 4th Aerospace
Computer Security Applications Conference. 1988. Orlando, FL: IEEE.
49. Lunt, T.F. Real-time intrusion detection. 1989:IEEE
50. Anderson, D., T. Frivold, and A. Valdes, Next-generation intrusion detection
expert system (NIDES): A summary. 1995: SRI International, Computer Science
Laboratory.
51. Kruegel, C., et al., On the detection of anomalous system call arguments.
Computer Security–ESORICS 2003, p. 326-343.2003:IEEE.
52. Maxion, R.A. and F.E. Feather, A case study of ethernet anomalies in a
distributed computing environment. IEEE Transactions on Reliability, 1990.
39(4): p. 433-443.
166
53. Mahoney, M. and P. Chan, Detecting novel attacks by identifying anomalous
network packet headers. Florida Institute of Technology Technical Report CS-
2001-2, 2001.
54. Mahoney, M. and P.K. Chan, Learning models of network traffic for detecting
novel attacks. Florida Institute of Technology Technical Report CS-2002-08,
2002.
55. Mahoney, M.V. and P.K. Chan. Learning nonstationary models of normal
network traffic for detecting novel attacks. in eighth ACM SIGKDD international
conference on Knowledge discovery and data mining 2002: ACM.
56. Lee, W. and D. Xiang. Information-theoretic measures for anomaly detection.
2001.
57. Biles, S., Detecting the unknown with snort and statistical packet anomaly
detection engine (SPADE). Computer Security Online Ltd., Tech. Rep, 2003.
58. Maggi, F., M. Matteucci, and S. Zanero, Detecting Intrusions through System
Call Sequence and Argument Analysis. IEEE Transactions on Dependable and
Secure Computing, 2010: p. 381-395.
59. Wattenberg-Simmross, F., et al., Anomaly Detection in Network Traffic Based
on Statistical Inference and Alpha-Stable Modeling, IEEE Transactions on
Dependable and Secure Computing, July 2011: p. 494-509.
60. Heckerman, D., A tutorial on learning with bayesian networks. Innovations in
Bayesian Networks, 2008: p. 33-82.
61. Jolliffe, I.T. and MyiLibrary, Principal component analysis. Vol. 2. 2002: Wiley
Online Library.
62. Wang, W. and R. Battiti. Identifying intrusions in computer networks with
principal component analysis. in First International Conference on Availability,
Reliability and Security, ARES '06 2006: IEEE Computer Society Washington,
DC, USA.
63. Shyu, M.L., A novel anomaly detection scheme based on principal component
classifier. 2003, DTIC Document.
167
64. Ye, N., Y. Zhang, and C.M. Borror, Robustness of the Markov-chain model for
cyber-attack detection. IEEE Transactions on Reliability: 2004. 53(1): p. 116-
123.
65. Zheng, Z., Z. Lan, and Y. Li, Toward Automated Anomaly Identification in
Large-Scale Systems, IEEE Transactions on Parallel and Distributed Systems,
2010: p. 381-395.
66. Xiang, Y., K. Li, and W. Zhou, Low-Rate DDoS Attacks Detection and
Traceback by Using New Information Metrics. IEEE Transactions on
Information Forensics and Security, 2011. 6(2): p. 426-437.
67. Xiang, Y., W. Zhou, and M. Guo, Flexible Deterministic Packet Marking an IP
Traceback System to Find the Real Source of Attacks. IEEE Transactions on
Parallel and Distributed Systems, 2009. 20(4): p. 567-580.
68. Lee, W., S.J. Stolfo, and K.W. Mok. Mining in a data-flow environment:
experience in network intrusion detection. in Fifth International Conference on
Knowledge Discovery and Data Mining (KDD), ACM 1999. 1999: ACM.
69. Lee, W. and S.J. Stolfo, A framework for constructing features and models for
intrusion detection systems. ACM Transactions on Information and System
Security (TISSEC), 2000. 3(4): p. 227-261.
70. Lee, W., S.J. Stolfo, and K.W. Mok, Adaptive intrusion detection: A data mining
approach. Artificial Intelligence Review, 2000. 14(6): p. 533-567.
71. Barbará, D., et al., ADAM: a testbed for exploring the use of data mining in
intrusion detection. Journal of ACM SIGMOD Record: Special Issue, 2001.
30(4): p. 15-24.
72. Bridges, S.M. and R.B. Vaughn. Fuzzy data mining and genetic algorithms
applied to intrusion detection. in NISSC. 2000.
73. Xin, J., J.E. Dickerson, and J.A. Dickerson. Fuzzy feature extraction and
visualization for intrusion detection. 2003.
74. Kim, D.S., H.N. Nguyen, and J.S. Park. Genetic algorithm to improve SVM
based network intrusion detection system. 2005.
75. Portnoy, L., E. Eskin, and S. Stolfo. Intrusion detection with unlabeled data
using clustering. 2001: Citeseer.
168
76. Ramadas, M., S. Ostermann, and B. Tjaden. Detecting anomalous network traffic
with self-organizing maps. in 6th International Symposium on RAID. 2003.
Pitsburgh, PA, USA: Springer.
77. Ramadas, M., S. Ostermann, and B. Tjaden, eds. Detecting Anomalous Network
Traffic with Self-organizing Maps. RAID 2003, LNCS 2820, pp. 36–54, 2003.,
ed. E.J. G. Vigna, and C. Kruegel 2003. 36–54.
78. Fossi, M., et al., Symantec Internet Security Threat Report trends for 2010.
Volume XVI.
79. Kohonen, T., The self-organizing map. Proceedings of the IEEE, 1990. 78(9): p.
1464-1480.
80. Wang, K. and S.J. Stolfo. Anomalous payload-based network intrusion detection.
in RAID, Lecture Notes in Computer Science. 2004: Springer.
81. Bolzoni, D., et al., POSEIDON: A 2-Tier anomaly based intrusion detection
system, in Proceedings of the Fourth IEEE International Workshop on
Information Assurance. 2006.
82. Wang, K., J. Parekh, and S. Stolfo. Anagram: A content anomaly detector
resistant to mimicry attack. 2006: Springer.
83. Perdisci, R., et al., McPAD: A multiple classifier system for accurate payload-
based anomaly detection. Computer Networks, 2009. 53(6): p. 864-881.
84. Rieck, K. and P. Laskov, Language models for detection of unknown attacks in
network traffic. Journal in Computer Virology, 2007. 2(4): p. 243-256.
85. Bolzoni, D., B. Crispo, and S. Etalle. ATLANTIDES: An architecture for alert
verification in network intrusion detection systems. in LISA'07 Proceedings of
the 21st conference on Large Installation System Administration Conference
2007: Usenix Association.
86. Bolzoni, D. and S. Etalle, eds. Approaches in anomaly-based network intrusion
detection systems. Intrusion Detection Systems, ed. L. Springer Verlag. Vol. 38.
2008. 1-16.
87. Utsumi, A. and N. Tetsutani. Human detection using geometrical pixel value
structures. 2002: IEEE.
169
88. Theodoridis, S., et al., Introduction to pattern recognition: a matlab approach.
2009: Academic Pr.
89. Lamping, U. and E. Warnicke, Wireshark User's Guide. 2004, Recuperado el.
90. Mahoney, M. and P. Chan. An analysis of the 1999 DARPA/Lincoln Laboratory
evaluation data for network anomaly detection. in Proceedings of Recent
Advances in Intrusion Detection (RAID). 2003: Springer.
91. Kendall, K., A database of computer attacks for the evaluation of intrusion
detection systems. 1999, Massachusetts Institute of Technology.
92. Fielding, R., et al., Hypertext transfer protocol--HTTP/1.1. 1999, RFC 2616,
June.
93. Dhamankar, R., et al., The top cyber security risks. TippingPoint, Qualys, the
Internet Storm Center and the SANS Institute faculty, Tech. Rep, 2009.
94. Balthrop, J., et al., Technological networks and the spread of computer viruses.
Science, 2004. 304(5670): p. 527-529.
95. Di Lucca, G.A., et al. Identifying cross site scripting vulnerabilities in web
applications. 2004: IEEE.
96. Lippmann, R., et al., The 1999 DARPA off-line intrusion detection evaluation.
Computer Networks, 2000. 34(4): p. 579-595.
97. Ingham, K. and H. Inoue. Comparing anomaly detection techniques for http. in
RAID'07, The 10th international conference on Recent advances in intrusion
detection 2007: Springer.
98. Chen, Y., et al., eds. Survey and taxonomy of feature selection algorithms in
intrusion detection system. Vol. 4318. 2006, Springer. 153-167.
99. Chen, C.M., Y.L. Chen, and H.C. Lin, An efficient network intrusion detection.
Computer communications. 33(4): p. 477-484.
100. Singh, S. and S. Silakari, Generalized Discriminant Analysis algorithm for
feature reduction in Cyber Attack Detection System. Arxiv preprint
arXiv:0911.0787, 2009: p. 173-180.
101. Shih, H.C., et al. Detection of Network Attack and Intrusion Using PCA-ICA. in
3rd International Conference on Innovative Computing Information and Control,
2008. ICICIC '08. 2008. Dalian, Liaoning: IEEE.
170
102. Venkatachalam, V. and S. Selvan, Performance comparison of Intrusion
detection system classifiers using various feature reduction techniques.
International journal of simulation, 2008. 9(1): p. 30-39.
103. Mahoney, M.V. Network traffic anomaly detection based on packet bytes. in
Proceedings of the 2003 ACM symposium on Applied computing 2003. New
York, NY, USA.
104. McLachlan, G.J. and J. Wiley, Discriminant analysis and statistical pattern
recognition. 1992: Wiley Online Library.
105. Bishop, C.M. and SpringerLink, Pattern recognition and machine learning. 1st
ed. Information Science and Statistics. Vol. 4. 2006: springer New York.
106. Ringberg, H., et al., Sensitivity of PCA for traffic anomaly detection. ACM
SIGMETRICS Performance Evaluation Review, 2007. 35(1): p. 109-120.
107. Isabelle, G. and E. Andre, An introduction to variable and feature selection.
Journal of Machine Learning Research, 2003. 3(1): p. 1157-1182.
108. Chebrolu, S., A. Abraham, and J.P. Thomas, Feature deduction and ensemble
design of intrusion detection systems. Computers & Security, 2005. 24(4): p.
295-307.
109. Suebsing, A. and N. Hiransakolwong. Euclidean-based Feature Selection for
Network Intrusion Detection. in International Conference on Machine Learning
and Computing, 2009: IACSIT Press.
110. Summerville, D.H., N. Nwanze, and V.A. Skormin. Anomalous packet
identification for network intrusion detection. 2004: IEEE.
111. Yu, H. and J. Yang, A direct LDA algorithm for high-dimensional data-with
application to face recognition. Pattern Recognition, 2001. 34(10): p. 2067.
112. Sharma, S., et al. Feature extraction using non-linear transformation for robust
speech recognition on the Aurora database. 2000: IEEE.
113. Chung, S. and A. Mok. Allergy attack against automatic signature generation.
2006: Springer.
114. Düssel, P., et al., Cyber-critical infrastructure protection using real-time
payload-based anomaly detection. Lecture Notes in Computer Science,Critical
Information Infrastructures Security. Vol. 6027. 2010: Springer. p. 85-97.
171
115. Martinez, A.M. and A.C. Kak, PCA versus LDA. Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 2001. 23(2): p. 228-233.
116. Bouzida, Y. and S. Gombault. Intrusion detection using principal component
analysis. in Seventh multi-conference on Systemics, Cybernetics and
Informatics. 2003. Orlando, Florida, USA: Citeseer.
117. Jamdagni, A., et al. Intrusion Detection Using Geometrical Structure. in Fourth
International Conference on Frontier of Computer Science and Technology,
2009. FCST '09. 2009. China: IEEE Computer Society Washington, DC, USA
118. Cattell, R.B., The scree test for the number of factors. Multivariate behavioral
research, 1966. 1(2): p. 245-276.
119. Franklin, S.B., et al., Parallel analysis: a method for determining significant
principal components. Journal of Vegetation Science, 1995. 6(1): p. 99-106.
120. Cattell, R.B. and S. Vogelmann, A comprehensive trial of the scree and KG
criteria for determining the number of factors. Multivariate behavioral research,
1977. 12(3): p. 289-325.
121. Nelson, L.R., Some observations on the scree test, and on coefficient alpha. Thai
Journal of Educational Research and Measurement. 2005. 3(1): p. 1-17.
122. Chung, S. and A. Mok. Advanced Allergy Attacks: Does a Corpus Really Help?
2007: Springer.
123. Rrushi, D. and U. di Milano. SCADA Intrusion Prevention System. 2006.
124. Xiaoxiang, Z., Modbus Protocol and Programing. Electronic Engineer, 2005.
172