Analyzing Denial-Of-Service Attacks in KDD CUP 99 Data Set For Intrusion Detection System

Analyzing Denial-of-service Attacks in KDD CUP 99 Data Set for Intrusion
Detection System
Kyaw Thet Khaing, Thinn Thu Naing

University of Computer Studies, Yangon
Kyawthetkhaing.ucsy@gmail.com , thinnthu@gmail.com
Abstract consuming its resources so that it can no longer

provide its intended service or obstructing the
Recently cyber security has emerged as an communication media between the intended users
established discipline for computer systems and and the victim so that they can no longer
infrastructures with a focus on protection of communicate adequately.
valuable information stored on those systems from Denial-of-service attacks are considered
adversaries who want to obtain, corrupt, damage, violations of the IAB's Internet proper use policy,
destroy or prohibit access to it. One of the security- and also violate the acceptable use policies of
threat which is difficult to address using traditional virtually all Internet Service Providers (ISP). They
network security techniques is Denial of Service also commonly constitute violations of the laws of
(DoS) attacks. We have seen increasing numbers of individual nations[1].
denial of service (DoS) attacks against online We studied the details of the research done in
services and web applications either for extortion attacks on KDD Cup 99 intrusion detection data
reasons, or for impairing and even disabling the sets, which is widely used as one of the few publicly
competition. This paper implemented that how much available data sets for network-based anomaly
these (DoS) attacks stand on a top role above on the detection system such as intrusion detection system.
other attacks. We evaluate those results on four These data sets are public by at Massachusetts
attack categories as found in the KDD Cup 99 Institute of Technology (MIT) Lincoln Lab [3].
intrusion detection datasets, which is widely used The first important deficiency in the KDD
as one of the publicly available data sets for dataset is the huge number of network traffic
network-based anomaly detection system such as records. In "kddcup_data_corrected" dataset, it
Intrusion Detection System (IDS). consists 4898431 connections record and take
721MB in file size. This large amount of records in
1. Introduction data set will the evaluation results to be biased by
the methods which have better detection rates on
According to the Wikipedia, the denial-of-service these records.
attack is an attempt to make a computer resource In addition, to analyze the difficulty level of the
unavailable to its intended users. Although the record in KDD data set, we use "kddcup_10_
means to carry out, motives for, and targets of a DoS percent_corrected" dataset with only 494021 in
attack may vary, it generally consist the concerted record, which are only ten percent of the entire data
efforts of a person or people to prevent an internet set for dataset1. In the first dataset, about 78% of the
sites or service from functioning efficiently or at all, records are duplicated [2]. These redundant records
temporarily or indefinitely. Perpetrators of DoS will cause learning algorithms to be biased towards
attacks typically target sites or services hosted on the more frequent records, and thus prevent it from
high-profile web servers such as banks, credit card, learning unfrequented records which are usually
payment gateways, and even root name servers. more harmful to networks such as U2R attacks. So,
On common method of attack involves saturating we used "corrected" data set for dataset2 and
the target (victim) machine with external "kddtest+" dataset for dataset3 which is a new-
communications requests, such that it cannot version of KDD dataset [2, 4]. These dataset2 and
respond to legitimate traffic, or responds so slowly dataset3 contain 292300 and 125973 respectively.
as to be rendered effectively unavailable. In general The rest of the paper is organized as follows.
terms, DoS attacks are implemented by either Section II introduces the KDDCUP99 data set which
forcing the targeted computer(s) to reset, or widely used in anomaly detection. In Section III, we
1
about the Intrusion Detection System were KDD’99 features can be classified into three
described. Details analysis of DoS attacks with its groups:
several attack types was implemented in Section VI. (1) Basic features: this category encapsulates all
We concluded our paper and future extension in the attributes that can be extracted from a
Section V. TCP/IP connection. Most of these features
2. KDD CUP 99 Data Set Description leading to an implicit delay in detection.
(2) Traffic features: this category includes
Since 1999, KDD’99 [3] has been the most features that are computed with respect to a
widely used data set for the evaluation of anomaly window interval and is divided into two
detection methods. This data set is built based on the groups:
data captured in DARPA’98 IDS evaluation (a) “same host” features: examine only the
program [5]. DARPA’98 is about 4 gigabytes of connections in the past 2 seconds that
compressed raw (binary) tcpdump data of 7 weeks of have the same destination host as the
network traffic, which can be processed into about 5 current connection, and calculate statistics
million connection records, each with about 100 related to protocol behavior, service, etc.
bytes. The two weeks of test data have around 2 (b) “same service” features: examine only the
million connection records. KDD training dataset connections in the past 2 seconds that
consists of approximately 4,900,000 single have the same service as the current
connection vectors each of which contains 41 connection.
features and is labeled as either normal or an attack, The two aforementioned types of “traffic”
with exactly one specific attack type. The simulated features are called time-based. However, there
attacks fall in one of the following four categories: are several slow probing attacks that scan the
(1) Denial of Service Attack (DoS): is an attack hosts (or ports) using a much larger time
in which the attacker makes some computing interval than 2 seconds, for example, one in
or memory resource too busy or too full to every minute. As a result, these attacks do not
handle legitimate requests, or denies produce intrusion patterns with a time
legitimate users access to a machine. window of 2 seconds. To solve this problem,
(2) User to Root Attack (U2R): is a class of the “same host” and “same service” features
exploit in which the attacker starts out with are re-calculated but based on the connection
access to a normal user account on the system window of 100 connections rather than a time
(perhaps gained by sniffing passwords, a window of 2 seconds. These features are
dictionary attack, or social engineering) and called connection-based traffic features.
is able to exploit some vulnerability to gain (3) Content features: unlike most of the DoS and
root access to the system. Probing attacks, the R2L and U2R attacks
(3) Remote to Local Attack (R2L): occurs when don’t have any intrusion frequent sequential
an attacker who has the ability to send patterns. This is because the DoS and Probing
packets to a machine over a network but who attacks involve many connections to some
does not have an account on that machine host(s) in a very short period of time; however
exploits some vulnerability to gain local the R2L and U2R attacks are embedded in the
access as a user of that machine. data portions of the packets, and normally
(4) Probing Attack: is an attempt to gather involves only a single connection. To detect
information about a network of computers for these kinds of attacks, we need some features
the apparent purpose of circumventing its to be able to look for suspicious behavior in
security controls. the data portion, e.g., number of failed login
It is important to note that the test data is not attempts. These features are called content
from the same probability distribution as the training features.
data, and it includes specific attack types not in the
training data which make the task more realistic. 3. Overview of Intrusion Detection
Some intrusion experts believe that most novel System
attacks are variants of known attacks and the
signature of known attacks can be sufficient to catch 3.1 Intrusion
novel variants. The datasets contain a total number
of 24 training attack types, with an additional 14 Intrusions are actions that attempt to bypass
types in the test data only. The name and detail security mechanisms of computer systems. So, they
description of the training attack types are listed in are any set of actions that threatens the integrity,
[7].
2
availability, or confidentiality of a network resource. monitor events and alerts and control the sensors,
These properties have the following explanations: and a central Engine that records events logged by
Confidentiality – means that information is not the sensor in a databases and uses a system of rules
made available or disclosed to unauthorized to generate alerts from security events received.
individuals, entities or processes; These are several ways to categorize an IDS
Integrity – means that data has not been altered depending on the type and location of the sensors
or destroyed in an unauthorized manner. and the methodology used by the engine to generate
Availability – means that a system or a system alerts. In many simple IDS implementations all
resource that ensures that it is accessible and usable three components are combined in a single device or
upon demand by an authorized system user. [9] appliance [8].
4. Detail analysis on DoS
3.2 Intrusion Detection System
Denial-of-service (DoS) attacks cost business
An Intrusion detection system is used to detect millions of dollars each year because of system
several of malicious behaviors that can compromise downtime, lost revenue and productivity, tarnished
the security and trust of a computer system. This reputation, and the hours required by technical staff
includes network attacks against vulnerable services, to locate the problem and resolve it. Once customers
data driven attacks on applications, host based lost confidence in the security of the systems holding
attacks such as privilege escalation, unauthorized their confidential and financial information, they
logins and access to sensitive files, and malware will often take their business elsewhere.
(virus, Trojan, horses, and worms). Classification of the types of DoS attack is also
The technologies of intrusion detection system important because since the different types of DoS
are indispensable for network and computer security, attacks employ slightly different attack mechanism,
as the increasing the serious matters cause by cyber this means that the defense against them is also
threats. Intrusion Detection is the process of different.
detecting these cyber attacks in a system or network As the formerly express, KDD data sets
by monitoring. Intrusion Detection System (IDS) classified Denial-of-service DoS in 6 attack types
monitors network traffic for untrusting activity and such as smurf, teardrop, neptune, land, pod, and
warning the system or network administrator against back. Among them smurf and neptune have a large
malicious attacks. The goal of Intrusion Detection amount of record in three datasets.
System is to alert and protect the confidentiality, Smurf attack is a type of network-level by
integrity and availability of critical networked overwhelming the victim machine with Internet
information systems. Control Message Protocol (ICMP) echo replies from
There can be divided into two main approaches computers in the same broadcast network computers
named misuse and anomaly detection. Misuse in the same broadcast network by sending forged
detection is based on a description of known ICMP echo request to an IP broadcast address using
malicious activities. This description is often the IP address of the victim machine, making
modeled as a set of rules referred to as attack computers in the same reply to the requests, flooding
signatures. An anomaly detection IDS looks for the victim machine with ICMP echo replies.
anomalies, meaning it thinks outside of the ordinary. The important role of classification on DoS types
It uses rules or predefined concepts about "normal" are shown on table 1. We can simply seen that the
and "abnormal" system activity (called heuristics) to smurf and neptune attacks among other types are
distinguish anomalies from normal system behavior significantly take large amount for attacking us.
and to monitor report on, or block anomalies as they
occur [7]. Table 1. Matrix of comparison among
An intrusion detection system (IDS) is a software different types in DoS attack
and or hardware designed to detect unwanted Attack
Data Set 1 Data Set 2 Data Set 3
attempts at accessing, manipulating, and/or Types
disabling of computer systems, mainly through a Smurf 280790 164091 2646
network, such as the Internet. These attempts may back 2203 1098 956
take the form of attacks, as examples, by crackers, Land 21 9 18
malware and/or disgruntled employees. IDS cannot Pod 264 87 201
directly detect attacks within properly encrypted Teardrop 979 12 892
traffic. Neptune 107201 58001 41214
An IDS can be composed of several components.
Sensors which generate security events, a Console to The number of connection records that
represented for DoS attacks of data sets is shown in
3
Table 2. And Figure 1 and 2 had shown the 5. Conclusion
comparison the amount of attacking in data sets
among DoS attacks and other attacks according to We have seen increasing numbers of denial of
the number of connection records in datasets. service (DoS) attacks against online services and
web applications either for extortion reasons, or for
Table 1. Number of network connection impairing and even disabling the competition. After
in three data sets analyzing above three KDD CUP 99 data sets, the
no. of results show that DOS attacks is the most highly
Dataset 1 Dataset 2 Dataset 3
Connection possible attacks and the effect of DOS attacks is
Total 494021 292300 125973
hazard in every system. So this attack should be
DoS 391458 223258 45927
viewed as a risk management issue that can be
U2R 52 39 52
effectively dealt with like other business. This means
R2L 1126 5993 995
minimizing exposure where possible and being
Probe 4107 2377 11656
prepared should an attack eventuate.
normal 97278 60593 67343
6. References
[1] "Denial-of-service attack", 2009. Available on:
http://en.wikipedia.org/ wiki/denial-of-service_attack
[2] M. Tavallace, E. Bagheri, W. Lu, and A. Ghorbani, "A

Detailed Analysis of the KDD CUP 99 Data Set",
Submitted to second IEEE Symposium on Computational
Intelligence for Security and Defense Applications
(CSIDA), 2009.
Figure 1. Comparison of attack rate on [3] KDD Cup 1999. Available on: http://kdd.ics.uci.edu/
three datasets databases/kddcup99/ kddcup99.html, December 2009.
We can clearly seen that the number of [4] "NSL-kdd data set for network-based intrusion
connection of DoS attacks in all datasets detection system." Available on: http://nsl.cs.unb.ca/NSL-
significantly overwholm the network traffic even KDD/ , December 2009.
normal state connection can not take 25% of the
traffic in first two datasets. In this state, the [5] KDD Cup DARPA 1998. Available on:
http://kdd.ics.uci.edu/databases/kddcup98/
redundant records play a role on dataset3. The DoS
kddcup98.html, December 2009.
attacks fall down under normal traffic can seen
definitely. However, according to the Figure 2, [6] MIT Lincon Labs. 1998 DARPA Intrusion Detection
network traffic porbability have to suffer in DoS Evaluation. Available on: http://www.ll.mit.edu/mission/
attacks 86.81% on overall network traffic. communications/ist/corpra/idvel/index.html.
December,2009.
[7] M. Bahrololum, E. Salahi and M. Khaleghi, "Anomaly

Intrusion Detection Design using Hybrid of Unsupervised
and Supervised Neural Network", International Journal
of Computer Network & Communications(IJCNC), Vol.1,
No.2, July 2009.
[8] "Intrusion detection system", 2009

http://en.wikipedia.org/ wiki/intrusion_detection_sytem
[9] V. Marinova-Boncheva, "A Short Survey of Intrusion

Figure 2. Overall percentages of attacks hold
Detection System", 2007.
on three data sets

Analyzing Denial-Of-Service Attacks in KDD CUP 99 Data Set For Intrusion Detection System

Uploaded by

Copyright:

Available Formats

Analyzing Denial-Of-Service Attacks in KDD CUP 99 Data Set For Intrusion Detection System

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analyzing Denial-Of-Service Attacks in KDD CUP 99 Data Set For Intrusion Detection System

Uploaded by

Copyright:

Available Formats

Analyzing Denial-of-service Attacks in KDD CUP 99 Data Set for Intrusion

Kyaw Thet Khaing, Thinn Thu Naing

Abstract consuming its resources so that it can no longer

[2] M. Tavallace, E. Bagheri, W. Lu, and A. Ghorbani, "A

[7] M. Bahrololum, E. Salahi and M. Khaleghi, "Anomaly

[8] "Intrusion detection system", 2009

[9] V. Marinova-Boncheva, "A Short Survey of Intrusion

You might also like