Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

UNSW-NB15: A Comprehensive Data Set For Network Intrusion Detection Systems (UNSW-NB15 Network Data Set)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/287330529

UNSW-NB15: a comprehensive data set for network intrusion detection


systems (UNSW-NB15 network data set)

Conference Paper · November 2015


DOI: 10.1109/MilCIS.2015.7348942

CITATIONS READS

337 12,762

2 authors:

Nour Moustafa Jill Slay


UNSW Canberra La Trobe University
56 PUBLICATIONS   996 CITATIONS    148 PUBLICATIONS   1,907 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Detecing malicious activity of HTTP and DNS protocols using a proposed ensemble leaning framework and statistical features View project

Big Data Analytics for Intrusion Detection System: Statistical Decision-making using Finite Dirichlet Mixture Model View project

All content following this page was uploaded by Nour Moustafa on 24 December 2015.

The user has requested enhancement of the downloaded file.


UNSW-NB15: A Comprehensive Data set for Network
Intrusion Detection systems
(UNSW-NB15 Network Data Set)

Nour Moustafa, IEEE student Member, Jill Slay


School of Engineering and Information Technology
University of New South Wales at the Australian Defence Force Academy
Canberra, Australia
E-mail: nour.abdelhameed@student, j.slay@{.adfa.edu.au}

Abstract— One of the major research challenges in this field is The effectiveness of NIDS is evaluated based on their
the unavailability of a comprehensive network based data set performance to identify attacks which requires a
which can reflect modern network traffic scenarios, vast varieties comprehensive data set that contains normal and abnormal
of low footprint intrusions and depth structured information behaviors [6]. Older benchmark data sets are KDDCUP 99 [7]
about the network traffic. Evaluating network intrusion detection
and NSLKDD [8] which have been widely adopted for
systems research efforts, KDD98, KDDCUP99 and NSLKDD
benchmark data sets were generated a decade ago. However, evaluating NIDS performance. It is perceived through several
numerous current studies showed that for the current network studies [6][9][10][11], evaluating a NIDS using these data sets
threat environment, these data sets do not inclusively reflect does not reflect realistic output performance due to several
network traffic and modern low footprint attacks. Countering reasons. First reason is the KDDCUP 99 data set contains a
the unavailability of network benchmark data set challenges, this tremendous number of redundant records in the training set.
paper examines a UNSW-NB15 data set creation. This data set The redundant records affect the results of detection biases
has a hybrid of the real modern normal and the contemporary toward the frequent records [10]. Second, there are also
synthesized attack activities of the network traffic. Existing and multiple missing records that are a factor in changing the
novel methods are utilised to generate the features of the UNSW-
nature of the data [9]. Third, The NSLKDD data set is the
NB15 data set. This data set is available for research purposes
and can be accessed from the link1. improved version of the KDDCUP 99, it tackles the several
issues such as data unbalancing among the normal/abnormal
Keywords- UNSW-NB15 data set; NIDS; low footprint attacks; records and the missing values [12]. However, this data set is
pcap files; testbed not a comprehensive representation of a modern low foot print
attack environment.
I. INTRODUCTION The above reasons have instigated a serious challenge for
Currently, due to the massive growth in computer networks the cyber security research group at the Australian Centre for
and applications, many challenges arise for cyber security Cyber Security (ACCS)2 and other researchers of this domain
research. Intrusions /attacks can be defined as a set of events around the globe. Countering this challenge, this paper
which are able to compromise the principles of computer provides an effort in creating a UNSW-NB15 data set to
systems, e.g. availability, authority, confidentiality and evaluate NIDSs. The IXIA PerfectStorm tool3 is utilised in the
integrity [1]. Firewall systems cannot detect modern attack Cyber Range Lab of the ACCS to create a hybrid of the
environments and are not able to analyse network packets in modern normal and abnormal network traffic. The abnormal
depth. Because of these reasons, IDSs are designed to achieve traffic through the IXIA tool simulates nine families of attacks
high protection for the cyber security infrastructure [2]. that are listed in Table VIII. The IXIA tool contains all
information about new attacks that are updated continuously
A Network Intrusion Detection System (NIDS) monitors from a CVE site4. This site is a dictionary of publicly known
network traffic flow to identify attacks. NIDSs are classified information security vulnerabilities and exposures. Capturing
into misuse/signature and anomaly based [4]. The signature network traffic in the form of packets, the tcpdump 5 tool is
based matches the existing of known attacks to detect used. The simulation period was 16 hours on Jan 22, 2015 and
intrusions. However, in the anomaly based, a normal profile is 15 hours on Feb 17, 2015 for capturing 100 GBs. Further,
created from the normal behavior of the network, and any each pcap file is divided into 1000 MB using the tcpdump
deviation from this is considered as attack [3] [4]. Further, the tool. Creating reliable features from the pcap files, Argus6 and
signature based NIDSs cannot detect unknown attacks, and for
these anomaly NIDS are recommended in many studies [4] 2
http://www.accs.unsw.adfa.edu.au/
[5]. 3
http://www.ixiacom.com/products/perfectstorm
4
https://cve.mitre.org/
5
1
http://www.tcpdump.org/
6
http://www.cybersecurity.unsw.adfa.edu.au/ADFA%20NB15%20Datasets/. http://qosient.com/argus/index.shtml
Bro-IDS 7 tools are utilised. Additionally, twelve algorithms divided into three groups of intrinsic features, content features
are developed using a C# language to analyse in-depth the and traffic features. Further, attack records in this data set are
flows of the connection packets. The data set is labelled from categorised into four vectors (e.g., DoS, Probe, U2R, and
a ground truth table that contains all simulated attack types. R2L). The training set of KDDCUP99 included 22 attack
This table is designed from an IXIA report that is generated types and test data contained 15 attack types [13] [7].
during the simulation period. The key characteristics of the
UNSW-NB15 data set are a hybrid of the real modern normal A number of IDS researchers as have utilised these
behaviors and the synthetical attack activities. datasets due to their public availability. However, many
researchers have reported majorly three important
The rest of the paper is organised as follows: section 2 disadvantages of these datasets [6] [9] [10] [11] [12] which
examines the general goal and orientation of any IDS data set. can affect the transparency of the IDS evaluation. First, every
Section 3 exposes in-detail the existing benchmark datasets attack data packets have a time to live value (TTL) of 126 or
shortcomings. The synthetic environment configuration and 253, whereas the packets of the traffic mostly have a TTL of
generation of UNSW-NB15 details are given in section 4. 127 or 254. However, TTL values 126 and 253 do not occur in
Section 5 is a comparative analysis between the KDDCUP99 the training records of the attack [9]. Second, the probability
and the UNSW-NB15 data set. Section 6 displays the final distribution of the testing set is different from the probability
shape about the files of the UNSW-NB15 data set. Finally, distribution of the training set, because of adding new attack
section 7 concludes the work and future intentions. records in the testing set [10][12]. This leads to skew or bias
classification methods to be toward some records rather than
II. THE GOAL AND ORIENTATION OF A NIDS DATA SET the balancing between the types of attack and normal
A NIDS data set can be conceptualized as relational data observations. Third, the data set is not a comprehensive
[6]. Input to a NIDS is a set of data records. Each record representation of recently reported low foot print attack
consists of attributes of different data types (e.g., binary, float, projections [11].
nominal and integer) [6]. The label assigns each record of the
B. NSLKDD Data Set
data, either normal is 0 or abnormal is 1. Labelling is done by
matching processed record, according to the particular NIDS According to [12] considering the three goals, an
scenario with the ground truth table of all transaction records. upgraded version of the KDD data set was created and it is
referred to as NSLKDD. The first goal was, removing the
III. CRITICISMS OF EXITING DATA SETS duplication of the record in the training and test sets of the
A quality of the NIDS data set reflects two important KDDCUP99 data set for the purpose of eliminating classifiers
characteristics are a comprehensive reflection of contemporary biased to more repeated records. Secondly, selecting a variety
threat and inclusive normal range of traffic. The quality of the of the records from different parts of the original KDD data set
data set ultimately affects the reliable outcome of any NIDS is to achieve reliable results from classifier systems. Third,
[6] [9]. In this section the disadvantages of existing data sets eliminating the unbalancing problem among the number of
for NIDS are explored in the perspective of data set quality. records in the training and testing phase is to decrease the
The most widely adopted data sets for NIDS are KDDCUP99, False Alarm Rates (FARs). The major disadvantage of
and its improved version NSL-KDD. NSLKDD is that, it does not represent the modern low foot
print attack scenarios [9] [12].
A. KDDCup99 Data Set
Generating DARPA98 [13], (IST) group of Lincoln
laboratories at MIT University performed a simulation with IV. UNSW-NB15 DATA SET
normal and abnormal traffic in a military network (U.S. Air In this section, the synthetic environment configuration
Force LAN) environment. The simulation ended with nine and generation of UNSW-NB15 details are presented. The
weeks of raw tcpdump files. The training data size was about section includes mainly the testbed configuration details and
four GBs and consisted of compressed binary tcpdump files the whole processes which involved in generating UNSW-
from seven weeks of network traffic. This was processed into NB15 from the configured testbed.
approximately five million connection records. The simulation
A. An IXIA tool Testbed Configuration
provided two weeks of test data which contained two million
connection records [7] [13]. According to Fig. 1, the IXIA traffic generator is
Upgrading DARAP98 network data features configured with the three virtual servers. The servers 1 and 3
comprehensiveness, utilising the same environment (U.S. Air are configured for normal spread of the traffic while server 2
formed the abnormal/malicious activities in the network traffic.
Force LAN), the simulation ended with 41 features for each
Establishing the intercommunication between the servers,
connection along with the class label using Bro-IDS tool. The
acquiring public and private network traffic, there are two
upgraded version of DARAP98 is referred to as KDDCUP99. virtual interfaces having IP addresses, 10.40.85.30 and
In the KDDCUP99 data set, the whole extracted features were 10.40.184.30. The servers are connected to hosts via two
7
routers. The router 1 has 10.40.85.1 and 10.40.182.1 IP
https://www.bro.org/index.html addresses, whereas router 2 is configured with 10.40.184.1 and
10.40.183.1 IP addresses. These routers are connected to the the number of Kbytes that is sniffed during each simulation
firewall device that is configured to pass all the traffic either period.
normal or abnormal. The tcpdump tool is installed on the router
1 to capture the Pcap files of the simulation uptime. Moreover,
the central intent of this whole testbed was to capture the
normal or abnormal traffic, which was originated from the
IXIA tool and dispersed among network nodes (e.g., servers
and clients). Importantly, the IXIA tool is utilised as an attack
traffic generator along with as normal traffic, the attack
behaviour is nourished from the CVE site for the purpose of a
real representation of a modern threat environment.
(A)

(B)
Figure 1. The Testbed Visualization for UNSW-NB15 Figure 2. The Concurrent Transactions of Flows during the Simulation
Periods.
Due to the speed of network traffic and the way of exploiting
by modern attacks, the IXIA tool is configured to generate one
C. Architectural Framework
attack per second during the first simulation to capture the first
50 GBs. On the other hand, the second simulation is The whole architecture which is involved in generating the
configured to make ten attacks per second to extract another final shape of the UNSW-NB15 from pcap files to CSV files
50 GBs. with 49 features (attributes in any CSV file) is presented in
Fig. 3. All the 49 features of the UNSW-NB15 data set are
B. Traffic Analysis elaborated from Tables II-VII along with the generation
The traffic analysis is described for the cumulative flows sequence explanation for understanding convenience.
during the period of the simulation while generating the
UNSW-NB15 data set. In Table I, the data set statistics are
provided which represents the simulation period, the flows
numbers, the total of source bytes, the destination bytes, the
number of source packets, the number of destination packets,
protocol types, the number of normal and abnormal records
and the number of unique source/destination IP addresses.

TABLE I. DATA SET STATISTICS


When the simulation was running on the testbed presented i
Statistical features 16 hours 15 hours
Figure 3. Framework Architecture for Generating UNSW-NB15 data set
No._of_flows 987,627 976,882
Src_bytes 4,860,168,866 5,940,523,728 When the simulation was running on the testbed presented
Des_bytes 44,743,560,943 44,303,195,509
in Fig. 1, the pcap files are generated by using the tcpdump
Src_Pkts 41,168,425 41,129,810
Dst_pkts 53,402,915 52,585,462 tool. The features of the UNSW-NB15 data set are extracted
TCP 771,488 720,665 by using Argus, Bro-IDS tools and twelve algorithms are
Protocol UDP 301,528 688,616 developed using c# programming language as shown in Fig. 3.
types ICMP 150 374 Moreover, these features are matched according to the equal
Others 150 374 flow features as listed in Table II. These tools are installed and
Label Normal 1,064,987 1,153,774
Attack 22,215 299,068
are configured on Linux Ubuntu 14.0.4. The detailed
Unique Src_ip 40 41 formatting description of the UNSW-NB15 data set is
Dst_ip 44 45 elaborated in the following sections.
D. The extracted features from the Argus and Bro-IDS Tools
In Fig. 2, the concurrent transactions with respect the time
which are presented during the 16 hours of the simulation on Argus tool processes raw network packets (e.g., pcap files)
Jan 22, 2015 and the 15 hours of Feb 17, 2015. The x-axis and generates attributes/features of the network flow packets.
shows the time of each 10 seconds and the y-axis represents The Argus tool consists of an Argus-server and Argus-clients.
The Argus-server writes pcap files of receiving packets in
Argus files in the binary format. The Argus clients extract the Importantly, the features from 1-35 represent the
features from the Argus files. integrated gathered information from data packets. The
majority of features are generated from header packets as
Bro-IDS tool is an open-source network traffic analyser. It reflected in Tables II-V. It is acknowledged that the UNSW-
is predominantly a security monitor that inspects all network NB15 data set creates additional flow based features as
traffic against malicious activities. The Bro-IDS tool is described in the following section.
configured to generate three log files from the pcap files. First,
the conn file records all connection information seen on the TABLE IV. CONTENT FEATURES
pcap files. Second, the http file includes all HTTP requests # Name T Description
and replies. Third, the ftp file records all activities of a FTP 19 swin I Source TCP window advertisement
service. 20 dwin I Destination TCP window advertisement
21 stcpb I Source TCP sequence number
22 dtcpb I Destination TCP sequence number
Finally, the output files of the two different tools, Argus and 23 smeansz I Mean of the flow packet size transmitted by the
Bro-IDS are stored in the SQL Server 20088 database to match src
the Argus and Bro-IDS generated features by using the flow 24 dmeansz I Mean of the flow packet size transmitted by the
features as reflected in Table II. dst
25 trans_depth I the depth into the connection of http
TABLE II. FLOW FEATURES request/response transaction
26 res_bdy_len I The content size of the data transferred from the
# Name T. Description server’s http service.
1 srcip N Source IP address
2 sport I Source port number TABLE V. TIME FEATURES
3 dstip N Destination IP address # Name T Description
4 dsport I Destination port number 27 sjit F Source jitter (mSec)
5 proto N Transaction protocol 28 djit F Destination jitter (mSec)
29 stime T record start time
30 ltime T record last time
E. The matched features of the Argus and Bro-IDS Tools 31 sintpkt F Source inter-packet arrival time (mSec)
32 dintpkt F Destination inter-packet arrival time (mSec)
These features include a variety of packet-based features
33 tcprtt F The sum of ’synack’ and ’ackdat’ of the TCP.
and flow-based features. The packet based features assist the 34 synack F The time between the SYN and the SYN_ACK
examination of the payload beside the headers of the packets. packets of the TCP.
On the contrary, for the flow based features and maintaining 35 ackdat F The time between the SYN_ACK and the ACK
low computational analysis instead of observing all the packets of the TCP.
packets going through a network link, only connected packets
of the network traffic are considered. Moreover, the flow- F. The additional features from the matched features
based features are based on a direction, an inter-arrival time
The generation details of the twelve additional features of
and an inter-packet length [6] (mentioned in Tables III and IV,
the UNSW-NB15 data set (e.g., Table VI) from the matched
as well as they are executed in the connection features of
features (e.g., Tables II-IV) are provided. Table VI is divided
Table VI).The matched features are categorised into three
into two parts according to the nature and purpose of the
groups: Basic, Content, and Time which were described in
additional generated features. The features from 36-40, are
Tables III, IV and V, respectively.
considered as general purpose features whereas from 41-47,
TABLE III. BASIC FEATURES are labelled as connection features. In the general purpose
features, each feature has its own purpose, according to the
# Name T Description
defence point of view, whereas connection features are solely
6 state N The state and its dependent protocol, e.g.
ACC, CLO, else (-) created to provide defence during attempt to connection
7 dur F Record total duration scenarios. The attackers might scan hosts in a capricious way.
8 sbytes I Source to destination bytes For example, once per minute or one scan per hour [12]. In
9 dbytes I Destination to source bytes order to identify these attackers, the features 36-47 of Table
10 sttl I Source to destination time to live VI are intended to sort accordingly with the last time feature to
11 dttl I Destination to source time to live
12 sloss I Source packets retransmitted or dropped
capture similar characteristics of the connection records for
13 dloss I Destination packets retransmitted or dropped each 100 connections sequentially ordered.
14 service N http, ftp, ssh, dns ..,else (-)
15 sload F Source bits per second TABLE VI. ADDITIONAL GENERATED FEATURES
16 dload F Destination bits per second # Name T Description
17 spkts I Source to destination packet count
General purpose features
18 dpkts I Destination to source packet count
36 is_sm_ips_ports B If source (1) equals to destination (3)IP
addresses and port numbers (2)(4) are
8 equal, this variable takes value 1 else 0
http://www.microsoft.com/en-au/download/details.aspx?id=26113
37 ct_state_ttl I No. for each state (6) according to
specific range of values for TABLE VIII. DATA SET RECORD DISTRIBUTION
source/destination time to live (10) (11). Type No. Description
38 ct_flw_http_mthd I No. of flows that has methods such as Get Records
and Post in http service.
Normal 2,218,761 Natural transaction data.
39 is_ftp_login B If the ftp session is accessed by user and
Fuzzers 24,246 Attempting to cause a program or n-
password then 1 else 0.
etwork suspended by feeding it the
40 ct_ftp_cmd I No of flows that has a command in ftp
randomly generated data.
session.
Analysis 2,677 It contains different attacks of port
Connection features
scan, spam and html files penetrations.
41 ct_srv_src I No. of connections that contain the same
service (14) and source address (1) in 100 Backdoors 2,329 A technique in which a system security
mechanism is bypassed stealthily to
connections according to the last time
access a computer or its data.
(26).
DoS 16,353 A malicious attempt to make a server
42 ct_srv_dst I No. of connections that contain the same
or a network resource unavailable to
service (14) and destination address (3) in
users, usually by temporarily
100 connections according to the last time
interrupting or suspending the services
(26).
of a host connected to the Internet.
43 ct_dst_ltm I No. of connections of the same
Exploits 44,525 The attacker knows of a security
destination address (3) in 100 connections
problem within an operating system or
according to the last time (26).
a piece of software and leverages that
44 ct_src_ ltm I No. of connections of the same source
knowledge by exploiting the
address (1) in 100 connections according
vulnerability.
to the last time (26).
Generic 215,481 A technique works against all block-
45 ct_src_dport_ltm I No of connections of the same source
ciphers (with a given block and key
address (1) and the destination port (4) in
size), without consideration about the
100 connections according to the last time
structure of the block-cipher.
(26).
Reconnaissa- 13,987 Contains all Strikes that can simulate
46 ct_dst_sport_ltm I No of connections of the same destination nce attacks that gather information.
address (3) and the source port (2) in 100
Shellcode 1,511 A small piece of code used as
connections according to the last time
the payload in the exploitation of
(26).
software vulnerability.
47 ct_dst_src_ltm I No of connections of the same source (1)
Worms 174 Attacker replicates itself in order to
and the destination (3) address in in 100
spread to other computers. Often, it
connections according to the last time
uses a computer network to spread
(26).
itself, relying on security failures on the
target computer to access it.
G. The labelled features
To label this data set, the IXIA tool has generated report VI. COMPARISON OF THE KDDCUP99 AND UNSW-NB15 DATA
about the attack data. This report is configured in the shape of SET
the ground truth table to match all transaction records. This
Table IX shows a comparative analysis among the
table consists of eleven attributes, e.g. (start time, last time,
KDDCUP99 and UNSW-NB15 data sets. The table consists of
attack category, attack subcategory, protocol, source address,
eight parameters are the number of networks, number of
source port, destination address, destination port, attack name
unique ip address, type of data generation, duration of the data
and attack reference). This data set is labelled as listed in
generation and its output format, attack vectors and the tools
Table VII, attack categories (i.e., attack_cat) and label for
that are used to extract the features and the number of features
each record either 0 if the record is normal and 1 if the record
for each data set. It can be observed that UNSW-NB15 data
is attack.
TABLE VII. LABELLED FEATURES set has different attack families which ultimately reflect
modern low foot print attacks.
# Name T Description TABLE IX. COMPARISON OF KDD CUP 99 AND UNSW-NB15
48 attack_cat N The name of each attack category. In this
data set, nine categories (e.g., Fuzzers, # Parameters KDDCUP99 [7] UNSW-NB15
Analysis, Backdoors, DoS, Exploits, 1 No. of networks 2 3
Generic, Reconnaissance, Shellcode and No. of distinct ip 11 45
Worms) 2 address
49 Label B 0 for normal and 1 for attack records 3 Simulation Yes Yes
Type (T.) N: nominal, I: integer, F: float, T: timestamp and B: binary 4 The duration of data 5 weeks 16 hours
collected 15 hours
5 Format of data 3 types (tcpdump, Pcap files
V. DATA SET RECORDS DISTRIBUTION collected BSM and dump
files)
Table VIII represents the distribution of all records of the 6 Attack families 4 9
UNSW-NB15 data set. The major categories of the records are 7 Feature Extraction Bro-IDS tool Argus, Bro-IDS
normal and attack. The attack records are further classified tools and new tools.
8 No. of features 42 49
into nine families according to the nature of the attacks. extraction
[4] C.Dartigue, H.Jang and W.Zeng, ”A new data-mining based approach
for network intrusion detection”, Communication Networks and
VII. FINAL SHAPE OF THE UNSW-NB15 DATA SET FILES Services Research Conference. CNSR'09. Seventh Annual. IEEE, 2009,
p 372-377.
In this section, the description of the final shape of the [5] J.Zhang, and Z.Mohammad, “Anomaly based network intrusion
UNSW-NB15 is provided. The purpose of this section is to detection with unsupervised outlier detection”, Communications, 2006.
guide the researchers on how to use and manipulate final CSV ICC'06. IEEE International Conference on. Vol. 5. IEEE.
files of the UNSW-NB15 data set. Four CSV files of the data [6] P.Gogoi et al, ”Packet and flow based network intrusion
dataset."Contemporary Computing”. Springer Berlin Heidelberg, 2012.
records are provided and each CSV file contains attack and P 322-334.
normal records. The names of the CSV files are UNSW- [7] KDDCup1999.Available-on:
NB15_1.csv, UNSW-NB15_2.csv, UNSW-NB15_3.csv and http://kdd.ics.uci.edu/databases/kddcup99/KDDCUP99.html, 2007.
UNSW-NB15_4.csv. [8] NSLKDD. Available on: http://nsl.cs.unb.ca/NSLKDD/, 2009.
[9] McHugh, John, ”Testing intrusion detection systems: a critique of the
In each CSV file, all the records are ordered according the 1998 and 1999 DARPA intrusion detection system evaluations as
performed by Lincoln Laboratory”. ACM transactions on Information
last time attribute. Further, the first three CSV files each file and system Security, 3, 2000, p 262-294.
contains 700000 records and the fourth file contains 440044 [10] V.Mahoney, and K.Philip, “An analysis of the 1999 DARPA/Lincoln
records. The ground truth table is named UNSW- Laboratory evaluation data for network anomaly detection."Recent
NB15_GT.csv. The list of event file is labelled UNSW- Advances in Intrusion Detection”. Springer Berlin Heidelberg, 2003.
NB15_LIST_EVENTS which contains attack category and [11] A.Vasudevan, E. Harshini, and S. Selvakumar, "SSENet-2011: a
subcategory. The interested reader can obtain the raw pcap network intrusion detection system dataset and its comparison with
KDD CUP 99 dataset”, Internet (AH-ICI), 2011, Second Asian
files by e-mailing the authors. Himalayas International Conference on. IEEE.
[12] M.Tavallaee, E.Bagheri, W.Lu, and A.Ghorbani, “A detailed analysis of
VIII. CONCLUSION AND FUTURE WORK the KDD CUP 99 data set”. Proceedings of the Second IEEE
In this paper, the existing benchmark datasets are not Symposium on Computational Intelligence for Security and Defence
Applications, 2009.
representing the comprehensive representation of the modern
[13] DARPA98.Available
orientation of network traffic and attack scenarios. UNSW- on:http://www.ll.mit.edu/mission/communications/cyber/CSTcorpora/id
NB15 is created by establishing the synthetic environment at eval/data/, 1998.
the UNSW cyber security lab. The key utilised IXIA tool, has
provided the capability to generate a modern representative of
the real modern normal and the synthetical abnormal network
traffic in the synthetic environment. UNSW-NB15 represents
nine major families of attacks by utilising the IXIA
PerfectStorm tool. There are 49 features that have been
developed using Argus, Bro-IDS tools and twelve algorithms
which cover characteristics of network packets. In contrast the
existing benchmark data sets such as KDD98, KDDCUP99
and NSLKDD, realised a limited number of attacks and
information of packets which are outdated. Moreover, the
UNSW-NB15 is compared with KDDCUP99 data set by
considering some key features and it shows the benefits. In
future, it is expected that, the UNSW-NB15 data set can be
helpful to the NIDS research community and considered as a
modern NIDS benchmark data set.

ACKNOWLEDGMENT
This work is supported by cyber range lab of the
Australian Centre for Cyber Security (ACCS) at UNSW in
Canberra. The authors are grateful for the manager of the
Cyber range lab.

REFERENCES
[1] R.Heady, G.Luger, A.Maccabe, M.Servilla. “The architecture of a
network level intrusion detection system”. Tech. rep., Computer Science
Department, University of New Mexico, New Mexico ,1990.
[2] M.Aydın, M. Ali, A. Halim Zaim, and K. Gökhan Ceylan. “A hybrid
intrusion detection system design for computer network
security”, Computers & Electrical Engineering, 2009, p 517-526.
[3] Axelsson, Stefan. “Intrusion detection systems: A survey and
taxonomy”, Technical report, 2000, Vol. 99.

View publication stats

You might also like