Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Approach Using Genetic Algorithm for Intrusion Detection System 3

This paper proposes a genetic algorithm (GA) approach for an intrusion detection system (IDS) that generates rules from network audit data to classify attacks. The system utilizes the DARPA benchmark dataset to optimize the detection rate and minimize search time through a two-module structure: one for rule generation and another for real-time intrusion detection. Experimental results indicate a high match percentage for various attack types, demonstrating the effectiveness of the GA in enhancing network security.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Approach Using Genetic Algorithm for Intrusion Detection System 3

This paper proposes a genetic algorithm (GA) approach for an intrusion detection system (IDS) that generates rules from network audit data to classify attacks. The system utilizes the DARPA benchmark dataset to optimize the detection rate and minimize search time through a two-module structure: one for rule generation and another for real-time intrusion detection. Experimental results indicate a high match percentage for various attack types, demonstrating the effectiveness of the GA in enhancing network security.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

IJCSN International Journal of Computer Science and Network, Volume 5, Issue 3, June 2016

ISSN (Online): 2277-5420 www.IJCSN.org


Impact Factor: 1.02
544

Approach Using Genetic Algorithm for Intrusion


Detection System
Abhijeet Karve

Government College of Engineering, Aurangabad, Dr. Babasaheb Ambedkar Marathwada University,


Aurangabad, Maharashtra- 431005, India

Abstract - Now a days it is very important to maintain a detection system well-known attacks are represented by
high level security to ensure a safe and trusted signature.
communication of information between various
organizations. But secured data communication over The Misuse approach uses several techniques grouped
internet or any other network is always threats of intrusions
into three classes:
and misuses. There are different Soft computing approaches
have been proposed to detect the attacks. In this paper we
proposed the genetic algorithm to generate the rules with 1) The rule-based approaches or expert systems,
the help of network audit data and for selection of rules used 2) Approaches based on signature
fitness function. The generated rules are used to detect or 3) Genetic Algorithms GA.
classify the attacks. By using Genetic Algorithm (GA) we
can classify the different types of attack. To implement and In this paper, we present a GA approach for network
measure the performance of system we used the DARPA intrusion detection. Genetic algorithms are used to
benchmark dataset and obtained reasonable detection rate. optimize the search of attack scenarios in audit files. Our
implemented system contains two modules where each
Keywords - Computer & Network Security, DARPA 98 operates at a different stage. In the training stage, a set of
Dataset, Genetic Algorithm (GA), Intrusion Detection System
rules are generated from the audit data. In the stage of
(IDS.)
intrusion detection, produced rules are used to classify
incoming network connections in real time. But the main
1. Introduction goal is the optimization of the number of signatures in the
audit file to minimize the search time and increase the
Local networks and internet are growing at an detection rate of attacks. This system is tested using the
exponential rate in recent years. While we enjoy the Defense Advanced Research Project Agency (DARPA)
advantage that the new technology has brought us, data set which has become the standard test systems for
computer systems are exposed to increasing security intrusion detection.
threats that originate from external or internal hosts.
Although there are different mechanism of protections are
2. Related Work
available, but it is almost impossible to have a totally
secured system. Therefore, intrusion detection technology
Dheeraj pal [2] has proposed attribute subset selection
becomes more and more important that monitors traffic
with information gain and Genetic algorithm for creation
and identifies network intrusion.
of rules for detection. In this approach, GA is used to
detect the attack with help of rules with less attribute, on
There are two major categories of the analyze techniques
Network Security Laboratory-Knowledge Discovery and
of IDS (Intrusion Detection System): the anomaly
Data Mining (NSLKDD) dataset.
detection and the misuse detection. Anomaly detection
uses the established normal profiles to identify any
Weiming Hu, Jun Gao [8] has proposed, two online
unacceptable deviation as the result of an attack. In a
Adaboost-based intrusion detection algorithms. In the first
misuse detection system, also known as signature based algorithm, a traditional online Adaboost process is used
IJCSN International Journal of Computer Science and Network, Volume 5, Issue 3, June 2016
ISSN (Online): 2277-5420 www.IJCSN.org
Impact Factor: 1.02
545

where decision stumps are used as weak classifiers. In the achieved. Genetic algorithm is composed of three
second algorithm, an improved online Adaboost process is operators. They are reproduction or selection, crossover or
proposed, and online Gaussian mixture models (GMMs) recombination and mutation.
are used as weak classifiers. They further propose a
distributed intrusion detection framework, in which a
local parameterized detection model is constructed in each
node using the online Adaboost algorithm. A global
detection model is constructed in each node by combining
the local parametric models using a small number of
samples in the node. This combination is achieved using
an algorithm based on particle swarm optimization (PSO)
and support vector machines. The global model in each
node is used to detect intrusions.

Zhenwei Yu, Jeffrey Tsai [9] has proposed, automatically .


tune the detection model on-the-fly according to the Fig 1: Structure of simple Genetic algorithm
feedback provided by the system operator when false
predictions are encountered. The system is evaluated The basic concepts of Genetic Algorithms are simple, yet
using the KDDCup’99 intrusion detection dataset. They the process of choosing the gene representation, a good
has used anomaly detection technique. fitness function, and even application of the
recombination [Whitley] can be the key to successful use
Dong Song, Malcolm I. Heywood and A. Nur Zincir- of Genetic Algorithms.
Heywood [10] has proposed, The hierarchical RSS-DSS
algorithm is introduced for dynamically filtering large 2.2 DARPA Data Set
datasets based on the concepts of training pattern age and
difficulty, Such a scheme provides the basis for training A key dependency of the work done by Gong and Li and
genetic programming (GP) on a data set of half a million as will be shown with net GA is the usage of DARPA data
patterns in 15 min and used anomaly detection technique. sets for training data. Creating this training data is not a
trivial task and is considered beyond the scope of this
2.1 Genetic Algorithm project. The MIT Lincoln laboratory provides an excellent
description of the process followed for creating the data.
Genetic Algorithms is an optimization technique using an This DARPA training data is actually a result of test
evolutionary process [5] [6]. A solution of a problem is network traffic data, a Sun Microsystems Solaris and the
represented as a data structure known as chromosome. An use of Sun's Basic Security Module [Sun]. The data sets
evaluation function is used to calculate the goodness of used in both papers were created in 1998[4]. Today's
each chromosome according to the desired solution; this attacks have changed with regard to rule based systems,
function is known as “Fitness Function”. GA process but the training data still works well for developing
begins with series of initial solutions is initially generated Genetic Algorithms.
(random population) and through a combination of
algorithms similar to an evolutionary process (often a 3. Proposed System
combination of elitism, crossover, and mutation) the
process works towards evolving solutions having better 3.1 Data Representation
“goodness” as evaluated by the fitness function.
The way that Genetic Algorithms are used with net GA is
In every generation the fitness of these chromosomes is that rules are randomly created to match attacks encoded
checked. To determine the fitness of the chromosomes as an integer array with the seven elements shown in
fitness function is used and then fittest chromosomes are Figure 2. The first six attributes of the chromosome match
selected. The chromosomes which have poor fitness value the gene characteristics of an attack. The seventh attribute
are discarded. The selected fit chromosomes undergo describes the attack type that the first six rules identify
crossover, mutation to form a new population. This new when they match. This representation uses the same
population is used for the next generation. Normally, the approach as used by Gong [3].
algorithm terminates when either a set number of
generations or a satisfactory fitness level has been
IJCSN International Journal of Computer Science and Network, Volume 5, Issue 3, June 2016
ISSN (Online): 2277-5420 www.IJCSN.org
Impact Factor: 1.02
546

Above rule specifies that if a network packet is originated


from IP address 99.19.99.19 and port number 18989 and
send to IP address 192.168.254.10 at port number 79
using finger protocol for duration of connection 1 second
then most likely it is Neptune attack which eventually
make destination host out of service.

3.2 Fitness Function

Every chromosome is selected after applying fitness


function to them. To determine the fitness of a rule, the
support confidence framework [7] is used. If a rule is
represented as if A then B [5] then the fitness of the rule
is as follows:
Support = |A and B| / N
Confidence = |A and B| / |A|
Fitness = w1 * support + w2 * confidence

Here, N is the total number of network connections in the


audit data, |A| stands for the number of network
connections matching the condition A, and |A and B| is
the number of network connections that matches the rule
Fig 2: Structure if Proposed System if A then B. The weights w1 and w2 are used to control the
balance between the two terms and have the default values
Table 1. Chromosome Representation for Rule of w1=0.2 and w2=0.8.

Feature Name Format Number of 3.3 Crossover and Mutation


genes
Duration h:m:s 3 Crossover is one of the important steps in GA. There are
Protocol Int 1 three types of crossover techniques. They are one point,
Source_port Int 1 two point and uniform cross over technique. In this paper
Destination_port Int 1 we used two point crossovers. Crossover involves
Source_IP a.b.c.d 4 splitting two chromosomes and then combining first part
of a chromosome with the second part of the other
Destination_IP a.b.c.d 4
chromosome.
Attack_name Int 1
Each gene in each chromosome is checked for possible
In order to evaluate a rule represented by a chromosome, mutation by generating a random number between zero
the DARPA audit data is parsed and loaded into a list of and one and if this number is less than or equal to the
audit connections. The attributes loaded from the DARPA given mutation probability then the gene value is
audit data directly match the attributes used in the changed. Mutations create diversity to search in domain
chromosome representation. The gene representation regions that may otherwise be excluded.
follows the simple rule if A then B, where if the first six
attributes are logically and-ed together are true (A), then
3.4 Algorithm Steps
the rule matches the attack (B). Following is the sample
example [5] that classifies a network connection as the
Listing 1 shows the major steps of the employed detection
denial-of-service attack Neptune.
algorithm as well as the training process. It first generates
the initial population, sets the defaults parameters, and
if (duration = “0:0:1” and protocol = “finger” and
loads the network audit data. Then the initial population
source_port = 18989 and destination_port = 79 and
is evolved for a number of generations.
source_ip = “99.19.99.19” and destination_ip =
“192.168.254.10” ) then (attack_name =
Algorithm: Rule set generation using genetic algorithm.
“Neptune”)
IJCSN International Journal of Computer Science and Network, Volume 5, Issue 3, June 2016
ISSN (Online): 2277-5420 www.IJCSN.org
Impact Factor: 1.02
547

Input: Network audit data, number of generations, and population. Finally, line 23 checks and decides whether to
population size. terminate the training process or to enter the next
generation to continue the evolution process.
Output: A set of classification rules.
4. Experimental Results
1. Initialize the population
2. W1 = 0.2, W2 = 0.8, T = 0.5 The genetic algorithm for rule generation is implemented
using Java language (JDK1.8) in NetBeans8.0. The front
3. N = total number of records in the training set end development environment used is NetBeans8.0. Two
4. for each chromosome in the population subsets were developed from DARPA 1998 data. Table 1
gives the distributions of record types in both training and
5. A = 0, AB = 0 testing data set. The first row gives the number of normal
6. for each record in the training set network records. The second row gives the distributions of
Smurf attack whereas the third row gives the distribution
7. If the record matches the chromosome of Neptune attack.
8. AB = AB + 1
The implementation is done in two phases. In the first
9. End if phase the classification rules are generated using genetic
10. If the record matches only the “condition” part algorithm. Support confidence function as fitness
function. The GA parameters used were w1 = 0.2, w2 =
11. A = A + 1 0.8, 200 generations, population of 2000 rules, mutation
12. End if rate of 0.001. In the second (testing / detection) phase, for
each test data, an initial population is made using the data
13. End for and occurring mutation in different features. This
14. Fitness = W1 * AB / N + W2 * AB / A population is compared with each chromosomes prepared
in training phase. Portion of population, which are more
15. If Fitness > T loosely related with all training data than others, are
16. Select the chromosome into new population removed. Crossover and mutation occurs in rest of the
population which becomes the population of new
17. End if generation. The process runs until the last generation
18. End for finished. The group of the chromosome which is closest
relative of only surviving chromosome of test data is
19. for each chromosome in the new population returned as the predicted type.
20. Apply crossover operator to the chromosome
Table2: Experimental Result
21. Apply mutation operator to the chromosome
Record Training Testing Match
22. End for Type %
23. If number of generations is not reached, Normal 73 64 87%
then goto line 4
Smurf 799 790 98%
Listing 1. Major steps of the detection algorithm.

In each of the qualities rules are firstly calculated, then a Neptune 96 96 100%
number of best-fit rules are selected, and finally the GA
operators are applied to the selected rules. The training
process starts by randomly generating an initial 4.1 Execution Time
population of rules (line 1). The weights and fitness
threshold values are initialized in line 2. Line 3. The time taken by a genetic algorithm to reach to a
Calculates the total number of records in the audit data. required solution is an important aspect. This execution
Lines 4-18 calculate the fitness of each rule and select the time increases linearly as the population size increases for
best-fit rules into new population. Lines 19-22 apply the the equal number of generations. Graph 4.1 shows the
crossover and mutation operators to each rule in the new population size and corresponding execution time taken
IJCSN International Journal of Computer Science and Network, Volume 5, Issue 3, June 2016
ISSN (Online): 2277-5420 www.IJCSN.org
Impact Factor: 1.02
548

by the GA. The maximum number of generations is set to increased, more accurate intrusion detection rates are
200. obtained.

Future Scope

The system is able to detect the Daniel of Services Attacks


as the main focus was on that only. The system can be
modified and change accordingly to select all types of
attacks, as it is mentioned by Li. Feature selection can
work if the selected features remain same for training and
testing datasets.

References
[1] W. Li, “A Genetic Algorithm Approach to Network
Intrusion Detection”, SANS Institute, USA, 2004.
Fig 3. Effects of GA Population Size on Execution Time [2] Dheeraj Pal and Amrita Parashar “Improved Genetic
Algorithm for Intrusion Detection System”, 2014 Sixth
International Conference on Computational
4.2 Population Size
Intelligence and Communcation Networks
[3] Li, Wei. 2002. “The integration of security sensors
Graph 4.2 shows the percentage detection for different into the Intelligent Intrusion Detection System (IIDS)
number of generations of GA. As the number of in a cluster environment.” Master’s Project Report.
generations is increased, the detection rate is improved at Department of Computer Science, Mississippi State
the cost of increased time required for the generation of University.
rules. The best results are obtained after 200 generations. [4] MIT Lincoln Laboratory, DARPA datasets, MIT,
USA, in November2004).
http://www.ll.mit.edu/IST/ideval/data/data_index.html
[5] H. Pohlheim, “Genetic and Evolutionary Algorithms:
Principles, Methods and Algorithms”,
http://www.geatbx.com/docu/index.html (accessed in
January 2005).
[6] M. Crosbie and E. Spafford, “Applying Genetic
Programming to Intrusion Detection”, Proceedings of
the AAAI Fall Symposium, 1995
[7] W. Lu and I. Traore, “Detecting New Forms of
Network Intrusion Using Genetic Programming”,
Computational Intelligence, vol. 20, pp. 3, Blackwell
Publishing, Malden, pp. 475-494, 2004.
[8] Weiming Hu, Jun Gao, Yanguo Wang, Ou Wu, and
Stephen Maybank “Online Adaboost-Based
Parameterized Methods for Dynamic Distributed
Network Intrusion Detection”, IEEE Trans. On
Fig 4. Effects of Population on Detection Accuracy
Cybernetics.
[9] Zhenwei Yu, Jeffrey J. P. Tsai and Thomas Weigert,
5. Conclusion “An Automatically Tuning Intrusion Detection
System”, IEEE Trans. On Systems, Man and
IDS is implemented using GA in two steps. In the first Cybernetics-Part B: Cybernetics Vol.37, No.2, April
step, GA is used to generate classification rules where as 2007.
in the second step these rules are used for intrusion [10] Dong Song, Malcolm I. Heywood and A. Nur Zincir-
detection. This reduces the search space and yields more Heywood, “Training Genetic Programming on Half a
Million Patterns: An Example from Anomaly
accurate results while using smaller population and lesser
Detection”, IEEE TRANSACTIONS ON
number of generations compared to Gong et al.’s EVOLUTIONARY COMPUTATION, VOL. 9, NO. 3,
approach. This has reduced the time required for the JUNE 2005.
generation of fittest rules. The given system is run for
different generations. As the number of generations is

You might also like