Approach Using Genetic Algorithm for Intrusion Detection System 3
Approach Using Genetic Algorithm for Intrusion Detection System 3
Abstract - Now a days it is very important to maintain a detection system well-known attacks are represented by
high level security to ensure a safe and trusted signature.
communication of information between various
organizations. But secured data communication over The Misuse approach uses several techniques grouped
internet or any other network is always threats of intrusions
into three classes:
and misuses. There are different Soft computing approaches
have been proposed to detect the attacks. In this paper we
proposed the genetic algorithm to generate the rules with 1) The rule-based approaches or expert systems,
the help of network audit data and for selection of rules used 2) Approaches based on signature
fitness function. The generated rules are used to detect or 3) Genetic Algorithms GA.
classify the attacks. By using Genetic Algorithm (GA) we
can classify the different types of attack. To implement and In this paper, we present a GA approach for network
measure the performance of system we used the DARPA intrusion detection. Genetic algorithms are used to
benchmark dataset and obtained reasonable detection rate. optimize the search of attack scenarios in audit files. Our
implemented system contains two modules where each
Keywords - Computer & Network Security, DARPA 98 operates at a different stage. In the training stage, a set of
Dataset, Genetic Algorithm (GA), Intrusion Detection System
rules are generated from the audit data. In the stage of
(IDS.)
intrusion detection, produced rules are used to classify
incoming network connections in real time. But the main
1. Introduction goal is the optimization of the number of signatures in the
audit file to minimize the search time and increase the
Local networks and internet are growing at an detection rate of attacks. This system is tested using the
exponential rate in recent years. While we enjoy the Defense Advanced Research Project Agency (DARPA)
advantage that the new technology has brought us, data set which has become the standard test systems for
computer systems are exposed to increasing security intrusion detection.
threats that originate from external or internal hosts.
Although there are different mechanism of protections are
2. Related Work
available, but it is almost impossible to have a totally
secured system. Therefore, intrusion detection technology
Dheeraj pal [2] has proposed attribute subset selection
becomes more and more important that monitors traffic
with information gain and Genetic algorithm for creation
and identifies network intrusion.
of rules for detection. In this approach, GA is used to
detect the attack with help of rules with less attribute, on
There are two major categories of the analyze techniques
Network Security Laboratory-Knowledge Discovery and
of IDS (Intrusion Detection System): the anomaly
Data Mining (NSLKDD) dataset.
detection and the misuse detection. Anomaly detection
uses the established normal profiles to identify any
Weiming Hu, Jun Gao [8] has proposed, two online
unacceptable deviation as the result of an attack. In a
Adaboost-based intrusion detection algorithms. In the first
misuse detection system, also known as signature based algorithm, a traditional online Adaboost process is used
IJCSN International Journal of Computer Science and Network, Volume 5, Issue 3, June 2016
ISSN (Online): 2277-5420 www.IJCSN.org
Impact Factor: 1.02
545
where decision stumps are used as weak classifiers. In the achieved. Genetic algorithm is composed of three
second algorithm, an improved online Adaboost process is operators. They are reproduction or selection, crossover or
proposed, and online Gaussian mixture models (GMMs) recombination and mutation.
are used as weak classifiers. They further propose a
distributed intrusion detection framework, in which a
local parameterized detection model is constructed in each
node using the online Adaboost algorithm. A global
detection model is constructed in each node by combining
the local parametric models using a small number of
samples in the node. This combination is achieved using
an algorithm based on particle swarm optimization (PSO)
and support vector machines. The global model in each
node is used to detect intrusions.
Input: Network audit data, number of generations, and population. Finally, line 23 checks and decides whether to
population size. terminate the training process or to enter the next
generation to continue the evolution process.
Output: A set of classification rules.
4. Experimental Results
1. Initialize the population
2. W1 = 0.2, W2 = 0.8, T = 0.5 The genetic algorithm for rule generation is implemented
using Java language (JDK1.8) in NetBeans8.0. The front
3. N = total number of records in the training set end development environment used is NetBeans8.0. Two
4. for each chromosome in the population subsets were developed from DARPA 1998 data. Table 1
gives the distributions of record types in both training and
5. A = 0, AB = 0 testing data set. The first row gives the number of normal
6. for each record in the training set network records. The second row gives the distributions of
Smurf attack whereas the third row gives the distribution
7. If the record matches the chromosome of Neptune attack.
8. AB = AB + 1
The implementation is done in two phases. In the first
9. End if phase the classification rules are generated using genetic
10. If the record matches only the “condition” part algorithm. Support confidence function as fitness
function. The GA parameters used were w1 = 0.2, w2 =
11. A = A + 1 0.8, 200 generations, population of 2000 rules, mutation
12. End if rate of 0.001. In the second (testing / detection) phase, for
each test data, an initial population is made using the data
13. End for and occurring mutation in different features. This
14. Fitness = W1 * AB / N + W2 * AB / A population is compared with each chromosomes prepared
in training phase. Portion of population, which are more
15. If Fitness > T loosely related with all training data than others, are
16. Select the chromosome into new population removed. Crossover and mutation occurs in rest of the
population which becomes the population of new
17. End if generation. The process runs until the last generation
18. End for finished. The group of the chromosome which is closest
relative of only surviving chromosome of test data is
19. for each chromosome in the new population returned as the predicted type.
20. Apply crossover operator to the chromosome
Table2: Experimental Result
21. Apply mutation operator to the chromosome
Record Training Testing Match
22. End for Type %
23. If number of generations is not reached, Normal 73 64 87%
then goto line 4
Smurf 799 790 98%
Listing 1. Major steps of the detection algorithm.
In each of the qualities rules are firstly calculated, then a Neptune 96 96 100%
number of best-fit rules are selected, and finally the GA
operators are applied to the selected rules. The training
process starts by randomly generating an initial 4.1 Execution Time
population of rules (line 1). The weights and fitness
threshold values are initialized in line 2. Line 3. The time taken by a genetic algorithm to reach to a
Calculates the total number of records in the audit data. required solution is an important aspect. This execution
Lines 4-18 calculate the fitness of each rule and select the time increases linearly as the population size increases for
best-fit rules into new population. Lines 19-22 apply the the equal number of generations. Graph 4.1 shows the
crossover and mutation operators to each rule in the new population size and corresponding execution time taken
IJCSN International Journal of Computer Science and Network, Volume 5, Issue 3, June 2016
ISSN (Online): 2277-5420 www.IJCSN.org
Impact Factor: 1.02
548
by the GA. The maximum number of generations is set to increased, more accurate intrusion detection rates are
200. obtained.
Future Scope
References
[1] W. Li, “A Genetic Algorithm Approach to Network
Intrusion Detection”, SANS Institute, USA, 2004.
Fig 3. Effects of GA Population Size on Execution Time [2] Dheeraj Pal and Amrita Parashar “Improved Genetic
Algorithm for Intrusion Detection System”, 2014 Sixth
International Conference on Computational
4.2 Population Size
Intelligence and Communcation Networks
[3] Li, Wei. 2002. “The integration of security sensors
Graph 4.2 shows the percentage detection for different into the Intelligent Intrusion Detection System (IIDS)
number of generations of GA. As the number of in a cluster environment.” Master’s Project Report.
generations is increased, the detection rate is improved at Department of Computer Science, Mississippi State
the cost of increased time required for the generation of University.
rules. The best results are obtained after 200 generations. [4] MIT Lincoln Laboratory, DARPA datasets, MIT,
USA, in November2004).
http://www.ll.mit.edu/IST/ideval/data/data_index.html
[5] H. Pohlheim, “Genetic and Evolutionary Algorithms:
Principles, Methods and Algorithms”,
http://www.geatbx.com/docu/index.html (accessed in
January 2005).
[6] M. Crosbie and E. Spafford, “Applying Genetic
Programming to Intrusion Detection”, Proceedings of
the AAAI Fall Symposium, 1995
[7] W. Lu and I. Traore, “Detecting New Forms of
Network Intrusion Using Genetic Programming”,
Computational Intelligence, vol. 20, pp. 3, Blackwell
Publishing, Malden, pp. 475-494, 2004.
[8] Weiming Hu, Jun Gao, Yanguo Wang, Ou Wu, and
Stephen Maybank “Online Adaboost-Based
Parameterized Methods for Dynamic Distributed
Network Intrusion Detection”, IEEE Trans. On
Fig 4. Effects of Population on Detection Accuracy
Cybernetics.
[9] Zhenwei Yu, Jeffrey J. P. Tsai and Thomas Weigert,
5. Conclusion “An Automatically Tuning Intrusion Detection
System”, IEEE Trans. On Systems, Man and
IDS is implemented using GA in two steps. In the first Cybernetics-Part B: Cybernetics Vol.37, No.2, April
step, GA is used to generate classification rules where as 2007.
in the second step these rules are used for intrusion [10] Dong Song, Malcolm I. Heywood and A. Nur Zincir-
detection. This reduces the search space and yields more Heywood, “Training Genetic Programming on Half a
Million Patterns: An Example from Anomaly
accurate results while using smaller population and lesser
Detection”, IEEE TRANSACTIONS ON
number of generations compared to Gong et al.’s EVOLUTIONARY COMPUTATION, VOL. 9, NO. 3,
approach. This has reduced the time required for the JUNE 2005.
generation of fittest rules. The given system is run for
different generations. As the number of generations is