Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Icoei48184 2020 9143028

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Proceedings of the Fourth International Conference on Trends in Electronics and Informatics (ICOEI 2020)

IEEE Xplore Part Number: CFP20J32-ART; ISBN: 978-1-7281-5518-0

Optimizing a New Intrusion Detection System Using


Ensemble Methods and Deep Neural Network
Ajeet Rai
Data Science and MIS
iServeU technology
Bhubaneswar, India
Ajeetrai2293@gmail.co m

Abstract—In the previous, not many years, digital assaults classification problem. In this paper, only the binary
have become a significant issue in cybersecurity. Researchers are classification will be discussed. There are several datasets for
taking a shot at the intrusion detection framework from the most intrusion detection systems that can be used as a benchmark,
recent couple of decades and numerous methodologies have been but trained models based on these old datasets fail to detect
developed. Yet at the same time, these methodologies won't be anomalous web traffic and hence it is necessary to update these
adequate for the intrusion detection framework in the up and data after a certain interval of time. Researchers have used a lot
coming days. Along these lines, in light of headways in
of machine learning algorithms to create intrusion detection
innovation, the current framework has to be refreshed with systems. Some classical machine learning algorithms such as
another one. In this paper, ensemble learning strategies have
been examined for the intrusion detection system were boosting
support vector machine (SVM), Decision Tree (DT), K-Nearest
and bagging methods like Distributed Random Forest (DRF),
Neighbors (KNN), etc. have been used so far. Also, some
Gradient Boosting Machine (GBM) and XGBoost are feature selection techniques like Filter Methods, Wrapper
implemented using python library H2O for the new Intrusion Methods, Embedded Methods, etc. are used to improve these
identification framework. The Deep Neural Network (DNN) is models. In this paper, some machine learning algorithms have
likewise executed using the H2O library and found that our been used which have become very popular in recent years.
model beats the past aftereffect of Deep Neural Network (DNN) Ensemble models are used very frequently for solving real-
after utilizing the feature selection method genetic algorithm. world problems and many times it outperforms all other
Our outcomes outperform the numerous old-style machine models. Since many feature selection techniques have been
learning models too. used for better results. Researchers have used the bio-inspired
algorithm to find out the combination of best features for the
Keywords— cybersecurity, intrusion detection, machine intrusion detection system. So, a very special kind of
learning, distributed random forest, gradient boosting machine, metaheuristic optimization technique called genetic algorithm
xgboost, deep neural network, ensemble methods, h2o for feature selection has been used. In this paper, ensemble
methods and neural networks have been focused on. In many
I. INT RODUCT ION cases, ensemble methods and neural networks work better than
In the year 2018, some cyber-attacks have drawn the other machine learning models. In literature, these models
along with genetics algorithm have never been used in the
attention of everyone to the flaws of Cyber Security. These
cyber-attacks are clear that what the intrusion detection system intrusion detection systems . Also, our results are better than
previous results of the deep neural network after using the
is not enough to prevent these attacks. In the present era,
genetic algorithm for feature selection.
peoples are more close to devices connected through the
internet. Due to which the threat of these attacks has increased.
So, to recognize these attacks, the researchers have been II. RELAT ED WORK
working on it for many years. As the web traffic remains In earlier studies, machine learning models like Support
abnormal at the time of cyber-attacks, which are rarely seen. Vector Machine (SVM), K-Nearest Neighbor(KNN), Decision
Researchers have considered these web traffic as anomalous
Tree (DT), Random Forest, etc., and hybrid algorithms
and have started to work on so many anomalies detection
whereas in deep learning Artificial Neural Network (ANN),
methods. Initially, rules-based intrusion detection techniques
Convolutional Neural Networks (CNN), Recurrent Neural
were used. Later they used data mining, machine learning, and
deep learning methods as well. In literature, attacks are Network (RNN), Deep Belief Network (DBN) have been
categories in four different groups namely Denial of Service implemented for the intrusion detection system. Many
(DOS), User to Root (U2R), Probe (Probing) and Root to Local researchers implement similar models and hence here only a
(R2L). In machine learning, anomaly detection can be seen in few of them have been mentioned. The Support Vector
two ways. Firstly Web traffic can be either normal or Machine (SVM) performs well for almost all researchers.
abnormal, and it can be called binary class classification Kotpalliwar and Wajgi [7] used a Support Vector Machine
problem and secondly, the process of identifying four types of (SVM) on a few portions of the KDD Cup 99 dataset and got
abnormal web traffic can be treated as a multiclass 99.9% accuracy on the validation data. Even Saxena and

978-1-7281-5518-0/20/$31.00 ©2020 IEEE 527

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 24,2020 at 06:32:30 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Trends in Electronics and Informatics (ICOEI 2020)
IEEE Xplore Part Number: CFP20J32-ART; ISBN: 978-1-7281-5518-0

Richariya [24] used SVM with Particle Swarm Optimization whereas the remaining two are the validation dataset.
(PSO) for feature selection and got quite good results. Another KDDTest -21 is more complex than KDDTest + dataset. The
method K-Nearest Neighbor (KNN) and K-means have been dataset has a total of 41 features and a target outcome having
implemented by Sharifi et al. [19] where they got around 90% values normal and anomaly. Among all features, 3 features are
accuracy. The tree-based model's decision tree was used by categorical whereas remaining are continuous.
Ingre et al. [20] where they used Correlation Feature Selection
(CFS) such that 14 features were extracted. C4.5 was also
used by Rai and Devi [22] with 79.52% of accuracy.
Unsupervised deep learning models like Deep Belief Network
(DBN) with Logistic Regression were used by Alrawashdeh
and Purdy [21] where accuracy was 97.9% and this was
performed using a few portions of KDD Cup 99 dataset. Yin
et al. [23] gave an accuracy of 83.28% on the validation
dataset after implementing the Recurrent Neural Network
(RNN) on the NSL-KDD dataset. Author Santhosh and
Arghir-Nicolae [9] have used the library H2O for the first time
in intrusion detection to implement deep neural network
(DNN) and they got 83% accuracy with NSL-KDD dataset.
Not only the above-mentioned models but many researchers
used a different combination of models to improve the
previous results. To improve models trying with the different
machine learning algorithms is not sufficient. Data pre-
processing and feature engineering steps makes a crucial
difference in performance. Researchers have tried many
feature selection techniques to improve the models for the
intrusion detection system. Not only the most frequently used
techniques but some bio-inspired algorithms as well. Aghdam
and Kabiri [6] proposed a method for feature selection which
is based on an ant colony optimization technique for the
intrusion detection system. Another bio-inspired algorithm
called Whale Optimization Algorithm (WOA) was used by
Sukriti and Batla on the same NSL-KDD dataset. Also, other Figure 1. Flow chart for the proposed methodology
algorithms firefly, particle swarm optimization, Eigenvector
Centrality, genetic algorithm, etc. were used by researchers
previously. B. Data pre-processing
III. MET HODOLOGY Before modeling, a very prominent step data pre-processing is
A problem with the Decision Tree algorithm is that it is required. While pre-processing, did not found any missing
more likely to be over-fitting. So to overcome the problem of values in data, but some features were categorical. That's why
overfitting, more than one tree can be used in an algorithm. first encoded the label and categorical features and then
Some algorithms that use lots of trees are Distributed Random perform one-hot encoding, after which got a new feature
Forest (DRF), Gradient Boosting Machine (GBM) and matrix that has 122 features.
XGBoost. Another algorithm Deep Neural Network (DNN) is Since the range of features is not on the same scale and hence
well known for extracting hidden features from data and will feature scaling was required. Well, there are many ways of
use these algorithms and study their performance. feature scaling, but used Standardization. After
standardization, each features having a mean 0 and standard
A. dataset deviation 1. The below formula is used for standardization.
A very popular NSL-KDD dataset for our experiment has
been used. NSL-KDD dataset has been used by many Standardization= …. (1)
researchers as a benchmark. KDD cup 1999 dataset is one of
the most widely used benchmark datasets for the intrusion Where, x - input value,
detection system. It is based on DARPA 1998 dataset. But this μ - mean,
KDD cup 99 datasets was having inherent redundant records - Standard deviation
problems. In 2009 new dataset NSL-KDD was generated
without the problems which have in the KDD cup 1999 dataset. One of the biggest advantages of using tree-based models is
For our experiment, the NSL-KDD dataset. NSL-KDD dataset that it is not sensitive to missing values and outliers. So will
is used having three different datasets namely KDDTrain +, not discuss it.
KDDTest +, KDDTest -21 .KDD train is a training dataset

978-1-7281-5518-0/20/$31.00 ©2020 IEEE 528

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 24,2020 at 06:32:30 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Trends in Electronics and Informatics (ICOEI 2020)
IEEE Xplore Part Number: CFP20J32-ART; ISBN: 978-1-7281-5518-0

C. Feature selection
One way to reduce overfitting is to remove low importance
features. Since 122 features have from which can remove
some features so that the model complexity will be reduced
and the chances of overfitting will also be reduced. There are
many types of metaheuristics algorithms, one of which is the
genetic algorithm will use for the feature selection. After using
the genetic algorithm left with only 43 features. A genetic
algorithm is search-based metaheuristics optimization
techniques inspired by nature. It is based on Charles Darwin’s
theory of natural evolution. In the genetic algorithm, have the
terms namely initial population, fitness value, selection,
crossover, offspring, and mutation. In the initial population,
have some fixed individuals also known as chromosomes.
Chromosomes are sets of genes arranged in an array. So, the
initial population is a set of chromosomes and chromosomes Figure 3. Fitness value over different iterations
are a set of genes. Inside the genes, have a binary value either
0 or 1.
In particular for feature selection, chromosomes are nothing
but a set of features where 1 represents the variables are taken
and 0 represents variables are not taken and have kept the
population size 50 that means there is a total of 50
combinations of variables. Since the genetic algorithm is
computationally expensive, hence used 10 iterations only. In
the above plot, green shows the best fitness value and the
dotted blue line represents the mean fitness value. The
original author has used the below formula for fitness function
and also use the same.

Fitness function= …..(2)

Figure 2. The internal architecture of the genetic So, first of all, individuals1 having variables 1 and 0 where
1 represents the variable and 0 their absence. In the first
algorithm
iteration, individuals1 used for modeling and used the random
forest for modeling. After the first iteration, the ROC of
In the genetic algorithm, first of all, took a possible solution to random forest divided by the total number of features that give
a problem. These solutions are chromosomes. The fitness fitness value for individual1 and similarly for other individuals
function is defined such that fittest chromosomes are as well. So the selection is based on the ROC value. Then the
evaluated and some fitness score is given to each individual. next steps executed and got the set of best features 43 out of
The fitness score then used to rank the individuals, and select 122. To implement the genetic algorithm for feature selection
the two fittest individuals for the next step and those two the R language has been used.
individuals named as parents. In the next step, randomly
choose a point where genes are interchanged in chromosomes. D. Ensemble Learning
If choose point 2 then first and second genes of individual1
Due to the complexity of data, classical machine learning
will exchange the binary values with individual2 and this step
techniques are not always sufficient for modeling. In most
is known as crossover. The next step is offspring which are
cases, it's over fitted depending on the data, so ensemble
nothing but two new individuals which are formed by
techniques can be the solution. The ensemble method can be
exchanging the genes between individual 1 and individual 2
categorized into basic and advanced methods. Max voting,
and then new individuals 3 and individuals 4 is added to the
Averaging, Weighted Average are the basic ensemble
population. The final step is a mutation where just flip the
methods, where the different algorithms trained on data and
binary values 0 to 1 or vice versa to maintain the diversity in
after averaging, the results got more powerful models.
the population. This process is stopped when new offspring
Advanced ensemble techniques are categorized in Stacking,
are not different from the previous one. Finally, the last left
bagging, boosting and blending. Further, these groups can also
individuals are our solution.
be categorized into different groups. Here, bagging and
boosting techniques for modeling will be discussed.

978-1-7281-5518-0/20/$31.00 ©2020 IEEE 529

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 24,2020 at 06:32:30 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Trends in Electronics and Informatics (ICOEI 2020)
IEEE Xplore Part Number: CFP20J32-ART; ISBN: 978-1-7281-5518-0

1) Tree-based model 4) Xgboost


Tree models divide feature set X into several categories C by Unlike the other ensemble techniques boosting techniques , a
using certain rules. For each C prediction is evaluated by using follows sequential approach. In boosting many models are
the constant function. Different tree methods CART, CHAID, built-in sequential manner. The first model built by just
C4.5 has been developed by researchers. In particular, GBM random guessing while second models based on residual. The
uses the CART method. In CART method tree developed from updated model built by adding both the models.
top to bottom until the stopping criteria occur. For the splitting Let's assume have feature matrix X and target variable Y.
of nodes, CART uses different methods like Chi-Square, Also, let's say f0 (x) is an initial model built by just random
information gain, Gini index, etc. Out of the four models, the guessing. The second model is built using the difference of
three models Distributed Random Forest (DRF), Gradient target outcome and random guesses which is nothing but a
Boosting Machine (GBM) and XGBoost are a tree-based residual or can be written as,
model. Y-f0 (x) …..(6)
2) Distributed Random Forest (DRF Now, the average of this residual is calculated at each leaf of
If a decision tree model is simply built, there is a high chance the tree. Therefore got the second model as,
that the model will be overfitted. So to avoid this have the
bagging method where build the many decision trees on some h 0 (x) = Mean of residuals or model fitted on residuals ....(7)
parts of data with replacement. This process will overcome the Such that, new model f1 (x) will be,
high variance problem and the method is called distributed f1 (x) =f0 (x) +h 0 (x) …..(8)
random forest. After training the model the results of all Similarly like above first iteration many iterations can be
decision trees are simply combined to get a better result. performed.
In a distributed random forest first decision tree 1 is built on
2/3rd of the entire data again second decision tree 2 built on f2 (x) =f1 (x) +h 1 (x)
the same amount of data but with replacement. Similarly, n f3 (x) =f2 (x) +h 2 (x)
numbers of the decision tree are built. Further in the case of .
classification problem maximum vote is applicable whereas in .
regression problem average is considered for combing the fn (x) =fn-1 (x) +h n-1 (x) …..(9)
results of all decision trees. Here the tuning parameters are the
number of variables taken at a time for training the tree and XGBoost uses pattern in residual to build the next model, so
number of trees. after getting the maximum accuracy residual will not have any
3) Gradient Boosting Machine patterns, that means XGBoost will stop if any iteration having
Gradient Boosting Machine (GBM) is an ensemble technique no pattern in residual. Hence, a strong XGBoost model will be
of boosting group which can be used for both classification built using weak learners.
and regression problem. In boosting, predictors from iteration 5) Deep Neural Network
"n" learn from the residuals of predictors from iteration "n-1". The amount of quality data always influences the model's
Therefore boosting techniques are works sequentially. GBM performance. If have a large amount of data, then deep
can be a great solution if the model is suffering from variance learning algorithms may work well compared to the classical
and bias problems. It uses CART as the base predictors and it machine learning algorithm. Since having enough amount of
was developed by J.H Friedman. Combining all can say GBM data, have also used Feed Forward Neural Network with many
ensembles all weak learners to form single new strong layers which are known as a deep neural network. A neural
learners. The general definition of machine learning is to find network with one hidden layer is known as a shallow neural
out the approximate function F(x) that maps the input values network whereas with many hidden layers it's known as a deep
and target outcome by learning the patterns in th e feature neural network. In our case, a deep neural network is a feed-
matrix. The loss function is calculated by using, forward neural network because it doesn't have any loops in
their graph.
In Shallow neural network, the input values used to form a
Loss function = ∑ (ypredicted - yactual ) 2 ….. (3) function as given below.
Where n is the total number of observations. ∑xi wi +b i …..(10)
Where, xi - input values,
Here the aim is to minimize the loss function. At any iteration wi - weight value,
of GBM, have some approximate function say Fa . GBM will and b i - baised term
try to improve the function Fa by adding some term say α such
that new term will define as, The above function is then passed into the activation function.
Fa+1 = Fa + α …..(4) There are many activation functions Sigmoid, Threshold
To satisfy the above equation, GBM finds the best α such that, function, Rectifier, Hyperbolic tangent, etc. In a shallow
Fa+1 = Fa + α=F(x) …..(5) neural network, the sigmoid activation function is used in case
This process will continue and the residual will improve in of binary classification.
each iteration.

978-1-7281-5518-0/20/$31.00 ©2020 IEEE 530

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 24,2020 at 06:32:30 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Trends in Electronics and Informatics (ICOEI 2020)
IEEE Xplore Part Number: CFP20J32-ART; ISBN: 978-1-7281-5518-0

= ……(11)

After the passing through the sigmoid activation function, the


result is binary which is then compare with actual results. The
difference between the results is known as the loss function.

Loss Function= (y hat -y)2 ……(12)


Where, y hat - output value,
y - actual value

Now the objective is used to reduce this loss function and to


achieve that used a gradient descent algorithm where the
weight value is updated by using backpropagation . The set of
inputs are input layer, the second layer are known as a hidden
layer and in last output layer. In the shallow neural network, Figure 4. Comparing different models for KDDTest +
there is only one hidden layer whereas in the deep neural
network it can be many. 5 cross-validations is also being used for each model and 3
cross-validations for the random forest in the genetic
IV. EXPERIMENT RESULTS algorithm.
For our experiment purpose, both tool R and Python have been
used. R is being used for the implementation of the genetic KDDTest -21+
algorithm whereas remaining work data pre-processing and Without Feature With Feature
modeling has done by python. There are many libraries in R Selection Selection
and python that can be used for machine learning and related
works. To implement machine learning either have to write
code from scratch or have to use any inbuilt libraries like
Scikit learn. In particular, the python H2O library has been Distributed 0.8729 0.8646
used for the implementation of machine learning models. H2O Random Forest
is a machine learning platform available for many (DRF)
programming languages that can be used for large datasets and Gradient Boosting 0.8591 0.8776
that's the reason this library has been used. The below results Machine (GBM)
shows the performance of each model using feature selection XGBoost 0.8645 0.8874
+
and without feature selection on KDDTest dataset and Deep neural 0.7414 0.8344
-21
KDDTest dataset. network (DNN)

KDDTest + Table 2. Performance of different models on KDDTest -21+


Without Feature With Feature
Selection Selection

Distributed 0.9212 0.9108


Random Forest
(DRF)
Gradient Boosting 0.9062 0.9150
Machine (GBM)
XGBoost 0.9132 0.9274
Deep neural network 0.8490 0.8822
(DNN)

Table 1. Performance of different models on KDDTest +

Figure 5. Comparing different models for KDDTest -21+

978-1-7281-5518-0/20/$31.00 ©2020 IEEE 531

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 24,2020 at 06:32:30 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Trends in Electronics and Informatics (ICOEI 2020)
IEEE Xplore Part Number: CFP20J32-ART; ISBN: 978-1-7281-5518-0

Relevance Features. Proceedings of the World Congress on Engineering


and Computer Science. 1. 20-22.
V. CONCLUSION [16] Sahu, Santosh & Sarangi, Sauravranjan & Jena, Sanjay. (2014). A detail
analysis on intrusion detection datasets.
Our experiment realized the need to update our results with [17] Hoque, Mohammad Sazzadul & Mukit, Md & Bikas, Md. Abu Naser.
new emerging tools and techniques. Nowadays many machine (2012). An Implementation of Intrusion Detection System Using Genetic
learning tools can be used to implement machine learning Algorithm. International Journal of Network Security & Its
models very efficiently and the H2O library is one of them. Applications. 4. 109-120. 10.5121/ijnsa.2012.4208.
Our results show that ensemble learning performs well as [18] Sharma, Rupam & Kalita, Hemanta Kumar & Borah, Parashjyoti.
(2015). Analysis of Machine Learning T echniques Based Intrusion
compare to other machine learning results. So, to conclude, Detection Systems. 44. 10.1007/978-81-322-2529-4_51#page-1.
the ensemble methods can outperform many of the existing [19] Sharifi, Aboosaleh & Amirgholipour Kasmani, Saeed & Pourebrahimi,
algorithms. In our experiment, there were some limitations. In Alireza. (2015). Intrusion Detection Based on Joint of K-Means and
the genetic algorithm, only a few parts of the dataset with 10 KNN. Journal of Convergence Information Technology(JCIT ). 10. 42 -
51.
iterations is being used. So in future works result may be
improved if the whole dataset is being used with more [20] Ingre, Bhupendra & Yadav, Anamika & Soni, Atul. (2017). Decision
T ree Based Intrusion Detection System for NSL-KDD Dataset.
iterations. Also our Deep Neural Network (DNN) model 10.1007/978-3-319-63645-0_23.
outperforms the previous results after using a genetic [21] Alrawashdeh, Khaled & Purdy, Carla. (2016). T oward an Online
algorithm for feature selection. Anomaly Intrusion Detection System Based on Deep Learning. 195-200.
10.1109/ICMLA.2016.0040.
[22] Rai, Kajal & Devi, Mandalika & Professor, Devi & Guleria, Ajay.
VI. REFERENCES (2016). Decision T ree Based Algorithm for Intrusion Detection.
International Journal of Advanced Networking and Applications. 07.
[1] Mahbod T avallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani. 2828-2834.
2009. A detailed analysis of the KDD CUP 99 data set. In Proceedings
of the Second IEEE international conference on Computational [23] Chuan-long, Yin & Yue-fei, Zhu & Jin-long, Fei & Xin-zheng, He.
intelligence for security and defense applications (CISDA’09). IEEE (2017). A Deep Learning Approach for Intrusion Detection Using
Press, 53–58. Recurrent Neural Networks. IEEE Access. PP. 1-1.
10.1109/ACCESS.2017.2762418.
[2] Friedman, Jerome H. “ Greedy Function Approximation: A Gradient
Boosting Machine.” Annals of Statistics (2001): 1189 -1232. [24] Saxena, Harshit & Richariya, Vineet. (2014). Intrusion Detection in
KDD99 Dataset using SVM-PSO and Feature Reduction with
[3] Demystifying Information Security Using Data Science Retrieved from Information Gain. International Journal of Computer Applications. 98.
https://www.analyticsvidhya.com/blog/2018/02/demystifying-security- 25-29. 10.5120/17188-7369.
data-science/
[4] Gradient Boosting Machine (GBM) retrieved from
http://docs.h2o.ai/h2o/latest -stable/h2o-docs/data-science/gbm.html
[5] Boosting algorithm: GBM retrieved from
https://towardsdatascience.com/boosting-algorithm-gbm-97737c63daa3
[6] Hosseinzadeh Aghdam, Mehdi & Kabiri, Peyman. (2 016). Feature
Selection for Intrusion Detection System Using Ant Colony
Optimization. International Journal of Network Security. 18. 420 -432.
[7] M. V. Kotpalliwar and R. Wajgi, "Classification of Attacks Using
Support Vector Machine (SVM) on KDDCUP'99 IDS Dat abase," 2015
Fifth International Conference on Communication Systems and Network
T echnologies, Gwalior, 2015, pp. 987-990.
[8] Mark J van der Laan, Eric C Polley, and Alan E Hubbard. “ Super
Learner.” Journal of the American Statistical Applications in Genetics
and Molecular Biology. Volume 6, Issue 1. (September 2007).
[9] S. Parampottupadam & A.-N. Moldovan – “ Cloud Based Real-T ime
Network Intrusion Detection Using Deep Learning”, Cyber Security
2018, Glasgow, Scotland, UK, 12th June 2018.
[10] Introduction to Genetic Algorithms — Including Example Code
retrieved from https://towardsdatascience.com/introduction-to-genetic-
algorithms-including-example-code-e396e98d8bf3.
[11] genetic-algorithm-feature-selection retrieved from
https://github.com/pablo14/genetic-algorithm-feature-selection.
[12] Feature Selection using Genetic Algorithms in R retrieved from
https://blog.datascienceheroes.com/feature-selection-using-genetic-
algorithms-in-r/
[13] Artificial neural network retrieved from
https://www.slideshare.net/KirillEremenko/deep-learning-az-artificial-
neural-networks-ann-module-1.
[14] Lingaraj, Haldurai. (2016). A Study on Genetic Algorith m and its
Applications. International Journal of Computer Sciences and
Engineering. 4. 139-143.
[15] Adetunmbi, Adebayo & Adeola, S.Oladele & Abosede, D.O.. (2010).
Analysis of KDD '99 Intrusion Detection Dataset for Selection of

978-1-7281-5518-0/20/$31.00 ©2020 IEEE 532

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on July 24,2020 at 06:32:30 UTC from IEEE Xplore. Restrictions apply.

You might also like