0% found this document useful (0 votes)

136 views

Machine Learning For Intrusion Detection in Cyber Security: Applications, Challenges, and Recommendations

Modern life revolves around networks and cybersecurity has emerged as a critical study field. The health of the software and hardware running on a network is monitored by an Intrusion Detection System (IDS) which is a fundamental cybersecurity approach. After decades of research, the existing IDSs have developed the capability to confront hurdles in order to improve detection accuracy, reduce false alarm rates, and detect unexpected attacks. Many academics have concentrated on designing such IDS

Uploaded by

INNOVATIVE COMPUTING REVIEW

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

136 views

Machine Learning For Intrusion Detection in Cyber Security: Applications, Challenges, and Recommendations

Uploaded by

INNOVATIVE COMPUTING REVIEW

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Innovative Computing Review (ICR)

Volume 2 Issue 2, Fall 2022

ISSN(P): 2791-0024 ISSN(E): 2791-0032
Homepage: https://journals.umt.edu.pk/index.php/ICR

Article QR

Machine Learning for Intrusion Detection in Cyber Security:

Title:
Applications, Challenges, and Recommendations

Author (s): Aqib Ali1, Samreen Naeem1, Sania Anam2, Muhammad Munawar Ahmed3

Affiliation (s):
1
College of Automation, Southeast University, Nanjing, China.
2
Govt Associate College for Women Ahmadpur East, Bahawalpur, Pakistan.
3
Islamia University Bahawalpur, Bahawalpur, Pakistan.
DOI: https://doi.org/10.32350.icr.22.03

History: Received: October 10, 2022, Revised: November 11, 2022, Accepted: December 2, 2022
Citation: A. Ali, S. Naeem, S. Anam, and M. M. Ahmed, “Machine learning for intrusion
detection in cyber security: Applications, challenges, and recommendations,”
UMT Artif. Intell. Rev., vol. 2, no. 2, pp. 41-64, 2022, doi:
https://doi.org/10.32350.icr.22.03
Copyright: © The Authors
Licensing: This article is open access and is distributed under the terms of
Creative Commons Attribution 4.0 International License
Conflict of
Interest: Author(s) declared no conflict of interest

A publication of
School of Systems and Technology
University of Management and Technology, Lahore, Pakistan
Machine Learning for Intrusion Detection in Cyber
Security: Applications, Challenges, and Recommendations
Aqib Ali1 *, Samreen Naeem1, Sania Anam2, and Muhammad Munawar
0F

Ahmed3
1
College of Automation, Southeast University, Nanjing, China.
2
Department of Computer Science, Govt. Associate College for Women
Ahmadpur East, Bahawalpur, Pakistan.
3
Department of Information Technology, Islamia University Bahawalpur,
Pakistan.
Abstract-Modern life revolves critical dimension to classify and
around networks and cybersecurity summarize the IDS literature. This
has emerged as a critical study field. form of classification structure is
The health of the software and appropriate for cyber security
hardware running on a network is researchers.
monitored by an Intrusion Detection
Index Terms-classification, feature
System (IDS) which is a fundamental
optimization, Intrusion Detection
cybersecurity approach. After
System, machine Learning
decades of research, the existing IDSs
Classification
have developed the capability to
confront hurdles in order to improve I. Introduction
detection accuracy, reduce false
alarm rates, and detect unexpected The Internet has become a vital
attacks. Many academics have aspect of modern lives as the digital
concentrated on designing such IDSs world has grown considerably [1].
that employ machine learning With the emergence of smart cities,
approaches to overcome the self-driving cars, health monitoring
aforementioned difficulties. Machine via wearables, and mobile banking,
learning approaches are capable to among many other things, internet
discover important distinctions that addiction is on the rise. While these
exist between normal and aberrant technologies assist individuals and
data with great accuracy. Moreover, societies at a large scale, they also
these approaches are also very
pose several concerns [2]. For
generalizable which allows them to
detect unknown attacks. The survey instance, hackers could take
conducted in the current study offers advantage of weaknesses, resulting
ataxonomy of IDS based on machine in theft and sabotage that harm
learning that uses data objects as the people worldwide. Cyberattacks

*
Corresponding Author: aqibcsit@gmail.com

Innovative Computing Review

42
Volume 2 Issue 2, Fall 2022
Ali et al.

may be costly to organizations current level of cybercrime, there is

regarding both cash losses and little doubt that the IDS plays a
reputational damage. As a result, critical role. The classification of
network security has become a IDS taxonomy is shown in Fig. 1.
significant concern. Organizations' The IDS could be regarded as a
use of traditional measures, such as hardware or software system that
firewalls, encryption, and antivirus monitors, detects, and warns the
software packages play a computer or network of attacks or
considerable part to safeguard their intrusions [5]. This warning report
network infrastructure. These assists the administrator or user to
approaches; however, only provide locate and resolve the system or
the first line of protection and network vulnerability. An attempt to
cannot fully defend networks and access the data, change it, or render
systems against malware and the system unworkable after an
advancing attacks. Consequently, intrusion might be purposeful or a
some intruders are nevertheless, criminal act. The area of
able to get access which may result cybersecurity aids to prevent and
in a breach [3]. detect the illicit computer activity.
Intrusion Detection refers to the Data in both hardware and software
is safeguarded against destruction
security of computer systems
and disruption [6]. Computer
against illegal usages, such as
security prevents the intruders to use
hackers and any form of misuse
the computers for their personal
from lawful access, such as insider
benefits. Firewalls, suites,
threats (ID). A breach in the
antiviruses, and other cybersecurity
computer system may result in data
tools are the various examples to
loss, restricted access to internet
protect the system. Data availability
resources, the loss of sensitive data,
at the right moment, asset
and the exploitation of private
authentication, document
resources [4]. Denning (1987) was
confidentiality, integrity, and all
the first to construct the Intrusion
specified data are the four primary
Detection System (IDS). Therefore,
it has become a hot study area as a categories that may be classed under
this specific domain, notably in
vital tool for computer network
cybersecurity [7]
security since then. Given the

School of System and Technology

43
Volume 2 Issue 2, Fall 2022
Machine Learning for Intrusion Detection...

Fig. 1. Classification of IDS taxonomy

According to the World Internet dependability, Variability: the data's
Statistics report, the Internet's meaning is changing over time, and
growth rate from 2000 to 2019 was Visualization: the data's simple
1.114 percent, with more than two access or reading [9].
quintillion bytes of data created per Because of the exponential rate
day [8]. This demonstrates that the of data expansion, traditional data
data accumulation from diverse management systems are incredibly
sources was relatively rapid, while complicated and are time and
the development of hacking tools resource-intensive. The
and procedures also increased accumulation of vast data is
rapidly. To secure data from inherently complex, necessitating
intrusion, information security and solid technology and
data analysis are necessary knowledgeable algorithms to handle
measures. The typical detection it. To identify the attacks, IDS is
system cannot detect intruders due crucial. An IDS monitors the
to the enormous volume and high network traffic to detect unusual
data velocity. Significant data behaviors and known threats. The
approaches are employed to administrators could then be
efficiently handle the intrusion. The informed on the discovery of such
7v defines big data as Volume: data conducts to avoid any trouble. ML
size, Speed: data generation pace, algorithms may be used to
Variety: diverse sorts of data, efficiently handle and categorize the
Value: the data's worth, attacks [10]. Intrusion detection is
Truthfulness: the data's
Innovative Computing Review
44
Volume 2 Issue 2, Fall 2022
Ali et al.

divided into two kinds based on how hole, and Sybil attacks are common
it works and they are as follows: types of IoT attacks. The Reference
[13] proposed supervised light
A. Active IDS
intrusion detection. SVM was
Active IDS are similar to created to identify attacks (DDoS
passive IDS in that they prevent target).
attacks by blocking suspicious
traffic.
B. Passive IDS
These IDS merely monitor and
analyze traffic by notifying the
administrator of attacks and
vulnerabilities [11].
II. Applications of Intrusion
Detection Systems
Intrusion Detection Systems are
vital to prevent cyber-attacks. All
transactions and data processing
occurs through the Internet, which is
very susceptible to fraudulent
activities. It is essential that the Fig. 2. Application of IDS
Information security must be B. IDS of Smart City
emphasized. Fig. 2 summarizes the
IDS based applications. The Reference [14] described
the intelligent city intrusion
A. IDS for Internet of Things detection. The author utilized an
The Internet of Things (IoT) is a intelligent water distribution system
network of things or devices that can dataset. Smart city DDoS attacks
detect, collect, and transmit data must be detected. The approach
without human or computer suggested in the current paper
interaction. Low-power IoT devices consists of two parts, that is, RBM
use lightweight protocols. The and classifier. This RBM model
Reference [12] discussed smart grid helps unsupervised high-level
IoT devices. Attackers may learning. Classification is used to
manipulate the sensor data. differentiate DDOS attacks. The
Physical, side channel, FFNN, AFNN, RF, and SVM
environmental, cryptanalysis, black classifiers were employed. RBM
School of System and Technology
45
Volume 2 Issue 2, Fall 2022
Machine Learning for Intrusion Detection...

model processes the K-Means router, gateway, and edge server.

method and contains up to 5 layers Fog nodes allow heterogeneous
that give five subversions of each processing, making them vulnerable
clustering. algorithm with a distinct to attacks like DDoS, Remote-to-
k value. Four classifiers as they are Local (R2L), User-to-Root (U2R),
used for every five cluster- and PROBE. The Reference [16]
generated datasets and 20 tests are added to the DDOS attack process in
run. fog computing and studied the fog
node and hypergraph-based DDOS.
C. IDS for Big Data
Load factor helps to determine the
Big data refers to as being fog node status. The fog node's
heterogeneous, organized, threshold charge level determines
unstructured, and semi-structured. its condition. This approach is used
Traditional intrusion management to assess a DDoS attack's
cannot handle excessive data association with cloud nodes.
therefore, ML is needed for Big
Data IDS. The Reference [15] E. IDS for Mobile
utilized Apache Spark Big Data to People in the modern era
identify intrusion detection. The increasingly use mobile phones to
preprocessed model uses Mllib communicate and store their
spark unit variance. sensitive information. Mobile
NumTopFeatures is used to pick vulnerabilities include apps,
features using Chisqselector and devices, networks, online sources,
SVM. SVM soft margin reduces and content vulnerabilities.
misclassification. The slack variable Therefore, IDS is needed to handle
swaps margin and classification these vulnerabilities and threats.
error. Their results reveal faster and The Reference [17] presented a 5G-
more efficient big data intrusion oriented cyber protection
detection. architecture to recognize 5G mobile
network's cyber threats. The
D. IDS for Fog Computing
incursions were defined by dividing
Fog computing is a type of novel anomaly detection into two levels,
processing paradigm that moves that is, ASD and NAD. The NAD
analytics to the edge to boost uses a supervised variant of LSTM
performance. Fog computing has a (Recurrent Short-Term Memory
cloud, fog, and user levels. The fog Networks), while the ASD module
service layer has a globally uses a supervised or semi-
dispersed fog node comprising a

Innovative Computing Review

46
Volume 2 Issue 2, Fall 2022
Ali et al.

supervised two-level form of DBN method [18]. The two rounds of

and SAE. training and testing are involved to
categorize the data in the dataset. A
III. The Role of Machine
target classifier is learned during the
Learning in IDS
training and learning phase. The
One of the machine learning generated model predicts class
activities is classification which is labels for provided data during the
also a paradigm of supervised second phase, that is, the test phase.
learning. It is employed in intrusion It is critical to determine that how
detection systems that are binary- much time each classifier takes
based or multiclass. The data is during the training and testing
always labeled in supervised phases. Data preparation helps the
learning, with each record in a data classification model to minimize
set being assigned to a specific time and complexity by eliminating
class. All network traffic is unnecessary data and improving the
categorized into normal or abnormal performance of classification
classification techniques by an IDS methods before applying classifiers.
based on a classification model. The For the IDS dataset categorization,
enormous volume of data impedes the cross-validation procedure is
to create the model. Data divided evenly into two groups, that
preparation is required by is, one group would be utilized for
classification techniques, which testing and the remainder would be
may handle various challenges in used for training. Only a few
model construction, especially with algorithms may reliably
high-dimensional data. The discriminate between unusual
confusion matrix and accuracy attacks and typical attacks [19] as
performance assessment criteria shown in Fig. 3
determines the optimal ranking

Fig. 3. Generalized machine learning based IDS methodology

School of System and Technology
47
Volume 2 Issue 2, Fall 2022
Machine Learning for Intrusion Detection...

Fig. 4. Machine Learning Classifiers Categorization

The most commonly used relationships The Reference [21]
machine learning classifiers in proposed an automated intrusion
IDSis discussed below in Fig. 4. detection system.
A. Neural Networks C. Genetic Algorithms
The brain and other biological A genetic algorithm based on a
nervous systems handle information natural selection process is used to
similarly as neural networks do. address both bound and unbound
Artificial neural networks [20] have problems. To identify intrusions,
the ability to recognize and genetic algorithms may be
categorize the network activity employed effectively [22].
depending on various
D. Artificial Neural Networks
characteristics.
(ANN)
B. Bayesian Network's ANNs are based on biological
Bayesian Networks is a neural networks. They execute tasks
probabilistic learning model that using examples from noisy and
depicts an acyclic graph with partial data. ANN was designed to
conditional and unconditional address brain-like issues. Data-

Innovative Computing Review

48
Volume 2 Issue 2, Fall 2022
Ali et al.

intensive applications employ such and choose the highest likelihood.

systems. This section discusses NB is also used to detect intrusions.
ANN kinds, contributions, and
G. Regression Logistic (LR)
intrusion detection performance
[23]. The LR estimates Zero or One
from independent values. The fit
E. Support Vector Machine (SVM) data predicts the logistic function's
This supervised model event. The Reference [27] presented
classifies, regresses, and detects the the network anomaly which defined
outliers. Hyperplane-based data that the detection method is based
linearization. SVM maps the data on Internet traffic's nonlinear
into feature space and separates it invariant features. The findings
into classes using a hyperplane with demonstrated that this approach
the most significant class margin separates a wide variety of
and transforms it a into multiclass volumetric DoS attacks with great
ensemble. SVM excels the accuracy and precision.
nonlinear data. Using SVM,
H. Decision Tree (DT)
researchers have detected
intrusions. The Reference [24]–[25] A chart or tree model is used to
used SVM to detect network make decisions and examine the
breaches. Authors say that the high- potential ramifications of those
quality training data improves actions, including the outcomes of
detection efficiency and they also random occurrences. A decision tree
presented an SVM-based IDS. To has symbolic labels, while a
improve SVM detection, they log- regression tree has continuous
transformed marginal density ratio values. This method attempts to sort
(LMDRT). The results indicated the sample through a tree of options,
excellent DR and decent efficiency. with each decision affecting the
next. These decisions are tree-
F. Naïve Bayes (NB) structured. CART creates decision
Naïve Bayes is a Theorem- trees, whereas DT is used to detect
based categorization method. This intrusions. The Reference [28]
classifier believes that each suggested a misuse and anomaly-
characteristic's class probability is based hybrid intrusion detection
independent of others [26]. This approach. The experiment used
method is used to calculate the NSL-KDD. The proposed strategy
instance probabilities of each class improved DR, FPR, and
complexity. The recommended

School of System and Technology

49
Volume 2 Issue 2, Fall 2022
Machine Learning for Intrusion Detection...

method's time-saving techniques intrusion detection. Fuzzy logic

were not great. However, the future allows an item to belong to many
study would prove helpful to classes at once, it also proves useful
improve C4.5's decision tree when class differences are unclear.
algorithm. Fuzzy theory may identify intruders
when normal and abnormal
I. Random Forest (RF)
classifications aren't correctly
RF builds a decision tree, as the defined [31].
name indicates. It is made by
integrating many decision trees and L. Swarm Intelligence (SI)
averaging their forecasts. Single It solves complicated issues
indication is typically less accurate. through agent-environment
A forest looks healthier with more interactions. SI requires self-
trees. The Reference [29] suggested organization and work division.
an RF intrusion detector model. RF Self-organization is a system's
surpassed other conventional capacity to restore its agents without
classifiers in ranking successful outside aid. Parallel task execution
attacks. allows him to solve complicated
challenges. The ACO and PSO are
J. Clustering K-means
swarm-inspired algorithms. ACO
Clustering with K-means is an replicates ant behavior and solves
unsupervised ML algorithm. discrete optimization issues,
Unsupervised algorithms don't label whereas PSO solves nonlinear
the data. Data search groups drive optimization problems [32]–[33].
this algorithm. Groups items are
These approaches allow
based on similarities and contrasts.
algorithms to move beyond merely
K-means is used to pattern-match
static program instructions,
time series data. K-Means is
producing data-driven predictions
incapable to handle non-spherical
or judgments by constructing a
findings. Using K-mean,
model from sample inputs. It may be
researchers have detected intrusions
utilized in various computing tasks
[30].
when explicit methods cannot be
K. Fuzzy Logic (FL) designed or programmed, such as
Fuzzy Logic is utilized to network infiltration and security
examine the safety of a place and to breach.
begin scientific research. For
quantitative and security reasons,
fuzzy logic was employed for
Innovative Computing Review
50
Volume 2 Issue 2, Fall 2022
Ali et al.

IV. Literature Review The Reference [35] observed the

performance of 4 ML classifiers.
Several studies have been
Apache Spark tools were used to
conducted to enhance IDS to detect
categorize the network traffic
and prevent cyberattacks in the
intrusion detection. The model uses
previous decade. This section
42 characteristics from the UNSW-
examines the data preparation,
NB15 public network intrusion
feature selection, number of features
dataset. Among various classifiers,
picked, classification methods, and
a random forest classifier has the
assessment algorithms used in
most remarkable accuracy of
intrusion detection classification.
97.49%, specificity of 97.75% and
The Reference [34] developed a sensitivity of 93.53%.
machine learning-based wireless
The Reference [36] suggested a
network IDS. Before training, the
Hybrid Filter-based Selection
preparation phase converts the
Algorithm (HFSA). HFSA
dataset values to integers, scales
huge data ranges, and normalizes optimized a subset of the most
them into smaller fields. The current relevant and highest-ranking
classifier functions. This model uses
study employed multiple ML
real-time Jpcap packets. Nave
classifiers and focused to improve
Bayes classifies regular attacks as
classification algorithms' feature
optimization efficiency, which harmful ones. Preprocessing
involves two stages. Firstly, the
improves accuracy and detection
difficulty to transform input into a
time. The numbers 32, 10, 7, and 5
were chosen as valuable functions quantitative value. Secondly, during
data normalization, each record's
for the training model. The random
forest classifier with 32 specified attributes are scaled from (0,1) to
attributes showed the best (0,1). Naive Bayes feature selection
and Naive Bayes regression
performance in the experiments.
The classification methods methods were used to detect six
represented 99.64% accuracy, 0.995 standard classes. HFSA improves
precision, and 0.966 recalled—the the categorization system. The
suggested system used AWID model's total accuracy was 92%,
wireless data. The comparison of with 95% accuracy and 90% recall.
the proposed approach with various The Reference [37] introduced
categorization methods helps to an IDS based on SVM and Nave
validate the results. Bayes algorithms. The function
selection correlation subset type

School of System and Technology

51
Volume 2 Issue 2, Fall 2022
Machine Learning for Intrusion Detection...

selected 24 of 42 NSL-KDD accuracy. The approach showed a

functions. The data preparation 77.18% attack detection rate.
methods convert the characteristics The Reference [41] developed
to binary numbers and normalizes an embedded approach for SVM-
the data. SVM showed 93.95% based intrusion detection that uses
overall promising accuracy. Naive Bayes. The embedding model
The Reference [38] proposed a was used in numerous datasets to
supervised approach to detect identify different sorts of attacks,
malicious network traffic. The including NSL-KDD and Kyoto
current study employed ANN and 2006+. Based on the embedded
SVM algorithms to classify the data. system against a single SVM
Both filter and wrapper feature algorithm, the suggested technique
selections were used, that is, Chi- found that the combination of Naive
Square and correlation. The 25,191- Bays with SVM improves detection
record NSL-KDD training dataset. accuracy. NSL-KDD represented
The wrapper technique is based on the highest accuracy of 99.36%.
17 of 41 essential features. A chi- The Reference [42] revealed
squared filter selects 35 more that IDS uses a hybrid classification
interesting and relevant training algorithm with profile
model attributes. The wrapper augmentation. Hybrid classification
strategy, which picks 17 features, methods use Nave Bayes and SVM.
has the maximum ANN accuracy of It also preprocesses the data.
94.02%. Normalizing data, scaling attributes
The Reference [39] introduced a to 0,1, and picking the suitable real-
new feature of categorization and time dataset characteristics improve
selection technique using ART and model accuracy. This hybrid
Random Forest. HAIDS is the technique achieved an overall
system (Hybrid Anomaly-Based accuracy of 93.10%.
Intrusion Detection System). The The Reference [43] proposed
hybrid technique showed very hybrid IDS categorization in 2020.
promising accuracy of 87.74%. Hybrid of Decision Tree J48 was
The Reference [40] introduced performed with SVM. The SVM
IDS based hybrid system which overcomes high-dimensionality.
combined KNN, ELM, and HELM. Particle Swarm Optimization (PSO)
The suggested system's KDD Cup was utilized to extract features,
99 results revealed 84.29% selecting nine out of 42 that were
meaningful. Training and testing
Innovative Computing Review
52
Volume 2 Issue 2, Fall 2022
Ali et al.

was carried out using KDD99. The study compared the results of
data collection was proportioned. classifiers using all 41 features vs.
The results revealed that 70% 11, 12, 13, and 15 feature sets. The
testing and 30% training proved best reduction of characteristics
for accuracy and false alarm rate. enhanced precision in the
The hybrid model achieved 99.1% experiment. Random Forest
of accuracy. Classification Algorithm performed
better with the DoS class at 99.63%
The Reference [44] updated the
accuracy.
electricity smart grid to identify
regular harmful attacks. A Hybrid The Reference [47] developed
Decision Trees (HDTs) approach an intrusion detection system
was devised to identify the attacks. employing a random forest
The proposed hybrid method's classifier with PCA scaling.
presentation was also compared Decision trees, naive Bayes, and
with SVM. The trials demonstrated SVM were compared to the
that the proposed strategy (HDT) suggested technique. The proposed
was more efficient with a measuring approach obtained the maximum
accuracy of 97.2193% using accuracy of 96.78 percent, an error
NSLKDD. rate of 0.21 percent, and built the
3.42 model which proved to be the
The Reference [45] suggested a
fastest.
DDoS detection approach to
increase network security in 2020. The Reference [48] provided a
The classification was carried out technique for anomalous IDS based
using K-Nearest Neighbor and Nave on ML classifier. The CSE-CIC-
Bayes, while feature extraction IDS2018 dataset model showed 80
employed correlation. The proposed features. This ensemble feature
model was compared against NSL- optimization approach used Chi-
KDD and KDD Cup 99 learning square to calculate high feature rank
models. The eight-character KNN correlation. The hybrid technique
technique surpassed Naive Bayes. picked 23 of 80 features. The
Performance was calculated to be suggested model outperformed the
98.51 percent and accuracy 98.9%. three overall classifiers' accuracy of
98.8%.
The Reference [46] explained
the usage of feature reduction in the
classification model. Intelligent
IDS were presented employing
various ML classifiers. The current
School of System and Technology
53
Volume 2 Issue 2, Fall 2022
Machine Learning for Intrusion Detection...

V. Comparative Analysis of provided and addressed in the

Various Ml Algorithms Used for current study. Various IDS apps
IDS were thrown out, as well as a
The survey of intrusion performance evaluation. The
survey's results are summarized in
detection using ML algorithm was
Table I.
Table I
Summary of Literature Review
Feature Optimization
Ref Dataset Classifier Accuracy
Approach
Random
[34] AWID ZeroR 99.64%
Forest
Random
[35] UNSW-NB15 ZeroR 97.49%
Forest
[36] KDD Cup 99 HFSA Naïve Bayes 92%
[37] NSL– KDD CFS Subset Eval SVM 93.95%
[38] NSL– KDD Correlation Chi-Square ANN 94.02%
Regression
[39] UNSW-NB15 Random Forest 87.74%
Tree
[40] NSL– KDD Software SDN KNN 84.29%
[41] NSL– KDD Naïve Bayes Hybrid SVM 99.36%
Real World
[42] Naïve Bayes Naïve Bayes 95.3%
Log
[43] KDD'99 PSO J48 99.1%
[44] NSLKDD CART tree Decision Tree 97.21%
[45] KDD Cup 99 Correlation KNN 98.9%
feature reduction PCA - Random
[46] NSL KDD 99.63%
RFE Forest
Random
[47] KDD PCA 96.78%
Forest
SE-CIC-
[48] Chi-square Correlation Decision Tree 98.8%
IDS2018

Innovative Computing Review

54
Volume 2 Issue 2, Fall 2022
Ali et al.

Comparatively Analysis of various ML Algorithms used for

IDS
100.00%

95.00%

90.00%

85.00%

80.00%

75.00%

70.00%

Fig. 5. Graphically comparatively analysis of various ml algorithms used

for IDS
Most researchers compared the
suggested that Random Forest and
Decision Tree models according on
the literature review. The highest
accuracy was calculated to be
99.64% which was obtained by
using Random Forest as shown in
Fig. 5.
VI. Research Challenges
This section discusses IDS
research challenges as shown in Fig.
6.
Fig. 6. IDS based research
challenges
School of System and Technology
55
Volume 2 Issue 2, Fall 2022
Machine Learning for Intrusion Detection...

A. No Systematic Dataset number of minority attack

occurrences to balance the dataset.
The current study emphasized
Recently, researchers applied
the lack of an up-to-date dataset
SMOTE, Random Over Sampler,
reflecting recent network threats.
and ADASYN Algorithm to reduce
Most of the proposed approaches
the dataset imbalance ratio and
couldn't detect zero-day attacks
improve performance [50].
because their models lacked
adequate kinds and patterns of the C. Real-World Performance
attack. Earlier and newer attacks The Real-world performance is
must be evaluated and confirmed for another IDS research problem. Most
an effective IDS model. By suggested approaches are lab-tested
incorporating the maximum number using public datasets. None of the
of attacks in a dataset, ML/DL may offered methods is field-tested. It's
learn more patterns and guard unclear how they would function in
against maximal incursions. Dataset real-world situations. Still old
creation is expensive and requires datasets for testing are being used.
expertise. One of the IDS's research The proposed procedure must be
problems is building an up-to-date equally effective as in lab testing.
dataset with fine examples of The suggested solution should be
practically all attack types. The evaluated in real-time to ensure its
dataset should be updated usefulness for current networks
periodically and made accessible to
[51].
benefit researchers [49].
D. Complex Models Take
B. Lower Detection Accuracy
Resources
Owing to an Imbalanced Dataset
Most IDS strategies provided by
According to the current study,
the researcher need a lot of
most of the proposed IDS
processing time and computational
approaches have poorer detection resources (almost 80 percent DL-
accuracy for particular attack types based methods or ML-based
than the model overall. Unbalanced methods). This may add processing
data causes this difficulty. Low-
costs and degrade the IDS
frequency attacks have insufficient performance. A multi-core GPU
detection accuracy than frequent may speed up the calculation and
strikes. To combat this problem two
minimize time, however it is
solutions have been proposed. expensive. Similarly, an efficient
Firstly, create a balanced, up-to-date feature selection method is needed
dataset. Secondly, increase the
Innovative Computing Review
56
Volume 2 Issue 2, Fall 2022
Ali et al.

to choose the most significant current article aimed to evaluate the

features for speedy processing. performance of various/differently
Researchers are exploring different ranking algorithms by using a
optimization techniques for feature variety of criteria and compared
selection, however, there is still a their results. A variety of metrics
room for improvement. More study were used to evaluate the
is needed to develop an efficient performance of the classifiers, out
approach [52]. of which the random forest method
produced satisfactory results. It
E. Lightweight IoT Security
proved to be one of the excellent
An IDS can secure the IoT and accurate methods to identify the
network and sensor nodes. In IoT, various kinds of attacks. To obtain
sensor nodes collect and exchange good performance from the model,
critical data online. Sensor nodes most researchers chose to construct
have limited CPU, storage, and IDSs by utilizing the hybrid
battery life. IDS may be installed classification method, rather than
where internet traffic enters the IoT using individual classification. In
network or is dispersed over sensor big data sets, the success of size
nodes. In the first case, the NIDS reduction in lowering the
must identify malicious attacks complexity leads to selecting
efficiently and face the same outstanding features which, in turn,
obstacles. Secondly, resource- leads to improved classification
limited sensor nodes need a performance in terms of accuracy
lightweight IDS paradigm. and speed.
Designing a lightweight IDS model,
efficient in processing power, Conflict of Interest
training time, and intrusion The authors declare that they
detection rate is a problem [53]. have no conflict of interest
regarding the publication of this
VII. Conclusion
paper.
The effectiveness of various
machine learning strategies is Acknowledgment
required since it plays an important The authors would like to thank
role to enhance the IDS the referees for their careful reading
performance. Classification and for their comments, which
algorithms play a crucial part to help significantly improved the paper.
the IDSs differentiate between Additionally, thanks to Dr. Salman
multiple forms of attacks. The Qadri, (Associate Professor,

School of System and Technology

57
Volume 2 Issue 2, Fall 2022
Machine Learning for Intrusion Detection...

Chairman Department of Computer detection system: A systematic

Science, MNS University of study of machine learning and
Agriculture, Multan, Pakistan) and deep learning
Dr. Farrukh Jamal, (Assistant approaches,” Trans. Emerg.
Professor, Department of Statistics, Telecommun. Technol., vol. 32,
The Islamia University of no. 1, Art. no. 4150, 2021, doi:
Bahawalpur, Pakistan) for his https://doi.org/10.1002/ett.4150
motivational support. [5] M. Sarhan, S. Layeghy, and M.
References Portmann, “Towards a standard
feature set for network intrusion
[1] I. Levin and M. Dan, “Culture
detection system
and society in the digital
datasets,” Mobile Netw. Appl.,
age,” Information, vol. 12, no.
vol. 1, pp. 1-14, 2021, doi:
2, Art. no. 68, Feb. 2021, doi:
https://doi.org/10.1007/s11036-
https://doi.org/10.3390/info120
021-01843-0
20068
[6] A. Thakkar and R. Lohiya, “A
[2] N. A. Usmani, T. Ahmed, and
survey on intrusion detection
M. Faisal, “An IoT-based
system: feature selection,
Framework toward a Feasible
model, performance measures,
Safe and Smart City Using
application perspective,
Drone Surveillance,” in Smart
challenges, and future research
Cities, K. Kumar, G. Saini, D.
directions,” Artif. Intell. Rev.,
Manh Nguyen, N. Kumar, and
vol. 55, pp. 453–563, 2021, doi:
R. Shah, Eds., CRC Press, 2022,
https://doi.org/10.1007/s10462-
pp. 97–112.
021-10037-9
[3] K. F. Steinmetz, A. Pimentel,
[7] R. Leszczyna, “Review of
and W. R. Goe, “Performing
cybersecurity assessment
social engineering: A qualitative
methods: Applicability
study of information security
perspective,” Comput.
deceptions,” Comput. Hum.
Secur., vol. 108, Art. no.
Behav., vol. 124, Art. no.
102376, 2021, doi:
106930, 2021, doi:
https://doi.org/10.1016/j.cose.2
https://doi.org/10.1016/j.chb.20
021.102376
21.106930
[8] H. Wu, N. Ba, S. Ren, et al.,
[4] Z. Ahmad, A. K. Shahid, C. S.
“The impact of internet
Wai, J. Abdullah, and F.
development on the health of
Ahmad, “Network intrusion
Innovative Computing Review
58
Volume 2 Issue 2, Fall 2022
Ali et al.

Chinese residents: Transmission comprehensive survey on

mechanisms and empirical security, attacks and
tests,” Socio-Econom. Plann. countermeasures for industrial
Sci., vol. 81, Art. no. 101178, applications,” Sensors, vol. 21,
2021, doi: no. 11, Art. no. 3654, 2021, doi:
https://doi.org/10.1016/j.seps.2 https://doi.org/10.3390/s21113
021.101178 654
[9] H. Wu, Y. Hao, S. Ren, X. [13] A. Khraisat and A. Alazab, “A
Yang, and G. Xie, “Does critical review of intrusion
internet development improve detection systems in the
green total factor energy internet of things: techniques,
efficiency? Evidence from deployment strategy, validation
China,” Energy Policy, vol. strategy, attacks, public
153, Art. no. 112247, 2021, doi: datasets and
https://doi.org/10.1016/j.enpol. challenges,” Cybersecur., vol.
2021.112247 4, no. 1, pp. 1–27, 2021, doi:
https://doi.org/10.1186/s42400
[10] A. Churcher, R. Ullah, J.
-021-00077-7
Ahmad, et al., “An
experimental analysis of attack [14] D. Chen, P. Wawrzynski, and
classification using machine Z. Lv, “Cyber security in smart
learning in IoT cities: A review of deep
networks,” Sensors, vol. 21, learning-based applications
no. 2, Art. no. 446, 2021, doi: and case studies,” Sustain.
https://doi.org/10.3390/s21020 Cities Soci., vol. 66, Art. no.
446 102655, 2021, doi:
https://doi.org/10.1016/j.scs.20
[11] J. Perháč, V. Novitzká, W.
20.102655
Steingartner, and Z. Bilanová,
“Formal model of IDS based on [15] M. Mahdavisharif, S. Jamali,
BDI logic,” Math., vol. 9, no. and R. Fotohi, “Big data-aware
18, Art. no. 2290, 2021, doi: intrusion detection system in
https://doi.org/10.3390/math91 communication networks: a
82290 deep learning approach,” J.
Grid Comput., vol. 19, no. 4,
[12] N. Abosata, S. A. Rubaye, G.
pp. 1–28, 2021, doi:
Inalhan, and C.
https://doi.org/10.1007/s10723
Emmanouilidis, “Internet of
-021-09581-z
things for system integrity: a

School of System and Technology

59
Volume 2 Issue 2, Fall 2022
Machine Learning for Intrusion Detection...

[16] P. Kumar, G. P. Gupta, and R. [20] A. O. Drewek, M. Pietrołaj, and

Tripathi, “Design of anomaly- J. Rumiński, “A survey of
based intrusion detection neural networks usage for
system using fog computing for intrusion detection systems,” J.
IoT network,” Aut. Control Ambient Intell. Humaniz.
Comput. Sci., vol. 55, no. 2, pp. Comput., vol. 12, no. 1, pp.
137–147, 2021, doi: 497–514, 2021.
https://doi.org/10.3103/S01464 https://doi.org/10.1007/s12652
11621020085 -020-02014-x
[17] V. Ponnusamy, M. Humayun, [21] P. G. George and V. R. Renjith,
N. Jhanjhi, A. Yichiet, and M. “Evolution of safety and
F. Almufareh, “Intrusion security risk assessment
detection systems in internet of methodologies towards the use
things and mobile ad-hoc of bayesian networks in process
networks,” Comput. Syst. Sci. industries,” Process Saf.
Eng., vol. 40, no. 3, pp. 1199– Environ. Prot., vol. 149, pp.
1215, 2022, doi: 758–775, 2021, doi:
https://doi.org/10.32604/csse.2 https://doi.org/10.1016/j.psep.
022.018518 2021.03.031
[18] Y. Jiang and Y. Atif, “A [22] A. J. Obaid, K. A. Alghurabi, S.
selective ensemble model for A. Albermany, and S. Sharma,
cognitive cybersecurity “Improving extreme learning
analysis,” J. Netw. Comput. machine accuracy utilizing
Appl., vol. 193, Art. no. genetic algorithm for intrusion
103210, 2021, doi: detection purposes,”
https://doi.org/10.1016/j.jnca.2 in Research in Intelligent and
021.103210 Computing in Engineering, R.
N. Kumar, N. H. Quang, V.
[19] I. Castiglioni, L. Rundo, M.
Kumar Solanki, M. Cardona,
Codari, et al., “AI applications
P. K. Pattnaik, Eds., Singapore:
to medical images: From
Springer, 2021, pp. 171–177,
machine learning to deep
doi:
learning,” Physica Med., vol.
https://doi.org/10.1007/978-
83, pp. 9–24, 2021, doi:
981-15-7527-3_17
https://doi.org/10.1016/j.ejmp.
2021.02.006 [23] M. Choraś and M. Pawlicki,
“Intrusion detection approach
based on optimised artificial
Innovative Computing Review
60
Volume 2 Issue 2, Fall 2022
Ali et al.

neural https://doi.org/10.1016/j.ipm.2
network,” Neurocomput., vol. 021.102540
452, pp. 705–715, 2021, doi: [28] R. Kajal, D. Syamala, and G.
https://doi.org/10.1016/j.neuco Ajay, “Decision tree-based
m.2020.07.138 Algorithm for Intrusion
[24] M. Ajdani and H. Ghaffary, Detection,” Int. J. Adv. Netw.
“Design network intrusion Appl., vol. 7, no. 4, pp. 2828–
detection system using support 2834, 2021.
vector machine,” Int. J. [29] N. Kaur, M. Bansal, and S. S.
Commun. Syst., vol. 34, no. 3, Sran, “Scrutinizing attacks and
Art. no. 4689, 2021, doi: evaluating performance
https://doi.org/10.1002/dac.46 appraisal parameters via
89 feature selection in intrusion
[25] M. Mohammadi, T. A. Rashid, detection system,” Res. Squ.,
S. H. T. Karim, et al, “A vol. 10, pp. 1–14, 2021, doi: :
comprehensive survey and https://doi.org/10.21203/rs.3.rs
taxonomy of the SVM-based -748765/v1
intrusion detection systems,” J. [30] Q. V. Dang, “Studying the
Netw. Comput. Appl., vol. 178, fuzzy clustering algorithm for
Art. no. 102983, 2021, doi: intrusion detection on the
https://doi.org/10.1016/j.jnca.2 attacks to the domain name
021.102983 system,” in 2021 5th World
[26] M. Zubair, A. Ali, S. Naeem, F. Conf. Smart Trends Syst. Secur.
Jamal and C. Chesneau, Sustainab. (WorldS4), London,
“Emotion recognition from United Kingdom, 29–30 July,
facial expression using 2021, IEEE, pp. 271–274, doi:
machine vision approach,” J. https://doi.org/10.1109/World
Appl. Emerg. Sci., vol. 10, no. S451998.2021.9514038
1, pp. 12–21, 2020. [31] M. Almseidin, J. Al-Sawwa,
[27] X. Duan, S. Ying, W. Yuan, H. and M. Alkasassbeh,
Cheng, and X. Yin, “QLLog: A “Anomaly-based Intrusion
log anomaly detection method Detection System Using Fuzzy
based on Q-learning Logic,” in 2021 Int. Conf.
algorithm,” Info. Process. Inform. Technol., IEEE,
Manag., vol. 58, no. 3, Art. no. Amman, Jordan, July 14–15,
102540, 2021, doi: 2021, pp. 290-295, doi:

School of System and Technology

61
Volume 2 Issue 2, Fall 2022
Machine Learning for Intrusion Detection...

https://doi.org/10.1109/ICIT52 [35] A. Ali and S. Naeem, “The

682.2021.9491742 controller parameter
optimization for nonlinear
[32] A. Alsaleh and W. Binsaeedan,
systems using particle swarm
“The influence of salp swarm
optimization and genetic
algorithm-based feature
algorithm,” J. Appl. Emerg.
selection on network anomaly
Sci., vol. 12, no. 1, 2022.
intrusion detection,” IEEE
Access, vol. 9, pp. 112466- [36] K. S. Bhosale, M. Nenova, and
112477, Aug. 2021, doi: G. Iliev, “Data mining based
https://doi.org/10.1109/ACCE advanced algorithm for
SS.2021.3102095 intrusion detections in
communication networks,” in
[33] J. E. Fontecha, P. Agarwal, M.
2018 Int. Conf. Comput. Tech.
N. Torres, S. Mukherjee, L. J.
Electron. Mechanic. Syst.,
Walteros, and J. P. Rodríguez,
Belgaum, India, Dec. 21–22,
“A two‐stage data‐driven
2018, pp. 297–300, doi:
spatiotemporal analysis to
https://doi.org/10.1109/CTEM
predict failure risk of urban
S.2018.8769173
sewer systems leveraging
machine learning [37] K. K. Gulla, P. Viswanath, S.
algorithms,” Risk Anal., vol. B. Veluru, and R. R. Kumar,
41, no. 12, pp. 122-151, Dec. “Machine learning based
2021, doi: intrusion detection
https://doi.org/10.1111/risa.13 techniques,” in Handbook of
742 computer Networks and Cyber
Security, B. Gupta, G. Perez, D.
[34] R. Abdulhammed, M.
Agrawal, D. Gupta. Eds.,
Faezipour, A. Abuzneid, and A.
Springer, 2020. pp. 873–888.
Alessa, “Effective features
selection and machine learning [38] K. A. Taher, B. M. Y. Jisan,
classifiers for improved and M. M. Rahman, “Network
wireless intrusion detection,” intrusion detection using
in 2018 Int. Symp. Netw, supervised machine learning
Comput. Commun., Rome, technique with feature
Italy, June 19–21, 2018, pp. 1– selection,” in 2019 Int. Conf.
6, doi: Robot. Elect. Signal Process.
https://doi.org/10.1109/ISNCC Tech., 10–12 Jan. 2019, pp.
.2018.8530969 643–64, doi:

Innovative Computing Review

62
Volume 2 Issue 2, Fall 2022
Ali et al.

https://doi.org/10.1109/ICRES and support vector machine,”

T.2019.8644161 In 2020 IEEE 5th Int. Conf.
Comput. Commun. Autom.,
[39] S. Naeem and A. Ali, “Bees
Greater Noida, India, Oct. 30–
algorithm based solution of
31, 2020, pp. 396–400.
non-convex dynamic power
dispatch issues in thermal [44] S. M. Taghavinejad, M.
units,” J. Appl. Emerg. Taghavinejad, L. Shahmiri, M.
Sci., vol. 12, no. 1, 2022. Zavvar, and M. H. Zavvar,
“Intrusion detection in iot-
[40] M. Latah and l. Toker, “An
based smart grid using hybrid
efficient flow-based multi-
decision tree,” in 2020 6th Int.
level hybrid intrusion
Conf. Web Res., Tehran, Iran,
detection system for software-
Apr. 22–23, 2020, pp. 152–
defined networks,” CCF
156,
Trans. Netw., vol. 3, no. 3, pp.
https://doi.org/10.1109/ICWR
261–271, 2020, doi:
49608.2020.9122320
https://doi.org/10.1007/s4204
5-020-00040-z [45] A. V. Kachavimath, S. V.
Nazare, and S. S. Akki,
[41] J. Gu and S. Lu, “An effective
“Distributed denial of service
intrusion detection approach
attack detection using naïve
using SVM with naïve Bayes
bayes and k-nearest neighbor
feature embedding,” Comput.
for network forensics,” in 2020
Secur., vol. 103, Art. no.
2nd Int. Conf. Innov. Mecha.
102158, 2021, doi:
Indust. Appl., Bangalore, India,
https://doi.org/10.1016/j.cose.
Mar. 5–7, 2020, pp. 711–717,
2020.102158
doi:
[42] P. Pokharel, R. Pokhrel, and https://doi.org/10.1109/ICIMI
S. Sigdel, “Intrusion detection A48430.2020.9074929
system based on hybrid
[46] G. Sah and S. Banerjee,
classifier and user profile
“Feature reduction and
enhancement techniques,” in
classifications techniques for
2020 Int. Work. Big Data
intrusion detection system,” in
Inform. Secur., pp. 137–144,
2020 Int. Conf. Commun. Sig.
2020.
Process., Chennai, India, July
[43] A. Kumari and A. K. Mehta, “A 28-30, 2020, pp. 1543-1547,
hybrid intrusion detection doi:
system based on decision tree
School of System and Technology
63
Volume 2 Issue 2, Fall 2022
Machine Learning for Intrusion Detection...

https://doi.org/10.1109/ICCSP ubiquitous and smart

48568.2020.9182216 environment,” Sustain. Energy
Technol. Assess., vol. 52, Art.
[47] S. Waskle, L. Parashar, and U.
no. 102311, Aug. 2022, doi:
Singh, “intrusion detection
https://doi.org/10.1016/j.seta.2
system using PCA with random
022.102311
forest approach,” in 2020 Int.
Conf. Electron. Sustain. [51] Z. Wang, Y. Liu, D. He, and S.
Commun. Syst., Coimbatore, Chan, “Intrusion detection
India, July 2–4, 2020 pp. 803– methods based on integrated
808, doi: deep learning model,” Comput.
https://doi.org/10.1109/ICESC Secur., vol. 103, Art. no.
48915.2020.9155656 102177, Apr. 2021, doi:
https://doi.org/10.1016/j.cose.2
[48] Q. R. S. Fitni and K. Ramli,
021.102177
“Implementation of ensemble
learning and feature selection [52] N. Jose and J. Govindarajan,
for performance improvements “DOMAIN-Based intelligent
in anomaly-based intrusion network intrusion detection
detection systems,” In 2020 system,” in Invent. Comput.
IEEE Int. Conf. Indust. 4.0, Info. Technol., S. Smys, V. E.
Artif. Intell. Commun. Balas, R. Palanisamy, Eds.,
Technol., Bali, Indonesia, July Singapore, Springer, pp. 449-
7–8, 2020, pp. 118–124, doi: 462, 2022, doi:
https://doi.org/10.1109/IAICT https://doi.org/10.1007/978-
50021.2020.9172014 981-16-6723-7_34
[49] M. Ghurab, G. Gaphari, F. [53] S. Roy, J. Li, B. J. Choi, and Y.
Alshami, R. Alshamy, and S. Bai, “A lightweight supervised
Othman, “A detailed analysis intrusion detection mechanism
of benchmark datasets for for IoT networks,” Future
network intrusion detection Gener. Comput. Syst., vol. 127,
system,” Asian J. Res. Comput. pp. 276–285, Feb. 2022, doi:
Sci., vol. 7, no. 4, pp. 14-33, https://doi.org/10.1016/j.future
2021. .2021.09.027
[50] M. Ragab and M. F. S. Sabir,
“Outlier detection with optimal
hybrid deep learning enabled
intrusion detection system for