Ensemble-Based Botnet Attack Detection and Classification Using Machine Learning Algorithms on NBaIoT Dataset

Uploaded by

avneets2103

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Ensemble-Based Botnet Attack Detection and Classification Using Machine Learning Algorithms on NBaIoT Dataset

Uploaded by

avneets2103

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Ensemble-Based Botnet Attack Detection and

Classification Using Machine Learning Algorithms

on NBaIoT Dataset
1st Mamta Rawat 2nd Avneet Singh Bedi 3rd Balvinder Singh
Department of Computer Science Department of Computer Science Department of Computer Science
(NSUT) (NSUT) (NSUT)
New Delhi,India New Delhi,India New Delhi,India
mamta.rawat.phd22@nsut.ac.in avneets2103@gmail.com balvindersingh485280@gmail.com

4th Sneha Gupta 5th Gaurav Singal 6th Preeti Kaur

Department of Computer Science Department of Computer Science Department of Computer Science
(NSUT) (NSUT) (NSUT)
New Delhi,India New Delhi,India New Delhi,India
gsneha20012004@gmail.com gaurav.singal@nsut.ac.in preeti.kaur@nsut.ac.in
2024 IEEE Region 10 Symposium (TENSYMP) | 979-8-3503-6486-6/24/$31.00 ©2024 IEEE | DOI: 10.1109/TENSYMP61132.2024.10752221

Abstract—Through botnet assaults, Mirai and BASHLITE with the ultimate goal of developing a robust model for the
present serious risks in the context of the Internet of Things detection and classification of these attack. Unraveling their
(IoT). A reliable detection and classification model is the need methods and impact, BASHLITE, also known as Gafgyt, is
of the hour. This article presents a solution to the problem by
suggesting an ensemble-based algorithm to detect and classify malicious software that specifically targets Linux systems. Its
different categories of Mirai and BASHLITE attacks. We have primary aim is to infect these systems and launch DDoS
used the NBaIoT, a heterogeneous dataset, that purposefully com- attacks, which can be as formidable as 400 Gbps [5]. In 2014,
promised on many devices, and exposes subtle attack categories. BASHLITE made its debut by exploiting a vulnerability in
Interestingly, certain devices resist Mirai infections, improving the bash shell, known as the Shellshock software bug. Later,
our analysis. By utilizing important data types and various time-
based window sizes, our approach seeks to create an effective its source code leaked in 2015, leading to a proliferation of
model for identifying and classifying IoT botnet assaults to variants and infections reaching one million devices by 2016.
strengthen the IoT ecosystem against new attacks. The primary targets of BASHLITE are IoT devices, with a
Index Terms—Internet of Things, Intrusion Detection, Botnet, staggering 96 percent of identifiable devices in botnets being
Mirai, BASHLITE, IoT devices, including cameras and DVRs [6].
Mirai, another infamous botnet, has a distinct modus
I. I NTRODUCTION operandi. It targets smart devices running on ARC processors,
Understanding IoT, requires envisioning a vast intercon- transforming them into remotely controlled bots, forming what
nected web of devices, ranging from smart thermostats to is commonly known as a botnet [7]. This botnet, in turn, is
security cameras, collectively contributing to a digitally wo- employed to launch DDoS attacks, harnessing the combined
ven tapestry. However, this interconnectedness brings forth power of these infected devices. Mirai scans the vast expanse
vulnerabilities, as cyber threats adapt and evolve to exploit of the Internet for vulnerable IoT devices, exploiting default
potential weaknesses in this expansive ecosystem [1]. In the username and password combinations to gain unauthorized
vast landscape of IoT, a lurking threat has emerged - the IoT access. Once inside, it infects the device, adding it to its legion
botnet attack. These attacks involve the manipulation of IoT of controlled devices [8].
devices to form a network, commonly referred to as botnet [2]. The major contributions of the paper are:
This web of compromised devices becomes a powerful tool for • To thoroughly analyze the NBaIoT (Network Based
cyber attackers, particularly in executing distributed denial-of- Anamoly detection in IoT) dataset and compare the
service (DDoS) attacks [3], [4]. performance of various machine learning (ML) and Deep
Our research delves into the intricacies of these attacks, Learning (DL) techniques on the given dataset.
focusing on the notorious Mirai and BASHLITE botnets, • To use various ML and DL paradigms to contribute to-
wards forming a reliable detection model using a variation
This project work was Sponsored by the SERB SURE funding agency, of the Stacking Ensemble technique to create a resource-
India. Application Number: SUR/2022/001939.
efficient detection model to operate on IoT devices.
979-8-3503-6486-6/24/ $31.00 ©2024 IEEE • To improve upon potentially outdated models applied to

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on December 12,2024 at 07:10:36 UTC from IEEE Xplore. Restrictions apply.
In another work, several experiments were conducted us-
ing Deep Neural Network (DNN) for detecting IoT attacks.
Most widely used datasets like KDD-Cup’99, NSL-KDD, and
UNSW-NB15 were used to assess the performance of the given
model. It was observed that the model based on DNN achieved
more than 90% accuracy for all the datasets [13].
Different ML approaches including Multi-layer Perception
Artificial Neural Network (MLP ANN), K- Nearest Neigh-
bour (KNN), and Naive Bayes were used to detect DDoS
attack. The dataset used for the evaluation is the BoT-IoT.
The algorithms are implemented on the two different sets
of the dataset. The first set is the actual dataset while the
second one is a class-balanced dataset created by applying
the SMOTE technique to the original dataset. It is observed
that applying SMOTE improved accuracy, precision as well as
recall. Based on the complete experimentation it is seen that
KNN performed best [14].
The author in [15] has proposed to use both supervised
and unsupervised ML algorithm for botnet detection in IoT
devices.

Fig. 1: Data Distribution in Different Categories III. M ETHODOLOGY

extensive datasets. It focuses on developing innovative, The methodology section of this research paper discusses
advanced models for different attack classifications by and explains the data processing and model development
combining various basic models for a more comprehen- pipeline used. The section provides a detailed account of the
sive solution. methodology used to develop the proposed solution, including
The rest of the paper is organized as: Section 2 discusses the models’ exhaustive training on pre-existing data sets and
the literature work followed by the Methodology in the use of exhaustive data sets for better validation and tweak-
Section 3 and the Results in Section 4. Finally, Section ing of the models. The section also highlights the importance
5 concludes the paper along with the future directions. of using combined models and how they can help overcome
the limitations of the existing models.
II. L ITERATURE R EVIEW
Many studies have been done for Botnet detection in IOT A. Dataset Description
environments using ML and DL techniques over different The dataset we have used is called NBaIoT dataset [16].
datasets. This section discusses some of the existing works The dataset consists of nine real IoT devices intentionally
on Botnet detection in IoT environment. compromised using two botnets: Mirai and Gafgyt/BASHLite.
Husain et.al. in their work have trained and evaluated the The dataset extracts 115 features (the same 23 features in 5-
performance of different classifiers like neural network, multi- time windows) to get the snapshot of a packet. Ten different
logistic regression, nonlinear SVM, XGBoost, Naive Bayes, attacks (5 each for Mirai and Gafgyt) are executed and tested.
and Random forest on the UNSW-NB15 dataset and it was The NBAIoT data set comprises 7062606 instances, with every
observed that XGBoost performed best with the accuracy instance having 115 associated details.
of 88% followed by random forest which reported 87.89% The dataset, meticulously structured, delineates 11 distinct
accuracy [9] classes. Spanning from benign instances to highly specific
Many researchers have used ML [10] on flow-based fea- attack types, the classification encompasses a myriad of sce-
tures. The problem posed by flow-based features is that they narios. For Gafgyt, the attack types include combo, junk, scan,
cannot fully represent the network’s communication patterns. TCP, and UDP, while Mirai introduces classes such as ack,
so, the author have proposed to use a graph-based ML model scan, syn, UDP, and UDP plain. This intricate classification
for botnet detection. The proposed technique is compared with schema lays the foundation for a granular comprehension of
existing flow-based and graph-based techniques and found the multifaceted nature of each distinct attack type. Figure 1
to be more accurate and precise [11]. With the aim of shows how the data is distributed among different categories.
enhancing accuracy in finding vulnerabilities in IoT devices, In the relentless pursuit of effective inference and feature
Hemanth.et.el in [12], have suggested to use a convolutional extraction, our methodology identifies and prioritizes five
Neural Network (CNN). A well-known dataset, UNSW-NB15 pivotal types of data. These include the mutual information
was used to assess the performance of the proposed model. between packet direction and label (MI-dir), the entropy of
Results show that the suggested model achieves better accu- packet length (H), the entropy of packet direction (HH), the
racy than other existing works. jitter of packet direction (HH-jit), and the traffic measure from

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on December 12,2024 at 07:10:36 UTC from IEEE Xplore. Restrictions apply.
XGBoost RF ANN RNN Final Model
Parameter Value Parameter Value Features L1 L2 L3 L1 L2 L1 L2 L3
alpha 0.06 n estimators 100 Activation sig relu sig sig sig relu relu sig
gamma 0.3 max features 0.4 Function
max depth 4 min sample leaf 10 No. of 32 24 1 48 1 16 8 1
min child weight 5 max depth 9 Percep-
n estimators 80 trons

TABLE I: Values assigned to Different Parameters of TABLE II: Value of Different Parameters of ANN, RNN and
XGBoost and RF Model for Binary Classification Final model for Binary Classification
Note* L stands for Layer
the packet’s host + port to the packet’s destination host + port
(HpHp) [17]. These discerning data types collectively serve were finally used for ensemble learning to make a final
as the keystones for unraveling the complexities embedded prediction. We have used the following models.
within the network traffic, setting the stage for the subsequent 1 Random Forests: RF is a well-known ensemble algorithm
phases of our research endeavor. that can be leveraged for both classification and regres-
To capture the essence of network traffic statistics, we’ve sion problems [18]. Parameters are decided upon hyper-
considered different window sizes based on time, denoted by parameter training using Randomized Search CV. Table I
Lambda (L). Ranging from shorter time frames like L5 and L3 shows the final hyperparameters.
to longer time frames like L0.01, these windows provide a nu- 2 Artificial Neural Network (ANN): ANN is an intercon-
anced understanding, detecting both short-lived and prolonged nected network of neurons [19]. Three types of layers
attacks [17]. Noteworthy is the distinctive status of devices 3 are there, input layer, multiple hidden layers, and out-
and 7. In an intriguing turn of events, Mirai refrained from put layer for receiving input, performing computation
infecting these devices, resulting in the conspicuous absence and producing output respectively. Hyperparameters are
of Mirai-related data. This peculiarity enriches the dataset with devised using Keras-tuner. It’s a Feed Forward Neural
a nuanced diversity, fostering a conducive environment for an Network. All the layers are ’Dense’ and the model is
in-depth analysis of the malevolent intricacies pervasive within ’Sequential’. Total number of layers used in the model
the IoT ecosystem. are 3. The final hyperparameters are shown in Table II.
3 Recurrent Neural Network (RNN): RNN is a type of NN
B. Data Analysis and Pre-processing that is designed specially to handle sequential data like
The raw data set comprised of 7062606 instances, with 115 time series data [20]. It makes use of feedback loops to
features each i.e. 7072606 X 115. The input data is purely remember the previous input. Hyperparameters like the
numerical for every input column. There are no missing values number of hidden layers, type of activation function, size
in the raw data. Raw data contains 4579930 duplicates. The of each layer, etc are devised using Keras-tuner. Table II
raw data is highly imbalanced categorically. Also, outliers shows the final hyperparameters.
were present but we decided not to remove them as this is 4 XGBoost: eXtreme Gradient Boosting is a supervised ML
a model for anomaly detection and outliers may point towards algorithm that makes gradient boosting more efficient by
anomalies. Different steps taken to pre-process the raw data increasing its power‘ [21]. It is capable of dealing with
are shown in Algorithm 1 large datasets and complicated models due to its parallel
processing capabilities. Parameters for the model are
Algorithm 1 Algorithm to Pre-process Dataset decided upon hyper-parameter training using Randomized
Require: Raw Data Search cross Validation (CV). The final hyperparameters
Ensure: Separate .csv files for various data frames. are shown in Table I.
1: Accumulate data from all the different files into a com- But there are some flaws in the above models, the first
bined data frame flaw is not every model is equally good, we must asso-
2: Add the labels for each instance. ciate a weight parameter with every model’s outcomes to
3: Remove Duplicate data. predict the final outcome. Another flaw is the question
4: Balance the dataset by reducing the attack instances. of ”how to decide the values of these weight parameters
5: Scaling. for every model”. The solution proposed in this paper is
6: Reduce Dimensions using PCA. to predict the probability associated with the predictions
7: Divide the data into train, test, and valid sets. of 5 different models and then concatenate these output
8: Save each data frame into separate .csv files to avoid prediction probabilities into a new data frame, which is
mixing up. used to train a final neural network that can regulate
the weights associated with each model thus giving us a
neural network of models or some might call it complex
C. Binary Classification Model feature engineering as we are making new features to fit
The binary classification model is trained to classify be- in our final neural network.
tween the benign state and the botnet attack state. Different 5 Final model: In the proposed model we have produced
models were trained on the same data of 496614*116, which the confidence of prediction of each model for both

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on December 12,2024 at 07:10:36 UTC from IEEE Xplore. Restrictions apply.
Fig. 4: Confusion Matrix for XGBOOST

Fig. 2: Proposed Framework of Merged models

the classes: 0[Benign] and 1[attack]. The data frames
produced from the same were then merged/concatenated
into a final input data frame as shown in figure 2. This
step was taken for Training data, Validation data, and
Testing Data.
The new probability prediction-based data frames were
used to train another Neural network which was finally
checked on the Train output. Hyperparameters like the
number of hidden layers, type of activation functions,
size of each layer, etc are devised using Keras-tuner Fig. 5: Confusion Matrix for Final Model
It’s a Feed Forward Neural Network. All the layers are
’Dense’ and the model is ’Sequential’. Table II the final Accuracies obtained for various models for binary classifi-
hyperparameters. cation are shown in Figure 3. As the figure shows, the Final
Recall Precision for Precision for model gives the highest accuracy 99.996%, 99.997%, and
validation Data Test Data 99.983% for train, valid, and test data respectively. Table III
Benign Attack Benign Attack Benign Attack shows the values obtained for recall and precision for different
XG 99.996 99.969 99.970 99.996 99.969 99.996
SVM 99.906 99.961 99.962 99.903 99.961 99.906
data frames. It can be easily observed from the table that
RF 99.971 99.949 99.949 99.970 99.949 99.970 our final model outperforms other models. Similar results are
ANN 99.985 99.967 99.967 99.985 99.967 99.985 shown by the confusion matrix in Figure 4, and 5.
RNN 99.985 99.967 99.967 99.985 99.967 99.985
Final 99.995 99.973 99.973 99.995 99.973 99.995 D. Multi class Classification Model
TABLE III: Value of Recall and Precision Obtained Using The multi-class classification model is trained to classify
Different Models for Binary Classification between the benign state and 10 different types of botnet
attack states. Different models were trained on the same data
of 304926*116, which were finally used for ensemble learning
to make a final prediction. We have used the following models
for multi-class classification.
1 Random Forests: Parameters are decided upon hyper-
parameter training using Randomized Search CV. The
final hyperparameters are as shown in Table IV.
XGBoost RF
Parameter Value Parameter Value
alpha [0.01, 0.03, 0.06] n estimators 50
gamma [0.1, 0.2, 0.3] max features 0.6
max depth [3, 5, 7] min sample leaf 10
min child weight [3, 5, 7]
n estimators [20, 40, 60, 80]

Fig. 3: Accuracies of Different Models for Binary TABLE IV: Values assigned to Different Parameters of
Classification XGBoost and RF Model for Multi-Class Classification

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on December 12,2024 at 07:10:36 UTC from IEEE Xplore. Restrictions apply.
2 ANN: Hyperparameters are devised using Keras-tuner It’s
a Feed Forward Neural Network. A total of five layers are
used. All the layers are ’Dense’ and model is ’Sequential’.
Table V shows the values activation function and the
number of perceptrons used in different layers.
3 RNN: Hyperparameters like the number of hidden layers,
type of activation function, size of each layer, etc are de-
vised using Keras-tuner. It is a recurrent neural network,
the first layer is of type LSTM. Different parameters used
are shown in Table V.
4 XGBoost: Parameters are decided upon hyper-parameter
training using Randomized Search CV. The best model
was used further from the Randomized search CV with Fig. 6: Accuracies of Different Models for Multi-class
the parameter grid as shown in Table IV. Classification
ANN RNN Final Model
Features L1 L2 L3 L4 L5 L1 L2 L3 L4 L5 L1 L2 L3 L4
Activation tanh relu relu relu soft tanh sig tanh relu soft sig sig sig soft
Function
No. of 72 56 56 72 11 48 16 8 8 11 40 40 40 11
Percep-
trons

TABLE V: Value of Different Parameters for Muti-Class

Classification

5 Final model: In the proposed model we have produced

the confidence of prediction of each model for all the Fig. 7: Confusion matrics for RF
classes: 0[Benign], 1[gafgyt.combo], 2[gafgyt.junk],
3[gafygt.scan], 4[gafgyt.tcp], 5[gafgyt.udp], 6[mirai.ack],
7[mirai.scan], 8[mirai.syn], 9[mirai.udp], 10[mi-
rai.updplain]. The data frames produced from the same
were then merged/concatenated into a final input data
frame. This step was taken for Training data, Validation
data and Testing Data.
The new probability prediction-based data frames made
were used to train another Neural network which was
finally checked on the Train Output. Hyperparameters like
the number of hidden layers, type of activation functions, Fig. 8: Confusion matrics for ANN
size of each layer, etc are devised using Keras-tuner It’s a
Feed Forward Neural Network. All the layers are ’Dense’
and the model is ’Sequential’. A total of 3 layers are used.
Parameters used in the final model are shown in Table V.
Accuracies obtained for various models are shown in Figure 6.
As the figure shows, the Final model gives the highest accuracy
99.99%, 99.88%, and 99.90% for train, valid, and test data
respectively.
IV. R ESULTS Fig. 9: Confusion matrics for XGBOOST
Binary classification model: The different accuracies among
the models for IoT botnet detection can be attributed to
their distinct methodologies and capabilities in handling data
features. XGBoost surpasses other models due to its advanced
boosting techniques, which iteratively correct errors and fine-
tune the model, achieving superior accuracy. The ensemble
model combines the strengths of all individual models, miti-
gating their weaknesses, thus significantly boosting the overall Fig. 10: Confusion matrics for Final Model
highest accuracy of 99.98% for test data.

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on December 12,2024 at 07:10:36 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES
[1] G. Singal, V. Laxmi, M. S. Gaur, S. Todi, V. Rao, M. Tripathi, and
R. Kushwaha, “Multi-constraints link stable multicast routing protocol
in manets,” Ad Hoc Networks, vol. 63, pp. 115–128, 2017.
[2] M. Feily, A. Shahrestani, and S. Ramadass, “A survey of botnet and
botnet detection,” in 2009 Third International Conference on Emerging
Security Information, Systems and Technologies. IEEE, 2009, pp. 268–
273.
[3] N. Ahuja, G. Singal, and D. Mukhopadhyay, “Dlsdn: Deep learning
for ddos attack detection in software defined networking,” in 2021
11th International Conference on Cloud Computing, Data Science &
Engineering (Confluence). IEEE, 2021, pp. 683–688.
[4] N. Ahuja, G. Singal, D. Mukhopadhyay, and N. Kumar, “Automated
ddos attack detection in software defined networking,” Journal of
Network and Computer Applications, vol. 187, p. 103108, 2021.
Fig. 11: Recall and Precision of Different Models for [5] A. Marzano, D. Alexander, O. Fonseca, E. Fazzion, C. Hoepers,
Multi-class Classification K. Steding-Jessen, M. H. Chaves, Í. Cunha, D. Guedes, and W. Meira,
“The evolution of bashlite and mirai iot botnets,” in 2018 IEEE Sym-
Multi-class classification: The slight differences in accura- posium on Computers and Communications (ISCC). IEEE, 2018, pp.
00 813–00 818.
cies among the models for differentiating between 10 types of [6] G. Bastos, A. Marzano, O. Fonseca, E. Fazzion, C. Hoepers, K. Steding-
IoT botnet attacks highlight their varied strengths in handling Jessen, Í. Cunha, D. Guedes, and W. Meira, “Identifying and character-
complex classification tasks. RNNs and RF offer solid perfor- izing bashlite and mirai c&c servers,” in 2019 IEEE Symposium on
Computers and Communications (ISCC). IEEE, 2019, pp. 1–6.
mance (as shown by confusion matrix in figure 7,figure ??) [7] M. Antonakakis, T. April, M. Bailey, M. Bernhard, E. Bursztein,
but fall slightly behind due to their inherent limitations with J. Cochran, Z. Durumeric, J. A. Halderman, L. Invernizzi, M. Kallitsis
static and high dimensional data, respectively. In contrast, et al., “Understanding the mirai botnet,” in 26th USENIX security
symposium (USENIX Security 17), 2017, pp. 1093–1110.
ANNs(figure 8) excel in capturing intricate patterns and re- [8] W. T. Strayer, D. Lapsely, R. Walsh, and C. Livadas, “Botnet detection
lationships within the data, making them particularly effective based on network behavior,” Botnet Detection: Countering the Largest
for classifying multiple botnet types compared to a simpler Security Threat, pp. 1–24, 2008.
[9] A. Husain, A. Salem, C. Jim, and G. Dimitoglou, “Development of
binary classification. XGBoost (figure 9) also stands out with an efficient network intrusion detection model using extreme gradient
its advanced boosting techniques, providing exceptional ac- boosting (xgboost) on the unsw-nb15 dataset,” in 2019 IEEE Interna-
curacy through iterative error correction and fine-tuning. The tional Symposium on Signal Processing and Information Technology
(ISSPIT). IEEE, 2019, pp. 1–7.
ensemble model (figure 10), combining the strengths of ANN [10] R. Kumar, M. Swarnkar, G. Singal, and N. Kumar, “Iot network
and XGBoost along with other models, harnesses these advan- traffic classification using machine learning algorithms: An experimental
tages to achieve the highest overall accuracy, demonstrating analysis,” IEEE Internet of Things Journal, vol. 9, no. 2, pp. 989–1008,
2021.
the effectiveness of ensemble methods in complex multi-class [11] A. Alharbi and K. Alsubhi, “Botnet detection approach using graph-
classification scenarios. based machine learning,” Ieee Access, vol. 9, pp. 99 166–99 180, 2021.
[12] D. Hemanth et al., “Intrusion detection system using convolutional
neural network on unsw nb15 data-set,” Advances in Parallel Computing
Technologies and Applications, vol. 40, pp. 1–8, 2021.
V. C ONCLUSION AND F UTURE W ORK [13] S. Choudhary and N. Kesswani, “Analysis of kdd-cup’99, nsl-kdd and
unsw-nb15 datasets using deep learning in iot,” Procedia Computer
Science, vol. 167, pp. 1561–1573, 2020.
The article presents a reliable detection and classification [14] S. Pokhrel, R. Abbas, and B. Aryal, “Iot security: botnet detection in
iot using machine learning,” arXiv preprint arXiv:2104.02231, 2021.
method for various types of Botnet attacks using an ensemble [15] M. G. Desai, Y. Shi, and K. Suo, “A hybrid approach for iot botnet
algorithm combining the probability of prediction of different attack detection,” in 2021 IEEE 12th Annual Information Technology,
models. The variation of using prediction confidence and its Electronics and Mobile Communication Conference (IEMCON). IEEE,
2021, pp. 0590–0592.
variation, rather than predictions themselves can be more [16] Y. Meidan, M. Bohadana, Y. Mathov, Y. Mirsky, A. Shabtai, D. Breiten-
precise and accurate as the confidence of prediction can be bacher, and Y. Elovici, “N-baiot—network-based detection of iot botnet
a better parameter than predictions themselves for getting attacks using deep autoencoders,” IEEE Pervasive Computing, vol. 17,
no. 3, pp. 12–22, 2018.
the output. This was evident from this dataset where the [17] H. Hamid, R. M. Noor, S. N. Omar, I. Ahmedy, S. S. Anjum, S. A. A.
final model outshone the individual models even though the Shah, S. Kaur, F. Othman, and E. M. Tamil, “Iot-based botnet attacks
individual models had pretty good accuracies. Although, the systematic mapping study of literature,” Scientometrics, vol. 126, pp.
2759–2800, 2021.
proposed model gives quite decent results in terms of accuracy, [18] L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5–32,
recall, precision and other parameters, the model does not 2001.
consider the privacy of the sensitive adat of the users and [19] B. Yegnanarayana, Artificial neural networks. PHI Learning Pvt. Ltd.,
2009.
also the model is computation intensive. [20] L. R. Medsker, L. Jain et al., “Recurrent neural networks,” Design and
Future work includes using this variation in ensemble stack- Applications, vol. 5, no. 64-67, p. 2, 2001.
[21] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,”
ing with other datasets, with different individual models and in Proceedings of the 22nd acm sigkdd international conference on
types of datasets to find conclusive evidence for this idea. Also, knowledge discovery and data mining, 2016, pp. 785–794.
the future work may include using techniques like Federated .
learning and Pruning to overcome the identified limitations.

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on December 12,2024 at 07:10:36 UTC from IEEE Xplore. Restrictions apply.