Comparative Analysis of Machine Learning Algorithms On The Bot-IOT Dataset
Comparative Analysis of Machine Learning Algorithms On The Bot-IOT Dataset
Dataset
Jay Soni (20CS43)
Computer Science & Engineering Department
Engineering College, Ajmer
ABSTRACT
This paper presents a comparative analysis of five machine learning algorithms—Decision Tree,
K-Nearest Neighbors (KNN), Naive Bayes, Support Vector Machine (SVM), and Neural
Network—applied to the Bot-IoT dataset. The dataset, collected from IoT devices, contains
various features related to network traffic and communication patterns. The aim of this study is
to assess the accuracy of these algorithms in distinguishing between normal and botnet traffic.
Experimental results demonstrate the varying performance of the algorithms and provide
insights into their suitability for intrusion detection in IoT environments.
1. INTRODUCTION
In today's interconnected world, the Internet of Things (IoT) has become an integral part
of our lives, enabling communication and data exchange among various devices and
systems. However, this increasing connectivity also brings about new security
challenges, especially in the context of cyber threats and attacks. One of the significant
concerns is the emergence of botnets, which are networks of compromised devices
controlled by malicious actors for various illicit activities, such as distributed
denial-of-service (DDoS) attacks and data exfiltration. The ability to detect and mitigate
such threats is crucial for ensuring the security and reliability of IoT networks.
Intrusion detection plays a pivotal role in identifying and thwarting cyber threats within
IoT environments. Machine learning algorithms have gained prominence as effective
tools for intrusion detection due to their capability to analyze large volumes of data and
uncover patterns indicative of malicious activities. In this context, the "bot-iot dataset"
serves as a valuable resource for evaluating the performance of machine learning
algorithms in distinguishing normal network traffic from botnet-related activities.
This study focuses on the comparative analysis of five prominent machine learning
algorithms: Decision Tree, K-Nearest Neighbors (KNN), Naive Bayes, Support Vector
Machine (SVM), and Neural Network. Each of these algorithms approaches the task of
1
intrusion detection from a unique perspective, leveraging their respective strengths in
feature representation, pattern recognition, and decision-making. By evaluating the
accuracy of these algorithms on the "bot-iot dataset," we aim to gain insights into their
suitability for identifying botnet-related activities within IoT environments.
2. Related Work
In recent years, the increasing prevalence of IoT devices and the corresponding surge in
cyber threats have spurred extensive research in the domain of intrusion detection for
IoT environments. Detecting malicious activities in these interconnected networks is
challenging due to the diverse range of devices, communication protocols, and data
characteristics. Various studies have explored the application of machine learning
algorithms for identifying anomalies and attacks within IoT ecosystems.
Khan et al. [1] presented a comprehensive survey of intrusion detection techniques for
IoT networks. The study highlighted the significance of machine learning approaches
and discussed the challenges posed by the dynamic nature of IoT data. Similarly,
Antoniades et al. [2] conducted an empirical evaluation of machine learning algorithms
2
for intrusion detection in IoT environments. Their findings underscored the importance of
algorithm selection based on the characteristics of the dataset.
The subsequent sections will detail the methodology employed for the comparative
analysis of the algorithms on the "bot-iot dataset.”
3. Methodology
The methodology section outlines the steps and techniques employed in conducting the
comparative analysis of machine learning algorithms on the "bot-iot dataset."
Algorithm Selection: Five machine learning algorithms were chosen for this
comparative analysis: Decision Tree, K-Nearest Neighbors (KNN), Naive Bayes, Support
Vector Machine (SVM), and Neural Network. Each algorithm possesses unique
characteristics and capabilities that make it well-suited for the task of intrusion detection
within IoT environments.
Model Training and Evaluation: The dataset was divided into training and testing
subsets using a standard 80-20 split. The training subset was used to train each of the
selected algorithms, while the testing subset was reserved for evaluating their
performance. To ensure fair comparison, the same training and testing data were used
for all algorithms.
For each algorithm, the training data underwent feature scaling to ensure uniformity in
feature ranges. The scaled features were then used to train the respective algorithm
models. Post-training, the models were evaluated on the testing data to calculate
accuracy scores.
4. Experimental Results
3
Boxplot for Accuracy, Precision, Recall and F-1 Score:
4
5. Discussion and Analysis
The discussion and analysis section delves into the performance of each machine
learning algorithm on the "bot-iot dataset," offering insights into their strengths and
limitations.
Among the tested algorithms, the Decision Tree demonstrated the highest accuracy,
precision, recall, and F1-score, all surpassing 0.9. Decision Trees excel at partitioning
feature space to distinguish classes, making them effective for intrusion detection.
Future research can explore ensemble methods and hyperparameter tuning to enhance
algorithm performance.
In conclusion, the Decision Tree's superior performance highlights its potential for robust
intrusion detection, but algorithm selection should align with deployment needs.
6. Conclusion
In this study, we conducted a comparative analysis of five prominent machine learning
algorithms—Decision Tree, K-Nearest Neighbors (KNN), Naive Bayes, Support Vector
Machine (SVM), and Neural Network—for intrusion detection within IoT environments
using the "bot-iot dataset." Each algorithm's performance was evaluated based on
accuracy, precision, recall, and F1-score.
Among the algorithms, the Decision Tree emerged as the top performer, achieving the
highest scores across all metrics. Its ability to create decision rules based on features
allowed it to effectively distinguish between normal and botnet traffic. Naive Bayes and
Support Vector Machine achieved competitive accuracy scores, indicating their
effectiveness in IoT intrusion detection. However, consideration of algorithm complexity,
interpretability, and generalization ability is crucial for practical deployment.
This analysis provides valuable insights for selecting suitable machine learning
algorithms for IoT intrusion detection, contributing to enhanced network security and
resilience in the face of evolving cyber threats.
5
As IoT ecosystems continue to grow, future research can explore ensemble techniques,
hyperparameter optimization, and real-time adaptation to further advance intrusion
detection capabilities.
By understanding the strengths and limitations of each algorithm, practitioners can make
informed decisions to protect IoT networks and ensure the integrity of interconnected
devices.
With this conclusion, the study encapsulates its outcomes and implications, paving the
way for informed decision-making in IoT intrusion detection strategies.
7. References:
● Khan, R., & Khan, S. U. (2017). A comprehensive survey of recent trends in IoT
security. IEEE Communications Surveys & Tutorials, 20(3), 2887-2925.
● Antoniades, D., & Loukas, G. (2019). Adversarial machine learning in the Internet
of Things: A systematic survey. IEEE Access, 7, 107652-107674.
● Koroniotis, Nickolaos, Nour Moustafa, Elena Sitnikova, and Benjamin Turnbull.
"Towards the development of realistic botnet dataset in the internet of things for
network forensic analytics: Bot-iot dataset." Future Generation Computer
Systems 100 (2019): 779-796. Public Access Here.
● Koroniotis, Nickolaos, Nour Moustafa, Elena Sitnikova, and Jill Slay. "Towards
developing network forensic mechanism for botnet activities in the iot based on
machine learning techniques." In International Conference on Mobile Networks
and Management, pp. 30-44. Springer, Cham, 2017.
● Koroniotis, Nickolaos, Nour Moustafa, and Elena Sitnikova. "A new network
forensic framework based on deep learning for Internet of Things networks: A
particle deep framework." Future Generation Computer Systems 110 (2020):
91-106.
● Koroniotis, Nickolaos, and Nour Moustafa. "Enhancing network forensics with
particle swarm and deep learning: The particle deep framework." arXiv preprint
arXiv:2005.00722 (2020).
● Koroniotis, Nickolaos, Nour Moustafa, Francesco Schiliro, Praveen Gauravaram,
and Helge Janicke. "A Holistic Review of Cybersecurity and Reliability
Perspectives in Smart Airports." IEEE Access (2020).
● Koroniotis, Nickolaos. "Designing an effective network forensic framework for the
investigation of botnets in the Internet of Things." PhD diss., The University of
New South Wales Australia, 2020.