P&D Research Paper
P&D Research Paper
P&D Research Paper
Abstract— Within the domain of IoT forensics, the prevailing challenge lies in securing resource-limited IoT devices. Currently,
frameworks like N2N (Node to Node) are in use, but they grapple with the constraints of these devices. To surpass existing
methodologies, we propose a novel framework rooted in Distributed Edge Computing. By leveraging advanced machine
learning techniques, such as ADASYN, we aim to enhance attack detection capabilities, surpassing the efficacy of conventional
SMOTE-based approaches. This innovation not only fortifies IoT security but also alleviates the strain on device resources.
Through these advancements, our research lays a foundation for a more resilient IoT ecosystem, crucial in the face of the
burgeoning IoT landscape.
In this pursuit, we extend the capabilities of IoT forensics by introducing a comprehensive framework that not only adapts to
resource limitations but also enhances security protocols. By integrating Distributed Edge Computing and advanced machine
learning, we usher in a new era of IoT forensics that is both robust and efficient. Through meticulous experimentation and
evaluation, we demonstrate the tangible benefits of our approach, paving the way for more secure and resilient IoT deployments
in real-world scenarios.
Index Terms— Internet of Things (IoT), IoT Forensics, Distributed Edge Computing, Machine Learning, Botnet Detection, Data
Preprocessing, Outlier Treatment, Feature Transformation, Categorical Variables, ADASYN, Ensemble Techniques,
Comparative Analysis, Performance Evaluation, Cybersecurity, Resource-Constrained Environments.
—————————— ——————————
1 INTRODUCTION
environments. Overall, this framework represents a 3.3 Improved Machine Learning Models
significant leap forward in IoT forensics, offering a more Building upon the foundation of refined data pre-
efficient, privacy-preserving, and scalable approach to processing techniques, our research places a
digital investigations in IoT ecosystems. strong emphasis on elevating the performance of
machine learning models. This endeavor involves
the integration of advanced methodologies to aug- F
ment the predictive capabilities of the framework. g
A pivotal advancement lies in the adoption of
ADASYN [14], a powerful oversampling tech-
nique, as a replacement for conventional ap-
proaches like SMOTE. This strategic shift is
geared towards addressing class imbalances more
effectively, thereby fortifying the models against
skewed data distributions. Additionally, ensemble
learning takes center stage in our approach. By
fusing predictions from distinct models, we not
only bolster accuracy but also establish a more
comprehensive and stable basis for decision-mak-
ing. The fusion of Logistic Regression with Deci-
sion Tree models, as well as Random Forest with
Gradient Boosting, demonstrates substantial per-
formance gains. These improvements collectively
forge a more resilient machine learning founda-
tion, poised to deliver superior results in the do-
main of IoT forensics.
4 PERFORMANCE ANALYSIS
Figure 1:1 Distributed Edge Computing Framework 4.1 Dataset Description
The cornerstone of any machine learning en-
deavor hinges on the quality and relevance of the
dataset utilized for training and evaluation. In this
study, we leveraged the widely recognized UNSW-
NB15 dataset [9], meticulously curated for net-
3.2 Data Pre-processing Techniques work intrusion detection systems, and sourced
In In data preprocessing, crucial for optimal algo- from the esteemed platform, Kaggle [7]. This
rithmic performance, we utilized the pandas li- dataset encapsulates a diverse array of network
brary for dataset manipulation [11]. Initially, we traffic scenarios, including normal, attack, and
refined the dataset structure by excluding redun- mixed instances, providing a comprehensive rep-
dant columns like 'id' and 'attack_cat'. Managing resentation of real-world situations. It encom-
outliers was a priority, adjusting extreme values passes a total of 49 features, comprising both cat-
to align with the 95th percentile. For highly vari- egorical and numerical attributes, offering a holis-
able numeric features, we applied log-transforma- tic perspective on network behavior. Furthermore,
tion, enhancing algorithmic efficiency. Handling the dataset is enriched with labels that categorize
non-numeric categorical data, we retained the instances into various attack categories, facilitat-
most frequent entries and grouped the rest. Em- ing the training of models for specific threat iden-
ploying OneHotEncoding [12], we converted these tification.
entries into a numeric matrix, aligning with ma-
chine learning algorithms' preferences. We parti- 4.2 Evaluation metrics
tioned the dataset for comprehensive model eval- Throughout our research, we utilized multiple
uation, using one subset for learning and the machine learning algorithms to gauge their
other for validation. Post-encoding, we utilized performance on the dataset in question,
StandardScaler for consistent feature scaling. Ad- employing key metrics such as Accuracy, Recall,
dressing data imbalance, we integrated ADASYN Precision, and F1-Score to compare and analyze
to augment underrepresented categories [4]. their effectiveness.
These measures elevate input data quality, en-
hancing predictive precision for advanced analyti-
cal pursuits.
4 IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. #, NO. #, MMMMMMMM 1996
Starting with the Logistic Regression model, we higher at 97.69%. However, a notable point here
observed an accuracy of 92.80%. The model's is the time taken for training which was 4.17
recall, precision, and F1-Score were also in the seconds - a bit higher compared to previous
vicinity of 92.80%. It took about 1.38 seconds for models but justifiable given the model's ensemble
the model to be trained, marking it as relatively nature.
6 END SECTIONS
6.1 Future Work
The next phase of IoT forensics research should
focus on dynamic threat response strategies for
real-time adaptation to evolving attack patterns.
Integrating real-time monitoring and exploring
blockchain integration for enhanced data integrity
are crucial steps. Optimizing machine learning
model deployment on edge devices is vital for re-
source efficiency. Rigorous robustness testing and
scalability assessment under diverse conditions
are imperative. Additionally, refining user authen-
tication mechanisms will ensure secure access.
These efforts promise to fortify IoT security, en-
abling adaptability and resilience against evolving
cyber threats.
6.2 Conclusion
In summary, our research in IoT forensics ad-
vances the challenge of resource constraints in
IoT devices by integrating Distributed Edge Com-
puting with advanced machine learning tech-
niques [18]. This framework not only enhances
threat detection but also fortifies IoT security sub-
6 IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. #, NO. #, MMMMMMMM 1996
[1] Z. Arshad, H. Rahman, J. Tariq, A. Riaz, A. Imran, A. Yasin and I. Ihsan, "Digital Forensics
Analysis of IoT Nodes using Machine Learning," Journal of Computing & Biomedical Infor-
matics, November 2022.
[2] S. J. Bigelow, "What is edge computing? Everything you need to know," TechTarget, Decem -
ber 2021. [Online]. Available: https://www.techtarget.com/searchdatacenter/definition/edge-
computing.
[3] A. U. Rehman, K. Alissa, T. Alyas, K. Zafar, Q. Abbas, N. Tabassum and S. Sakib, "Botnet Attack
Detection in IoT Using Machine Learning," Computational Intelligence and Neuroscience, Oc-
tober 2022.
[5] M. Z. Arshad, H. Rahman, J. Tariq, A. Riaz, A. Imran and I. Ihsan, "Digital Forensics Analysis
of IoT Nodes using Machine Learning".
[7] K. Cao, Y. Liu, G. Meng and Q. Sun, "An Overview on Edge Computing Research," An Over-
view on Edge Computing Research, vol. 8, 1 May 2020.
[8] W. Yu, F. Liang, X. He, W. G. Hatcher, C. Lu, J. Lin and X. Yang, "A Survey on the Edge Com -
puting for the Internet of Things," 2017.
[9] I. Psychoula, D. Singh, L. Chen, F. Chen, A. Holzinger and H. Ning, "Users' Privacy Concerns
in IoT Based Applications," 2018.
[10] B. Chen, J. Wan, A. Celesti, D. Li, H. Abbas and Q. Zhang, "Edge Computing in IoT-Based
Manufacturing," Edge Computing in IoT-Based Manufacturing, vol. 56, 2018.
[11] H. El-Sayed, S. Sankar, M. Prasad, D. Puthal, A. Gupta, M. Mohanty and Chin-Teng, "Edge of
Things: The Big Picture on the Integration of Edge, IoT and the Cloud in a Distributed Com -
puting Environment," 2017.
[13] A. Y. Hussein, P. Falcarin and A. T. Sadiq, "Enhancement performance of random forest algo-
rithm via one hot," vol. 9, August 2021.
[15] N. Moustafa and J. Slay, "UNSW-NB15: a comprehensive data set for network intrusion de-
tection systems (UNSW-NB15 network data set)," December 2015.
[17] M. A. Samara, I. Bennis, A. Abouaissa and P. LorenzORCID, "A Survey of Outlier Detection
Techniques in IoT: Review and Classification," 2022.
[20] J. Siłka, M. Wieczorek and M. Woźniak, "BiLSTM deep neural network model for imbalanced
medical data of IoT systems," Future Generation Computer Systems, vol. 141, 2023.
[22] J. Okwuibe, M. Liyanage, M. Ylianttila and T. Taleb, "Survey on Multi-Access Edge Comput-
ing for Internet of Things Realization," 2018.
[23] U. Y. Khan and T. R. Soomro, "Applications of IoT: Mobile Edge Computing Perspectives".
[24] D. S. M. Kumar and D. Majumder, "Healthcare Solution based on Machine Learning Applica -
tions in".
[25] V. Prakash, A. Williams, L. Garg, C. Savaglio and S. Bawa, "Cloud and Edge Computing-
Based Computer Forensics: Challenges and Open Problems," Cloud and Edge Computing-
Based Computer Forensics: Challenges and Open Problems, 2021.