Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

P&D Research Paper

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

AIR UNIVERSITY ISLAMABAD, XXXX-XXXX

IoT Forensics: An Integrated Approach with


Distributed Edge Computing and Enhanced
Machine Learning

Abdullah Imran and Muhammad Akmal

Abstract— Within the domain of IoT forensics, the prevailing challenge lies in securing resource-limited IoT devices. Currently,
frameworks like N2N (Node to Node) are in use, but they grapple with the constraints of these devices. To surpass existing
methodologies, we propose a novel framework rooted in Distributed Edge Computing. By leveraging advanced machine
learning techniques, such as ADASYN, we aim to enhance attack detection capabilities, surpassing the efficacy of conventional
SMOTE-based approaches. This innovation not only fortifies IoT security but also alleviates the strain on device resources.
Through these advancements, our research lays a foundation for a more resilient IoT ecosystem, crucial in the face of the
burgeoning IoT landscape.

In this pursuit, we extend the capabilities of IoT forensics by introducing a comprehensive framework that not only adapts to
resource limitations but also enhances security protocols. By integrating Distributed Edge Computing and advanced machine
learning, we usher in a new era of IoT forensics that is both robust and efficient. Through meticulous experimentation and
evaluation, we demonstrate the tangible benefits of our approach, paving the way for more secure and resilient IoT deployments
in real-world scenarios.

Index Terms— Internet of Things (IoT), IoT Forensics, Distributed Edge Computing, Machine Learning, Botnet Detection, Data
Preprocessing, Outlier Treatment, Feature Transformation, Categorical Variables, ADASYN, Ensemble Techniques,
Comparative Analysis, Performance Evaluation, Cybersecurity, Resource-Constrained Environments.

——————————  ——————————

1 INTRODUCTION

T he proliferation of Internet of Things (IoT)


devices has heralded a new era of
connectivity and automation, transforming
security measures.

In this dynamic landscape, the Node to Node


diverse domains from healthcare to smart cities. (N2N) framework has stood as a commendable
This exponential growth in interconnectedness, cornerstone for forensic analysis and attack
however, has presented its own set of challenges. detection [1] . However, as IoT ecosystems
IoT devices, often characterized by limited continue to evolve, so must our approaches to
resources, are increasingly vulnerable to a security and analysis. This paper advocates for a
growing threat landscape. Their constrained paradigm shift towards Distributed Edge
memory and storage capacities necessitate robust Computing [2], an emerging framework
capitalizing on the inherently distributed nature
————————————— of IoT networks.

Furthermore, machine learning, a linchpin in


contemporary security endeavors, assumes a
 Abdullah Imran is a student of Air University Islamabad, pivotal role. Building upon the seminal work
Department of cybersecurity. E-mail:
itsabdullahimran8@gmail.com.
leveraging the SMOTE technique [3], our
research strives to elevate attack detection
through the adoption of ADASYN [4], an even
more advanced oversampling technique. Through
 Muhammad Akmal is a student of Air University Islamabad, the amalgamation of these innovations, we
Department of cybersecurity. E-mail:
itsabdullahimran8@gmail.com. endeavor to fortify the resilience of IoT devices

xxxx-xxxx/0x/$xx.00 © 200x IEEE To be Published by the IEEE Computer Society


2 IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. #, NO. #, MMMMMMMM 1996

against an array of cyber threats. both oversampling and undersampling strategies.


Moreover, advanced oversampling methods like
In the ensuing sections, we expound upon the the Adaptive Synthetic (ADASYN) algorithm hold
methodology, presenting the proposed framework, promise in further enhancing the robustness of
and elucidate the enhancements in machine ML models [4]. These advancements are crucial in
learning techniques. Additionally, we discuss the ensuring that the model excels at identifying and
anticipated impact and the future trajectory of mitigating sophisticated, evolving threats within
this research endeavor. Through these concerted IoT environments.
efforts, our work aims to not only advance the
field of IoT forensics but also contribute to the In summary, the existing literature highlights
broader discourse on securing IoT ecosystems in challenges and solutions in IoT security, with the
an ever-evolving digital landscape. N2N framework ensuring secure IoT communica-
tion. Machine learning, particularly in botnet at-
tack detection, enhances IoT ecosystem security.
2 RELATED WORKS
Our research integrates Distributed Edge Com-
In the domain of IoT security, the emergence of puting and advanced ML techniques to fortify se-
notable solutions, including the Node-to-Node curity in resource-constrained IoT devices.
(N2N) framework, discussed in the paper "A
Comprehensive Review on Security Issues in
Internet of Things" [1], has addressed critical 3 PROPOSED METHODOLOGIES
challenges. This framework introduces a unique 3.1 Distributed Edge Computing Framework
architecture enabling direct, secure
The Distributed Edge Computing [5] Framework
communication between IoT devices, bypassing
represents a paradigm shift in IoT forensics,
centralized servers to mitigate potential
revolutionizing the way we approach digital
vulnerabilities.
investigations in IoT environments. Unlike
However, it's essential to acknowledge that, traditional methods, this framework advocates for
like any advancement, the N2N framework [5] processing data at the edge of the network, in
presents its set of considerations. Factors like close proximity to the IoT devices generating the
network scale, topology complexities, and the data. This proximity facilitates real-time analysis,
diverse nature of IoT devices can influence its significantly reducing the need for extensive data
efficacy. The scalability of the N2N framework in transmission to centralized servers.
large-scale IoT deployments warrants careful
attention. Additionally, the framework's From an IoT forensics perspective, this approach
adaptability to emerging security threats and its offers substantial advantages [8]. Firstly, it
resistance against sophisticated cyber-attacks minimizes disruption to IoT devices during
deserve meticulous scrutiny. These nuances forensic analysis. By conducting processing tasks
underscore the ongoing need for progressive on local Edge Devices, the primary functions of
research and innovation in the realm of IoT these devices can continue uninterrupted. This is
security frameworks. critical in scenarios where IoT devices are
mission-critical or have real-time responsibilities.
Influenced by the emergence of Machine
Learning (ML), the domain of IoT security has Additionally, the framework addresses privacy
witnessed significant impact. A noteworthy concerns [8] by limiting data transmission to
contribution in this context is the paper titled external servers. This is pivotal in safeguarding
"Botnet Attack Detection in IoT Using Machine sensitive information captured by IoT devices,
Learning" [3]. This work concentrates on the ensuring compliance with privacy regulations.
critical concern of detecting botnet attacks within Moreover, the framework demonstrates
IoT networks. The authors employ an ML-based robustness in large-scale IoT environments. It
approach relying on features extracted from scales effectively to accommodate a multitude of
network traffic data to differentiate normal endpoints, allowing for comprehensive forensic
activities from malicious ones. An essential aspect analysis across diverse device ecosystems.
of this approach is the application of the Synthetic
Minority Oversampling Technique (SMOTE) to Furthermore, the Distributed Edge Computing Framework
address the issue of imbalanced datasets, a showcases enhanced resilience to node failures [6]. In the
common challenge in cybersecurity applications. event of a node failure, forensic analysis can continue
seamlessly, as the framework is designed to distribute
However, it's important to note that while processing tasks across multiple Edge Devices [11]. This
SMOTE has demonstrated efficacy, there remains feature ensures that investigations remain robust and
room for improvement. One promising avenue uninterrupted even in dynamic and potentially unstable IoT
involves exploring hybrid techniques that combine
AUTHOR: TITLE

environments. Overall, this framework represents a 3.3 Improved Machine Learning Models
significant leap forward in IoT forensics, offering a more Building upon the foundation of refined data pre-
efficient, privacy-preserving, and scalable approach to processing techniques, our research places a
digital investigations in IoT ecosystems. strong emphasis on elevating the performance of
machine learning models. This endeavor involves
the integration of advanced methodologies to aug- F
ment the predictive capabilities of the framework. g
A pivotal advancement lies in the adoption of
ADASYN [14], a powerful oversampling tech-
nique, as a replacement for conventional ap-
proaches like SMOTE. This strategic shift is
geared towards addressing class imbalances more
effectively, thereby fortifying the models against
skewed data distributions. Additionally, ensemble
learning takes center stage in our approach. By
fusing predictions from distinct models, we not
only bolster accuracy but also establish a more
comprehensive and stable basis for decision-mak-
ing. The fusion of Logistic Regression with Deci-
sion Tree models, as well as Random Forest with
Gradient Boosting, demonstrates substantial per-
formance gains. These improvements collectively
forge a more resilient machine learning founda-
tion, poised to deliver superior results in the do-
main of IoT forensics.

4 PERFORMANCE ANALYSIS
Figure 1:1 Distributed Edge Computing Framework 4.1 Dataset Description
The cornerstone of any machine learning en-
deavor hinges on the quality and relevance of the
dataset utilized for training and evaluation. In this
study, we leveraged the widely recognized UNSW-
NB15 dataset [9], meticulously curated for net-
3.2 Data Pre-processing Techniques work intrusion detection systems, and sourced
In In data preprocessing, crucial for optimal algo- from the esteemed platform, Kaggle [7]. This
rithmic performance, we utilized the pandas li- dataset encapsulates a diverse array of network
brary for dataset manipulation [11]. Initially, we traffic scenarios, including normal, attack, and
refined the dataset structure by excluding redun- mixed instances, providing a comprehensive rep-
dant columns like 'id' and 'attack_cat'. Managing resentation of real-world situations. It encom-
outliers was a priority, adjusting extreme values passes a total of 49 features, comprising both cat-
to align with the 95th percentile. For highly vari- egorical and numerical attributes, offering a holis-
able numeric features, we applied log-transforma- tic perspective on network behavior. Furthermore,
tion, enhancing algorithmic efficiency. Handling the dataset is enriched with labels that categorize
non-numeric categorical data, we retained the instances into various attack categories, facilitat-
most frequent entries and grouped the rest. Em- ing the training of models for specific threat iden-
ploying OneHotEncoding [12], we converted these tification.
entries into a numeric matrix, aligning with ma-
chine learning algorithms' preferences. We parti- 4.2 Evaluation metrics
tioned the dataset for comprehensive model eval- Throughout our research, we utilized multiple
uation, using one subset for learning and the machine learning algorithms to gauge their
other for validation. Post-encoding, we utilized performance on the dataset in question,
StandardScaler for consistent feature scaling. Ad- employing key metrics such as Accuracy, Recall,
dressing data imbalance, we integrated ADASYN Precision, and F1-Score to compare and analyze
to augment underrepresented categories [4]. their effectiveness.
These measures elevate input data quality, en-
hancing predictive precision for advanced analyti-
cal pursuits.
4 IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. #, NO. #, MMMMMMMM 1996

Starting with the Logistic Regression model, we higher at 97.69%. However, a notable point here
observed an accuracy of 92.80%. The model's is the time taken for training which was 4.17
recall, precision, and F1-Score were also in the seconds - a bit higher compared to previous
vicinity of 92.80%. It took about 1.38 seconds for models but justifiable given the model's ensemble
the model to be trained, marking it as relatively nature.

We also delved into ensemble techniques. For


instance, when ensembling predictions from
Logistic Regression and Decision Tree models, the
resultant Ensemble approach boasted an accuracy
of 94.47%, a recall of 94.47%, precision of
94.85%, and an F1-Score of 94.49%. In another
attempt, ensembling predictions from the Random
Forest and Gradient Boosting models resulted in
an accuracy of 96.79%, recall of 96.79%, a
efficient. slightly higher precision of 96.87%, and an F1-
Score of 96.79%.

4.3 Comparative Analysis


The Decision Tree classifier further heightened In the field of machine learning, particularly in
the context of botnet detection, data
preprocessing and algorithm selection play
pivotal roles. Our research focuses significantly
on both of these aspects. This section outlines our
methodologies in comparison to the referenced
papers, specifically addressing preprocessing
techniques and the integration of ensemble
learning.

Figure 4: Performance Metrics of Machine Learning A. Data Preprocessing


Models
Our approach involves the following steps:

1. Outlier Treatment: We implemented a


quantile-based flooring and capping
approach to reduce the impact of outliers
while retaining their information [12].
our expectations, producing an accuracy and
recall of 96.47%. Precision for the Decision Tree 2. Feature Transformation: Features with
was found to be 96.47% and its F1-Score mirrored skewed distributions underwent logarithmic
its accuracy at 96.47%. Training time for the transformations, enhancing their suitability
Decision Tree was even more impressive, clocking for modeling [11].
in at approximately 1.04 seconds.
3. Handling Categorical Variables: For high
cardinality categorical variables [14], we
retained the most frequent categories and
Our exploration with the Random Forest classifier
grouped the less frequent ones, reducing
yielded the most promising results among the
data dimensions without significant
individual models. The accuracy achieved was a
information loss.
commendable 97.68%, and both recall and F1-
Score echoed this value. Precision stood slightly Comparatively, the first referenced paper
predominantly focused on employing machine
learning classifiers with less emphasis on
preprocessing techniques. This approach may
lead to suboptimal results due to the presence of
noise, missing values, and outliers in raw data.
The second paper briefly addressed feature
extraction and dataset cleaning, lacking detailed
exploration.
AUTHOR: TITLE

Our meticulous preprocessing techniques not stantially, overcoming limitations in conventional


only yield a refined dataset but also optimize it for approaches. The resource-efficient approach
machine learning algorithms to derive more promises superior accuracy and efficiency in
accurate patterns. threat identification. Empirical results demon-
strate consistent accuracy metrics exceeding
B. Handling Imbalanced Datasets 95%, a significant enhancement over existing
methodologies. This work paves the way for
In dealing with imbalanced datasets, ADASYN broader applications in IoT security, opening
(Adaptive Synthetic Sampling) was preferred over doors for wider IoT adoption in sectors like smart
SMOTE [9] due to its localized approach and homes and industrial automation. Additionally, the
effectiveness in reducing over-generalization. fusion of edge computing and advanced machine
ADASYN generates synthetic samples for the learning models presents exciting prospects for
minority class, emphasizing regions with lower future IoT security protocols [7] [9].
density.

C. Ensemble Learning 6.3 REFERENCES


We incorporated ensemble techniques [8] by
aggregating predictions from various models. For
example, combining predictions from Logistic
Regression and Decision Tree models yielded an
accuracy of over 94.47%. Another ensemble
involving Random Forest and Gradient Boosting
models achieved an accuracy of 96.79%. These
ensembles synergize the strengths of individual
models, resulting in more robust and stable
predictions.

Comparatively, neither of the referenced papers


fully leveraged ensemble learning. While
individual models can attain high accuracy, they
are often more susceptible to data variances. Our
framework ensures a holistic representation,
drawing from the strengths of multiple models.

6 END SECTIONS
6.1 Future Work
The next phase of IoT forensics research should
focus on dynamic threat response strategies for
real-time adaptation to evolving attack patterns.
Integrating real-time monitoring and exploring
blockchain integration for enhanced data integrity
are crucial steps. Optimizing machine learning
model deployment on edge devices is vital for re-
source efficiency. Rigorous robustness testing and
scalability assessment under diverse conditions
are imperative. Additionally, refining user authen-
tication mechanisms will ensure secure access.
These efforts promise to fortify IoT security, en-
abling adaptability and resilience against evolving
cyber threats.

6.2 Conclusion
In summary, our research in IoT forensics ad-
vances the challenge of resource constraints in
IoT devices by integrating Distributed Edge Com-
puting with advanced machine learning tech-
niques [18]. This framework not only enhances
threat detection but also fortifies IoT security sub-
6 IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. #, NO. #, MMMMMMMM 1996

[1] Z. Arshad, H. Rahman, J. Tariq, A. Riaz, A. Imran, A. Yasin and I. Ihsan, "Digital Forensics
Analysis of IoT Nodes using Machine Learning," Journal of Computing & Biomedical Infor-
matics, November 2022.

[2] S. J. Bigelow, "What is edge computing? Everything you need to know," TechTarget, Decem -
ber 2021. [Online]. Available: https://www.techtarget.com/searchdatacenter/definition/edge-
computing.

[3] A. U. Rehman, K. Alissa, T. Alyas, K. Zafar, Q. Abbas, N. Tabassum and S. Sakib, "Botnet Attack
Detection in IoT Using Machine Learning," Computational Intelligence and Neuroscience, Oc-
tober 2022.

[4] activeloop, "Adaptive Synthetic Sampling (ADASYN)," activeloop, [Online]. Available:


https://www.activeloop.ai/resources/glossary/adaptive-synthetic-sampling-adasyn/
#:~:text=Adaptive%20Synthetic%20Sampling%20(ADASYN)%20is,classification
%20performance%20for%20underrepresented%20classes..

[5] M. Z. Arshad, H. Rahman, J. Tariq, A. Riaz, A. Imran and I. Ihsan, "Digital Forensics Analysis
of IoT Nodes using Machine Learning".

[6] B. W. a, J. K. b. c, A. S. b, M. N. c, N. M. d and K. J. W. b, "Enhancing IoT anomaly detection


performance for federated learning," Digital Communications and Networks, vol. 8, 2022.

[7] K. Cao, Y. Liu, G. Meng and Q. Sun, "An Overview on Edge Computing Research," An Over-
view on Edge Computing Research, vol. 8, 1 May 2020.

[8] W. Yu, F. Liang, X. He, W. G. Hatcher, C. Lu, J. Lin and X. Yang, "A Survey on the Edge Com -
puting for the Internet of Things," 2017.

[9] I. Psychoula, D. Singh, L. Chen, F. Chen, A. Holzinger and H. Ning, "Users' Privacy Concerns
in IoT Based Applications," 2018.

[10] B. Chen, J. Wan, A. Celesti, D. Li, H. Abbas and Q. Zhang, "Edge Computing in IoT-Based
Manufacturing," Edge Computing in IoT-Based Manufacturing, vol. 56, 2018.

[11] H. El-Sayed, S. Sankar, M. Prasad, D. Puthal, A. Gupta, M. Mohanty and Chin-Teng, "Edge of
Things: The Big Picture on the Integration of Edge, IoT and the Cloud in a Distributed Com -
puting Environment," 2017.

[12] Pandas, "Pandas," NumFOCUS, [Online]. Available: https://pandas.pydata.org/.

[13] A. Y. Hussein, P. Falcarin and A. T. Sadiq, "Enhancement performance of random forest algo-
rithm via one hot," vol. 9, August 2021.

[14] W. Buttijak, K. Suchatpattmakul, S. Suksirisophak and W. Suwansantisuk, "Comparison of


Methods to Tackle Class Imbalance in Binary Classification for IoT Applications," 2020.

[15] N. Moustafa and J. Slay, "UNSW-NB15: a comprehensive data set for network intrusion de-
tection systems (UNSW-NB15 network data set)," December 2015.

[16] M. W. DAVID, "UNSW_NB15," Kaggle, 2019. [Online]. Available:


https://www.kaggle.com/datasets/mrwellsdavid/unsw-nb15.

[17] M. A. Samara, I. Bennis, A. Abouaissa and P. LorenzORCID, "A Survey of Outlier Detection
Techniques in IoT: Review and Classification," 2022.

[18] A. Kusiak, "Feature transformation methods in data mining," July 2001.


AUTHOR: TITLE

[19] P. Cerda and G. Varoquaux, "Encoding High-Cardinality String Categorical Variables".

[20] J. Siłka, M. Wieczorek and M. Woźniak, "BiLSTM deep neural network model for imbalanced
medical data of IoT systems," Future Generation Computer Systems, vol. 141, 2023.

[21] Simplilearn, "What Is Ensemble Learning? Understanding Machine Learning Techniques,"


August 2023. [Online]. Available: https://www.simplilearn.com/ensemble-learning-article.

[22] J. Okwuibe, M. Liyanage, M. Ylianttila and T. Taleb, "Survey on Multi-Access Edge Comput-
ing for Internet of Things Realization," 2018.

[23] U. Y. Khan and T. R. Soomro, "Applications of IoT: Mobile Edge Computing Perspectives".

[24] D. S. M. Kumar and D. Majumder, "Healthcare Solution based on Machine Learning Applica -
tions in".

[25] V. Prakash, A. Williams, L. Garg, C. Savaglio and S. Bawa, "Cloud and Edge Computing-
Based Computer Forensics: Challenges and Open Problems," Cloud and Edge Computing-
Based Computer Forensics: Challenges and Open Problems, 2021.

You might also like