Research Paper
Research Paper
Research Paper
Abstract—As smart devices and the Internet develop, the devices of the users.[4] This not only reduces the advancement
Internet of Things (IoT) technologies have become an important of IoT technology but also slows down the development of
factor in our life. IoT helps manufactory companies to monitor IoT infrastructure. Therefore, providing security and privacy of
the status of every machine in real time, the quality of products
and the environment variables within the factory. This not only these constantly and heavily connected devices has became a
allows managers to reduce the risk of damages and losses, also major challenge. Another key issue for providing security and
help to make decision from a higher overall standpoint. In privacy to these devices is the managing the huge amount of
addition, IoT has changed people’s life and behavior. People data generated by them, which is quite difficult using general
are now relied on IoT devices and services more than ever. data collection, storage and processing techniques[18].
However, anomalies can caused security and safety issues for an
IoT network. It is important to detect anomalies and alarm user With the development of Machine Learning (ML) and
to prevent damages or losses. In this paper, we proposed using Deep Learning (DL), learning algorithms can learn from the
the Machine Learning and Deep Learning methods to detect results of trained data and adapt in order to increase the
anomalies in a network. The experiments were performed on the performance to make informed and intelligent decisions. A
IoT-23 dataset. The performance and time cost for these models learning algorithm that has been trained by the data is able
are compared to give us the best algorithm with high performance
in less time. to establish the difference between regular benign traffic in
the model with the malicious traffic. In other words, it can
Index Terms—Internet of Things, security, malicious node, detect when there is an abnormal behaviour in the network
anomaly detection, Machine Learning, Deep Learning.
thereby preventing unauthorized access. Learning algorithms
are basically classified into two categories which are Super-
I. I NTRODUCTION vised Learning and Unsupervised Learning. We try to use the
Internet of Things (IoT) is a revolution to the global infor- light weighted machine learning methods and neural networks
mation industry after the Internet. The IoT is a smart network for accuracy improvement on detecting malicious node. The
that allows devices to exchange information and communicate Central unit in the model captures IoT traffic data and sends
with each other through internet. With IoT, human can achieve the data to a selected trained Machine Learning or Deep
the purpose of tracking, monitoring, locating, identifying and Learning model. Multiple trained Machine Learning and Deep
managing things [1]. Since the revolution of the Internet and Learning models are tested. The reason for choosing multiple
mobile devices, IoT has become an evolving and hot research models is to fit the individual needs for different users or
topic within the computer science industry. The number of IoT groups. In other words, it is important to find the efficient
devices on the Internet is increasing every year and in every model for different type of user.
sector such as: Smart Healthcare, Smart Transportation, Smart This large data in the IoT network and the heterogeneity of
Governance, Smart Agriculture, Smart Grid, Smart Home, the data makes it to difficult to improve the security and to
Smart Supply chain etc. [2]. meet all the requirements such as cost effectiveness, reliability,
Because of the convenience brought by IoT, the behavior performance etc. In some cases, if one of the feature is
of humans has also changed. People of younger generations improved then it may effect performance of other features[16].
are more used to use services from IoT devices such as smart For example, an increase in the number of security checks and
bulbs, smart oven, smart refrigerator,AC,temperature sensor, protocols in all data transfer then it may result in the increase
smoke detector etc. [3] However, as IoT develops, the concerns in cost and latency of that particular application making it
of the privacy and security issues has increased among the unsuitable for certain users. Also the increase in number of
users. As all the devices are connected to the internet and each devices connected increases the chance for attacker to gain
other, this leads to more number of ways for the attacker to access the network by accessing the node or device that has
access the information possible. The connected devices collect a weak link for example a device like smart bulb. Most of
data with personal information and stores it. Most of the users the devices that are available in the market as of now do not
do not have knowledge about IoT technology, and the hackers have the security features like firewalls, anti-virus etc. As the
can steal information from the users or even control the smart IoT devices are resource constrained it is important for these
2
devices to detect an intrusion with less complexity and time. into place to determine when to activate the IDS to detect
So, the use of Machine learning(ML) and Deep Learning(DL) an anomaly and to add a new rule to signature pattern and
techniques helps to reduce this complexity as these models build the model. Machine learning or Deep Learning methods
learn from the trained data. It is important for the central unit have been discussed in [16]. The various types of attacks at
to classify the message’s integrity. The privacy and security different levels of IoT infrastructure are clearly explained and
issues of IoT motivates researches for developing framework the possible solutions to these attacks using Machine learning
of automatic IoT sensors attack and anomaly detection[14]. are also clearly explained, that which are caused due to the
In this paper, We proposed to use ML/DL algorithms such as lack of proper security data available, the low quality data
Support Vector Machines, Decision Trees, Naive Bayes and available and performance of the learning algorithms could be
Convolutional Neural Networks for anomaly detection and the key in providing and improving the security and privacy
based on their accuracy and time cost the better algorithm of IoT devices. In this paper we would like to calculate the
to use can be concluded. And we used the IoT-23 dataset accuracy and time cost for the models, thereby comparing
for the implementation of ML/DL methods. The paper goes them to get the model that gives highest accuracy with less
as follows, in Section II literature review is discussed, in amount of time to detect and prevent the malware attacks in
Section III methodology is explained, in Section IV results are resource constrained IoT devices.
discussed with evaluation metrics and comparison. In sections
V we concluded the paper with a few suggestion of future III. M ETHODOLOGY
work. At last, references for this study is included.
A. Proposed Model
II. L ITERATURE R EVIEW
This study proposes an anomaly detection system model for
In this section, all the different anomaly detection algo- IoT security. Fig.1 is the diagram of the proposed anomaly
rithms and methodologies are briefly discussed. There are a detection system model. In our proposed model, a traffic
number of different mechanisms to improve the safety and capture unit captures traffic flow from sensors to the central
privacy of IoT devices. For example, in [12], chaos based unit. The captured traffic flow will be send to a compute unit,
encryption technique is used to generate symmetric keys to which can be a cloud or local computer. Then the compute unit
provide secured data transmission between server and the IoT will run multiple Machine Learning (ML) and Deep Learning
device which guarantees the data integrity and authenticity. (DL) models in order to get the performance and cost of each
According to [13], a mechanism with low computational individual model. Also, the compute unit will store the traffic
complexity has been proposed by using, random hopping flow to its database for future studies or model re-calibration.
sequence and random permutations to hide valuable informa- After getting the performance and cost of the ML/DL models,
tion. Moreover, in [14], Doshi presented a method to detect the user or system will select the model that is going to be
DDoS attacks in the network layer with low-cost machine used for anomaly detection. When detecting anomalies, the
learning approach, including KNN, LSVM, NN, Decision compute unit will send message or commands back to the
Tree, and Random Forest. This method can detect which node central unit such as dropping packets, malware scan, physical
is attacking the central unit with IP address. This method was inspection, marking IP address and alarming user. With our
reported to achieve high testing accuracy for all five machine proposed model, users can choose the ML/DL model based
learning algorithms. In [21] detection of anomaly is done on the performance and cost, such as accuracy and time cost.
using the fog computing, which clusters the different types of Since every user has different situation and usage of a anomaly
anomalies present in the sensor layer or edge nodes without detection system for IoT security, it is important to offer the
performing computation on both the cloud and sensor layer but best fit for different users. Moreover, since our proposed model
in the fog layer of the network. By using the fog computing captures traffic flow and store them into the database, the new
method it has become more easy to detect an anomaly. In dataset can be generated and be used for future re-calibration
[17], the author tries to implement malware detection system in the existing ML/DL models to further improve performance.
by using different classifiers of k-NN and random forest to Machine Learning algorithms such as Support Vector Ma-
build the model. The device filters TCP packets and selects chine(SVM), Random forest, Naive Bayes, Nearest Neigh-
important features such as frame numbers, length, labels etc. bours etc. and Deep learning methods such as Convolutional
The k-NN algorithm assigns traffic to the class while the Neural Networks(CNN) are trained with the data and then
random forest classifier builds decision trees to detect the computation is done to detect the anomaly in the system which
malware. The authors have proposed a new methodology in can be done on a local machine or on cloud. The dataset is
[22] which uses game theory and nash equilibrium to help divided into training and testing data and then based on the
the resource constrained IoT devices to detect an anomaly algorithm trained, conclusions can be drawn from the obtained
using Intrusion Detection System(IDS), activating it only when results. If an anomaly is detected then certain possible actions
needed. When an attack occurs the attack pattern (signature) can be taken based on the result such as: Dropping packets,
is stored and then model is trained and whenever pattern Blacklist sender’s IP address, Alarm user, Physical inspection
repeats it is identified by the signature detection technique and and more. The system can then scanned to detect any malware
anomaly is detected. Using IDS all the time can be resource present and also physical inspection can be done on the marked
consuming, so the game theory and nash equilibrium come devices.
3
TABLE I
VARIABLES AND DEFINITION FOR ZEEK FILES
B. Dataset
The dataset in this study was obtained from [20], the IoT-
23 dataset, which is a very recent one that was published in
January 2020 consisting of network traffic from 3 different
smart home IoT devices. The devices used were Amazon Echo,
Philips HUE and Somfy Door Lock. It is a large dataset of
real and labeled IoT malware infections and benign traffic
especially made to develop Machine learning algorithms. It
consists of 23 captures(also called scenarios), in the 23 cap-
tures, there are 20 malicious captures and 3 benign captures.
Captures from infected devices will have the possible name of
the malware sample executed on each scenario.
The malware labels for IoT-23 dataset are: Attack,
C&C, C&C-FileDownload, C&C-HeartBeat, C&C-HeartBeat-
Attack, C&C-HeartBeat-FileDownload, C&C-Mirai, C&C-
Torii, DDoS, FileDownload, Okiru, Okiru-Attack, PartOfA-
HorizontalPortScan.
In addition, Zeek is a software that perform network
analysing. The IoT-23 dataset we used is in the format of
conn.log.labeled, which is the Zeek conn.log file that was
generated from the Zeek network analyser using the original
pcap file. The variable types and definition for IoT-23 dataset
are as shown in Table I.
Since the dataset is huge, we have decided to capture part
of records from each individual dataset, then combine them
Fig. 1. The proposed anomaly detection system model to a new dataset. By doing this, our computer can handle the
workload for the new dataset, and the new dataset remains
most of the attack types of IoT-23 dataset.
The results obtained are compared with each other in order
to define the efficient method that can be used for the real C. Data Preprocessing
time data. The factors taken into consideration are ”accuracy” First, we used the Python library Pandas to load all 23
and ”time cost” taken for the algorithm. For example, even datasets separately of the IoT-23 Dataset into data frames with
if a model gives 100 percent accuracy and takes a lot of a condition of skipping the first 10 rows and reading the one
time it isn’t suitable for IoT network because the devices hundred thousand rows after. Then we combined all 23 data
are resource constrained. Therefore, our proposed model is frames into a new data frame. Next, we dropped the variables
to offer an optimal solution for different type of users, such that have no impact to the results. These variables are: ts, uid,
as a big company with lots of resources that aiming for the id.orig h, id.orig p, id.resp h, id.resp p, service, local orig,
highest accuracy or a small company that worries about cost local resp, history. Furthermore, we gave dummy values to
efficiency. the proto and conn state variables and replaced all the missing
4
Layer (type) Output Shape Number of Parameters Method Testing Accuracy Time Cost
Input (Dense) (None, 2000) 50000 Naive Bayes 0.30 6 seconds
dense 1 (Dense) (None, 1500) 3001500 SVM 0.69 5849 seconds
dropout 1 (Dropout) (None, 1500) 0 Decision Tree 0.73 3 seconds
dense 2 (Dense) (None, 800) 1200800 CNN 0.6935 242 seconds
dropout 2 (Dropout) (None, 800) 0
dense 3 (Dense) (None, 400) 320400 TABLE IX
dropout 3 (Dropout) (None, 400) 0 R ESULTS COMPARISON WITH PAPER [19]
dense 4 (Dense) (None, 150) 60150
dropout 4 (Dropout) (None, 150) 0 Method Testing Accuracy
Output (Dense) (None, 12) 1812 Naive Bayes (ours) 0.30
Total parameters: 4,634,662 Naive Bayes (paper[19]) 0.23
Trainable parameters: 4,634,662 SVM (ours) 0.69
Non-trainable parameters: 0 SVM (paper[19]) 0.67
TABLE VII
CNN RESULTS
As shown in Table VII that the testing accuracy for CNN
model is 69 percent and the execution time is around 4
training accuracy training loss testing accuracy testing loss minutes. Although CNN has lower accuracy and higher time
0.6937 0.8583 0.6935 0.8602
time cost 242 seconds
cost than Decision Trees, CNN can have a better performance
when dealing with a more complex dataset.
of different learning algorithms and methods. Based on our [17] L. Xiao, X. Wan, X. Lu, Y. Zhang and D. Wu, ”IoT Security Techniques
results, Naive Bayes has the worst performance of all learn- Based on Machine Learning: How Do IoT Devices Use AI to Enhance
Security?,” in IEEE Signal Processing Magazine, vol. 35, no. 5, pp. 41-49,
ing algorithms and methods, and Decision Trees has shown Sept. 2018, doi: 10.1109/MSP.2018.2825478.
the highest accuracy with least cost of time among all the [18] F. Hussain, R. Hussain, S. A. Hassan and E. Hossain, ”Machine Learning
ML/DL methods. In the future, more datasets from different in IoT Security: Current Solutions and Future Challenges,” in IEEE
Communications Surveys and Tutorials, vol. 22, no. 3, pp. 1686-1721,
environment should be tested in the ML/DL methods used in thirdquarter 2020, doi: 10.1109/COMST.2020.2986444.
this study. This can help to further clarify the performance, [19] N. A. Stoian, ”Machine Learning for anomaly detection in
time cost and comparison between the methods. IoT networks : Malware analysis on the IoT-23 data set,”
http://purl.utwente.nl/essays/81979
[20] IoT-23 Dataset ”https://www.stratosphereips.org/datasets-iot23”
[21] L. Lyu, J. Jin, S. Rajasegarar, X. He and M. Palaniswami, ”Fog-
R EFERENCES Empowered Anomaly Detection in IoT Using Hyperellipsoidal Cluster-
ing,” in IEEE Internet of Things Journal, vol. 4, no. 5, pp. 1174-1184,
[1] S. Chen, H. Xu, D. Liu, B. Hu and H. Wang, ”A Vision of IoT: Oct. 2017, doi: 10.1109/JIOT.2017.2709942.
Applications, Challenges, and Opportunities With China Perspective,” in [22] H. Sedjelmaci, S. M. Senouci and M. Al-Bahri, ”A lightweight
IEEE Internet of Things Journal, vol. 1, no. 4, pp. 349-359, Aug. 2014, anomaly detection technique for low-resource IoT devices: A game-
doi: 10.1109/JIOT.2014.2337336. theoretic methodology,” 2016 IEEE International Conference on Com-
[2] G. Shen and B. Liu, ”The visions, technologies, applications and security munications (ICC), Kuala Lumpur, Malaysia, 2016, pp. 1-6, doi:
issues of Internet of Things,” 2011 International Conference on E- 10.1109/ICC.2016.7510811.
Business and E-Government (ICEE), Shanghai, China, 2011, pp. 1-4,
doi: 10.1109/ICEBEG.2011.5881892.
[3] Huang, Y., Benford, S., Price, D., Patel, R., Li, B., Ivanov, A., and
Blake, H. (2020). Using Internet of Things to Reduce Office Workers’
Sedentary Behavior: Intervention Development Applying the Behavior
Change Wheel and Human-Centered Design Approach. JMIR mHealth
and uHealth, 8(7), e17914–. https://doi.org/10.2196/17914
[4] Almusaylim, Z., and Zaman, N. (2018). A review on smart home present
state and challenges: linked to context-awareness internet of things (IoT).
Wireless Networks, 25(6), 3193–3204. https://doi.org/10.1007/s11276-
018-1712-5
[5] Singh, K., and Singh, N. (2020). An ensemble hyper-tuned model for IoT
sensors attacks and anomaly detection. Journal of Information and Opti-
mization Sciences, 1–25. https://doi.org/10.1080/02522667.2020.1799515
[6] Kumar, S., Vealey, T., and Srivastava, H. (2016). Security in Internet
of Things: Challenges, Solutions and Future Directions. 5772–5781.
https://doi.org/10.1109/HICSS.2016.714
[7] Tahsien, S., Karimipour, H., and Spachos, P. (2020). Machine learn-
ing based solutions for security of Internet of Things (IoT): A sur-
vey. Journal of Network and Computer Applications, 161, 102630–.
https://doi.org/10.1016/j.jnca.2020.102630
[8] Tahsien, S., Karimipour, H., and Spachos, P. (2020). Machine learn-
ing based solutions for security of Internet of Things (IoT): A sur-
vey. Journal of Network and Computer Applications, 161, 102630–.
https://doi.org/10.1016/j.jnca.2020.102630
[9] A. Mosenia and N. K. Jha, ”A Comprehensive Study of Secu-
rity of Internet-of-Things,” in IEEE Transactions on Emerging Topics
in Computing, vol. 5, no. 4, pp. 586-602, 1 Oct.-Dec. 2017, doi:
10.1109/TETC.2016.2606384.
[10] J. Deogirikar and A. Vidhate, ”Security attacks in IoT: A survey,” 2017
International Conference on I-SMAC (IoT in Social, Mobile, Analyt-
ics and Cloud) (I-SMAC), Palladam, 2017, pp. 32-37, doi: 10.1109/I-
SMAC.2017.8058363.
[11] M. Nawir, A. Amir, N. Yaakob and O. B. Lynn, ”Internet of Things
(IoT): Taxonomy of security attacks,” 2016 3rd International Confer-
ence on Electronic Design (ICED), Phuket, 2016, pp. 321-326, doi:
10.1109/ICED.2016.7804660.
[12] T. Song, R. Li, B. Mei, J. Yu, X. Xing and X. Cheng, ”A Privacy Pre-
serving Communication Protocol for IoT Applications in Smart Homes,”
in IEEE Internet of Things Journal, vol. 4, no. 6, pp. 1844-1852, Dec.
2017, doi: 10.1109/JIOT.2017.2707489.
[13] M. N. Aman, B. Sikdar, K. C. Chua and A. Ali, ”Low Power Data
Integrity in IoT Systems,” in IEEE Internet of Things Journal, vol. 5, no.
4, pp. 3102-3113, Aug. 2018, doi: 10.1109/JIOT.2018.2833206.
[14] Doshi, R., Apthorpe, N., and Feamster, N. (2018, April 11). Machine
Learning DDoS Detection for Consumer Internet of Things Devices.
https://doi.org/10.1109/SPW.2018.00013
[15] V. Hassija, V. Chamola, V. Saxena, D. Jain, P. Goyal and B. Sikdar,
”A Survey on IoT Security: Application Areas, Security Threats, and
Solution Architectures,” in IEEE Access, vol. 7, pp. 82721-82743, 2019,
doi: 10.1109/ACCESS.2019.2924045.
[16] M. A. Al-Garadi, A. Mohamed, A. K. Al-Ali, X. Du, I. Ali and
M. Guizani, ”A Survey of Machine and Deep Learning Methods for
Internet of Things (IoT) Security,” in IEEE Communications Surveys
and Tutorials, vol. 22, no. 3, pp. 1646-1685, thirdquarter 2020, doi:
10.1109/COMST.2020.2988293.