Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Prevention and detection of DDOS attack in virtual cloud computing environment using Naïve Bayes algorithm of machine learning

The document discusses the prevention and detection of Distributed Denial of Service (DDoS) attacks in virtual cloud computing environments using the Naïve Bayes algorithm. It highlights the increasing prevalence of DDoS attacks due to the scalability and accessibility of cloud computing, which poses significant security challenges. The research emphasizes the need for effective detection and mitigation strategies to enhance the resilience of cloud-based services against such cyber threats.

Uploaded by

sigit pramono
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Prevention and detection of DDOS attack in virtual cloud computing environment using Naïve Bayes algorithm of machine learning

The document discusses the prevention and detection of Distributed Denial of Service (DDoS) attacks in virtual cloud computing environments using the Naïve Bayes algorithm. It highlights the increasing prevalence of DDoS attacks due to the scalability and accessibility of cloud computing, which poses significant security challenges. The research emphasizes the need for effective detection and mitigation strategies to enhance the resilience of cloud-based services against such cyber threats.

Uploaded by

sigit pramono
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Measurement: Sensors 31 (2024) 100991

Contents lists available at ScienceDirect

Measurement: Sensors

journal homepage: www.sciencedirect.com/journal/measurement-sensors

Prevention and detection of DDOS attack in virtual cloud computing environment using Naïve
Bayes algorithm of machine learning
Yongqiang Shang
Xinyang Agriculture and Forestry University, Department of Information Engineering Department, Xinyang, Henan, 464000, China

ARTICLEINFO ABSTRACT
Keywords: The popularity of cloud computing, with its incredible scalability and accessibility, has already welcomed a new era of
Machine learning innovation. Consumers who subscribe to a cloud-based service and use the associated pay-as-you-go features have unlimited
Cyber attack access to the applications mentioned above and technologies. In addition to lowering prices, this notion also increased the
Virtual cloud computing environment reliability and accessibility of the offerings. One of the most crucial aspects of cloud technology is the on-demand viewing of
Cloud computing personal services, which is also one of its most significant advantages. Apps that are cloud-based are available on demand
Navie bayes from anywhere in the world at a reduced cost. Although it causes its users pain with safety concerns, cloud computing can
thrive because of its fantastic instantaneous services. There are various violations, but they all accomplish something similar,
taking the systems offline. Distributed denial of service attacks are among the most harmful forms of online assault. For fast
and accurate DDoS (Distributed Denial of Service, distributed denial of service) attack detection. This research introduced the
DDOS attack and a method to defend against it, making the system more resistant to such attacks. In this scenario, numerous
hosts are used to carrying out a distributed denial of service assault against cloud- based web pages, sending possibly
millions or even trillions of packets. It uses an OS like ParrotSec to pave the way for the attack and make it possible. In the last
phase, the most effective algorithms, such as Naive Bayes and Random Forest, are used for detection and mitigation. Another
major topic was studying the many cyber attacks that can be launched against cloud computing.

1. Introduction that do not use cloud computing. Focused cloud-based


crimes are already using their innovations. Many security
DDos attack is a distributed type of attack mode in which vulnerabilities in cloud computing are unique compared to
an attacker controls a large number of attack machines and their predecessors in non- cloud computing environments
sends out DoS attack instructions to the machine. In the because data and business logic are stored on an external
latest Internet security report, DDoS attacks remain one of cloud server that lacks accessible oversight. The denial-of-
the major cybersecurity threats. The inexpensive pricing and service (DoS) assault is one technique that has been in the
"pay-as-you-go" focused accessibility to computational spotlight recently. Denial-of-service incidents are directed at
features and amenities on demand make cloud-based the server rather than the people it supports. DoS attackers
services a formidable competitor to the conventional IT attempt to flood live servers by masquerading genuine users
solutions available in prior eras. The use of cloud computing to overload the service’s capacity to handle incoming
is gaining popularity rapidly. Whether entirely or largely inquiries [1]. Cloud computing is an Internet-based service
governments and companies have moved their IT that enables users to access configurable computing
infrastructures onto the cloud. Cloud-based Infrastructure resource sharing pools (including server, storage, application
offers various advantages compared to traditional, on-site software, services, networks, etc.) to achieve online access
conventional infrastructures. The removal of expenses to computing resources on demand. As a mixture of
associated with operation and impairment, as well as the emerging technologies and business models, cloud
accessibility of materials on request, are only a few of the computing has developed rapidly in recent years due to its
advantages. However, there are many concerns that cloud advantages of super-large scale, virtualization, high
consumers have, and the research addresses these issues. reliability, good scalability and on-demand services. To
The majority of these inquiries centre on safeguarding overcome this issue, multiple inquiries are sent to the server
operational concepts and information. Many security- simultaneously. The term "distributed denial of service," or
related attacks can be prevented in conventional IT systems DDoS, refers to a variation on the classic "denial of service"

E-mail address: YongqiangShang2@163.com.


https://doi.org/10.1016/j.measen.2023.100991
Received 25 July 2023; Received in revised form 13 November 2023; Accepted 18 December 2023
Available online 20 December 2023
2665-9174/© 2024 The Author. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/bync-nd/4.0/).
Y. Shang Measurement: Sensors 31 (2024) 100991
that uses numerous computers to attack and impair one significant number of virtual machines (VMs) with a far
service at a time simultaneously. Among the most important greater capacity to meet their resource requirements. This is
and possibly catastrophic risks, among many others, is the because a cloud-based virtual machine can access infinite
growing number of distributed denial of service attacks resources. A Distributed Denial of Service (DDoS) assault,
observed. A quarter or more of the world’s organizations also known as an Economic Denial of Sustainability (EDoS)
have experienced a distributed denial of service attack. The attack or a Fraudulent Resource Consumption (FRC) attack, is
authors show great foresight in predicting DDoS attacks and the result of this "adaptability" or "auto-scaling," which
will increasingly focus on cloud-based assets and amenities. results in financial losses. Data center-distributed denial of
Multiple assaults in the past two years corroborate the service attacks is the focus of this study. We also define
paper’s predictions about future attacks. There have been these assaults, compared to more conventional DDoS
many attacks recently, but only a few have gained attacks, and analyze and classify the numerous
widespread notoriety and interest from scientists. In 2015, developments in this field. We will provide a comprehensive
Lizard Squad hacked Microsoft and Sony’s cloud-based taxonomy of these functions to make this analysis more
gaming systems, causing both firms to shut down their approachable. The popularization of computer network has
services on Christmas Day. Distributed denial of service changed the way we work and live, and the promotion of
attacks hit Rackspace, a cloud computing services provider, cloud computing in recent years has provided a more
hard. Another massive distributed denial of service attack convenient platform for network resource sharing. Cloud
was launched against Amazon EC2 cloud servers, serving as a computing platform organically integrates computer
magnificent example of an attack. Company activities were infrastructure (including server resources, storage resources,
severely disrupted, money was lost, and there were network resources, etc.) through virtualization technology
immediate and long-term effects on the attacked businesses. means, so as to realize resource sharing among multiple
In recent years, DDoS attacks have become more frequent, users, and greatly reduce the cost of using resources, making
and the botnet used by attackers has become larger, and the it possible to provide cheap and high- performance services
network traffic usage has reached a height of 1000G. For for users.
cloud computing platforms, DDoS attacks from outside are
similar to DDoS attacks from traditional networks. According
to the basic principle and characteristics of DDoS attack, the 1.1. DDoS attack and cloud features
defense is mainly divided into four stages: detection
(Detecting), analysis (Analyz-ing), defense (Resisting) and Denial of service (DoS, Denial of Service) attack is a
counterattack (Counterattack). The detection and analysis destructive attack on the target server through abnormal
technology is the key to the successful defense against DDoS methods, resulting in its inability to provide services to
attacks. normal network users. Currently, distributed denial of
According to research published by Verisign iDefense service (DDoS) assaults have achieved much accomplishment
Security Intelligence Solutions, distributed denial of service in cloud computing, where hackers make use of the "pay-as-
(DDoS) assaults have been particularly damaging to the you-go" model. Many factors contribute to cloud
internet and SaaS (Software as a Service) business computing’s meteoric rise in popularity, but these three
throughout the past few quarters. More than 75 % of known features stand out as particularly crucial. On the other hand,
countermeasures against DDoS assaults utilized services DDoS attackers have found that the same set of features
provided by the cloud [2]. "financial damages" refers to one dramatically aids them in achieving the objectives of their
of the worst possible results of a Distributed Denial of cyber attacks. In the sections that follow, we will examine
Service attack in the cloud. The median price of a distributed each of these features more closely. Fig. 1 describes the
denial of service assault is put at $482,000, according to cloud architecture which was affected by DDoS attack [5].
some estimates. Some of the financial losses suffered in Q1 There are many methods and forms of DoS attack, which are
2015 have been detailed in new disclosures from Neustar. summarized in the following situations: illegal occupation
Studies show that, on average, more than $72K is stolen in a and consumption of computer resources such as CPU,
single hour. Distributed denial of service (DDoS) attacks take network bandwidth and storage space; changing or even
on new significance in cloud computing. This variation destroying the configuration information of the target
directly results from the operational difficulties introduced server; changing or even destroying the key node equipment
by an assault on the victim network [3]. In the environment in the physical network; and accessing the services by
of cloud computing, DoS attack technology is also programming.
undergoing new changes, and is manifested in a variety of
forms. The attack may come from outside the server cluster 1.1.1. Automatic sizing
or from inside the server cluster. At present, the more Physical virtualization provides the capacity to scale
popular attack method is for the attacker to attack a specific down, up, and re-resource a live VM. A VM’s processing
cloud computing platform or server cluster. This attack power, primary memory, storage area, and data transfer
method causes great harm and various methods, and it is capacity can all be increased as needed, thanks to these
difficult to quickly carry out fault positioning and features. When some of the assigned resources are not
troubleshooting. being used or needed, this can be utilized to free up some of
Clouds that provide Infrastructure as a Service (IaaS) to those capabilities. Multiple vendors of services employ this
their clients contain virtual machines (VMs) that host the method of resource distribution, which is made practical by
amenities for the clients. The flexibility and on-demand automatic scaling and web- based tools. This allows those
nature of the cloud is made possible by the abstraction of who use the cloud to calculate their needed facilities using
servers. It allows virtual machines to acquire and distribute utilization rates or similar matrices. It is possible to extend
capabilities on the fly as needed. The advantages of cloud this functionality to automatically deploy new virtual
computing, such as upon-request processing and easily machines (VMs) on top of existing physical servers and
accessible assets, have contributed significantly to its recent remove them when they are no longer needed. Upward
meteoric rise. As a result, the cloud can now support a more scaling, which refers to adding more machines, and

2
Y. Shang Measurement: Sensors 31 (2024) 100991
horizontal scaling, which refers to adding more data centres consequently, one’s return on investment (ROI) can be
or clouds, are two of the most crucial computing features for accomplished through multi-tenancy. On the same physical
server, a single user can want to run multiple instances of
the same program or entirely distinct ones using different
virtual machines.

1.2. Cloud-based DDoS attack situation

The attack depicted in Fig. 1 is very normal. The cloud


requires enormous computers that can service multiple
users in a standardized setting. An attacker’s purpose may
not always be limited to a "Denial of Service" but can include
reducing the profitability of cloud subscribers. How to
prevent assaults like this has been a hot topic since the
inception of cloud computing. The term "Fraudulent
Resource Consumption" (FRC) attacks have been used in
many other works to characterize this type of attack.
Dispersed denial of service attacks targeting web pages and
hackers plant bots and trojans on compromised systems all
over the Internet. A DDoS attack will be executed as an EDoS
Fig. 1. Cloud architecture DDoS attacks. attack if the target service is hosted in the cloud. "Booters"
are businesses that connect their clients with a botnet to
launch distributed denial of service attacks (DDoS) on their
utility purposes. Distributing an application across multiple
rivals’ web pages. Attacks like these can be spurred on by
cloud- hosted physical servers is one way to increase its
everything from commercial competition to political rivalry
capacity. High-speed connections and ample storage space
to ransom to full-out cyber war between nations [22]. In
are the two most essential factors in determining
view of the working principle that DDoS consumes system
adaptability. The virtualization of OSes is crucial when
resources and causes the system cannot provide normal
contemplating the scalability of virtual machines (VMs). The
services, network managers can optimize and reinforce the
process of replicating a virtual machine and then releasing it
system to improve the system’s tolerance of DDoS attacks,
is quick. To alleviate strain, duplicate virtual machines might
and even block some DDoS attack packets. Firstly, improve
be launched on different servers [4]. This action can be taken
the network planning and design scheme to eliminate the
at any time when it is required. Streaming virtual machine
unreasonable factors of network structure; then implement
deployments are an additional significant expansion
the system security vulnerabilities and hidden dangers in the
accelerator because they allow migrating an active virtual
network system in the last, scan the key network equipment
server to a different nation’s more comprehensive hardware
such as firewall and router, to find the bottleneck of network
server with practically no interruption. This guarantees
equipment and optimize the performance. The approach to
ongoing adaptability, which is further strengthened in this
cloud computing provides customers with several
manner.
opportunities and benefits; nevertheless, DDoS attackers
also have access to these features and may find them
1.1.2. Pay-as-you-go reporting
helpful. In order to accomplish "Denial of Service," an
Upon request, utility services have grown in popularity
attacker launching a DDoS attack will send out a flood of fake
due to their convenience and the simplified resources
inquiries. Fig. 2 describes the classification, prevention and
reporting and invoicing they provide. Customers of cloud
mitigation of DDoS attacks [7]. Although DDoS attack
computing services can take advantage of the "Pay-as-you-
technology is varied, it has many similarities to the
go" model without making any upfront financial
phenomena caused by the system. Therefore, through the
commitments for resources. The administrator of a virtual
implementation of a distributed detection system, we will
machine (VM) may want to dynamically adjust the number
strive to find the behavior of DDoS attacks in the first time,
of resources that are accessible, either by adding more or
and accurately locate the source and characteristics of
taking them away [23]. Another perk of adopting a cloud-
attacks. Through the network abnormal traffic analysis
based system is that you can get more use out of your
system and DDoS detection tool, timely find the abnormal
hardware without worrying about things like electricity,
traffic and DDoS behavior in the network, find problems in
space, cooling, and maintenance. DDoS attacks in the cloud
time, and improve the overall detection and analysis ability
are only possible to comprehend with a firm grasp of the
of the system.
financial aspects of doing so. Since most cloud instances are
However, the targeted system must expend many
billed hourly, the minimum possible time frame for
resources to counter this hack. This "overload" condition
accounting is typically 70 min. Funds could be allocated in
would be seen as feedback by the "auto-scaling" function,
three ways: a predetermined amount, a pay-as-you-go
which would then add more CPUs (or other resources) to the
system, or auctions. The size and volume of data transferred
VM’s existing amount of readily available assets. First, a
in and out of a computer network also determine its
virtual machine will enter its "normal load VM" phase. Let us
usefulness. The "pay as you go" models are experimental
assume the DDoS attack has commenced, and the VM is now
and still in the prototype phase [6].
overburdened as an immediate consequence of the attack.
As soon as the cloud detects an overload, its auto-scaling
1.1.3. Multi-tenancy
features will kick in, and it will choose among the many
Multi-tenancy allows several Virtual Machines (VMs)
methods described in the literature for allocating resources
belonging to different VM proprietors to coexist on just one
to virtual machines, migrating them, and relocating them.
hardware system. Increasing hardware utilization and,
When a virtual machine (VM) gets overloaded, it can be
3
Y. Shang Measurement: Sensors 31 (2024) 100991
given more resources, transferred to a server with more attacks against servers situated in the cloud. Data mining for
available resources, or have a copy of itself launched on a relevant statistics
separate server [20]. If there is no countermeasure to halt
this procedure, further resources will be added. This can
continue until the service provider makes a payment or the
cloud service provider exhausts all available resources,
whichever comes first. The eventual outcome of this is
"Service Denial [8]." The vast majority of DDoS attacks are
organized and premeditated destructive acts. Relying solely
on a technical department or an enterprise, it is impossible
to completely solve the problem of security protection, let
alone to quickly track and locate the source of attacks. For
the defense of DDoS attacks, cloud computing platform
suppliers, communication operators and government
departments need to establish a cooperation mechanism to
complete security defense.
Fig. 3. Several tiers of cloud-based DDoS defence.
Consequently, this results in billing for resources only
when used, which raises the risk of incurring financial losses
over the set limit. To keep things manageable, we might run largely determined our recommended technique’s efficacy.
virtual machines with a static resource profile, in which case Table 1 illustrate the results, from which it can be deduced
SLA will not cover the provisioning of extra resources on that the suggested method has an excellent success rate
demand. A "Denial of Service" (DoS) attack would (about 99.78 %) in identifying DDoS attacks while producing
immediately wipe out the cloud’s valuable features in this few errors. Since we focused primarily on the supervised
situation. Fig. 3 is the description of tiers of cloud-based learning method in this study, future studies may investigate
DDoS defense. uncontrolled or reinforcement learning methods [9].
Distributed denial of service (DDoS) attacks are more
2. Related works challenging to execute on the public internet than on a
conventional network. There is more than one threat to the
DDoS attacks, which target computer systems, are cloud, and its surroundings are under attack from several
becoming increasingly common. DDoS perpetrators have directions. Existing machine learning techniques, such as
expanded their reach into practically every area of neural classifiers, can be used to identify DDoS attacks. This
technology, especially the cloud, the IoT, and the edge. research aims to shed light on the results of an investigation
Distributed denial of service (DDoS) attacks flood the into distributed denial of service (DDoS) attacks in cloud
targeted machine or host with so much traffic that it crashes settings. The number of false positives rises when artificial
or exhausts all available resources (including the network). intelligence methods are applied for detection. The ANN,
Multiple strategies for defense have been proposed, but SVM, kNN, J48, Feature rank and Feature selection
they have yet to be successful due to attackers’ ability to algorithms frequently detect Distributed Denial of Service
educate themselves to employ recently discovered (DDoS) attacks in a cloud context [10].
computerized ways of attack. Because of this, we presented The goal of this research was to examine several works
a machine- learning-based approach to spotting distributed associated with the identification of network assaults in both
denial-of-service (DDoS) attacks in the cloud. K Nearest traditional and cloud-
Neighbour, Random Forest, and Naive Bayes are three
No. Time Preliminary Place Target Protocol Length Information

1 113.6020 11.0.2.20 194.172.8.2 DNS 77 Standard query 0x0aa0 A


2 113.6037 11.0.2.20 11.0.2.20 DNS 174 Standard query response 0x375e AAAA
3 113.6039 91.32.134.1 11.0.2.20 TCP 66 82 > 59,501 (ACK)Seq = 310 Ack = 18 Win = 66,646 Len = 0
4 113.6039 91.32.134.1 11.0.2.20 TCP 66 82 > 59,523 (ACK)Seq = 310 Ack = 18 Win = 66,646 Len = 0
5 113.6039 91.32.134.1 11.0.2.20 TCP 66 82 > 59,547 (ACK)Seq = 310 Ack = 18 Win = 66,646 Len = 0
different categorization machine learning techniques that
help the system detect a distributed denial of service attack
Table 1
with a 99.75% success rate. In our study, we offer a machine Dataset snapshot.
learning-based approach to identifying and blocking DDoS

Fig. 2. Cloud DDoS classification, prevention, and mitigation.


Y. Shang Measurement: Sensors 31 (2024) 100991
based infrastructures. In the following paper, we will implement our suggested approach along with additional
examine the wide variety of attacks that could occur in a methods to enhance its overall functioning and test its
cloud environment. There is sometimes a conflation usefulness on a wide range of datasets [14].
between the terms "bandwidth reduction" and "resource Additionally, a more effective DDoS attack avoidance
reduction" when describing the impacts of distributed denial mechanism might be constructed and recommended as a
of service (DDoS) attacks. Most distributed denial of service future work of this study in order to manage DDoS attacks in
(DDoS) assaults in the cloud are SYN Flood or Flash Crowd a cloud computing environment in an efficient manner. The
assaults. The analysis found that TCP denial of service low- examination of various DDoS prevention strategies that have
rate assaults and performance decreases are two of the been used in the past, as well as those that are considered
most prevalent attack categories [11]. state-of-the-art, is the only purpose of this work. The scope
To spot distributed denial of service (DDoS) attacks, of future study may be expanded to include the presentation
researchers are trying out various machine-learning of a novel and effective DDoS prevention method to deal
algorithms, some of which have shown greater precision with the attacks [15].
than others. In experiments, real-time network logs, KDD, The term "cloud computing" describes a new and
NSL-KDD, and CIDDS datasets were used to identify network attractive model for administering and distributing offerings
attacks. Also used to predict DDoS attacks, linear regression, over the World Wide Web. Because of this, information
and logistic regression algorithms have been found to have retention strategies are changing across the IT environment.
high false favourable rates when implemented in several Data security must be considered when handling massive
databases. To improve precision and recognition rates, amounts of data storage. Intruders pose one of the biggest
however, it is constantly necessary to increase the number challenges to data security in the modern Internet
of records used for training and testing the dataset, which is environment. The resources, data, and applications stored
a difficult task in and of itself. DDoS assaults cover a wide on the public internet are vulnerable to assault due to the
array of topics. Therefore, researchers can use many system’s connection. Intrusion Detection Systems (IDS) are
machine-learning techniques and classifiers in future studies. employed in the cloud to monitor malicious behavior on
Furthermore, regression analysis has received more usage in both the network and the host systems. Because it creates
recently released literature [12]. As a potential research so much illicit information online, detecting a Distributed
strategy, we can reduce dimensionality and then use the Denial of Service (DDoS) attack is challenging for Intrusion
remaining data for regression evaluation. detection systems (IDS). Cybersecurity analytics can aid in
The number of people using online resources has the detection of intrusions through the use of methods for
increased recently due to the COVID-19 outbreak. As a direct data mining. Many distinct approaches have been developed
consequence, there has been an increase in the number of with machine learning methods as their foundation [16].
end users subscribing to various cloud- based applications, Selecting features is another effective method for
which provide various services to the end user. DDoS decreasing the dataset’s dimensionality. This research
assaults, on the other hand, are aimed at interrupting cloud proposes two distinct approaches for utilizing the dataset
computing services’ availability and processing power. This generated via NSL-KDD. Learning Vector Quantization (LVQ)
has the effect of negatively impacting both the performance is a filtering technique that comes first. The second
and accessibility of cloud computing resources. There is technique is dimensionality reduction by principal
currently no reliable method for detecting or filtering DDoS component analysis (PCA). Naive Bayes (NB), Support Vector
attacks, so they are a reliable tool for anyone looking to Machine (SVM), and Decision Tree (DT) categorization were
launch cyberattacks. Recently, scientists have started applied to the characteristics chosen from each technique,
experimenting with machine learning (ML) techniques to and the results were compared for their ability to identify
develop effective ML-based tactics to detect distributed DDoS attacks [17]. The results show that the LVQ-based DT
denial of service (DDoS) attacks [13]. In this scenario, we method is superior to the alternatives when it comes to
offer a method for detecting distributed denial of service spotting attacks. Unauthorized access to confidential data
(DDoS) assaults in a cloud computing environment by must be detected as the first step in securing that
combining big data with deep learning methods. The information [18].
proposed method employs big data sparking innovation to The NSL-KDD standard is the foundation for a cloud-
examine many incoming packets and a deep learning based intrusion detection system. In this study, we explore
machine learning algorithm to filter fraudulent data pertaining to distributed denial of service attacks. LVQ,
transmissions. Both of these technologies are used to make PCA, and other feature selection methods were used to
the methodology more effective. The testing and training classify the attacks using machine learning techniques such
phases were done with the KDDCUP99 dataset, and the final as neural networks, support vector machines, and decision
result attained a precision of 99.82 %. Even if the number of trees. In order to properly categorize DDoS attacks, it was
people using smart devices proliferates, the computing necessary to look at how well various techniques worked
power and resources available in these devices still need [19–21]. The PCA selected 21 features from a possible 42,
improvement. while the LVQ selected only 20. The results suggest that LVQ-
The cloud-based system offers multiple solutions for based feature selection in the DT model may be more
overcoming the issue of scarce resources by allowing for accurate than the other methods in identifying attacks. As
their cooperative use. The cloud computing platform is mentioned earlier, the model also outperforms its
periodically targeted by attackers while being susceptible to predecessors in terms of accuracy, recollection, particularity,
a wide range of cyber threats. As such, we provide access to and f-score.
a DDoS warning system that is capable to detect the DDoS
attack in a timely and accurate fashion. To avoid malicious or
undesirable communications from reaching a cloud
computing environment, we offer an approach that employs
big data and deep learning techniques. This is achieved by
employing these methods. We hope to eventually

5
Y. Shang Measurement: Sensors 31 (2024) 100991
3. Materials and methods 3.2. Understanding Naive Bayes and machine learning

3.1. Navie Bayes algorithm Machine learning has two main branches: supervised
learning and unsupervised learning. Classification and
The premise that the most straightforward answers often regression are two subsets of supervised learning that can
turn out to be the most enlightening is evident in Naive be distinguished here. Classification is where the Naive
Bayes and may be demonstrated in practice in daily Bayes method excels. The naive Bayes method was used for
situations. Machine learning has come a long way in recent face recognition. People’s faces and other features, like their
years, but its continued development shows that it can still noses,
be kept very straightforward without compromising
efficiency, accuracy, or dependability. It serves many
functions and has particular strength in resolving problems
associated with natural language processing (NLP). In
machine learning, the naive Bayes technique is a standard
statistical methodology used to solve classification problems
based on the Bayes Theorem. To clarify any lingering
questions, the following paragraphs will thoroughly explain
the Naive Bayes algorithm and its core concepts. The speed
Fig. 4. Procedure for navie bayes.
with which an NB model may be built makes it particularly
useful when dealing with vast amounts of data. The Naive
Bayes approach has been widely used because of its mouths, eyes, etc., can be recognized using this classification
simplicity and ability to outperform more complex method. In meteorology, it can be used to foretell whether
classification techniques. The foundation of a Bayesian the following weather will be pleasant or unpleasant.
classification is the assumption that indicators can be treated Doctors can make accurate diagnoses with the help of the
separately. A Naive Bayes classifier assumes that the classifier. Doctors can assess a patient’s likelihood of
presence of one feature in a class does not influence the developing cancer, cardiovascular disease, or other disorders
presence of any other feature, which simplifies things. using the Naive Bayes approach. Using a Naive Bayes
The Naive Bayes classifier is a popular guided machine classifier, Google News can decide whether a news piece is
learning approach in applications like text classification. about politics, the world, or any other topic. The Naive Bayes
Since it mimics the distribution of inputs for a given class or classifier has the advantages of being simple, easily
category, it belongs to the group of learning algorithms implemented, and requiring little training data. Both
known as generative learning approaches. To be successful, continuous and discrete data types are manageable using
this tactic relies on the assumption that the input data’s this method. It is stable even when exposed to many
attributes are conditionally independent given the class. This predictors and data points. It is fast, can be used to make
allows for fast and accurate recommendation generation by predictions in the here and now, and does not care about
the system. trivial details.
Naive Bayes classifiers, which implement Bayes’
statistical theorem, are often thought of as being used for 4. Proposed method
more fundamental probabilistic categorization tasks. This
theorem incorporates empirical evidence and Gathering relevant data should be the initial step. By
supplementary context when determining a hypothesis’s collecting relevant data, we can locate and exploit several
credibility. In order to function, the naive Bayes classifier security holes in the victim’s computers in our attack. All
relies on the assumption that the input data’s attributes are available information regarding running services, open and
unrelated to one another. Contrarily, real- world scenarios closed ports, and other security holes is compiled during the
usually play out differently. Although based on an unduly information-gathering phase. Here, the attacker has a better
naive premise, the Naive Bayes classifier sees widespread chance of learning the weak spots of the victim, making
application. This is because it serves its purpose well and has further attacks much simpler. The cloud service provider
proven highly efficient in several practical settings. assigns a different port number to each of its services, such
One of the simplest Bayesian network models, naive as: In most cases, FTP uses port 990, but it can use port 21 as
Bayes classifiers, can achieve high levels of reliability when well; HTTP uses port 80. TCP and UDP use ports 20 through
used in conjunction with kernel density estimation. Despite 23 for various purposes.
their simplicity, they are used less than other Bayesian In conclusion, gathering information is a procedure that
network models. When the distribution pattern of the input provides an attacker with all the necessary data to launch a
data is not given, using a kernel function to approximate the successful attack on any target system. In order to learn
probability density function of the input data can help the more about a network, we can employ the Nmap scanner. It
classifier operate better. The purpose of developing this simply needs the target machine’s IP address to launch an
strategy was to raise efficiency. This proves that the naive attack; at this point, it will perform a full system scan,
Bayes classifier is an effective machine learning technique revealing the targeted system’s activity, services, open ports,
for various purposes, including but not limited to text and so on. This implies that when the exposed connection is
categorization, spam filtering, and sentiment analysis. found, whatever occurring right now may be shown,
Thomas Bayes is credited with developing the method for regardless of what OS the other system is using. We would
predicting a probability given a set of known probabilities probably come up with an attack plan, and that plan would
currently known as Bayes’ Theorem. Fig. 4 is the layout of involve a Distributed denial of service attack, which would
Navie bayes. involve methods like the "ping of death." A distributed denial
of service (DDOS) assault is one of the most damaging types
of cyberattacks since it disrupts the entire system. Due to the
flood of packets caused by the DDoS assault, all services are

6
Y. Shang Measurement: Sensors 31 (2024) 100991
either momentarily or completely inaccessible. ParrotSec, preprocessing filters. A single filter, such as normalization, is
like Kali and Ubuntu, can be managed via command line chosen from the available options. Data standardization, or
interface, with the shell or terminal serving as the primary "making data un-redundantly," refers to removing
interface for entering these instructions. This feature is superfluous or identical information from a dataset.
shared with ParrotSec. Since ParrotSec handles everything,
you can type "PING IP" into the console, and it will be carried
5.2. Training data set
out. Since the victim site would receive over 65 thousand
packets, all services would be taken down. This is how an
The procedure for the collection of collecting training
assault could be generated. The subsequent stage is
information includes the construction of a machine-learning
detection. In this case, the target is a website hosted in the
model. Programming a computer algorithm typically requires
cloud, and Nmap is used to scan the entire site in order to
the use of data to train it. Said training information is a
locate any security flaws. This would lead to the exposure of
subset of a dataset used for instruction and evaluation
any underlying problems. After the exposed ports have been
alongside the entire dataset. Separating the datasets into
made public, a Python script comprising a distributed denial
training and testing sets is an essential first step when
of service attack will be created and run. This implies that
developing a machine learning-based model. However, a
when the exposed connection is found, whatever occurring
model driven by machine learning is necessary to generate
right now may be shown, regardless of what OS the other
further forecasts against the newly acquired dataset.
system is using. We would probably come up with an attack
plan, and that plan would involve a Distributed denial of
service attack, which would involve methods like the "ping of 5.3. Prediction algorithm
death." A distributed denial of service (DDOS) assault is one
of the most damaging types of cyber-attacks since it disrupts Following the development and validation of the
the entire system. Due to the flood of packets caused by the information set, various algorithms have been developed
DDoS assault, all services are either momentarily or through this process to anticipate several of the issues. In
completely inaccessible. ParrotSec, like Kali and Ubuntu, can this particular scenario, one must consider identifying
be managed via command line interface, with the shell or whether DDoS messages are harmful or not.
prompt serving as the main interface for entering these 5.4. Prediction of naive bayes
instructions.
Wireshark thoroughly analyzes each incoming packet. The percentages of real positives and fake positives are
After finishing the thorough packet analysis, a large data set displayed in this figure.
was produced, which may indicate the presence of a The percentage of fake positives is seen as an indicator of
classifier. The experimental setting demonstrates that both a distributed denial of service attack (DDoS) or of fake data
the random forest and the naive Bayes classifier, both of packets. In contrast, the proportion of actual positives is the
which are well-known, produce excellent results. While standard one. In this case, the average mean of actual
various other classifiers may be used for detection (support packets is 0.973, while the overall mean for fraudulent
vector machines, k- nearest neighbors, k-means, etc.), "Naive transmissions is approximately 0.05.
Bayes" is still the most effective.
In this work, naive Bayes is applied to the problem of 5.5. Proposed formula for naive bayes
predicting application-layer packets during distributed
denial-of-service attacks.
Notwithstanding the apparent simplicity, the Naive Bayes P(x|y)=P(y|x) P(x) / P(y)
algorithm may make precise forecasts using the current data.
The data set under consideration was trained with naive Where, The conditional probability of y given x is denoted by
Bayes, and then a fresh information set was built using the P (y|x), The likelihood of a class being P(x) and the
cross-validation technique with 65 folds. This was done so conditional likelihood of a predictor is P(y), Probability of
that we could figure out where the files were coming from occurrence is P (x|y).
and where they were going. The true affirmative level, false
alarm rate, fake negative level, and many more are just some 5.6. Basic theory
of the metrics that may be derived from this fresh
information set. Naive Bayes, a technique for making 5.6.1. Three-way handshake
predictions, produces a mix of correct and incorrect results. The between-machine communication paradigm is
A fake negative is considered an alarm for the benefit of depicted in Fig. 2, and it must be adhered to for the
internet consumers. Naive Bayes and random forest both communication to succeed. A three- way handshake is the
correctly identified the true positives as ordinary packets, name given to this particular protocol. Within the context of
whereas the false negatives were classified as DDoS attacks. this dialogue, a protocol exchange takes place between the
server and the hacker. When establishing a standard TCP
5. Experimentation & results relationship, the attacker contacts the client by sending an
SYN protocol. This is referred to as the "three-way
5.1. Data pre-processing handshake." A buffer will be allotted to the user by the
server as a reaction, and the server will also send back an
Regarding data mining, the most efficient method is ACK packet in addition to the SYN packet. At this stage, the
preliminary data processing. It streamlines complex connection is in a state that is referred to be "partially
information into something everyone can understand. Due accessible," and it is waiting for an ACK response from the
to its unreliability and lack of granularity, real-time data adversary in order to complete the link configuration. The
necessitates transforming pretreatment into valuable process that occurs once it has been determined that a
information. This is because information in real-time is often relationship has been successfully established is called the
unreliable and vague. Weka includes numerous options for three-way handshake.
7
Y. Shang Measurement: Sensors 31 (2024) 100991
On the other hand, instances known as TCP SYN Flood 6. Conclusion
are intended to exploit this three-way handshake by
saturating the server with an excessive number of SYN The key goals of this study are to learn how to recognize
queries. The denial of functionality attack, of which TCP SYN and prevent attacks involving distributed denial-of-service.
Flood is a prominent example, falls within the DoS category. The first and most crucial step is determining which ports
Employing a prolonged link and monitoring a duplicate of can be exploited. Nevertheless, this approach is not risk-free
the server’s activity is required for a packet capture program because susceptible ports are more likely to be exploited.
to identify a TCP SYN Flood as having occurred. One way to Given ParrotSec’s track record for stability and performance,
accomplish this is to keep an eye on a copy of the server’s we decided it would be the ideal choice for our company’s
traffic. Introducing an incoming IP Address to the server computer system. Since a DDoS attack involves sending one
typically corresponds with the manifestation of TCP SYN million separate packets toward the target, starting with an
Flood properties. After being submitted to calculation within on-the-internet website would be best. The targeted website
a predetermined period, IP Addresses that continually show was taken offline after it became clear that an assault had
on the server are utilized to get characteristics in a DDoS happened. Machine learning is constructive in this detecting
attack. process as well. Using this data, the most popular and
accessible tool, "weka," is being trained. Employing pre-
5.6.2. Naive Bayes algorithm processing techniques and the "discretize" filter to achieve
A simple computational approach that can be used to the desired effect. Therefore, the following phase is not only
calculate conditional likelihoods is the Naive Bayes Theorem. quite intriguing but also rather useful for both forecasting
A probabilistic condition quantifies the likelihood of one and detecting. We employed both methods and compared
event based on the presumption, premise, declaration, or the findings on the same platform, and we found that the
reality that a second event has already occurred. An analogy naive Bayes method provides the most trustworthy
would be the chance of something happening after conclusions. PCA selected 21 features from the possible 42
something else has happened. The posterior likelihood can features, while LVQ selected only 20 features. The results
be computed using a formula like the one below based on suggest that LVQ based feature selection in the DT model
the Naive Bayes theorem. may be more accurate than other methods in identifying
attacks. As mentioned earlier, the model also outperformed
P(B|A)P(A)
the previous models in terms of accuracy, recall, specificity,
P(A|B)=
and f-score. It was shown that the naive Bayes model had
P(B)
significantly better predictive power than the random forest
If A is more likely if B happens to be accurate, then P (A| model. There is a chance that a false positive rate warning
B) represents the conditional likelihood of B if A is true. In will be triggered for packet transmissions within a network.
probability theory, P(A) stands for the likelihood of Moreover, when compared to the random forest, naive
occurrence A, and P(B) stands for the likelihood of Bayes produces considerably more accurate forecasts. It was
occurrence B. We discussed using the packet-capturing demonstrated that the Naive Bayes algorithm outperformed
software as a computational input to estimate the IP address the random forest technique to identify the false and actual
and packet length obtained. We did the maths using the rate of transmissions. The result detection is not carried out
Naive Bayes method and the Gaussian distribution. After the in real time. Although attacks can be detected, real-time
computation, the outcomes are displayed on a two- alarm cannot be realized in the environment of high cluster
dimensional network. The Gaussian Naive Bayes approach, security, so the feasibility of real-time monitoring under
which requires the calculation of the mean and standard Hadoop platform should be studied continuously.
deviation for analysis, is applied once the quantitative input
has been gathered. Table 1 is about the dataset format Declaration of competing interest
sample.
The authors declare that they have no known competing
5.6.3. Matlab’s Current classification using the Naive Bayes financial interests or personal relationships that could have
algorithm appeared to influence the work reported in this paper.
Matlab is the application we employ for the method of
categorization because it is not only user-friendly but also
highly effective in producing aesthetically pleasing
outcomes. In the environment of analyzing information, a
tool built into Matlab allows users to do Naive Bayes
categorization. Using this method, we can also classify
network traffic as either K, L, or Q to gain further insight into
the type of data transmitted throughout an internet
connection. This concept will be challenging to grasp for a
significant number of individuals. The Matlab script for the
Naive Bayes classification and the parameters that go along
with it are displayed in the following figure. The results of
categorizing the information obtained from the system are
shown in the figure. The nonlinear shape the blue line
represents limits the standard class set, of which the green
circle is a component. The blue line shows these limitations.
The other variety is an array of red squares depicting some
threat. Fig. 5 defines the DDoS attack detection using
MATLAB.

8
Y. Shang Measurement: Sensors 31 (2024) 100991

Fig. 5. Categorization outcome using MATLAB module.


Data availability [5] A.K. Soliman, C. Salama, H.K. Mohamed, Detecting DNS reflection
amplification DDoS attack originating from the cloud, in: Proc. - 2018
13th Int. Conf. Comput. Eng. Syst. ICCES 2018, 2019, pp. 145–150.
No data was used for the research described in the article. [6] P. Arun Raj Kumar, S. Selvakumar, Detection of distributed denial of
service attacks using an ensemble of adaptive and hybrid neuro-fuzzy
Acknowledgement systems, Comput.
Commun. 36 (3) (2013) 303–319.
[7] A.S. Boroujerdi, S. Ayat, A robust ensemble of neuro-fuzzy classifiers
The study was supported by Key R&D and Promotion for DDoS attack detection, in: Proc. 2013 3rd Int. Conf. Comput. Sci.
Special Project (Science and Technology Research) in Henan Netw. Technol. ICCSNT 2013, 2014, pp. 484–487.
Province [8] L. Kwiat, C.A. Kamhoua, K.A. Kwiat, J. Tang, Risks and benefits: game-
theoretical analysis and algorithm for virtual machine security
(232102210146)" management in the cloud, Assur. Cloud Comput. (2018) 49–80.
[9] H.S. Mondal, M.T. Hasan, M.B. Hossain, M.E. Rahaman, R. Hasan,
References Enhancing secure cloud computing environment by Detecting DDoS
attack using fuzzy logic, in: 3rd Int. Conf. Electr. Inf. Commun.
[1] X. Jing, Z. Yan, W. Pedrycz, Security data collection and data analytics Technol. EICT 2017, 2018-Janua, 2018, pp. 1–4. December.
in the internet: a survey, IEEE Commun. Surv. Tutorials 21 (1) (2019) [10] P. Mishra, E.S. Pilli, V. Varadharajan, U. Tupakula, Intrusion detection
586–618. techniques in cloud environment: a survey, J. Netw. Comput. Appl.
[2] K.J. Singh, K. Thongam, T. De, Detection and differentiation of 77 (October 2016) (2017)
application layer DDoS attack from flash events using fuzzy-GA 18–47.
computation, IET Inf. Secur. 12 (6) (2018) 502–512. [11] R. Biswas, J. Wu, Filter assignment policy against distributed denial-
[3] T. Subbulakshmi, S. Mercy Shalinie, C. Suneel Reddy, A. of-service attack, Proc. Int. Conf. Parallel Distrib. Syst. - ICPADS 2018–
Ramamoorthi, Detection and classification of DDoS attacks using Decem (2019)
fuzzy inference system, Commun. Comput.
537–544.
Inf. Sci. 89 CCIS (2010) 242–252.
[12] S. Abbas, T. Alyas, A. Athar, M.A. Khan, A. Fatima, W.A. Khan, EAI
[4] N. Tabassum, M. S. Khan, S. Abbas, T. Alyas, A. Athar, and M. A. Khan, Endorsed Transactions Cloud Services Ranking by Measuring Multiple
“EAI Endorsed Transactions Intelligent reliability management in
Parameters Using AFIS, 2014, pp. 1–7.
hyper- convergence cloud infrastructure using fuzzy inference
[13] K. Iqbal, M. Adnan, S. Abbas, Z. Hasan, A. Fatima, Intelligent
system,”vol. 4, no. 5, pp. 1–12.
transportation system (ITS) for smart-cities using mamdani fuzzy
inference system, Int. J. Adv. Comput. Sci. Appl. 9 (2) (2018) 94–105.

9
Y. Shang Measurement: Sensors 31 (2024) 100991
[14] R.L. Neupane, T. Neely, P. Calyam, N. Chettri, M. Vassell, R.
Durairajan, Intelligent defense using pretense against targeted
attacks in cloud platforms, Future Generat.
Comput. Syst. 93 (2019) 609–626.
[15] T. Alyas, M.S. Khan, Intelligent reliability management in software
based cloud ecosystem using AGI 17 (12) (2017) 134–139.
[16] N.S. Naz, S. Abbas, M. Adnan, B. Abid, N. Tariq, M. Farrukh, Efficient
load balancing in cloud computing using multi-layered mamdani
fuzzy inference expert system, Int. J. Adv. Comput. Sci. Appl. 10 (3)
(2019) 569–577.
[17] Rudol, Implementasi keamanan jaringan komputer pada virtual
private network (vpn) menggungakan, Implementasi Keamanan Jar.
Komput. Pada Virtual Priv.
Netw. Menggungakan Ipsec 2 (1) (2017) 65–68.
[18] W. Alosaimi, M. Alshamrani, K. Al-Begain, Simulation-based study of
distributed denial of service attacks prevention in the cloud, Proc. -
NGMAST 2015 9th Int.
Conf. Next Gener. Mob. Appl. Serv. Technol. (2016) 60–65.
[19] N.C.S.N. Iyengar, G. Ganapathy, Chaotic theory based defensive
mechanism against distributed Denial of Service Attack in cloud
computing environment, Int. J. Secur. its Appl. 9 (9) (2015) 197–212.
[20] S.A. Miller, O. Behalf, C. America, CASE STUDY HYPERCONVERGENCE
VS CLOUD, 2017, pp. 134–139.
[21] T. Alyas, M.S. Khan, Intelligent reliability management in software
based cloud ecosystem using AGI 17 (12) (2017) 134–139.
[22] R.E. Spiridonov, V.D. Cvetkov, O.M. Yurchik, Data Mining for Social
Networks Open Data Analysis, 2017, pp. 395–396.
[23] L. Wang, Y. Ma, J. Yan, V. Chang, A.Y. Zomaya, pipsCloud: high
performance cloud computing for remote sensing big data
management and processing, Future Generat. Comput. Syst. 78
(2018) 353–368.

10

You might also like