article

Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns

Authors:

Lwin Khin Shar,

Hee Beng Kuan TanAuthors Info & Claims

Information and Software Technology, Volume 55, Issue 10

Pages 1767 - 1780

https://doi.org/10.1016/j.infsof.2013.04.002

Published: 01 October 2013 Publication History

Abstract

Context: SQL injection (SQLI) and cross site scripting (XSS) are the two most common and serious web application vulnerabilities for the past decade. To mitigate these two security threats, many vulnerability detection approaches based on static and dynamic taint analysis techniques have been proposed. Alternatively, there are also vulnerability prediction approaches based on machine learning techniques, which showed that static code attributes such as code complexity measures are cheap and useful predictors. However, current prediction approaches target general vulnerabilities. And most of these approaches locate vulnerable code only at software component or file levels. Some approaches also involve process attributes that are often difficult to measure. Objective: This paper aims to provide an alternative or complementary solution to existing taint analyzers by proposing static code attributes that can be used to predict specific program statements, rather than software components, which are likely to be vulnerable to SQLI or XSS. Method: From the observations of input sanitization code that are commonly implemented in web applications to avoid SQLI and XSS vulnerabilities, in this paper, we propose a set of static code attributes that characterize such code patterns. We then build vulnerability prediction models from the historical information that reflect proposed static attributes and known vulnerability data to predict SQLI and XSS vulnerabilities. Results: We developed a prototype tool called PhpMinerI for data collection and used it to evaluate our models on eight open source web applications. Our best model achieved an averaged result of 93% recall and 11% false alarm rate in predicting SQLI vulnerabilities, and 78% recall and 6% false alarm rate in predicting XSS vulnerabilities. Conclusion: The experiment results show that our proposed vulnerability predictors are useful and effective at predicting SQLI and XSS vulnerabilities.

References

[1]

Alpaydin, E., Introduction to Machine Learning. 2004. MIT Press, Massachusetts.

[2]

Anley, C., 2002. Advanced SQL Injection in SQL Server Applications. Next Generation Security Software Ltd., White Paper.

[3]

Arisholm, E., Briand, L.C. and Johannessen, E.B., A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software. v83 i1. 2-17.

[4]

BugTraq. <http://www.securityfocus.com/archive/1> (accessed March 2011).

[5]

Demšar, J., Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research. v7. 1-30.

[6]

Fagerland, M.W. and Sandvik, L., Performance of five two-sample location tests for skewed distributions with unequal variances. Contemporary Clinical Trials. v30 i5. 490-496.

[7]

Ferrante, J., Ottenstein, K.J. and Warren, J.D., The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems. v9 i3. 319-349.

[8]

D. Fisher, L. Xu, N. Zard, Ordering effects in clustering, in: Proceedings of the 9th International Workshop on Machine Learning, Aberdeen, Scotland, 1992, pp. 163-168.

Digital Library

[9]

S. Fogie, J. Grossman, R. Hansen, A. Rager, XSS Exploits: Cross Site Scripting Attacks and Defense, Syngress, 2007, pp. 395-406.

[10]

Gao, K., Khoshgoftaar, T.M., Wang, H. and Seliya, N., Choosing software metrics for defect prediction: an investigation on feature selection techniques. Software Practice and Experience. v41 i5. 579-606.

[11]

M. Gegick, L. Williams, J. Osborne, M. Vouk, Prioritizing software security fortification through code-level metrics, in: Proceedings of the 4th ACM Workshop on Quality of Protection, Alexandria, Virginia, 2008, pp. 31-38.

Digital Library

[12]

Halstead, M., Elements of Software Science. 1977. Elsevier, New York.

[13]

N. Jovanovic, C. Kruegel, E. Kirda, Pixy: a static analysis tool for detecting web application vulnerabilities, in: Proceedings of the IEEE Symposium on Security and Privacy, Berkeley/Oakland, CA, 2006, pp. 258-263.

Digital Library

[14]

A. Kie¿un, V. Ganesh, P.J. Guo, P. Hooimeijer, M.D. Ernst, HAMPI: a solver for string constraints, in: Proceedings of the 18th International Symposium on Testing and Analysis, Chicago, IL, 2009, pp. 105-116.

[15]

A. Kie¿un, P.J. Guo, K. Jayaraman, M.D. Ernst, Automatic creation of SQL injection and cross-site scripting attacks, in: Proceedings of the 31st International Conference on Software Engineering, Vancouver, BC, 2009, pp. 199-209.

[16]

Lessmann, S., Baesens, B., Mues, C. and Pietsch, S., Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Transactions on Software Engineering. v34 i4. 485-496.

[17]

V.B. Livshits, M.S. Lam, Finding security errors in Java programs with static analysis, in: Proceedings of the 14th USENIX Security Symposium, Baltimore, MD, 2005, pp. 271-286.

[18]

M. Martin, M.S. Lam, Automatic generation of XSS and SQL injection attacks with goal-directed model checking, in: Proceedings of the 17th USENIX Security Symposium, San Jose, CA, 2008, pp. 31-43.

[19]

McCabe, T., A complexity measure. IEEE Transactions on Software Engineering. v2 i4. 308-320.

[20]

T. Mende, Replication of defect prediction studies: problems, pitfalls and recommendations, in: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, Timisoara, Romania, 2010, pp. 1-10.

[21]

Menzies, T., Greenwald, J. and Frank, A., Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering. v33 i1. 2-13.

[22]

Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y. and Bener, A., Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering. v17 i4. 375-407.

[23]

T. Menzies, B. Turhan, A. Bener, G. Gay, B. Cukic, Y. Jiang, Implications of ceiling effects in defect predictors, in: Promise Workshop (Part of the 30th International Conference on Software Engineering), Leipzig, Germany, 2008, pp. 47-54.

[24]

S. Neuhaus, T. Zimmermann, A. Zeller, Predicting vulnerable software components, in: Proceedings of the 14th ACM Conference on Computer and Communications Security, Alexandria, Virginia, 2007, pp. 529-540.

Digital Library

[25]

OWASP. Top Ten Project 2010. <http://www.owasp.org> (accessed January 2012).

[26]

PhpMinerI. <http://sharlwinkhin.com/phpminer.html>.

[27]

PROMISE. Software Engineering Repository. <http://promise.site.uottawa.ca/SERepository/> (accessed November 2011).

[28]

Schneidewind, N.F., Methodology for validating software metrics. IEEE Transactions on Software Engineering. v18 i5. 410-422.

[29]

Sen, K. and Agha, G., CUTE and jCUTE: concolic unit testing and explicit path model-checking tools. Lecture Notes in Computer Science. v4144. 419-423.

[30]

Shar, L.K. and Tan, H.B.K., Automated removal of cross site scripting vulnerabilities in web applications. Information and Software Technology. v54 i5. 467-478.

[31]

L.K. Shar, H.B.K. Tan, Mining input sanitization patterns for predicting SQLI and XSS vulnerabilities, in: Proceedings of the 34th International Conference on Software Engineering, Zurich, Switzerland, 2012, pp. 1293-1296.

[32]

L.K. Shar, H.B.K. Tan, L.C. Briand, Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis, in: Proceedings of the 35th International Conference on Software Engineering, San Francisco, USA, in press.

[33]

Shin, Y., Meneely, A., Williams, L. and Osborne, J.A., Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Transactions on Software Engineering. v37 i6. 772-787.

[34]

Sjøberg, D.I.K., Hannay, J.E., Hansen, O., Kampenes, V.B., Karahasanović, A., Liborg, N.-K. and Rekdal, A.C., A survey of controlled experiments in software engineering. IEEE Transactions on Software Engineering. v31 i9. 733-753.

[35]

Song, Q., Jia, Z., Shepperd, M., Ying, S. and Liu, J., A general software defect-proneness prediction framework. IEEE Transactions on Software Engineering. v37 i3. 356-370.

[36]

Soot. A Java Optimization Framework. <http://www.sable.mcgill.ca/soot/> (accessed October 2012).

[37]

Sourceforge. <http://www.sourceforge.net> (accessed March 2011).

[38]

Thomas, S., Williams, L. and Xie, T., On automated prepared statement generation to remove SQL injection vulnerabilities. Information and Software Technology. v51 i3. 589-598.

[39]

A. Tosun, A. Bener, Ai-based software defect predictors: applications and benefits in a case study, in: Proceedings of the 22nd Innovative Applications of Artificial Intelligence Conference, Atlanta, Georgia, 2010.

[40]

A. Tosun, B. Turhan, A. Bener, Validation of network measures as indicators of defective modules in software systems, in: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, Vancouver, BC, 2009, pp. 1-9.

Digital Library

[41]

J. Walden, M. Doyle, G.A. Welch, M. Whelan, Security of open source web applications, in: Proceedings of the 3rd International Symposium on Empirical Software Engineering and Measurement, Lake Buena Vista, Florida, 2009, pp. 545-553.

[42]

G. Wassermann, D. Yu, A. Chander, D. Dhurjati, H. Inamura, Z. Su, Dynamic test input generation for web applications, in: Proceedings of the International Symposium on Software Testing and Analysis, Seattle, WA, 2008, pp. 249-260.

Digital Library

[43]

Witten, I.H. and Frank, E., Data Mining. 2005. second ed. Morgan Kaufmann.

[44]

Y. Xie, A. Aiken, Static detection of security vulnerabilities in scripting languages, in: Proceedings of the 15th USENIX Security Symposium, Vancouver, BC, 2006, pp. 179-192.

[45]

T. Zimmermann, N. Nagappan, Predicting defect using network analysis on dependency graphs, in: Proceedings of the 30th International Conference on Software Engineering, Leipzig, Germany, 2008, pp. 531-540.

Digital Library

Cited By

Fadolalkarim DBertino E(2025)DCAFixer: An Automatic Tool for Bug Detection and Repair for Database Java Client ApplicationsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.339666722:1(327-342)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TDSC.2024.3396667
Bhattacharya TPeddi APonaganti SVeeramalla S(2025)A survey on various security protocols of edge computingThe Journal of Supercomputing10.1007/s11227-024-06678-681:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11227-024-06678-6
Esposito MFalaschi VFalessi D(2024)An Extensive Comparison of Static Application Security Testing ToolsProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661199(69-78)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661199
Show More Cited By

Recommendations

Predicting common web application vulnerabilities from input validation and sanitization code patterns
ASE '12: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering

Software defect prediction studies have shown that defect predictors built from static code attributes are useful and effective. On the other hand, to mitigate the threats posed by common web application vulnerabilities, many vulnerability detection ...
Mining input sanitization patterns for predicting SQL injection and cross site scripting vulnerabilities
ICSE '12: Proceedings of the 34th International Conference on Software Engineering

Static code attributes such as lines of code and cyclomatic complexity have been shown to be useful indicators of defects in software modules. As web applications adopt input sanitization routines to prevent web security risks, static code attributes ...
Automatic creation of SQL Injection and cross-site scripting attacks
ICSE '09: Proceedings of the 31st International Conference on Software Engineering

We present a technique for finding security vulnerabilities in Web applications. SQL Injection (SQLI) and cross-site scripting (XSS) attacks are widespread forms of attack in which the attacker crafts the input to the application to access or modify ...

Comments

Information & Contributors

Information

Published In

cover image Information and Software Technology

Information and Software Technology Volume 55, Issue 10

October, 2013

165 pages

ISSN:0950-5849

Issue’s Table of Contents

Copyright © Elsevier B.V. © 2013.

Publisher

Butterworth-Heinemann

United States

Publication History

Published: 01 October 2013

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fadolalkarim DBertino E(2025)DCAFixer: An Automatic Tool for Bug Detection and Repair for Database Java Client ApplicationsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.339666722:1(327-342)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TDSC.2024.3396667
Bhattacharya TPeddi APonaganti SVeeramalla S(2025)A survey on various security protocols of edge computingThe Journal of Supercomputing10.1007/s11227-024-06678-681:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11227-024-06678-6
Esposito MFalaschi VFalessi D(2024)An Extensive Comparison of Static Application Security Testing ToolsProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661199(69-78)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661199
Yuan XLin GMei HTai YZhang J(2024)Software vulnerable functions discovery based on code composite featureJournal of Information Security and Applications10.1016/j.jisa.2024.10371881:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.jisa.2024.103718
Hannousse AYahiouche SNait-Hamoud M(2024)Twenty-two years since revealing cross-site scripting attacksComputer Science Review10.1016/j.cosrev.2024.10063452:COnline publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1016/j.cosrev.2024.100634
Su HLi FXu LHu WSun YSun QChao HHuo WJust RFraser G(2023)Splendor: Static Detection of Stored XSS in Modern Web ApplicationsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598116(1043-1054)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3597926.3598116
Li HLiu YQi XYu XGuo S(2023)Structuring meaningful bug‐fixing patches to fix software defectIET Software10.1049/sfw2.1214017:4(566-581)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1049/sfw2.12140
Marashdih AZaaba ZSuwais K(2023)An Enhanced Static Taint Analysis Approach to Detect Input Validation VulnerabilityJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.01.00935:2(682-701)Online publication date: 1-Feb-2023
https://dl.acm.org/doi/10.1016/j.jksuci.2023.01.009
Liu BChen JWang WCai SChen JFeng Q(2022)An adaptive search optimization algorithm for improving the detection capability of software vulnerabilityProceedings of the 13th Asia-Pacific Symposium on Internetware10.1145/3545258.3545283(212-220)Online publication date: 11-Jun-2022
https://dl.acm.org/doi/10.1145/3545258.3545283
Maurel HVidal SRezk T(2022)Statically identifying XSS using deep learningScience of Computer Programming10.1016/j.scico.2022.102810219:COnline publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1016/j.scico.2022.102810
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents