Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns

Published: 01 October 2013 Publication History

Abstract

Context: SQL injection (SQLI) and cross site scripting (XSS) are the two most common and serious web application vulnerabilities for the past decade. To mitigate these two security threats, many vulnerability detection approaches based on static and dynamic taint analysis techniques have been proposed. Alternatively, there are also vulnerability prediction approaches based on machine learning techniques, which showed that static code attributes such as code complexity measures are cheap and useful predictors. However, current prediction approaches target general vulnerabilities. And most of these approaches locate vulnerable code only at software component or file levels. Some approaches also involve process attributes that are often difficult to measure. Objective: This paper aims to provide an alternative or complementary solution to existing taint analyzers by proposing static code attributes that can be used to predict specific program statements, rather than software components, which are likely to be vulnerable to SQLI or XSS. Method: From the observations of input sanitization code that are commonly implemented in web applications to avoid SQLI and XSS vulnerabilities, in this paper, we propose a set of static code attributes that characterize such code patterns. We then build vulnerability prediction models from the historical information that reflect proposed static attributes and known vulnerability data to predict SQLI and XSS vulnerabilities. Results: We developed a prototype tool called PhpMinerI for data collection and used it to evaluate our models on eight open source web applications. Our best model achieved an averaged result of 93% recall and 11% false alarm rate in predicting SQLI vulnerabilities, and 78% recall and 6% false alarm rate in predicting XSS vulnerabilities. Conclusion: The experiment results show that our proposed vulnerability predictors are useful and effective at predicting SQLI and XSS vulnerabilities.

References

[1]
Alpaydin, E., Introduction to Machine Learning. 2004. MIT Press, Massachusetts.
[2]
Anley, C., 2002. Advanced SQL Injection in SQL Server Applications. Next Generation Security Software Ltd., White Paper.
[3]
Arisholm, E., Briand, L.C. and Johannessen, E.B., A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software. v83 i1. 2-17.
[4]
BugTraq. <http://www.securityfocus.com/archive/1> (accessed March 2011).
[5]
Demšar, J., Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research. v7. 1-30.
[6]
Fagerland, M.W. and Sandvik, L., Performance of five two-sample location tests for skewed distributions with unequal variances. Contemporary Clinical Trials. v30 i5. 490-496.
[7]
Ferrante, J., Ottenstein, K.J. and Warren, J.D., The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems. v9 i3. 319-349.
[8]
D. Fisher, L. Xu, N. Zard, Ordering effects in clustering, in: Proceedings of the 9th International Workshop on Machine Learning, Aberdeen, Scotland, 1992, pp. 163-168.
[9]
S. Fogie, J. Grossman, R. Hansen, A. Rager, XSS Exploits: Cross Site Scripting Attacks and Defense, Syngress, 2007, pp. 395-406.
[10]
Gao, K., Khoshgoftaar, T.M., Wang, H. and Seliya, N., Choosing software metrics for defect prediction: an investigation on feature selection techniques. Software Practice and Experience. v41 i5. 579-606.
[11]
M. Gegick, L. Williams, J. Osborne, M. Vouk, Prioritizing software security fortification through code-level metrics, in: Proceedings of the 4th ACM Workshop on Quality of Protection, Alexandria, Virginia, 2008, pp. 31-38.
[12]
Halstead, M., Elements of Software Science. 1977. Elsevier, New York.
[13]
N. Jovanovic, C. Kruegel, E. Kirda, Pixy: a static analysis tool for detecting web application vulnerabilities, in: Proceedings of the IEEE Symposium on Security and Privacy, Berkeley/Oakland, CA, 2006, pp. 258-263.
[14]
A. Kie¿un, V. Ganesh, P.J. Guo, P. Hooimeijer, M.D. Ernst, HAMPI: a solver for string constraints, in: Proceedings of the 18th International Symposium on Testing and Analysis, Chicago, IL, 2009, pp. 105-116.
[15]
A. Kie¿un, P.J. Guo, K. Jayaraman, M.D. Ernst, Automatic creation of SQL injection and cross-site scripting attacks, in: Proceedings of the 31st International Conference on Software Engineering, Vancouver, BC, 2009, pp. 199-209.
[16]
Lessmann, S., Baesens, B., Mues, C. and Pietsch, S., Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Transactions on Software Engineering. v34 i4. 485-496.
[17]
V.B. Livshits, M.S. Lam, Finding security errors in Java programs with static analysis, in: Proceedings of the 14th USENIX Security Symposium, Baltimore, MD, 2005, pp. 271-286.
[18]
M. Martin, M.S. Lam, Automatic generation of XSS and SQL injection attacks with goal-directed model checking, in: Proceedings of the 17th USENIX Security Symposium, San Jose, CA, 2008, pp. 31-43.
[19]
McCabe, T., A complexity measure. IEEE Transactions on Software Engineering. v2 i4. 308-320.
[20]
T. Mende, Replication of defect prediction studies: problems, pitfalls and recommendations, in: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, Timisoara, Romania, 2010, pp. 1-10.
[21]
Menzies, T., Greenwald, J. and Frank, A., Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering. v33 i1. 2-13.
[22]
Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y. and Bener, A., Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering. v17 i4. 375-407.
[23]
T. Menzies, B. Turhan, A. Bener, G. Gay, B. Cukic, Y. Jiang, Implications of ceiling effects in defect predictors, in: Promise Workshop (Part of the 30th International Conference on Software Engineering), Leipzig, Germany, 2008, pp. 47-54.
[24]
S. Neuhaus, T. Zimmermann, A. Zeller, Predicting vulnerable software components, in: Proceedings of the 14th ACM Conference on Computer and Communications Security, Alexandria, Virginia, 2007, pp. 529-540.
[25]
OWASP. Top Ten Project 2010. <http://www.owasp.org> (accessed January 2012).
[26]
PhpMinerI. <http://sharlwinkhin.com/phpminer.html>.
[27]
PROMISE. Software Engineering Repository. <http://promise.site.uottawa.ca/SERepository/> (accessed November 2011).
[28]
Schneidewind, N.F., Methodology for validating software metrics. IEEE Transactions on Software Engineering. v18 i5. 410-422.
[29]
Sen, K. and Agha, G., CUTE and jCUTE: concolic unit testing and explicit path model-checking tools. Lecture Notes in Computer Science. v4144. 419-423.
[30]
Shar, L.K. and Tan, H.B.K., Automated removal of cross site scripting vulnerabilities in web applications. Information and Software Technology. v54 i5. 467-478.
[31]
L.K. Shar, H.B.K. Tan, Mining input sanitization patterns for predicting SQLI and XSS vulnerabilities, in: Proceedings of the 34th International Conference on Software Engineering, Zurich, Switzerland, 2012, pp. 1293-1296.
[32]
L.K. Shar, H.B.K. Tan, L.C. Briand, Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis, in: Proceedings of the 35th International Conference on Software Engineering, San Francisco, USA, in press.
[33]
Shin, Y., Meneely, A., Williams, L. and Osborne, J.A., Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Transactions on Software Engineering. v37 i6. 772-787.
[34]
Sjøberg, D.I.K., Hannay, J.E., Hansen, O., Kampenes, V.B., Karahasanović, A., Liborg, N.-K. and Rekdal, A.C., A survey of controlled experiments in software engineering. IEEE Transactions on Software Engineering. v31 i9. 733-753.
[35]
Song, Q., Jia, Z., Shepperd, M., Ying, S. and Liu, J., A general software defect-proneness prediction framework. IEEE Transactions on Software Engineering. v37 i3. 356-370.
[36]
Soot. A Java Optimization Framework. <http://www.sable.mcgill.ca/soot/> (accessed October 2012).
[37]
Sourceforge. <http://www.sourceforge.net> (accessed March 2011).
[38]
Thomas, S., Williams, L. and Xie, T., On automated prepared statement generation to remove SQL injection vulnerabilities. Information and Software Technology. v51 i3. 589-598.
[39]
A. Tosun, A. Bener, Ai-based software defect predictors: applications and benefits in a case study, in: Proceedings of the 22nd Innovative Applications of Artificial Intelligence Conference, Atlanta, Georgia, 2010.
[40]
A. Tosun, B. Turhan, A. Bener, Validation of network measures as indicators of defective modules in software systems, in: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, Vancouver, BC, 2009, pp. 1-9.
[41]
J. Walden, M. Doyle, G.A. Welch, M. Whelan, Security of open source web applications, in: Proceedings of the 3rd International Symposium on Empirical Software Engineering and Measurement, Lake Buena Vista, Florida, 2009, pp. 545-553.
[42]
G. Wassermann, D. Yu, A. Chander, D. Dhurjati, H. Inamura, Z. Su, Dynamic test input generation for web applications, in: Proceedings of the International Symposium on Software Testing and Analysis, Seattle, WA, 2008, pp. 249-260.
[43]
Witten, I.H. and Frank, E., Data Mining. 2005. second ed. Morgan Kaufmann.
[44]
Y. Xie, A. Aiken, Static detection of security vulnerabilities in scripting languages, in: Proceedings of the 15th USENIX Security Symposium, Vancouver, BC, 2006, pp. 179-192.
[45]
T. Zimmermann, N. Nagappan, Predicting defect using network analysis on dependency graphs, in: Proceedings of the 30th International Conference on Software Engineering, Leipzig, Germany, 2008, pp. 531-540.

Cited By

View all
  • (2025)DCAFixer: An Automatic Tool for Bug Detection and Repair for Database Java Client ApplicationsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.339666722:1(327-342)Online publication date: 1-Jan-2025
  • (2025)A survey on various security protocols of edge computingThe Journal of Supercomputing10.1007/s11227-024-06678-681:1Online publication date: 1-Jan-2025
  • (2024)An Extensive Comparison of Static Application Security Testing ToolsProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661199(69-78)Online publication date: 18-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information and Software Technology
Information and Software Technology  Volume 55, Issue 10
October, 2013
165 pages

Publisher

Butterworth-Heinemann

United States

Publication History

Published: 01 October 2013

Author Tags

  1. Data mining
  2. Empirical study
  3. Input sanitization
  4. Static code attributes
  5. Vulnerability prediction
  6. Web application vulnerability

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)DCAFixer: An Automatic Tool for Bug Detection and Repair for Database Java Client ApplicationsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.339666722:1(327-342)Online publication date: 1-Jan-2025
  • (2025)A survey on various security protocols of edge computingThe Journal of Supercomputing10.1007/s11227-024-06678-681:1Online publication date: 1-Jan-2025
  • (2024)An Extensive Comparison of Static Application Security Testing ToolsProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661199(69-78)Online publication date: 18-Jun-2024
  • (2024)Software vulnerable functions discovery based on code composite featureJournal of Information Security and Applications10.1016/j.jisa.2024.10371881:COnline publication date: 1-Mar-2024
  • (2024)Twenty-two years since revealing cross-site scripting attacksComputer Science Review10.1016/j.cosrev.2024.10063452:COnline publication date: 18-Jul-2024
  • (2023)Splendor: Static Detection of Stored XSS in Modern Web ApplicationsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598116(1043-1054)Online publication date: 12-Jul-2023
  • (2023)Structuring meaningful bug‐fixing patches to fix software defectIET Software10.1049/sfw2.1214017:4(566-581)Online publication date: 12-Jul-2023
  • (2023)An Enhanced Static Taint Analysis Approach to Detect Input Validation VulnerabilityJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.01.00935:2(682-701)Online publication date: 1-Feb-2023
  • (2022)An adaptive search optimization algorithm for improving the detection capability of software vulnerabilityProceedings of the 13th Asia-Pacific Symposium on Internetware10.1145/3545258.3545283(212-220)Online publication date: 11-Jun-2022
  • (2022)Statically identifying XSS using deep learningScience of Computer Programming10.1016/j.scico.2022.102810219:COnline publication date: 1-Jul-2022
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media