research-article

Codebase-adaptive detection of security-relevant methods

Authors:

Goran Piskachev,

Lisa Nguyen Quang Do,

Eric BoddenAuthors Info & Claims

ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 181 - 191

https://doi.org/10.1145/3293882.3330556

Published: 10 July 2019 Publication History

Abstract

More and more companies use static analysis to perform regular code reviews to detect security vulnerabilities in their code, configuring them to detect various types of bugs and vulnerabilities such as the SANS top 25 or the OWASP top 10. For such analyses to be as precise as possible, they must be adapted to the code base they scan. The particular challenge we address in this paper is to provide analyses with the correct security-relevant methods (Srm): sources, sinks, etc. We present SWAN, a fully-automated machine-learning approach to detect sources, sinks, validators, and authentication methods for Java programs. SWAN further classifies the Srm into specific vulnerability classes of the SANS top 25. To further adapt the lists detected by SWAN to the code base and to improve its precision, we also introduce SWANAssist, an extension to SWAN that allows analysis users to refine the classifications. On twelve popular Java frameworks, SWAN achieves an average precision of 0.826, which is better or comparable to existing approaches. Our experiments show that SWANAssist requires a relatively low effort from the developer to significantly improve its precision.

References

[1]

Apache. {n. d.}. Abdera. https://abdera.apache.org/.

[2]

Apache. {n. d.}. Apache Commons. https://commons.apache.org/.

[3]

Apache. {n. d.}. Apache Cordova. https://cordova.apache.org/.

[4]

Apache. {n. d.}. Apache Lucene. http://lucene.apache.org/.

[5]

Apache. {n. d.}. Apache Stratos. http://stratos.apache.org/.

[6]

Apache. {n. d.}. Apache Struts. https://struts.apache.org/.

[7]

Apache. {n. d.}. Roller. http://roller.apache.org/.

[8]

Apache. {n. d.}. Tomcat. http://tomcat.apache.org/.

[9]

S. Arzt, S. Rasthofer, and E. Bodden. 2013. SuSi: A Tool for the Fully Automated Classification and Categorization of Android Sources and Sinks (NDSS’13).

[10]

S. Arzt, S. Rasthofer, and E. Bodden. 2017. The Soot-based Toolchain for Analyzing Android Apps. In Proceedings of the 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft ’17). IEEE Press, Piscataway, NJ, USA, 13– 24.

Digital Library

[11]

S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y. Le Traon, D. Octeau, and P. McDaniel. 2014. FlowDroid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 259–269.

Digital Library

[12]

2594299

[13]

L. Nguyen Quang Do, K. Ali, B. Livshits, E. Bodden, J. Smith, and E. Murphy-Hill. 2017. Cheetah: Just-in-time Taint Analysis for Android Apps. In Proceedings of the 39th International Conference on Software Engineering Companion (ICSE-C ’17). IEEE Press, Piscataway, NJ, USA, 39–42.

Digital Library

[14]

Dropwizard. {n. d.}. Dropwizard. https://www.dropwizard.io/.

[15]

R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. 2013. Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). 422–431.

Digital Library

[16]

Eclipse. {n. d.}. Jetty. https://www.eclipse.org/jetty/.

[17]

Eclipse. {n. d.}. Smarthome. https://www.eclipse.org/smarthome/.

[18]

European Bioinformatics Institute (EMBL-EBI). {n. d.}. EMBL-EBI home page. https://www.ebi.ac.uk/. Online; accessed 10 December 2018.

[19]

Common Weakness Enumeration. {n. d.}. 2011 CWE/SANS Top 25 Most Dangerous Software Errors. http://cwe.mitre.org/top25/.

[20]

Common Weakness Enumeration. {n. d.}. CWE-287: Improper Authentication. http://cwe.mitre.org/data/definitions/287.html.

[21]

Common Weakness Enumeration. {n. d.}. CWE-359: Exposure of Private Information (’Privacy Violation’). https://cwe.mitre.org/data/definitions/359.html.

[22]

Common Weakness Enumeration. {n. d.}. CWE CATEGORY: OWASP Top Ten 2017 Category A1 - Injection. https://cwe.mitre.org/data/definitions/1027.html.

[23]

Common Weakness Enumeration. {n. d.}. CWE home page. http://cwe.mitre.org/. Online; accessed 27 September 2018.

[24]

Z. P. Fry and Westley. 2013. Clustering static analysis defect reports to reduce maintenance costs. In 2013 20th Working Conference on Reverse Engineering (WCRE). 282–291.

[25]

Google. {n. d.}. Android API 4.2. https://developer.android.com/about/versions/ android-4.2.

[26]

Google. {n. d.}. Google Auth Java. https://github.com/googleapis/google-authlibrary-java.

[27]

GWT. {n. d.}. GWT. http://www.gwtproject.org/.

[28]

M. Harman and P. O’Hearn. 2018. From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis. In Proceedings of the 2018 IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM) (SCAM ’18). IEEE.

[29]

S. Heckman and L. Williams. 2009. A Model Building Process for Identifying Actionable Static Analysis Alerts. In 2009 International Conference on Software Testing Verification and Validation. 161–170.

Digital Library

[30]

J. Heffley and P. Meunier. 2004. Can source code auditing software identify common vulnerabilities and be used to evaluate software security?. In 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the. 10 pp.–.

Digital Library

[31]

European Bioinformatics Institute. {n. d.}. Gene Expression Atlas. https://github. com/gxa/gxa.

[32]

Spark Java. {n. d.}. Spark. http://sparkjava.com/.

[33]

JetBrains. {n. d.}. IntelliJ home page. https://www.jetbrains.com/idea/. Online; accessed 17 October 2018.

[34]

JGuard. {n. d.}. JGuard. http://jguard.net/.

[35]

jsoup. {n. d.}. jsoup. https://jsoup.org/.

[36]

P. Lam, E. Bodden, O. Lhoták, and L. Hendren. 2011. The Soot framework for Java program analysis: a retrospective. In Cetus Users and Compiler Infrastructure Workshop (CETUS 2011).

[37]

J. R. Landis and G. G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 1 (1977), 159–174. http://www.jstor.org/stable/ 2529310

[38]

B. Livshits, A. V. Nori, S. K. Rajamani, and A. Banerjee. 2009. Merlin: Specification Inference for Explicit Information Flow Problems. SIGPLAN Not. 44, 6 (June 2009), 75–86.

Digital Library

[39]

Lucia, D. Lo, L. Jiang, and A. Budi. 2012. Active refinement of clone anomaly reports. In 2012 34th International Conference on Software Engineering (ICSE). 397–407.

Digital Library

[40]

B. Mathis, V. Avdiienko, E. O. Soremekun, M. Böhme, and A. Zeller. 2017. Detecting Information Flow by Mutating Input Data. In Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). IEEE Press, Piscataway, NJ, USA, 263–273. http://dl.acm.org/citation.cfm? id=3155562.3155598

Digital Library

[41]

A. Mendoza and G. Gu. 2018. Mobile Application Web API Reconnaissance: Web-to-Mobile Inconsistencies amp; Vulnerabilities. In 2018 IEEE Symposium on Security and Privacy (SP). 756–769.

[42]

OWASP. {n. d.}. WebGoat. https://github.com/WebGoat/WebGoat.

[43]

Open Web Application Security Project. {n. d.}. OWASP Top 10 Most Critical Web Application Security Risks. https://www.owasp.org/index.php/Category: OWASP_Top_Ten_Project.

[44]

C. Sadowski, E. Aftandilian, A. Eagle, L. Miller-Cushon, and C. Jaspan. 2018. Lessons from Building Static Analysis Tools at Google. Commun. ACM 61, 4 (March 2018), 58–66.

Digital Library

[45]

D. Sas, M. Bessi, and F. A. Fontana. 2018. {Research Paper} Automatic Detection of Sources and Sinks in Arbitrary Java Libraries. In 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM). 103–112.

[46]

Java Spring. {n. d.}. Java Spring. https://spring.io/.

[47]

M. Stone. 1974. Cross-validatory choice and assessment of statistical predictions. Journal of the royal statistical society. Series B (Methodological) (1974), 111–147.

[48]

Pebble Templates. {n. d.}. Pebble. https://pebbletemplates.io/.

[49]

T. W. Thomas, M. Tabassum, B. Chu, and H. Lipford. 2018. Security During Application Development: An Application Security Expert Perspective. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). ACM, New York, NY, USA, Article 262, 12 pages.

Digital Library

[50]

J. Thomé, L. K. Shar, D. Bianculli, and L. C. Briand. 2017. JoanAudit: A Tool for Auditing Common Injection Vulnerabilities. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 1004–1008.

Digital Library

[51]

O. Tripp, S. Guarnieri, M. Pistoia, and A. Aravkin. 2014. ALETHEIA: Improving the Usability of Static Security Analysis. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS ’14). ACM, New York, NY, USA, 762–774.

Digital Library

[52]

Paderborn University and Fraunhofer IEM. {n. d.}. SWAN and SWAN Assist github repository. https://github.com/secure-software-engineering/swan. Online; published 03 November 2018.

[53]

R. Vallée-Rai, E. Gagnon, L. J. Hendren, P. Lam, P. Pominville, and V. Sundaresan. 2000. Optimizing Java Bytecode Using the Soot Framework: Is It Feasible?. In CC. 18–34.

Cited By

Johnson OPiskachev GKrishnamurthy RBodden EDig DBryksin TGolubev YBezzubov A(2024)Detecting Security-Relevant Methods using Multi-label Machine LearningProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648464(101-106)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643796.3648464
Tileria MBlasco JDash SRoychoudhury APaiva AAbreu RStorey M(2024)DocFlow: Extracting Taint Specifications from Software DocumentationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623312(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3623312
Sharma TKechagia MGeorgiou STiwari RVats IMoazen HSarro F(2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111934
Show More Cited By

Index Terms

Codebase-adaptive detection of security-relevant methods

Recommendations

Automatic web security unit testing: XSS vulnerability detection
AST '16: Proceedings of the 11th International Workshop on Automation of Software Test

Integrating security testing into the workflow of software developers not only can save resources for separate security testing but also reduce the cost of fixing security vulnerabilities by detecting them early in the development cycle. We present an ...
Code protection for resource-constrained embedded devices
LCTES '04

While the machine neutral Java bytecodes are attractive for code distribution in the highly heterogeneous embedded domain, the well-documented and standardized features also make it difficult to protect these codes. In fact, there are several tools to ...
An In-Depth Study of More Than Ten Years of Java Exploitation
CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

When created, the Java platform was among the first runtimes designed with security in mind. Yet, numerous Java versions were shown to contain far-reaching vulnerabilities, permitting denial-of-service attacks or even worse allowing intruders to bypass ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 2019

451 pages

ISBN:9781450362245

DOI:10.1145/3293882

General Chair:
Dongmei Zhang
Microsoft Research, China
,
Program Chair:
Anders Møller
Aarhus University, Denmark

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Conference

ISSTA '19

Sponsor:

SIGSOFT

ISSTA '19: 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 15 - 19, 2019

Beijing, China

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
417
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)4

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Johnson OPiskachev GKrishnamurthy RBodden EDig DBryksin TGolubev YBezzubov A(2024)Detecting Security-Relevant Methods using Multi-label Machine LearningProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648464(101-106)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643796.3648464
Tileria MBlasco JDash SRoychoudhury APaiva AAbreu RStorey M(2024)DocFlow: Extracting Taint Specifications from Software DocumentationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623312(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3623312
Sharma TKechagia MGeorgiou STiwari RVats IMoazen HSarro F(2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111934
Zhang XHeaps JSlavin RNiu JBreaux TWang X(2023)DAISY: Dynamic-Analysis-Induced Source Discovery for Sensitive DataACM Transactions on Software Engineering and Methodology10.1145/356993632:4(1-34)Online publication date: 27-May-2023
https://dl.acm.org/doi/10.1145/3569936
Rodrigues WWalmsley FCavalcanti GCruz R(2023)Security Relevant Methods of Android's API Classification: A Machine Learning Empirical EvaluationIEEE Transactions on Computers10.1109/TC.2023.329199872:11(3273-3285)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TC.2023.3291998
Piskachev GBecker MBodden E(2023)Can the configuration of static analyses make resolving security vulnerabilities more effective? - A user studyEmpirical Software Engineering10.1007/s10664-023-10354-328:5Online publication date: 12-Sep-2023
https://doi.org/10.1007/s10664-023-10354-3
Tang F(2022)PABAU: Privacy Analysis of Biometric API Usage2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta)10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00327(2301-2308)Online publication date: Dec-2022
https://doi.org/10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00327
Piskachev GSpäth JBudde IBodden E(2022)Fluently specifying taint-flow queries with fluentTQLEmpirical Software Engineering10.1007/s10664-022-10165-y27:5Online publication date: 30-May-2022
https://doi.org/10.1007/s10664-022-10165-y
Karakaya KBodden E(2021)SootFX: A Static Code Feature Extraction Tool for Java and Android2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM52516.2021.00030(181-186)Online publication date: Sep-2021
https://doi.org/10.1109/SCAM52516.2021.00030
Kalubowila DAthukorala STharaka BSamarasekara HSamaratunge Arachchilage UKasthurirathna D(2021)Optimization of Microservices Security2021 3rd International Conference on Advancements in Computing (ICAC)10.1109/ICAC54203.2021.9671131(49-54)Online publication date: 9-Dec-2021
https://doi.org/10.1109/ICAC54203.2021.9671131
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten