Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3293882.3330556acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Codebase-adaptive detection of security-relevant methods

Published: 10 July 2019 Publication History

Abstract

More and more companies use static analysis to perform regular code reviews to detect security vulnerabilities in their code, configuring them to detect various types of bugs and vulnerabilities such as the SANS top 25 or the OWASP top 10. For such analyses to be as precise as possible, they must be adapted to the code base they scan. The particular challenge we address in this paper is to provide analyses with the correct security-relevant methods (Srm): sources, sinks, etc. We present SWAN, a fully-automated machine-learning approach to detect sources, sinks, validators, and authentication methods for Java programs. SWAN further classifies the Srm into specific vulnerability classes of the SANS top 25. To further adapt the lists detected by SWAN to the code base and to improve its precision, we also introduce SWANAssist, an extension to SWAN that allows analysis users to refine the classifications. On twelve popular Java frameworks, SWAN achieves an average precision of 0.826, which is better or comparable to existing approaches. Our experiments show that SWANAssist requires a relatively low effort from the developer to significantly improve its precision.

References

[1]
Apache. {n. d.}. Abdera. https://abdera.apache.org/.
[2]
Apache. {n. d.}. Apache Commons. https://commons.apache.org/.
[3]
Apache. {n. d.}. Apache Cordova. https://cordova.apache.org/.
[4]
Apache. {n. d.}. Apache Lucene. http://lucene.apache.org/.
[5]
Apache. {n. d.}. Apache Stratos. http://stratos.apache.org/.
[6]
Apache. {n. d.}. Apache Struts. https://struts.apache.org/.
[7]
Apache. {n. d.}. Roller. http://roller.apache.org/.
[8]
Apache. {n. d.}. Tomcat. http://tomcat.apache.org/.
[9]
S. Arzt, S. Rasthofer, and E. Bodden. 2013. SuSi: A Tool for the Fully Automated Classification and Categorization of Android Sources and Sinks (NDSS’13).
[10]
S. Arzt, S. Rasthofer, and E. Bodden. 2017. The Soot-based Toolchain for Analyzing Android Apps. In Proceedings of the 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft ’17). IEEE Press, Piscataway, NJ, USA, 13– 24.
[11]
S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y. Le Traon, D. Octeau, and P. McDaniel. 2014. FlowDroid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 259–269.
[12]
[13]
L. Nguyen Quang Do, K. Ali, B. Livshits, E. Bodden, J. Smith, and E. Murphy-Hill. 2017. Cheetah: Just-in-time Taint Analysis for Android Apps. In Proceedings of the 39th International Conference on Software Engineering Companion (ICSE-C ’17). IEEE Press, Piscataway, NJ, USA, 39–42.
[14]
Dropwizard. {n. d.}. Dropwizard. https://www.dropwizard.io/.
[15]
R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. 2013. Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). 422–431.
[16]
Eclipse. {n. d.}. Jetty. https://www.eclipse.org/jetty/.
[17]
Eclipse. {n. d.}. Smarthome. https://www.eclipse.org/smarthome/.
[18]
European Bioinformatics Institute (EMBL-EBI). {n. d.}. EMBL-EBI home page. https://www.ebi.ac.uk/. Online; accessed 10 December 2018.
[19]
Common Weakness Enumeration. {n. d.}. 2011 CWE/SANS Top 25 Most Dangerous Software Errors. http://cwe.mitre.org/top25/.
[20]
Common Weakness Enumeration. {n. d.}. CWE-287: Improper Authentication. http://cwe.mitre.org/data/definitions/287.html.
[21]
Common Weakness Enumeration. {n. d.}. CWE-359: Exposure of Private Information (’Privacy Violation’). https://cwe.mitre.org/data/definitions/359.html.
[22]
Common Weakness Enumeration. {n. d.}. CWE CATEGORY: OWASP Top Ten 2017 Category A1 - Injection. https://cwe.mitre.org/data/definitions/1027.html.
[23]
Common Weakness Enumeration. {n. d.}. CWE home page. http://cwe.mitre.org/. Online; accessed 27 September 2018.
[24]
Z. P. Fry and Westley. 2013. Clustering static analysis defect reports to reduce maintenance costs. In 2013 20th Working Conference on Reverse Engineering (WCRE). 282–291.
[25]
Google. {n. d.}. Android API 4.2. https://developer.android.com/about/versions/ android-4.2.
[26]
Google. {n. d.}. Google Auth Java. https://github.com/googleapis/google-authlibrary-java.
[27]
GWT. {n. d.}. GWT. http://www.gwtproject.org/.
[28]
M. Harman and P. O’Hearn. 2018. From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis. In Proceedings of the 2018 IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM) (SCAM ’18). IEEE.
[29]
S. Heckman and L. Williams. 2009. A Model Building Process for Identifying Actionable Static Analysis Alerts. In 2009 International Conference on Software Testing Verification and Validation. 161–170.
[30]
J. Heffley and P. Meunier. 2004. Can source code auditing software identify common vulnerabilities and be used to evaluate software security?. In 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the. 10 pp.–.
[31]
European Bioinformatics Institute. {n. d.}. Gene Expression Atlas. https://github. com/gxa/gxa.
[32]
Spark Java. {n. d.}. Spark. http://sparkjava.com/.
[33]
JetBrains. {n. d.}. IntelliJ home page. https://www.jetbrains.com/idea/. Online; accessed 17 October 2018.
[34]
JGuard. {n. d.}. JGuard. http://jguard.net/.
[35]
jsoup. {n. d.}. jsoup. https://jsoup.org/.
[36]
P. Lam, E. Bodden, O. Lhoták, and L. Hendren. 2011. The Soot framework for Java program analysis: a retrospective. In Cetus Users and Compiler Infrastructure Workshop (CETUS 2011).
[37]
J. R. Landis and G. G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 1 (1977), 159–174. http://www.jstor.org/stable/ 2529310
[38]
B. Livshits, A. V. Nori, S. K. Rajamani, and A. Banerjee. 2009. Merlin: Specification Inference for Explicit Information Flow Problems. SIGPLAN Not. 44, 6 (June 2009), 75–86.
[39]
Lucia, D. Lo, L. Jiang, and A. Budi. 2012. Active refinement of clone anomaly reports. In 2012 34th International Conference on Software Engineering (ICSE). 397–407.
[40]
B. Mathis, V. Avdiienko, E. O. Soremekun, M. Böhme, and A. Zeller. 2017. Detecting Information Flow by Mutating Input Data. In Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). IEEE Press, Piscataway, NJ, USA, 263–273. http://dl.acm.org/citation.cfm? id=3155562.3155598
[41]
A. Mendoza and G. Gu. 2018. Mobile Application Web API Reconnaissance: Web-to-Mobile Inconsistencies amp; Vulnerabilities. In 2018 IEEE Symposium on Security and Privacy (SP). 756–769.
[42]
OWASP. {n. d.}. WebGoat. https://github.com/WebGoat/WebGoat.
[43]
Open Web Application Security Project. {n. d.}. OWASP Top 10 Most Critical Web Application Security Risks. https://www.owasp.org/index.php/Category: OWASP_Top_Ten_Project.
[44]
C. Sadowski, E. Aftandilian, A. Eagle, L. Miller-Cushon, and C. Jaspan. 2018. Lessons from Building Static Analysis Tools at Google. Commun. ACM 61, 4 (March 2018), 58–66.
[45]
D. Sas, M. Bessi, and F. A. Fontana. 2018. {Research Paper} Automatic Detection of Sources and Sinks in Arbitrary Java Libraries. In 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM). 103–112.
[46]
Java Spring. {n. d.}. Java Spring. https://spring.io/.
[47]
M. Stone. 1974. Cross-validatory choice and assessment of statistical predictions. Journal of the royal statistical society. Series B (Methodological) (1974), 111–147.
[48]
Pebble Templates. {n. d.}. Pebble. https://pebbletemplates.io/.
[49]
T. W. Thomas, M. Tabassum, B. Chu, and H. Lipford. 2018. Security During Application Development: An Application Security Expert Perspective. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). ACM, New York, NY, USA, Article 262, 12 pages.
[50]
J. Thomé, L. K. Shar, D. Bianculli, and L. C. Briand. 2017. JoanAudit: A Tool for Auditing Common Injection Vulnerabilities. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 1004–1008.
[51]
O. Tripp, S. Guarnieri, M. Pistoia, and A. Aravkin. 2014. ALETHEIA: Improving the Usability of Static Security Analysis. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS ’14). ACM, New York, NY, USA, 762–774.
[52]
Paderborn University and Fraunhofer IEM. {n. d.}. SWAN and SWAN Assist github repository. https://github.com/secure-software-engineering/swan. Online; published 03 November 2018.
[53]
R. Vallée-Rai, E. Gagnon, L. J. Hendren, P. Lam, P. Pominville, and V. Sundaresan. 2000. Optimizing Java Bytecode Using the Soot Framework: Is It Feasible?. In CC. 18–34.

Cited By

View all
  • (2024)Detecting Security-Relevant Methods using Multi-label Machine LearningProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648464(101-106)Online publication date: 20-Apr-2024
  • (2024)DocFlow: Extracting Taint Specifications from Software DocumentationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623312(1-12)Online publication date: 20-May-2024
  • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2019
451 pages
ISBN:9781450362245
DOI:10.1145/3293882
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2019

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Java Security
  2. Machine-learning
  3. Program Analysis

Qualifiers

  • Research-article

Conference

ISSTA '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)4
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Detecting Security-Relevant Methods using Multi-label Machine LearningProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648464(101-106)Online publication date: 20-Apr-2024
  • (2024)DocFlow: Extracting Taint Specifications from Software DocumentationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623312(1-12)Online publication date: 20-May-2024
  • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
  • (2023)DAISY: Dynamic-Analysis-Induced Source Discovery for Sensitive DataACM Transactions on Software Engineering and Methodology10.1145/356993632:4(1-34)Online publication date: 27-May-2023
  • (2023)Security Relevant Methods of Android's API Classification: A Machine Learning Empirical EvaluationIEEE Transactions on Computers10.1109/TC.2023.329199872:11(3273-3285)Online publication date: 1-Nov-2023
  • (2023)Can the configuration of static analyses make resolving security vulnerabilities more effective? - A user studyEmpirical Software Engineering10.1007/s10664-023-10354-328:5Online publication date: 12-Sep-2023
  • (2022)PABAU: Privacy Analysis of Biometric API Usage2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta)10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00327(2301-2308)Online publication date: Dec-2022
  • (2022)Fluently specifying taint-flow queries with fluentTQLEmpirical Software Engineering10.1007/s10664-022-10165-y27:5Online publication date: 30-May-2022
  • (2021)SootFX: A Static Code Feature Extraction Tool for Java and Android2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM52516.2021.00030(181-186)Online publication date: Sep-2021
  • (2021)Optimization of Microservices Security2021 3rd International Conference on Advancements in Computing (ICAC)10.1109/ICAC54203.2021.9671131(49-54)Online publication date: 9-Dec-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media