Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3433210.3453115acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article
Open access

Bran: Reduce Vulnerability Search Space in Large Open Source Repositories by Learning Bug Symptoms

Published: 04 June 2021 Publication History

Abstract

Software is continually increasing in size and complexity, and therefore, vulnerability discovery would benefit from techniques that identify potentially vulnerable regions within large code bases, as this allows for easing vulnerability detection by reducing the search space. Previous work has explored the use of conventional code-quality and complexity metrics in highlighting suspicious sections of (source) code. Recently, researchers also proposed to reduce the vulnerability search space by studying code properties with neural networks. However, previous work generally failed in leveraging the rich metadata that is available for long-running, large code repositories.
In this paper, we present an approach, named Bran, to reduce the vulnerability search space by combining conventional code metrics with fine-grained repository metadata. Bran locates code sections that are more likely to contain vulnerabilities in large code bases, potentially improving the efficiency of both manual and automatic code audits. In our experiments on four large code bases, Bran successfully highlights potentially vulnerable functions, outperforming several baselines, including state-of-art vulnerability prediction tools. We also assess Bran's effectiveness in assisting automated testing tools. We use Bran to guide syzkaller, a known kernel fuzzer, in fuzzing a recent version of the Linux kernel. The guided fuzzer identifies 26 bugs (10 are zero-day flaws), including arbitrary writes and reads.

References

[1]
Apache flink project. https://flink.apache.org/. (Used version: 1.6.0).
[2]
Cppcheck - a tool for static c/c+ code analysis.
[3]
flawfinder - lexically find potential security flaws ("hits") in source code.
[4]
Pscan, http://deployingradius.com/pscan/, 2017.
[5]
Rats, https://code.google.com/archive/p/rough-auditing-tool-for-security/, 2017.
[6]
and and. Improving vulnerability prediction accuracy with secure coding standard violation measures. In 2016 International Conference on Big Data and Smart Computing (BigComp), pages 115--122, Jan 2016.
[7]
Maurice J Bach et al. The design of the UNIX operating system, volume 5. Prentice-Hall Englewood Cliffs, NJ, 1986.
[8]
Paul E. Black. SARD. https://samate.nist.gov/SARD/.
[9]
Cagatay Catal and Banu Diri. A systematic review of software fault prediction studies. Expert Systems with Applications, 36:7346--7354, 05 2009.
[10]
Boris Chernis and Rakesh Verma. Machine learning methods for software vulnerability detection. In Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics, IWSPA '18, pages 31--39, New York, NY, USA, 2018. ACM.
[11]
Istehad Chowdhury and Mohammad Zulkernine. Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. J. Syst. Archit., 57(3):294--313, March 2011.
[12]
Debian. Debian - the universal operating system. https://www.debian.org/.
[13]
David Drysdale. Coverage-guided kernel fuzzing with syzkaller. Linux Weekly News, 2:33, 2016.
[14]
Xiaoning Du, Bihuan Chen, Yuekang Li, Jianmin Guo, Yaqin Zhou, Yang Liu, and Yu Jiang. Leopard: Identifying vulnerable code for vulnerability assessment through program metrics. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pages 60--71. IEEE, 2019.
[15]
Xu Duan, Jingzheng Wu, Shouling Ji, Zhiqing Rui, Tianyue Luo, Mutian Yang, and Yanjun Wu. Vulsniper: Focus your attention to shoot fine-grained vulnerabilities. In IJCAI, 2019.
[16]
Wei Fu and Tim Menzies. Revisiting unsupervised learning for defect prediction. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, pages 72--83, New York, NY, USA, 2017. ACM.
[17]
GitHub. GitHub. https://github.com.
[18]
GitHub. Github's public apis. https://developer.github.com/v3/. (Accessed: 2019-01--15).
[19]
Google. syzbot. https://syzkaller.appspot.com/.
[20]
Arthur Griffith. GCC: the complete reference. McGraw-Hill, Inc., 2002.
[21]
T. Gyimothy, R. Ferenc, and I. Siket. Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Transactions on Software Engineering, 31(10):897--910, Oct 2005.
[22]
Brian Hackett, Manuvir Das, Daniel Wang, and Zhe Yang. Modular checking for buffer overflows in the large. In Proceedings of the 28th International Conference on Software Engineering, ICSE '06, pages 232--241, New York, NY, USA, 2006. ACM.
[23]
T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6):1276--1304, Nov 2012.
[24]
Jacob A. Harer, Louis Y. Kim, Rebecca L. Russell, Onur Ozdemir, Leonard R. Kosta, Akshay Rangamani, Lei H. Hamilton, Gabriel I. Centeno, Jonathan R. Key, Paul M. Ellingwood, Marc W. McConley, Jeffrey M. Opper, Sang Peter Chin, and Tomo Lazovich. Automated software vulnerability detection with machine learning. CoRR, abs/1803.04497, 2018.
[25]
S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, PLDI '88, pages 35--46, New York, NY, USA, 1988. ACM.
[26]
L. Jiang, G. Misherghi, Z. Su, and S. Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In 29th International Conference on Software Engineering (ICSE'07), pages 96--105, May 2007.
[27]
S. Kim, S. Woo, H. Lee, and H. Oh. Vuddy: A scalable approach for vulnerable code clone discovery. In 2017 IEEE Symposium on Security and Privacy (SP), pages 595--614, May 2017.
[28]
S. Kim, S. Woo, H. Lee, and H. Oh. Vuddy: A scalable approach for vulnerable code clone discovery. In 2017 IEEE Symposium on Security and Privacy (SP), pages 595--614, May 2017.
[29]
A Günecs Koru, Khaled El Emam, Dongsong Zhang, Hongfang Liu, and Divya Mathew. Theory of relative defect proneness. Empirical Software Engineering, 13(5):473, 2008.
[30]
Brian Krebs. Experts Urge Rapid Patching of Struts Bug, 2018. https://krebsonsecurity.com/2018/08/experts-urge-rapid-patching-of-struts-bug/.
[31]
J. R. Larus, T. Ball, R. DeLine, M. Fahndrich, J. Pincus, S. K. Rajamani, and R. Venkatapathy. Righting software. IEEE Software, 21(3):92--100, May 2004.
[32]
Chris Lattner and Vikram Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004., pages 75--86. IEEE, 2004.
[33]
J. Li and M. D. Ernst. Cbcd: Cloned buggy code detector. In 2012 34th International Conference on Software Engineering (ICSE), pages 310--320, June 2012.
[34]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, Zhaoxuan Chen, Sujuan Wang, and Jialai Wang. Sysevr: A framework for using deep learning to detect software vulnerabilities. CoRR, abs/1807.06756, 2018.
[35]
Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. Vuldeepecker: A deep learning-based system for vulnerability detection. CoRR, abs/1801.01681, 2018.
[36]
Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondvrej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. In defense of soundiness: A manifesto. Commun. ACM, 58(2):44--46, January 2015.
[37]
V. Benjamin Livshits and Monica S. Lam. Finding security vulnerabilities in java applications with static analysis. In Proceedings of the 14th Conference on USENIX Security Symposium - Volume 14, SSYM'05, pages 18--18, Berkeley, CA, USA, 2005. USENIX Association.
[38]
Aravind Machiry, Chad Spensky, Jake Corina, Nick Stephens, Christopher Kruegel, and Giovanni Vigna. DR. CHECKER: A soundy analysis for linux kernel drivers. In 26th USENIX Security Symposium (USENIX Security 17), pages 1007--1024, 2017.
[39]
Ruchika Malhotra. A systematic review of machine learning techniques for software fault prediction. Appl. Soft Comput., 27(C):504--518, February 2015.
[40]
T. J. McCabe. A complexity measure. IEEE Transactions on Software Engineering, SE-2(4):308--320, Dec 1976.
[41]
Ibéria Medeiros, Nuno Neves, and Miguel Correia. Dekant: A static analysis tool that learns to detect web application vulnerabilities. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, pages 1--11. ACM, 2016.
[42]
Andrew Meneely, Harshavardhan Srinivasan, Ayemi Musa, Alberto Rodriguez Tejeda, Matthew Mokary, and Brian Spates. When a patch goes bad: Exploring the properties of vulnerability-contributing commits. In 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pages 65--74. IEEE, 2013.
[43]
Andrew Meneely, Alberto C Rodriguez Tejeda, Brian Spates, Shannon Trudeau, Danielle Neuberger, Katherine Whitlock, Christopher Ketant, and Kayla Davis. An empirical investigation of socio-technical code review metrics and security vulnerabilities. In Proceedings of the 6th International Workshop on Social Software Engineering, pages 37--44, 2014.
[44]
Bertrand Meyer. Soundness and Completeness: With Precision, 2019. https://cacm.acm.org/blogs/blog-cacm/236068-soundness-and-completeness-with-precision/fulltext.
[45]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[46]
Sara Moshtari, Ashkan Sami, and Mahdi Azimi. Using complexity metrics to improve software security. Computer Fraud & Security, 2013:8--17, 05 2013.
[47]
Sara Moshtari, Ashkan Sami, and Mahdi Azimi. Using complexity metrics to improve software security. Computer Fraud & Security, 2013(5):8--17, 2013.
[48]
Lili Mou, Ge Li, Yuxuan Liu, Hao Peng, Zhi Jin, Yan Xu, and Lu Zhang. Building program vector representations for deep learning. CoRR, abs/1409.3358, 2014.
[49]
J. C. Munson and S. G. Elbaum. Code churn: a measure for estimating the impact of code change. In Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272), pages 24--31, Nov 1998.
[50]
Stephan Neuhaus, Thomas Zimmermann, Christian Holler, and Andreas Zeller. Predicting vulnerable software components. In Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS '07, pages 529--540, New York, NY, USA, 2007. ACM.
[51]
Viet Hung Nguyen and Le Minh Sang Tran. Predicting vulnerable software components with dependency graphs. In Proceedings of the 6th International Workshop on Security Measurements and Metrics, MetriSec '10, pages 3:1--3:8, New York, NY, USA, 2010. ACM.
[52]
nluedtke. Linux kernel cves. https://github.com/nluedtke/linux_kernel_cves. (Accessed: 2019-01--15).
[53]
Serkan Ozkan. Cve details: The ultimate security vulnerability datasource. https://www.cvedetails.com. (Accessed: 2019-01--15).
[54]
Henning Perl, Sergej Dechand, Matthew Smith, Daniel Arp, Fabian Yamaguchi, Konrad Rieck, Sascha Fahl, and Yasemin Acar. Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 426--437, 2015.
[55]
Nam H. Pham, Tung Thanh Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. Detection of recurring software vulnerabilities. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE '10, pages 447--456, New York, NY, USA, 2010. ACM.
[56]
pmccabe. Mccabe-style function complexity and line counting. https://people.debian.org/ bame/pmccabe/pmccabe.1. (Used version: 2.6).
[57]
Danijel Radjenoviç, Marjan Herivcko, Richard Torkar, and Alevs vZivkovivc. Software fault prediction metrics. Inf. Softw. Technol., 55(8):1397--1418, August 2013.
[58]
Rebecca L. Russell, Louis Y. Kim, Lei H. Hamilton, Tomo Lazovich, Jacob A. Harer, Onur Ozdemir, Paul M. Ellingwood, and Marc W. McConley. Automated vulnerability detection in source code using deep representation learning. CoRR, abs/1807.04320, 2018.
[59]
Barbara G Ryder. Constructing the call graph of a program. IEEE Transactions on Software Engineering, (3):216--226, 1979.
[60]
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. Sourcerercc: Scaling code clone detection to big-code. In Proceedings of the 38th International Conference on Software Engineering, ICSE '16, pages 1157--1168, New York, NY, USA, 2016. ACM.
[61]
R. Scandariato, J. Walden, A. Hovsepyan, and W. Joosen. Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering, 40(10):993--1006, Oct 2014.
[62]
Yonghee Shin, Andrew Meneely, Laurie Williams, and Jason A. Osborne. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans. Softw. Eng., 37(6):772--787, November 2011.
[63]
Yonghee Shin and Laurie Williams. An empirical model to predict security vulnerabilities using code complexity metrics. In Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM '08, pages 315--317, New York, NY, USA, 2008. ACM.
[64]
Sooel Son, Kathryn S. McKinley, and Vitaly Shmatikov. Rolecast: Finding missing security checks when you do not know what checks are. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '11, pages 1069--1084, New York, NY, USA, 2011. ACM.
[65]
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. PyDriller: Python framework for mining software repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering - ESEC/FSE 2018, pages 908--911, New York, New York, USA, 2018. ACM Press.
[66]
Lin Tan, Xiaolan Zhang, Xiao Ma, Weiwei Xiong, and Yuanyuan Zhou. Autoises: Automatically inferring security specifications and detecting violations. In Proceedings of the 17th Conference on Security Symposium, SS'08, pages 379--394, Berkeley, CA, USA, 2008. USENIX Association.
[67]
J. Vanegue and S. K. Lahiri. Towards practical reactive security audit using extended static checkers. In 2013 IEEE Symposium on Security and Privacy, pages 33--47, May 2013.
[68]
J. Viega, J. T. Bloch, Y. Kohno, and G. McGraw. Its4: a static vulnerability scanner for c and c+ code. In Proceedings 16th Annual Computer Security Applications Conference (ACSAC'00), pages 257--267, Dec 2000.
[69]
Dmitry Vyukov. gcc: add fuzzing coverage, 2016. https://codereview.appspot.com/280140043.
[70]
Dmitry Vyukov. Recommended kernel configs, 2019. https://github.com/google/syzkaller/blob/master/docs/linux/kernel_configs.md.
[71]
Y. Xue, Z. Xu, M. Chandramohan, and Y. Liu. Accurate and scalable cross-architecture cross-os binary code search with emulation. IEEE Transactions on Software Engineering, pages 1--1, 2018.
[72]
F. Yamaguchi, N. Golde, D. Arp, and K. Rieck. Modeling and discovering vulnerabilities with code property graphs. In 2014 IEEE Symposium on Security and Privacy, pages 590--604, May 2014.
[73]
F. Yamaguchi, N. Golde, D. Arp, and K. Rieck. Modeling and discovering vulnerabilities with code property graphs. In 2014 IEEE Symposium on Security and Privacy, pages 590--604, May 2014.
[74]
Fabian Yamaguchi, Felix Lindner, and Konrad Rieck. Vulnerability extrapolation: Assisted discovery of vulnerabilities using machine learning. In Proceedings of the 5th USENIX Conference on Offensive Technologies, WOOT'11, pages 13--13, Berkeley, CA, USA, 2011. USENIX Association.
[75]
Fabian Yamaguchi, Markus Lottmann, and Konrad Rieck. Generalized vulnerability extrapolation using abstract syntax trees. In Proceedings of the 28th Annual Computer Security Applications Conference, ACSAC '12, pages 359--368, New York, NY, USA, 2012. ACM.
[76]
Fabian Yamaguchi, Alwin Maier, Hugo Gascon, and Konrad Rieck. Automatic inference of search patterns for taint-style vulnerabilities. In Proceedings of the 2015 IEEE Symposium on Security and Privacy, SP '15, pages 797--812, Washington, DC, USA, 2015. IEEE Computer Society.
[77]
Fabian Yamaguchi, Christian Wressnegger, Hugo Gascon, and Konrad Rieck. Chucky: Exposing missing checks in source code for vulnerability discovery. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, CCS '13, pages 499--510, New York, NY, USA, 2013. ACM.
[78]
M. Yan, Y. Fang, D. Lo, X. Xia, and X. Zhang. File-level defect prediction: Unsupervised vs. supervised models. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 344--353, Nov 2017.
[79]
Yibiao Yang, Yuming Zhou, Jinping Liu, Yangyang Zhao, Hongmin Lu, Lei Xu, Baowen Xu, and Hareton Leung. Effort-aware just-in-time defect prediction: Simple unsupervised models could be better than supervised models. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, pages 157--168, New York, NY, USA, 2016. ACM.
[80]
Wei You, Peiyuan Zong, Kai Chen, XiaoFeng Wang, Xiaojing Liao, Pan Bian, and Bin Liang. Semfuzz: Semantics-based automatic generation of proof-of-concept exploits. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 2139--2154. ACM, 2017.
[81]
Christoph Zengler and Wolfgang Küchlin. Encoding the linux kernel configuration in propositional logic. In Proceedings of the 19th European Conference on Artificial Intelligence (ECAI 2010) Workshop on Configuration, volume 2010, pages 51--56, 2010.
[82]
Feng Zhang, Quan Zheng, Ying Zou, and Ahmed E. Hassan. Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of the 38th International Conference on Software Engineering, ICSE '16, pages 309--320, New York, NY, USA, 2016. ACM.
[83]
Yuming Zhou, Yibiao Yang, Hongmin Lu, Lin Chen, Yanhui Li, Yangyang Zhao, Junyan Qian, and Baowen Xu. How far we have progressed in the journey? an examination of cross-project defect prediction. ACM Transactions on Software Engineering and Methodology (TOSEM), 27(1):1, 2018.

Cited By

View all
  • (2023)Predicting Bug-Fixing Time Using the Latent Dirichlet Allocation Model with CovariatesEvaluation of Novel Approaches to Software Engineering10.1007/978-3-031-36597-3_7(139-152)Online publication date: 8-Jul-2023

Index Terms

  1. Bran: Reduce Vulnerability Search Space in Large Open Source Repositories by Learning Bug Symptoms

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASIA CCS '21: Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security
    May 2021
    975 pages
    ISBN:9781450382878
    DOI:10.1145/3433210
    • General Chairs:
    • Jiannong Cao,
    • Man Ho Au,
    • Program Chairs:
    • Zhiqiang Lin,
    • Moti Yung
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. machine learning
    2. static analysis
    3. vulnerabilities

    Qualifiers

    • Research-article

    Conference

    ASIA CCS '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 418 of 2,322 submissions, 18%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)120
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Predicting Bug-Fixing Time Using the Latent Dirichlet Allocation Model with CovariatesEvaluation of Novel Approaches to Software Engineering10.1007/978-3-031-36597-3_7(139-152)Online publication date: 8-Jul-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media