Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3236024.3275524acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation

Published: 26 October 2018 Publication History

Abstract

Learning-based clone detection is widely exploited for binary vulnerability search. Although they solve the problem of high time overhead of traditional dynamic and static search approaches to some extent, their accuracy is limited, and need to manually identify the true positive cases among the top-M search results during the industrial practice. This paper presents VulSeeker-Pro, an enhanced binary vulnerability seeker that integrates function semantic emulation at the back end of semantic learning, to release the engineers from the manual identification work. It first uses the semantic learning based predictor to quickly predict the top-M candidate functions which are the most similar to the vulnerability from the target binary. Then the top-M candidates are fed to the emulation engine to resort, and more accurate top-N candidate functions are obtained. With fast filtering of semantic learning and dynamic trace generation of function semantic emulation, VulSeeker-Pro can achieve higher search accuracy with little time overhead. The experimental results on 15 known CVE vulnerabilities involving 6 industry widely used programs show that VulSeeker-Pro significantly outperforms the state-of-the-art approaches in terms of accuracy. In a total of 45 searches, VulSeeker-Pro finds 40 and 43 real vulnerabilities in the top-1 and top-5 candidate functions, which are 12.33× and 2.58× more than the most recent and related work Gemini. In terms of efficiency, it takes 0.22 seconds on average to determine whether the target binary function contains a known vulnerability or not.

References

[1]
Martın Abadi, Ashish Agarwal, Paul Barham, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http://tensorflow.org/ Software available from tensorflow.org.
[2]
Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. Bingo: Cross-architecture cross-os binary search. In FSE. ACM, 678–689.
[3]
CVE. {n. d.}. Common Vulnerabilities and Exposures. http://cve.mitre.org/. Accessed June 15, 2018.
[4]
Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code. In NDSS.
[5]
Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable graph-based bug search for firmware images. In CCS.
[6]
Jian Gao, Xin Yang, Ying Fu, Yu Jiang, and Jiaguang Sun. 2018. VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary. In ASE.
[7]
Yikun Hu, Yuanyuan Zhang, Juanru Li, and Dawu Gu. 2017. Binary code clone detection across architectures and compiling configurations. In ICPC. IEEE.
[8]
Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. CoRR abs/1801.01681 (2018).
[9]
Jie Liang, Mingzhe Wang, Yuanliang Chen, Yu Jiang, and Renwei Zhang. 2018. Fuzz testing in practice: Obstacles and solutions. In 25th International Conference on Software Analysis, Evolution and Reengineering.
[10]
Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. 2014. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In FSE.
[11]
Jiang Ming, Dongpeng Xu, Yufei Jiang, and Dinghao Wu. 2017. BinSim: Tracebased semantic binary diffing via system call sliced segment equivalence checking. USENIX.
[12]
IDA Pro. {n. d.}. The IDA Pro Disassembler and Debugger. https://www.hex-rays. com/. Accessed May 20, 2018.
[13]
Julian Seward and Julian Seward. 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI. 89–100.
[14]
Yan Shoshitaishvili, Ruoyu Wang, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2015. Firmalice - Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware. In NDSS.
[15]
Mingzhe Wang, Jie Liang, Yuanliang Chen, Yu Jiang, Xun Jiao, Han Liu, Xibin Zhao, and Jiaguang Sun. 2018. SAFL: Increasing and Accelerating Testing Coverage with Symbolic Execution and Guided Fuzzing. In ICSE.
[16]
Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection. In CCS. ACM, 363–376. Abstract 1 Introduction 2 Related Work 3 VulSeeker-Pro Design 3.1 Semantic Learning Predictor 3.2 Emulation Engine 4 Experimental Results 4.1 Accuracy Of Vulnerability Search 4.2 Time Efficiency 5 Discussion 6 Conclusion References

Cited By

View all
  • (2024)A Survey of Binary Code Similarity Detection TechniquesElectronics10.3390/electronics1309171513:9(1715)Online publication date: 29-Apr-2024
  • (2024)Fast Cross-Platform Binary Code Similarity Detection Framework Based on CFGs Taking Advantage of NLP and Inductive GNNChinese Journal of Electronics10.23919/cje.2022.00.22833:1(128-138)Online publication date: Jan-2024
  • (2023)Towards Practical Binary Code Similarity Detection: Vulnerability Verification via Patch Semantic AnalysisACM Transactions on Software Engineering and Methodology10.1145/360460832:6(1-29)Online publication date: 30-Sep-2023
  • Show More Cited By

Index Terms

  1. VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
    October 2018
    987 pages
    ISBN:9781450355735
    DOI:10.1145/3236024
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. function emulation
    2. semantic learning
    3. vulnerability search

    Qualifiers

    • Research-article

    Conference

    ESEC/FSE '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 112 of 543 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)38
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 02 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Survey of Binary Code Similarity Detection TechniquesElectronics10.3390/electronics1309171513:9(1715)Online publication date: 29-Apr-2024
    • (2024)Fast Cross-Platform Binary Code Similarity Detection Framework Based on CFGs Taking Advantage of NLP and Inductive GNNChinese Journal of Electronics10.23919/cje.2022.00.22833:1(128-138)Online publication date: Jan-2024
    • (2023)Towards Practical Binary Code Similarity Detection: Vulnerability Verification via Patch Semantic AnalysisACM Transactions on Software Engineering and Methodology10.1145/360460832:6(1-29)Online publication date: 30-Sep-2023
    • (2023)1dFuzz: Reproduce 1-Day Vulnerabilities with Directed Differential FuzzingProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598102(867-879)Online publication date: 12-Jul-2023
    • (2023)Learning Approximate Execution Semantics From Traces for Binary Function SimilarityIEEE Transactions on Software Engineering10.1109/TSE.2022.323162149:4(2776-2790)Online publication date: 1-Apr-2023
    • (2023)Open Science in Software Engineering: A Study on Deep Learning-Based Vulnerability DetectionIEEE Transactions on Software Engineering10.1109/TSE.2022.320714949:4(1983-2005)Online publication date: 1-Apr-2023
    • (2023)Investigating Neural-based Function Name Reassignment from the Perspective of Binary Code Representation2023 20th Annual International Conference on Privacy, Security and Trust (PST)10.1109/PST58708.2023.10320193(1-11)Online publication date: 21-Aug-2023
    • (2023)Novel supply chain vulnerability detection based on heterogeneous-graph-driven hash similarity in IoTFuture Generation Computer Systems10.1016/j.future.2023.06.006148:C(201-210)Online publication date: 29-Aug-2023
    • (2023)SENSEComputers and Security10.1016/j.cose.2023.103500135:COnline publication date: 1-Dec-2023
    • (2022)Position Distribution Matters: A Graph-Based Binary Function Similarity Analysis MethodElectronics10.3390/electronics1115244611:15(2446)Online publication date: 5-Aug-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media