Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Towards Practical Binary Code Similarity Detection: Vulnerability Verification via Patch Semantic Analysis

Published: 30 September 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Vulnerability is a major threat to software security. It has been proven that binary code similarity detection approaches are efficient to search for recurring vulnerabilities introduced by code sharing in binary software. However, these approaches suffer from high false-positive rates (FPRs) since they usually take the patched functions as vulnerable, and they usually do not work well when binaries are compiled with different compilation settings.
    To this end, we propose an approach, named Robin, to confirm recurring vulnerabilities by filtering out patched functions. Robin is powered by a lightweight symbolic execution to solve the set of function inputs that can lead to the vulnerability-related code. It then executes the target functions with the same inputs to capture the vulnerable or patched behaviors for patched function filtration. Experimental results show that Robin achieves high accuracy for patch detection across different compilers and compiler optimization levels respectively on 287 real-world vulnerabilities of 10 different software. Based on accurate patch detection, Robin significantly reduces the false-positive rate of state-of-the-art vulnerability detection tools (by 94.3% on average), making them more practical. Robin additionally detects 12 new potentially vulnerable functions.

    References

    [1]
    2010. angr. Retrieved February 15, 2021 from https://angr.io/
    [2]
    2015. CVE - CVE-2015-0288. Retrieved February 21, 2021 from https://cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2015-0288
    [3]
    2018. the-overlooked-problem-of-n-day-vulnerabilities. Retrieved February 08, 2021 from https://www.darkreading.com/vulnerabilities-threats/the-overlooked-problem-of-n-day-vulnerabilities
    [4]
    2021. CVE - CVE-2015-0209 Retrieved February 23, 2021 from https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-0209
    [5]
    2021. CVE - CVE-2015-0289 Retrieved February 16, 2021 from https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-0289
    [6]
    2021. CVE - CVE-2016-2181 Retrieved February 23, 2021 from https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-2181
    [7]
    2021. GitHub - Z3Prover/z3: The Z3 Theorem Prover. Retrieved February 15, 2021 from https://github.com/Z3Prover/z3
    [8]
    2021. IDA Pro – Hex Rays. Retrieved February 15, 2021 from https://www.hex-rays.com/products/ida/
    [9]
    2021. NVD - Home. Retrieved February 16, 2021 from https://nvd.nist.gov/
    [10]
    2021. Source Code of Robin. Retrieved December 10, 2021 from https://github.com/shouguoyang/Robin
    [11]
    2021. Vulnerability information. Retrieved December 10, 2021 from https://github.com/shouguoyang/Robin/blob/master/vulnerability_note.md
    [12]
    2021. x86 Function Attributes. Retrieved February 08, 2021 from https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html
    [13]
    2022. cve-2015-3196. Retrieved December 29, 2022 from https://cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2015-3196
    [14]
    2022. fix commit for CVE-2017-13051. Retrieved December 29, 2022 from https://github.com/the-tcpdump-group/tcpdump/commit/289c672020280529fd382f3502efab7100d638ec
    [15]
    Sunwoo Ahn, Seonggwan Ahn, Hyungjoon Koo, and Yunheung Paek. 2022. Practical binary code similarity detection with bert-based transferable similarity learning. In Proceedings of the 38th Annual Computer Security Applications Conference. 361–374.
    [16]
    Thanassis Avgerinos, Sang Kil Cha, Alexandre Rebert, Edward J. Schwartz, Maverick Woo, and David Brumley. 2014. Automatic exploit generation. Communications of the ACM 57, 2 (2014), 74–84.
    [17]
    Johann Blieberger and Bernd Burgstaller. 1998. Symbolic reaching definitions analysis of Ada programs. In Proceedings of the International Conference on Reliable Software Technologies. Springer, 238–250.
    [18]
    Benjamin Bowman and H. Howie Huang. 2020. VGRAPH: A robust vulnerable code clone detection system using code property triplets. In Proceedings of the 2020 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 53–69.
    [19]
    Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing mayhem on binary code. In Proceedings of the 2012 IEEE Symposium on Security and Privacy. IEEE, 380–394.
    [20]
    Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. Bingo: Cross-architecture cross-os binary search. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 678–689.
    [21]
    Jiarun Dai, Yuan Zhang, Zheyue Jiang, Yingtian Zhou, Junyan Chen, Xinyu Xing, Xiaohan Zhang, Xin Tan, Min Yang, and Zhemin Yang. 2020. BScout: Direct whole patch presence test for java executables. In Proceedings of the 29th \(\lbrace\) USENIX \(\rbrace\) Security Symposium ( \(\lbrace\) USENIX \(\rbrace\) Security 20). 1147–1164.
    [22]
    Yaniv David and Eran Yahav. 2014. Tracelet-based code search in executables. Acm Sigplan Notices 49, 6 (2014), 349–360.
    [23]
    Steven H. H. Ding, Benjamin C. M. Fung, and Philippe Charland. 2019. Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 472–489.
    [24]
    Yue Duan, Xuezixiang Li, Jinghan Wang, and Heng Yin. 2020. Deepbindiff: Learning program-wide code representations for binary diffing. In Proceedings of the Network and Distributed System Security Symposium.
    [25]
    Zakir Durumeric, Frank Li, James Kasten, Johanna Amann, Jethro Beekman, Mathias Payer, Nicolas Weaver, David Adrian, Vern Paxson, Michael Bailey, and J. Alex Halderman. 2014. The matter of heartbleed. In Proceedings of the 2014 Conference on Internet Measurement Conference. 475–488.
    [26]
    Manuel Egele, Maverick Woo, Peter Chapman, and David Brumley. 2014. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd \(\lbrace\) USENIX \(\rbrace\) Security Symposium ( \(\lbrace\) USENIX \(\rbrace\) Security 14). 303–317.
    [27]
    Dawson Engler and Daniel Dunbar. 2007. Under-constrained execution: making automatic code destruction easy and scalable. In Proceedings of the 2007 International Symposium on Software Testing and Analysis. 1–4.
    [28]
    Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient cross-architecture identification of bugs in binary code. In Proceedings of the NDSS. 58–79.
    [29]
    Qian Feng, Minghua Wang, Mu Zhang, Rundong Zhou, Andrew Henderson, and Heng Yin. 2017. Extracting conditional formulas for cross-platform bug search. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. 346–359.
    [30]
    Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 480–491.
    [31]
    Jian Gao, Xin Yang, Ying Fu, Yu Jiang, Heyuan Shi, and Jiaguang Sun. 2018. Vulseeker-pro: Enhanced semantic learning based binary vulnerability seeker with emulation. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 803–808.
    [32]
    Jiyong Jang, Abeer Agrawal, and David Brumley. 2012. ReDeBug: finding unpatched code clones in entire os distributions. In Proceedings of the 2012 IEEE Symposium on Security and Privacy. IEEE, 48–62.
    [33]
    Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. 2007. Deckard: Scalable and accurate tree-based detection of code clones. In Proceedings of the 29th International Conference on Software Engineering (ICSE’07). IEEE, 96–105.
    [34]
    Zheyue Jiang, Yuan Zhang, Jun Xu, Qi Wen, Zhenghe Wang, Xiaohan Zhang, Xinyu Xing, Min Yang, and Zhemin Yang. 2020. PDiff: Semantic-based patch presence testing for downstream kernels. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 1149–1163.
    [35]
    Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. 2017. Vuddy: A scalable approach for vulnerable code clone discovery. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 595–614.
    [36]
    S. Kiran Kumar and C. Pandu Rangan. 1987. A linear space algorithm for the LCS problem. Acta Informatica 24, 3 (1987), 353–362.
    [37]
    Zhe Lang, Shouguo Yang, Yiran Cheng, Xiaoling Zhang, Zhiqiang Shi, and Limin Sun. 2021. PMatch: Semantic-based patch detection for binary programs. In Proceedings of the 2021 IEEE International Performance, Computing, and Communications Conference (IPCCC). IEEE, 1–10.
    [38]
    Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Hanchao Qi, and Jie Hu. 2016. Vulpecker: An automated vulnerability detection system based on code similarity analysis. In Proceedings of the 32nd Annual Conference on Computer Security Applications. 201–213.
    [39]
    Bingchang Liu, Wei Huo, Chao Zhang, Wenchao Li, Feng Li, Aihua Piao, and Wei Zou. 2018. \(\alpha\) diff: Cross-version binary code similarity detection with dnn. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 667–678.
    [40]
    Danjun Liu, Yao Li, Yong Tang, Baosheng Wang, and Wei Xie. 2018. VMPBL: Identifying vulnerable functions based on machine learning combining patched information and binary comparison technique by LCS. In Proceedings of the 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE). IEEE, 800–807.
    [41]
    Kangjie Lu, Aditya Pakki, and Qiushi Wu. 2019. Detecting missing-check bugs via semantic-and context-aware criticalness and constraints inferences. In Proceedings of the 28th \(\lbrace\) USENIX \(\rbrace\) Security Symposium ( \(\lbrace\) USENIX \(\rbrace\) Security 19). 1769–1786.
    [42]
    Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. Safe: Self-attentive function embeddings for binary similarity. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 309–329.
    [43]
    Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2022. Learning approximate execution semantics from traces for binary function similarity. IEEE Transactions on Software Engineering 49, 4 (2022), 2776–2790.
    [44]
    David A. Ramos and Dawson Engler. 2015. Under-constrained symbolic execution: Correctness checking for real code. In Proceedings of the 24th \(\lbrace\) USENIX \(\rbrace\) Security Symposium ( \(\lbrace\) USENIX \(\rbrace\) Security 15). 49–64.
    [45]
    Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. Sourcerercc: Scaling code clone detection to big-code. In Proceedings of the 38th International Conference on Software Engineering. 1157–1168.
    [46]
    Yusuke Sasaki, Tetsuo Yamamoto, Yasuhiro Hayase, and Katsuro Inoue. 2010. Finding file clones in FreeBSD ports collection. In Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE, 102–105.
    [47]
    Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
    [48]
    Peiyuan Sun, Qiben Yan, Haoyi Zhou, and Jianxin Li. 2021. Osprey: A fast and accurate patch presence test framework for binaries. Computer Communications 173 (2021), 95–106.
    [49]
    Michael Sutton, Adam Greene, and Pedram Amini. 2007. Fuzzing: Brute Force Vulnerability Discovery. Pearson Education.
    [50]
    Sami Ullah and Heekuck Oh. 2021. BinDiff NN: Learning distributed representation of assembly for robust binary diffing against semantic differences. IEEE Transactions on Software Engineering 48, 9 (2021), 3442–3466.
    [51]
    Ilja Van Sprundel. 2005. Fuzzing: Breaking software in an automated fashion. Decmember 8th (2005).
    [52]
    Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. Jtrans: Jump-aware transformer for binary code similarity detection. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 1–13.
    [53]
    Yang Xiao, Bihuan Chen, Chendong Yu, Zhengzi Xu, Zimu Yuan, Feng Li, Binghong Liu, Yang Liu, Wei Huo, Wei Zou, and Wenchang Shi. 2020. MVP: Detecting vulnerabilities using patch-enhanced vulnerability signatures. In Proceedings of the 29th USENIX Security Symposium (USENIX Security 20). 1165–1182.
    [54]
    Yang Xiao, Zhengzi Xu, Weiwei Zhang, Chendong Yu, Longquan Liu, Wei Zou, Zimu Yuan, Yang Liu, Aihua Piao, and Wei Huo. 2021. VIVA: Binary level vulnerability identification via partial signature. In Proceedings of the 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 213–224.
    [55]
    Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural network-based graph embedding for cross-platform binary code similarity detection. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 363–376.
    [56]
    Yifei Xu, Zhengzi Xu, Bihuan Chen, Fu Song, Yang Liu, and Ting Liu. 2020. Patch based vulnerability matching for binary programs. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 376–387.
    [57]
    Zhengzi Xu, Bihuan Chen, Mahinthan Chandramohan, Y. Liu, and Fu Song. 2017. SPAIN: Security patch analysis for binaries towards understanding the pain and pills. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).462–472.
    [58]
    Zhengzi Xu, Yulong Zhang, Longri Zheng, Liangzhao Xia, Chenfu Bao, Zhi Wang, and Yang Liu. 2020. Automatic hot patch generation for android kernels. In Proceedings of the 29th \(\lbrace\) USENIX \(\rbrace\) Security Symposium ( \(\lbrace\) USENIX \(\rbrace\) Security 20). 2397–2414.
    [59]
    Yinxing Xue, Zhengzi Xu, Mahinthan Chandramohan, and Yang Liu. 2018. Accurate and scalable cross-architecture cross-os binary code search with emulation. IEEE Transactions on Software Engineering 45, 11 (2018), 1125–1149.
    [60]
    Fabian Yamaguchi, Felix Lindner, and Konrad Rieck. 2011. Vulnerability extrapolation: Assisted discovery of vulnerabilities using machine learning. In Proceedings of the 5th USENIX Conference on Offensive Technologies. 13–13.
    [61]
    Fabian Yamaguchi, Markus Lottmann, and Konrad Rieck. 2012. Generalized vulnerability extrapolation using abstract syntax trees. In Proceedings of the 28th Annual Computer Security Applications Conference. 359–368.
    [62]
    Shouguo Yang, Long Cheng, Yicheng Zeng, Zhe Lang, Hongsong Zhu, and Zhiqiang Shi. 2021. Asteria: Deep learning-based AST-encoding for cross-platform binary code similarity detection. In Proceedings of the 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 224–236.
    [63]
    Hang Zhang and Zhiyun Qian. 2018. Precise and accurate patch presence test for binaries. In Proceedings of the 27th \(\lbrace\) USENIX \(\rbrace\) Security Symposium ( \(\lbrace\) USENIX \(\rbrace\) Security 18). 887–902.
    [64]
    Lei Zhao, Yuncong Zhu, Jiang Ming, Yichen Zhang, Haotian Zhang, and Heng Yin. 2020. Patchscope: Memory object centric patch diffing. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 149–165.
    [65]
    Fei Zuo, Xiaopeng Li, Patrick Young, Lannan Luo, Qiang Zeng, and Zhexin Zhang. 2018. Neural machine translation inspired binary code similarity comparison beyond function pairs. In Proceedings of 26th Annual Network and Distributed System Security Symposium (NDSS’18).

    Index Terms

    1. Towards Practical Binary Code Similarity Detection: Vulnerability Verification via Patch Semantic Analysis

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Software Engineering and Methodology
      ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 6
      November 2023
      949 pages
      ISSN:1049-331X
      EISSN:1557-7392
      DOI:10.1145/3625557
      • Editor:
      • Mauro Pezzè
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 September 2023
      Online AM: 17 June 2023
      Accepted: 23 May 2023
      Revised: 22 April 2023
      Received: 24 July 2022
      Published in TOSEM Volume 32, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Patch detection
      2. vulnerability detection
      3. under constrained symbolic execution
      4. malicious function input

      Qualifiers

      • Research-article

      Funding Sources

      • National Key R&D Program of China
      • Strategic Priority Research Program of Chinese Academy of Sciences
      • Joint Fund Cultivation Project of National Natural Science Foundation of China
      • Science and Technology Project of State Grid Corporation of China
      • National Natural Science Foundation of China
      • Young Scientists Fund of the National Natural Science Foundation of China
      • Chinese National Natural Science Foundation

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 2,387
        Total Downloads
      • Downloads (Last 12 months)2,276
      • Downloads (Last 6 weeks)271

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media