Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3615924.3615947dlproceedingsArticle/Chapter ViewAbstractPublication PagescasconConference Proceedingsconference-collections
research-article

Towards Cross-Architecture Binary Code Vulnerability Detection

Published: 11 September 2023 Publication History

Abstract

Today’s Internet of Things (IoT) environments are heterogeneous as they are typically comprised of devices equipped with various CPU architectures and software platforms. Therefore, in defending IoT environments against security threats, the capability of crossarchitecture vulnerability detection is of paramount importance. In this paper, we propose BinX, a deep learning-based approach for code similarity detection in binaries that are obtained through different compilers and optimization levels for various architectures. Our research is guided by a key idea that involves leveraging the Ghidra decompiler to generate the decompiled C code and the high p-code intermediate representation and pre-train transformerbased model, specifically BERT and CodeBERT, to accurately generate semantic embeddings. These embeddings are then utilized as inputs to an RNN Siamese neural network, enhancing the learning process for code similarity detection. The effectiveness of our approach is demonstrated through several experiments and comparisons with existing methods. Our results showcase the potential of BinX in enabling cross-architecture vulnerability detection in cross-architecture cross-compiled binaries, contributing to the advancement of security in IoT environments.

References

[1]
2022. CVE Search. https://github.com/cve-search/cve-search.
[2]
2023. Ghidra, NSA. https://ghidra-sre.org/.
[3]
2023. US-CERT: The MITRE Corporation. https://cve.mitre.org/.
[4]
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1994. Signature verification using a “Siamese” time delay neural network. In Advances in neural information processing systems. 737–744.
[5]
Yaniv David, Nimrod Partush, and Eran Yahav. 2018. FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware. In ASPLOS. ACM, 392–404.
[6]
Nathan Burow Derrick McKee and Mathias Payer. 2019. Software ethology: An accurate and resilient semantic binary analysis framework. arXiv preprint arXiv:1906.02928 (2019).
[7]
Steven HH Ding, Benjamin CM Fung, and Philippe Charland. 2019. Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In (S&P). IEEE Computer Society, 38–55.
[8]
Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable Graph-based Bug Search for Firmware Images. In CCS. ACM.
[9]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, et al. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).
[10]
Cheng Fu, Huili Chen, Haolan Liu, Xinyun Chen, Yuandong Tian, Farinaz Koushanfar, and Jishen Zhao. 2019. Coda: An end-to-end neural program decom-piler. Advances in Neural Information Processing Systems 32 (2019).
[11]
Yeming Gu, Hui Shu, and Fan Hu. 2022. UniASM: Binary Code Similarity Detec-tion without Fine-tuning. arXiv preprint arXiv:2211.01144 (2022).
[12]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Transactions on Dependable and Secure Computing 19, 4 (2021), 2244–2258.
[13]
Ziyi Liang, T Tony Cai, Wenguang Sun, and Yin Xia. 2022. Locally Adaptive Transfer Learning Algorithms for Large-Scale Multiple Testing. arXiv preprint arXiv:2203.11461 (2022).
[14]
Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio, Mohamad Mansouri, and Davide Balzarotti. 2022. How machine learning is solv-ing the binary function similarity problem. In 31st USENIX Security Symposium (USENIX Security 22). 2099–2116.
[15]
Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. Safe: Self-attentive function embeddings for binary similarity. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA). Springer, 309–329.
[16]
Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. Trex: Learning execution semantics from micro-traces for binary similarity. arXiv preprint arXiv:2012.08680 (2020).
[17]
Xuan Zhou Yang Junfeng Jana Suman Pei, Kexin and Baishakhi Ray. 2020. Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity. ArXiv, /abs/2012.08680. (2020).
[18]
Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten Holz. 2015. Cross-architecture bug search in binary executables. In IEEE SP.
[19]
Abdullah Qasem, Paria Shirani, Mourad Debbabi, Lingyu Wang, Bernard Lebel, and Basile L Agba. 2021. Automatic vulnerability detection in embedded devices and firmware: Survey and layered taxonomies. ACM CSUR 54, 2 (2021), 1–42.
[20]
Paria Shirani, Leo Collard, Basile L Agba, Bernard Lebel, Mourad Debbabi, Lingyu Wang, and Aiman Hanna. 2018. BinARM: Scalable and efficient detection of vulnerabilities in firmware images of intelligent electronic devices. In DIMVA.
[21]
Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2017. BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape. In DIMVA. Springer, 301–324.
[22]
Yan Shoshitaishvili, Ruoyu Wang, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2015. Firmalice-Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware. In NDSS.
[23]
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, et al. 2016. SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In IEEE Symposium on Security and Privacy (SP). IEEE, 138–157.
[24]
Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In ASE. 87–98.
[25]
Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural network-based graph embedding for cross-platform binary code similarity detection. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 363–376.
[26]
Dapeng Yan, Kui Liu, Yuqing Niu, Li Li, Zhe Liu, Zhiming Liu, Jacques Klein, and Tegawendé F. Bissyandé. 2022. Crex: Predicting patch correctness in auto-mated repair of C programs through transfer learning of execution semantics. Information and Software Technology 152 (2022), 107043.
[27]
Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, and Shi Wu. 2020. Order matters: Semantic-aware neural networks for binary code similarity detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1145–1152.
[28]
Jonas Zaddach, Luca Bruno, Aurelien Francillon, and Davide Balzarotti. 2014. AVATAR: A Framework to Support Dynamic Security Analysis of Embedded Systems’ Firmwares. In NDSS.
[29]
Fei Zuo, Xiaopeng Li, Patrick Young, Lannan Luo, Qiang Zeng, and Zhexin Zhang. 2018. Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:1808.04706 (2018).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
CASCON '23: Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering
September 2023
251 pages

Publisher

IBM Corp.

United States

Publication History

Published: 11 September 2023

Author Tags

  1. Binary code analysis
  2. vulnerability detection
  3. machine learning.

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 82
    Total Downloads
  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)3
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media