research-article

Towards Cross-Architecture Binary Code Vulnerability Detection

Authors:

Guy-Vincent JourdanAuthors Info & Claims

CASCON'23: Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering

Pages 191 - 196

Published: 11 September 2023 Publication History

Abstract

Today’s Internet of Things (IoT) environments are heterogeneous as they are typically comprised of devices equipped with various CPU architectures and software platforms. Therefore, in defending IoT environments against security threats, the capability of crossarchitecture vulnerability detection is of paramount importance. In this paper, we propose BinX, a deep learning-based approach for code similarity detection in binaries that are obtained through different compilers and optimization levels for various architectures. Our research is guided by a key idea that involves leveraging the Ghidra decompiler to generate the decompiled C code and the high p-code intermediate representation and pre-train transformerbased model, specifically BERT and CodeBERT, to accurately generate semantic embeddings. These embeddings are then utilized as inputs to an RNN Siamese neural network, enhancing the learning process for code similarity detection. The effectiveness of our approach is demonstrated through several experiments and comparisons with existing methods. Our results showcase the potential of BinX in enabling cross-architecture vulnerability detection in cross-architecture cross-compiled binaries, contributing to the advancement of security in IoT environments.

References

[1]

2022. CVE Search. https://github.com/cve-search/cve-search.

[2]

2023. Ghidra, NSA. https://ghidra-sre.org/.

[3]

2023. US-CERT: The MITRE Corporation. https://cve.mitre.org/.

[4]

Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1994. Signature verification using a “Siamese” time delay neural network. In Advances in neural information processing systems. 737–744.

Digital Library

[5]

Yaniv David, Nimrod Partush, and Eran Yahav. 2018. FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware. In ASPLOS. ACM, 392–404.

[6]

Nathan Burow Derrick McKee and Mathias Payer. 2019. Software ethology: An accurate and resilient semantic binary analysis framework. arXiv preprint arXiv:1906.02928 (2019).

[7]

Steven HH Ding, Benjamin CM Fung, and Philippe Charland. 2019. Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In (S&P). IEEE Computer Society, 38–55.

[8]

Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable Graph-based Bug Search for Firmware Images. In CCS. ACM.

[9]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, et al. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).

[10]

Cheng Fu, Huili Chen, Haolan Liu, Xinyun Chen, Yuandong Tian, Farinaz Koushanfar, and Jishen Zhao. 2019. Coda: An end-to-end neural program decom-piler. Advances in Neural Information Processing Systems 32 (2019).

[11]

Yeming Gu, Hui Shu, and Fan Hu. 2022. UniASM: Binary Code Similarity Detec-tion without Fine-tuning. arXiv preprint arXiv:2211.01144 (2022).

[12]

Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Transactions on Dependable and Secure Computing 19, 4 (2021), 2244–2258.

[13]

Ziyi Liang, T Tony Cai, Wenguang Sun, and Yin Xia. 2022. Locally Adaptive Transfer Learning Algorithms for Large-Scale Multiple Testing. arXiv preprint arXiv:2203.11461 (2022).

[14]

Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio, Mohamad Mansouri, and Davide Balzarotti. 2022. How machine learning is solv-ing the binary function similarity problem. In 31st USENIX Security Symposium (USENIX Security 22). 2099–2116.

[15]

Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. Safe: Self-attentive function embeddings for binary similarity. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA). Springer, 309–329.

[16]

Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. Trex: Learning execution semantics from micro-traces for binary similarity. arXiv preprint arXiv:2012.08680 (2020).

[17]

Xuan Zhou Yang Junfeng Jana Suman Pei, Kexin and Baishakhi Ray. 2020. Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity. ArXiv, /abs/2012.08680. (2020).

[18]

Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten Holz. 2015. Cross-architecture bug search in binary executables. In IEEE SP.

[19]

Abdullah Qasem, Paria Shirani, Mourad Debbabi, Lingyu Wang, Bernard Lebel, and Basile L Agba. 2021. Automatic vulnerability detection in embedded devices and firmware: Survey and layered taxonomies. ACM CSUR 54, 2 (2021), 1–42.

Digital Library

[20]

Paria Shirani, Leo Collard, Basile L Agba, Bernard Lebel, Mourad Debbabi, Lingyu Wang, and Aiman Hanna. 2018. BinARM: Scalable and efficient detection of vulnerabilities in firmware images of intelligent electronic devices. In DIMVA.

[21]

Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2017. BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape. In DIMVA. Springer, 301–324.

[22]

Yan Shoshitaishvili, Ruoyu Wang, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2015. Firmalice-Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware. In NDSS.

[23]

Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, et al. 2016. SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In IEEE Symposium on Security and Privacy (SP). IEEE, 138–157.

[24]

Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In ASE. 87–98.

[25]

Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural network-based graph embedding for cross-platform binary code similarity detection. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 363–376.

Digital Library

[26]

Dapeng Yan, Kui Liu, Yuqing Niu, Li Li, Zhe Liu, Zhiming Liu, Jacques Klein, and Tegawendé F. Bissyandé. 2022. Crex: Predicting patch correctness in auto-mated repair of C programs through transfer learning of execution semantics. Information and Software Technology 152 (2022), 107043.

[27]

Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, and Shi Wu. 2020. Order matters: Semantic-aware neural networks for binary code similarity detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1145–1152.

[28]

Jonas Zaddach, Luca Bruno, Aurelien Francillon, and Davide Balzarotti. 2014. AVATAR: A Framework to Support Dynamic Security Analysis of Embedded Systems’ Firmwares. In NDSS.

[29]

Fei Zuo, Xiaopeng Li, Patrick Young, Lannan Luo, Qiang Zeng, and Zhexin Zhang. 2018. Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:1808.04706 (2018).

Index Terms

Towards Cross-Architecture Binary Code Vulnerability Detection

Index terms have been assigned to the content through auto-classification.

Recommendations

Towards Practical Binary Code Similarity Detection: Vulnerability Verification via Patch Semantic Analysis
Vulnerability is a major threat to software security. It has been proven that binary code similarity detection approaches are efficient to search for recurring vulnerabilities introduced by code sharing in binary software. However, these approaches suffer ...
Learning-based Vulnerability Detection in Binary Code
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and Computing

Cyberattacks typically exploit software vulnerabilities to compromise computers and smart devices. To address vulnerabilities, many approaches have been developed to detect vulnerabilities using deep learning. However, most learning-based approaches ...
Recurrent Neural Network Based Binary Code Vulnerability Detection
ACAI '19: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence

Vulnerability detection, especially vulnerability automatic detection, have long been an area of active research. Considering that RNN can model dependencies automatically, we propose a recurrent neural network (RNN) based method for binary code ...

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

CASCON '23: Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering

September 2023

251 pages

Conference Chair:
Iosif-Viorel (Vio) Onut
IBM Canada Ltd.
,
Editors:
Paria Shirani
University of Ottawa
,
Iosif-Viorel (Vio) Onut
IBM Canada Ltd.
,
Program Co-chairs:
Iosif-Viorel (Vio) Onut
IBM Canada Ltd
,
Paula Branco
University of Ottawa

Publisher

IBM Corp.

United States

Publication History

Published: 11 September 2023

Author Tags

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
82
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)3

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents