research-article

Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning

Authors:

Yunheung PaekAuthors Info & Claims

ACSAC '22: Proceedings of the 38th Annual Computer Security Applications Conference

Pages 361 - 374

https://doi.org/10.1145/3564625.3567975

Published: 05 December 2022 Publication History

Abstract

Binary code similarity detection (BCSD) serves as a basis for a wide spectrum of applications, including software plagiarism, malware classification, and known vulnerability discovery. However, the inference of contextual meanings of a binary is challenging due to the absence of semantic information available in source codes. Recent advances leverage the benefits of a deep learning architecture into a better understanding of underlying code semantics and the advantages of the Siamese architecture into better BCSD.

In this paper, we propose BinShot, a BERT-based similarity learning architecture that is highly transferable for effective BCSD. We tackle the problem of detecting code similarity with one-shot learning (a special case of few-shot learning). To this end, we adopt a weighted distance vector with a binary cross entropy as a loss function on top of BERT. With the prototype of BinShot, our experimental results demonstrate the effectiveness, transferability, and practicality of BinShot, which is robust to detecting the similarity of previously unseen functions. We show that BinShot outperforms the previous state-of-the-art approaches for BCSD.

References

[1]

Aisha Ali-Gombe, Irfan Ahmed, and Golden G. Richard III. 2015. OpSeq: Android Malware Fingerprinting. In Proceedings of the 5th Program Protection and Reverse Engineering Workshop (PPREW). Los Angeles, CA.

[2]

Angr. 2016. Python Framework for Analyzing Binaries. https://angr.io/.

[3]

Martial Bourquin, Andy King, and Edward Robbins. 2013. BinSlayer: Accurate Comparison of Binary Executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop (PPREW). Rome.

Digital Library

[4]

Jane Bromley, Isabelle M Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1993. Signature Verification using a Siamese Time Delay Neural Network. In Proceedings of the 6th Conference on Neural Information Processing Systems (NIPS). Denver, CO.

Digital Library

[5]

Tom Brosch and Maik Morgenstern. 2006. Runtime Packers: The Hidden Problem. In Black Hat USA Briefings (Black Hat USA). Las Vegas, NV.

[6]

Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. 2006. Detecting Self-mutating Malware Using Control-Flow Graph Matching. In Proceedings of the 3rd Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA). Berlin.

Digital Library

[7]

BusyBox. 2022. The Swiss Army Knife of Embedded Linux. https://busybox.net.

[8]

Silvio Cesare, Yang Xiang, and Wanlei Zhou. 2013. Control Flow-Based Malware Variant Detection. IEEE Transactions on Dependable and Secure Computing (TDSC) 11, 4(2013), 307–317.

[9]

Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Tan Hee Beng Kuan. 2016. BinGo: Cross-Architecture Cross-OS Binary Search. In Proceedings of the 24th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE). Seattle, WA.

Digital Library

[10]

Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Tan Hee Beng Kuan. 2018. VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-Platform Binary. In Proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). Montpellier.

[11]

Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a Similarity Metric Discriminatively, with Application to Face Verification. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Washington, DC.

Digital Library

[12]

The MITRE Corporation. 2012. CVE-2012-2776 Detail. https://www.cve.org/CVERecord?id=CVE-2012-2776/.

[13]

The MITRE Corporation. 2014. CVE-2014-0160 Detail. https://www.cve.org/CVERecord?id=CVE-2014-0160/.

[14]

The MITRE Corporation. 2014. CVE-2014-0221 Detail. https://www.cve.org/CVERecord?id=CVE-2014-0221/.

[15]

The MITRE Corporation. 2014. CVE-2014-3508 Detail. https://www.cve.org/CVERecord?id=CVE-2014-3508/.

[16]

The MITRE Corporation. 2014. CVE-2014-9295 Detail. https://www.cve.org/CVERecord?id=CVE-2014-9295/.

[17]

The MITRE Corporation. 2015. CVE-2015-1791 Detail. https://www.cve.org/CVERecord?id=CVE-2015-1791/.

[18]

Curl. 2022. libcurl - the multiprotocol file transfer library. https://curl.se/libcurl.

[19]

Yaniv David, Nimrod Partush, and Eran Yahav. 2016. Statistical Similarity of Binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Santa Barbara, CA.

Digital Library

[20]

Yaniv David, Nimrod Partush, and Eran Yahav. 2017. Similarity of Binaries through re-Optimization. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Barcelona.

Digital Library

[21]

Yaniv David and Eran Yahav. 2014. Tracelet-Based Code Search in Executables. Acm Sigplan Notices 49, 6 (2014), 349–360.

Digital Library

[22]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the Association for Computational Linguistics (NAACL). Minneapolis, Minnesota.

[23]

Steven H. H. Dinga, Benjamin C. M. Fung, and Philippe Charland. 2019. Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. In Proceedings of the 40th IEEE Symposium on Security and Privacy (Oakland). San Francisco, CA.

[24]

Yue Duan, Xuezixiang Li, Jinghan Wang, and Heng Yin. 2020. DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing. In Proceedings of the 27th Annual Network and Distributed System Security Symposium (NDSS). San Diego, CA.

[25]

Thomas Dullien and Rolf Rolles. 2005. Graph-based comparison of Executable Objects (English Version). In Symposium sur la sécurité des technologies de l’information et des communications (SSTIC). Rennes.

[26]

Manuel Egele, Maverick Woo, Peter Chapman, and David Brumley. 2014. Blanket Execution: Dynamic Similarity Testing for Program Binaries and Components. In Proceedings of the 23rd USENIX Security Symposium. San Diego, CA.

Digital Library

[27]

Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code. In Proceedings of the 23rd Annual Network and Distributed System Security Symposium (NDSS). San Diego, CA.

[28]

Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable Graph-based Bug Search for Firmware Images. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS). Vienna.

Digital Library

[29]

Halvar Flake. 2004. Structural Comparison of Executable Objects. In Proceedings of the 1st Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA). Dortmund.

[30]

Debin Gao, Michael K Reiter, and Dawn Song. 2008. BinHunt: Automatically Finding Semantic Differences in Binary Programs. In Proceedings of the 10th International Conference on Information and Communications Security (ICICS). Birmingham.

Digital Library

[31]

GNU. 2022. Libbgmp: The GNU Multiple Precision Arithmetic Library. https://gmplib.org.

[32]

Google. 2020. Release of BERT Models. https://github.com/google-research/bert.

[33]

Irfan Ul Haq and Juan Caballero. 2021. A Survey of Binary Code Similarity. ACM Computing Surveys (CSUR) 54, 3 (2021), 1–38.

Digital Library

[34]

Hex-rays. 2019. IDAPython Documentation. https://www.hex-rays.com/products/ida/support/idapython_docs/.

[35]

Hex-Rays. 2022. IDA Pro Disassembler. https://www.hex-rays.com/products/ida/.

[36]

Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580(2012).

[37]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997), 1735–1780.

Digital Library

[38]

Xin Hu, Sandeep Bhatkar, Kent Griffin, and Kang G. Shin. 2013. MutantX-S: Scalable Malware Clustering Based on Static Features. In Proceedings of the 22th USENIX Security Symposium. Washington, DC.

[39]

Yikun Hu, Yuanyuan Zhang, Juanru Li, and Dawu Gu. 2016. Cross-architecture Binary Semantics Understanding via Similar Code Comparison. In Proceedings of the 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). Suita, Osaka.

[40]

Yikun Hu, Yuanyuan Zhang, Juanru Li, and Dawu Gu. 2017. Binary Code Clone Detection across Architectures and Compiling Configurations. In Proceedings of the 25th International Conference on Program Comprehension (ICPC). Buenos Aires.

Digital Library

[41]

huanghonggit. 2019. BERT MLM with Pytorch. https://github.com/huanghonggit/Mask-Language-Model.

[42]

ImageMagick. 2022. ImageMagick. https://imagemagick.org.

[43]

Jiyong Jang, Abeer Agrawal, and David Brumley. 2012. ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions. In Proceedings of the 33rd IEEE Symposium on Security and Privacy (Oakland). San Francisco, CA.

Digital Library

[44]

Chani Jindal, Christopher Salls, Hojjat Aghakhani, Keith Long, Christopher Kruegel, and Giovanni Vigna. 2019. Neurlux: Dynamic Malware Analysis Without Feature Engineering. In Proceedings of the 35th Annual Computer Security Applications Conference (ACSAC). San Juan.

Digital Library

[45]

Ulf Kargén and Nahid Shahmehri. 2017. Towards Robust Instruction-Level Trace Alignment of Binary Code. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). Urbana-Champaign, IL.

Digital Library

[46]

Dongkwan Kim, Eunsoo Kim, Sang Kil Cha, Sooel Son, and Yongdae Kim. 2022. Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned. IEEE Transactions on Software Engineering(2022), 1–23.

Digital Library

[47]

TaeGuen Kim, Yeo Reum Lee, BooJoong Kang, and Eul Gyu Im. 2019. Binary Executable File Similarity Calculation using Function Matching. The Journal of Supercomputing 75, 2 (2019), 607–622.

Digital Library

[48]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR). San Diego, CA.

[49]

Gregory Koch. 2015. Siamese Neural Networks for One-Shot Image Recognition. Ph. D. Dissertation. University of Toronto, Toronto. http://www.cs.toronto.edu/ gkoch/files/msc-thesis.pdf.

[50]

Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese Neural Networks for One-shot Image Recognition. In Proceedings of the 32nd International Conference on Machine Learning (ICML). Lille.

[51]

Hyungjoon Koo, Soyeon Park, Daejin Choi, and Taesoo Kim. 2021. Semantic-aware Binary Code Representation with BERT. arXiv preprint arXiv:2106.05478(2021).

[52]

Hyungjoon Koo, Soyeon Park, and Taesoo Kim. 2021. A Look Back on a Function Identification Problem. In Proceedings of the 37th Annual Computer Security Applications Conference (ACSAC). Virtual Event.

Digital Library

[53]

Christopher Kruegel, Engin Kirda, Darren Mutz, William Robertson, and Giovanni Vigna. 2005. Polymorphic Worm Detection Using Structural Information of Executables. In Proceedings of the 8th International Symposium on Research in Attacks, Intrusions and Defenses (RAID). Seattle, Washington.

[54]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In Proceedings of the 7th International Conference on Learning Representations (ICLR). New Orleans, LA.

[55]

Yeo Reum Lee, BooJoong Kang, and Eul Gyu Im. 2013. Function Matching-based Binary-Level Software Similarity Calculation. In Proceedings of the 2013 Research in Adaptive and Convergent Systems (RACS). Montreal.

Digital Library

[56]

Xuezixiang Li, Qu Yu, and Heng Yin. 2021. Offical Implementation for PalmTree. https://github.com/palmtreemodel/PalmTree.

[57]

Xuezixiang Li, Qu Yu, and Heng Yin. 2021. PalmTree: Learning an Assembly Language Model for Instruction Embedding. In Proceedings of the 28th ACM Conference on Computer and Communications Security (CCS). Virtual Event.

Digital Library

[58]

LibTom. 2022. LibTomCrypt. https://www.libtom.net/LibTomCrypt.

[59]

Marina Lindorfer, Alessandro Di Federico, Federico Maggi, Paolo Milani Comparetti, and Stefano Zanero. 2012. Lines of Malicious Code: Insights Into the Malicious Software Industry. In Proceedings of the 28th Annual Computer Security Applications Conference (ACSAC). Orlando, Florida.

Digital Library

[60]

Bingchang Liu, Wei Huo, Chao Zhang, Wenchao Li, Feng Li, Aihua Piao, and Wei Zou. 2018. αDiff: Cross-version Binary Code Similarity Detection with DNN. In Proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). Montpellier.

Digital Library

[61]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692(2019).

[62]

Ilya Loshchilov and Frank Hutter. 2017. Decoupled Weight Decay Regularization. arXiv preprint arXiv:1711.05101(2017).

[63]

Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. 2014. Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Applications to Software Plagiarism Detection. In Proceedings of the 22nd ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE). Hong Kong.

Digital Library

[64]

Zhengping Luo, Tao Hou, Xiangrong Zhou, Hui Zeng, and Zhuo Lu. 2021. Binary Code Similarity Detection through LSTM and Siamese Neural Network. EAI Endorsed Transactions on Security and Safety 8, 29 (2021), 1–10.

[65]

Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Leonardo Querzoni, and Roberto Baldoni. 2019. SAFE: Self-Attentive Function Embeddings for Binary Similarity. In Proceedings of the 16th Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA). Gothenburg.

[66]

Jiang Ming, Meng Pan, and Debin Gao. 2012. iBinHunt: Binary Hunting with Inter-procedural Control Flow. In Proceedings of the 15th International Conference on Information Security and Cryptology (ISISC). Seoul.

[67]

Jiang Ming, Dongpeng Xu, Yufei Jiang, and Dinghao Wu. 2017. BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking. In Proceedings of the 26th USENIX Security Symposium. Vancouver, Canada.

[68]

Abhilash Nandy, Sushovan Haldar, Subhashis Banerjee, and Sushmita Mitra. 2020. A Survey on Applications of Siamese Neural Networks in Computer Vision. In Proceedings of the 2020 International Conference for Emerging Technology (INCET). GOA.

[69]

Nginx. 2020. High Performance Load-balancer and Web Server. https://nginx.com.

[70]

National Security Agency (NSA). 2019. Software Reverse Engineering (SRE) Suite of Tools. https://ghidra-sre.org/.

[71]

OpenSSL. 2022. Cryptography and SSL/TLS Toolkit. https://www.openssl.org.

[72]

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning (ICML). Atlanta, GA.

Digital Library

[73]

Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity. arXiv preprint arXiv:2012.08680(2020).

[74]

Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten Holz. 2015. Cross-Architecture Bug Search in Binary Executables. In Proceedings of the 36th IEEE Symposium on Security and Privacy (Oakland). San Jose, CA.

Digital Library

[75]

PuTTygen. 2022. Download PuTTYgen - Putty key generator. https://www.puttygen.com.

[76]

PyTorch. 2019. Open Source Machine Learning Framework. https://pytorch.org/.

[77]

Radare2. 2019. Libre and Portable Reverse Engineering Framework. https://rada.re/n/.

[78]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. (2018). https://www.cs.ubc.ca/ amuham01/LING530/papers/radford2018improving.pdf.

[79]

Kimberly Redmond, Lannan Luo, and Qiang Zeng. 2019. A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis. In Proceedings of the 2nd Workshop on Binary Analysis Research (BAR). San Diego, CA.

[80]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A Unified Embedding for Face Recognition and Clustering. In Proceedings of the 2015 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA.

[81]

Paria Shirani, Leo Collard, Basile L. Agba, Bernard Lebel, Mourad Debbabi, Lingyu Wang, and Aiman Hanna. 2018. BinArm: Scalable and Efficient Detection of Vulnerabilities in Firmware Images of Intelligent Electronic Device. In Proceedings of the 15th Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA). Paris.

[82]

SQLite. 2022. SQLite. https://www.sqlite.org.

[83]

Alan Mathison Turing 1936. ON COMPUTABLE NUMBERS, WITH AN APPLICATION TO THE ENTSCHEIDUNGSPROBLEM. Journal of Math 58, 5 (1936), 345–363.

[84]

Sami Ullah and Heekuck Oh. 2022. BinDiffNN: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences. IEEE Transactions on Software Engineering 48, 9 (2022), 3442–3466.

[85]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of machine learning research 9, 11 (2008), 2579–2605.

Digital Library

[86]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS). Long Beach, CA.

[87]

vsftpd. 2022. Probably the Most Secure and Fastest FTP server. https://security.appspot.com/vsftpd.html.

[88]

Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. jTrans: Jump-Aware Transformer for Binary Code Similarity Detection. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). Virtual Event.

Digital Library

[89]

Shuai Wang and Dinghao Wu. 2017. In-Memory Fuzzing for Binary Code Similarity Analysis. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). Urbana-Champaign, IL.

Digital Library

[90]

Harald Welte. 2012. Current developments in GPL compliance. (2012). http://taipei.freedomhec.org/dlfile/gpl_compliance.pdf

[91]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2019. HuggingFaceś Transformers: State-of-the-art Natural Language Processing. arXiv preprint arXiv:1910.03771(2019).

[92]

Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection. In Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS). Dallas, TX.

Digital Library

[93]

Yifei Xu, Zhengzi Xu, Bihuan Chen, Fu Song, Yang Liu, and Ting Liu. 2020. Patch Based Vulnerability Matching for Binary Programs. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). Virtual Event.

Digital Library

[94]

Zhengzi Xu, Bihuan Chen, Mahinthan Chandramohan, Yang Liu, and Fu Song. 2017. SPAIN: Security Patch Analysis for Binaries towards Understanding the Pain and Pills. In Proceedings of the 39th International Conference on Software Engineering (ICSE). Buenos Aires.

Digital Library

[95]

Shouguo Yang, Long Cheng, Yicheng Zeng, Zhe Lang, Hongsong Zhu, and Zhiqiang Shi. 2021. Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection. In Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). Taipei.

[96]

Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, and Shi Wu. 2020. Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI). New York, NY.

[97]

Xiaochuan Zhang, Wenjie Sun, Jianmin Pang, Fudong Liu, and Zhen Ma. 2020. Similarity Metric Method for Binary Basic Blocks of Cross-Instruction Set Architecture. In Proceedings of the 3rd Workshop on Binary Analysis Research (BAR). San Diego, CA.

[98]

Zhaoqi Zhang, Panpan Qi, and Wei Wang. 2020. Dynamic Malware Analysis with Feature Engineering and Feature Learning. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI). New York, NY.

[99]

Zlib. 2022. A Massively Spiffy Yet Delicately Unobtrusive Compression Library. https://zlib.net.

[100]

Fei Zuo, Xiaopeng Li, Patrick Young, Lannan Luo, and Zhexin Zeng, Qiang andZhang. 2019. Neural Machine Translation Inspired Binary Code Similarity Comparison Beyond Function Pairs. In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS). San Diego, CA.

[101]

Zynamics. 2022. zynamics BinDiff. https://www.zynamics.com/bindiff.html.

Cited By

Feng YLi HCao YWang YFeng H(2024)CRABS-former: CRoss-Architecture Binary Code Similarity Detection based on TransformerProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671390(11-20)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671390
Jang HMurodova NKoo H(2024)ToolPhet: Inference of Compiler Provenance From Stripped Binaries With Emerging Compilation ToolchainsIEEE Access10.1109/ACCESS.2024.335509812(12667-12682)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3355098
Sun RGuo SGuo JLi WZhang XGuo XPan Z(2024)GraphMoCoNeurocomputing10.1016/j.neucom.2024.127273575:COnline publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127273
Show More Cited By

Index Terms

Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning
1. Security and privacy
  1. Software and application security
    1. Software reverse engineering
    2. Software security engineering

Recommendations

Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection
CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

The problem of cross-platform binary code similarity detection aims at detecting whether two binary functions coming from different platforms are similar or not. It has many security applications, including plagiarism detection, malware detection, ...
jTrans: jump-aware transformer for binary code similarity detection
ISSTA 2022: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Binary code similarity detection (BCSD) has important applications in various fields such as vulnerabilities detection, software component analysis, and reverse engineering. Recent studies have shown that deep neural networks (DNNs) can comprehend ...
A Survey of Binary Code Similarity

Binary code similarityapproaches compare two or more pieces of binary code to identify their similarities and differences. The ability to compare binary code enables many real-world applications on scenarios where source code may not be available such as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACSAC '22: Proceedings of the 38th Annual Computer Security Applications Conference

December 2022

1021 pages

ISBN:9781450397599

DOI:10.1145/3564625

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Evaluated & Functional / v1.1

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ACSAC

ACSAC: Annual Computer Security Applications Conference

December 5 - 9, 2022

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 104 of 497 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
1,209
Total Downloads

Downloads (Last 12 months)659
Downloads (Last 6 weeks)38

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Feng YLi HCao YWang YFeng H(2024)CRABS-former: CRoss-Architecture Binary Code Similarity Detection based on TransformerProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671390(11-20)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671390
Jang HMurodova NKoo H(2024)ToolPhet: Inference of Compiler Provenance From Stripped Binaries With Emerging Compilation ToolchainsIEEE Access10.1109/ACCESS.2024.335509812(12667-12682)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3355098
Sun RGuo SGuo JLi WZhang XGuo XPan Z(2024)GraphMoCoNeurocomputing10.1016/j.neucom.2024.127273575:COnline publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127273
Jiang XWang SGong YYu TLiu LYu X(2024)HAformer: Semantic fusion of hex machine code and assembly code for cross-architecture binary vulnerability detectionComputers & Security10.1016/j.cose.2024.104029145(104029)Online publication date: Oct-2024
https://doi.org/10.1016/j.cose.2024.104029
Corlătescu DDinu AGăman MSumedrea POh ANaumann TGloberson ASaenko KHardt MLevine S(2023)EMBERSimProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667283(26722-26743)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667283
Luo ZWang PXie WZhou XWang B(2023)IoTSim: Internet of Things-Oriented Binary Code Similarity Detection with Multiple Block RelationsSensors10.3390/s2318778923:18(7789)Online publication date: 11-Sep-2023
https://doi.org/10.3390/s23187789
Sun XWei QDu JWang Y(2023)HEBCS: A High-Efficiency Binary Code Search MethodElectronics10.3390/electronics1216346412:16(3464)Online publication date: 16-Aug-2023
https://doi.org/10.3390/electronics12163464
Luo ZWang PXie WZhou XWang B(2023)BlockMatch: A Fine-Grained Binary Code Similarity Detection Approach Using Contrastive Learning for Basic Block MatchingApplied Sciences10.3390/app13231275113:23(12751)Online publication date: 28-Nov-2023
https://doi.org/10.3390/app132312751
Yang SXu ZXiao YLang ZTang WLiu YShi ZLi HSun L(2023)Towards Practical Binary Code Similarity Detection: Vulnerability Verification via Patch Semantic AnalysisACM Transactions on Software Engineering and Methodology10.1145/360460832:6(1-29)Online publication date: 30-Sep-2023
https://dl.acm.org/doi/10.1145/3604608
Zuo FRhee JKim YOh JQian G(2023)A Comprehensive Dataset Towards Hands-on Experience Enhancement in a Research-Involved Cybersecurity ProgramProceedings of the 24th Annual Conference on Information Technology Education10.1145/3585059.3611416(118-124)Online publication date: 11-Oct-2023
https://dl.acm.org/doi/10.1145/3585059.3611416
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents