research-article

Open access

LateBA: Latent Backdoor Attack on Deep Bug Search via Infrequent Execution Codes

Authors:

Yuchen LiuAuthors Info & Claims

Internetware '24: Proceedings of the 15th Asia-Pacific Symposium on Internetware

Pages 427 - 436

https://doi.org/10.1145/3671016.3674806

Published: 24 July 2024 Publication History

All formats PDF

Abstract

Backdoor attacks can mislead deep bug search models by exploring model-sensitive assembly code, which can change alerts to benign results and cause buggy binaries to enter production environments. But assembly instructions have strict constraints and dependencies, and these additional model-sensitive assembly codes destroy semantics and syntax and are easily detected by dynamic analysis or context-based detection. To escape from the dynamic analysis-based detection, we propose a novel latent backdoor attack (LateBA) scheme based on the locality principle of program execution, which only poisons a few of infrequent execution codes, minimizing the effects on the original code logic. In LateBA, a progressive seed mutating strategy is designated to change the American Fuzzy Lop (AFL)-based path search tool to pay more attention to infrequent execution codes. With this strategy, the optimal range to positions in the whole program is determined. Subsequently, triggers are target model-sensitive assembly instructions, and try to minimize the variables that have been called in the context instructions in the trigger. Finally, we employ code semantic feature comparisons to select precise trigger injection positions within these ranges. The selection criteria of the trigger injection position is whether the corresponding code segments in this position have a data dependency relationship with other code segments. We evaluate the performance of LateBA over 7 deep bug search tasks. The results demonstrate the attack success rate of the proposed LateBA is considerable and competitive against the baselines.

References

[1]

Roberto Baldoni, Emilio Coppa, Daniele Cono D’elia, Camil Demetrescu, and Irene Finocchi. 2018. A survey of symbolic execution techniques. ACM Computing Surveys (CSUR) 51, 3 (2018), 1–39.

Digital Library

[2]

Sen Chen, Minhui Xue, Lingling Fan, Shuang Hao, Lihua Xu, Haojin Zhu, and Bo Li. 2018. Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach. computers & security 73 (2018), 326–344.

[3]

Rudolf Ferenc, Dénes Bán, Tamás Grósz, and Tibor Gyimóthy. 2020. Deep learning in static, metric-based bug prediction. Array 6 (2020), 100021.

[4]

Patrice Godefroid. 2020. Fuzzing: Hack, art, and science. Commun. ACM 63, 2 (2020), 70–76.

Digital Library

[5]

Patrice Godefroid, Michael Y Levin, David A Molnar, 2008. Automated whitebox fuzz testing. In NDSS, Vol. 8. 151–166.

[6]

Bojan Kolosnjaji, Ambra Demontis, Battista Biggio, Davide Maiorca, Giorgio Giacinto, Claudia Eckert, and Fabio Roli. 2018. Adversarial malware binaries: Evading deep learning for malware detection in executables. In 2018 26th European signal processing conference (EUSIPCO). IEEE, 533–537.

[7]

Chaoran Li, Xiao Chen, Derui Wang, Sheng Wen, Muhammad Ejaz Ahmed, Seyit Camtepe, and Yang Xiang. 2021. Backdoor attack on machine learning based android malware detectors. IEEE Transactions on Dependable and Secure Computing 19, 5 (2021), 3357–3370.

[8]

Jia Li, Zhuo Li, HuangZhao Zhang, Ge Li, Zhi Jin, Xing Hu, and Xin Xia. 2023. Poison attack and poison detection on deep source code processing models. ACM Transactions on Software Engineering and Methodology (2023).

[9]

Yiming Li, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. 2022. Backdoor learning: A survey. IEEE Transactions on Neural Networks and Learning Systems (2022).

[10]

Yanzhou Li, Shangqing Liu, Kangjie Chen, Xiaofei Xie, Tianwei Zhang, and Yang Liu. 2023. Multi-target backdoor attacks for code pre-trained models. arXiv preprint arXiv:2306.08350 (2023).

[11]

Bin Liang, Pan Bian, Yan Zhang, Wenchang Shi, Wei You, and Yan Cai. 2016. AntMiner: mining more bugs by reducing noise interference. In Proceedings of the 38th International Conference on Software Engineering. 333–344.

Digital Library

[12]

Junyu Lin, Lei Xu, Yingqi Liu, and Xiangyu Zhang. 2020. Composite backdoor attack for deep neural network by mixing existing benign features. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 113–131.

Digital Library

[13]

Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2018. Trojaning attack on neural networks. In 25th Annual Network And Distributed System Security Symposium (NDSS 2018). Internet Soc.

[14]

Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. Safe: Self-attentive function embeddings for binary similarity. In Detection of Intrusions and Malware, and Vulnerability Assessment: 16th International Conference, DIMVA 2019, Gothenburg, Sweden, June 19–20, 2019, Proceedings 16. Springer, 309–329.

[15]

Barton P Miller, David Koski, Cjin Pheow Lee, Vivekandanda Maganty, Ravi Murthy, Ajitkumar Natarajan, and Jeff Steidl. 1995. Fuzz revisited: A re-examination of the reliability of UNIX utilities and services. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.

[16]

Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Finding likely errors with Bayesian specifications. arXiv preprint arXiv:1703.01370 (2017).

[17]

Minh Tu Nguyen, Viet Hung Nguyen, and Nathan Shone. 2024. Using deep graph learning to improve dynamic analysis-based malware detection in PE files. Journal of Computer Virology and Hacking Techniques 20, 1 (2024), 153–172.

[18]

Sravya Polisetty, Andriy Miranskyy, and Ayşe Başar. 2019. On usefulness of the deep-learning-based bug localization models to practitioners. In Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering. 16–25.

Digital Library

[19]

Michael Pradel and Koushik Sen. 2018. Deepbugs: A learning approach to name-based bug detection. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 1–25.

Digital Library

[20]

Ge Ren, Jun Wu, Gaolei Li, Shenghong Li, and Mohsen Guizani. 2022. Protecting intellectual property with reliable availability of learning models in ai-based cybersecurity services. IEEE Transactions on Dependable and Secure Computing (2022).

Digital Library

[21]

Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, and Yang Zhang. 2022. Dynamic backdoor attacks against machine learning models. In 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE, 703–718.

[22]

Kosta Serebryany. 2016. Continuous fuzzing with libfuzzer and addresssanitizer. In 2016 IEEE Cybersecurity Development (SecDev). IEEE, 157–157.

[23]

Giorgio Severi. 2022. Explanation guided backdoor implementation.Retrieved June 7, 2023 from https://github.com/ClonedOne/MalwareBackdoors

[24]

Giorgio Severi, Jim Meyer, Scott Coull, and Alina Oprea. 2021. { Explanation-Guided} backdoor poisoning attacks against malware classifiers. In 30th USENIX security symposium (USENIX security 21). 1487–1504.

[25]

S Sivapurnima and D Manjula. 2023. Adaptive Deep Learning Model for Software Bug Detection and Classification.Computer Systems Science & Engineering 45, 2 (2023).

[26]

Octavian Suciu, Radu Marginean, Yigitcan Kaya, Hal Daume III, and Tudor Dumitras. 2018. When does machine learning { FAIL} ? generalized transferability for evasion and poisoning attacks. In 27th USENIX Security Symposium (USENIX Security 18). 1299–1316.

[27]

Weisong Sun, Yuchen Chen, Guanhong Tao, Chunrong Fang, Xiangyu Zhang, Quanjun Zhang, and Bin Luo. 2023. Backdooring neural code search. arXiv preprint arXiv:2305.17506 (2023).

[28]

Guanhong Tao, Yingqi Liu, Guangyu Shen, Qiuling Xu, Shengwei An, Zhuo Zhang, and Xiangyu Zhang. 2022. Model orthogonalization: Class distance hardening in neural networks for better security. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 1372–1389.

[29]

Yao Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Dezhong Yao, Hai Jin, and Lichao Sun. 2022. You see what i want you to see: poisoning vulnerabilities in neural code search. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1233–1245.

Digital Library

[30]

Lun Wang, Zaynah Javed, Xian Wu, Wenbo Guo, Xinyu Xing, and Dawn Song. 2021. Backdoorl: Backdoor attack against competitive reinforcement learning. arXiv preprint arXiv:2105.00579 (2021).

[31]

Hao Wu, Hui Shu, Fei Kang, and Xiaobing Xiong. 2019. BiN: A two-level learning-based bug search for cross-architecture binary. IEEE Access 7 (2019), 169548–169564.

[32]

Geunseok Yang, Kyeongsic Min, and Byungjeong Lee. 2020. Applying deep learning algorithm to automatic bug localization and repair. In Proceedings of the 35th Annual ACM symposium on applied computing. 1634–1641.

Digital Library

[33]

Jia Yang, Cai Fu, Xiao-Yang Liu, Heng Yin, and Pan Zhou. 2021. Codee: A tensor embedding scheme for binary code search. IEEE Transactions on Software Engineering 48, 7 (2021), 2224–2244.

[34]

Limin Yang. 2022. Jigsaw Puzzle.Retrieved June 15, 2023 from https://whyisyoung.github.io/JigsawPuzzle

[35]

Limin Yang, Zhi Chen, Jacopo Cortellazzi, Feargus Pendlebury, Kevin Tu, Fabio Pierazzi, Lorenzo Cavallaro, and Gang Wang. 2023. Jigsaw puzzle: Selective backdoor attack to subvert malware classifiers. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 719–736.

[36]

Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, and Xu Sun. 2021. Rethinking stealthiness of backdoor attack against nlp models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 5543–5557.

[37]

Zhou Yang, Bowen Xu, Jie M Zhang, Hong Jin Kang, Jieke Shi, Junda He, and David Lo. 2024. Stealthy backdoor attack for code models. IEEE Transactions on Software Engineering (2024).

Digital Library

[38]

Xiaoyu Yi, Gaolei Li, Ao Ding, Yuqing Li, Zheng Yan, and Jianhua Li. 2023. AdvBinSD: Poisoning the Binary Code Similarity Detector via Isolated Instruction Sequences. IEEE SpaCCS-2023 (2023).

[39]

Xiaoyu Yi, Jun Wu, Gaolei Li, Ali Kashif Bashir, Jianhua Li, and Ahmad Ali AlZubi. 2022. Recurrent semantic learning-driven fast binary vulnerability detection in healthcare cyber physical systems. IEEE Transactions on Network Science and Engineering (2022).

[40]

Xinyang Zhang, Zheng Zhang, Shouling Ji, and Ting Wang. 2021. Trojaning language models for fun and profit. In 2021 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 179–197.

[41]

Zhuo Zhang, Guanhong Tao, Guangyu Shen, Shengwei An, Qiuling Xu, Yingqi Liu, Yapeng Ye, Yaoxuan Wu, and Xiangyu Zhang. 2023. { PELICAN} : Exploiting Backdoors of Naturally Trained Deep Learning Models In Binary Code Analysis. In 32nd USENIX Security Symposium (USENIX Security 23). 2365–2382.

Index Terms

LateBA: Latent Backdoor Attack on Deep Bug Search via Infrequent Execution Codes
1. Security and privacy
2. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools

Index terms have been assigned to the content through auto-classification.

Recommendations

AdvDoor: adversarial backdoor attack of deep learning system
ISSTA 2021: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Deep Learning (DL) system has been widely used in many critical applications, such as autonomous vehicles and unmanned aerial vehicles. However, their security is threatened by backdoor attack, which is achieved by adding artificial patterns on specific ...
Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks
Computer Vision – ECCV 2020
Abstract
Recent studies have shown that DNNs can be compromised by backdoor attacks crafted at training time. A backdoor attack installs a backdoor into the victim model by injecting a backdoor pattern into a small proportion of the training data. At test ...
Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features
CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

With the prevalent use of Deep Neural Networks (DNNs) in many applications, security of these networks is of importance. Pre-trained DNNs may contain backdoors that are injected through poisoned training. These trojaned models perform well when regular ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

Internetware '24: Proceedings of the 15th Asia-Pacific Symposium on Internetware

July 2024

518 pages

ISBN:9798400707056

DOI:10.1145/3671016

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

Internetware 2024

Sponsor:

SIGSOFT

Internetware 2024: 15th Asia-Pacific Symposium on Internetware

July 24 - 26, 2024

Macau, China

Acceptance Rates

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
155
Total Downloads

Downloads (Last 12 months)155
Downloads (Last 6 weeks)34

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents