Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3671016.3674806acmconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article
Open access

LateBA: Latent Backdoor Attack on Deep Bug Search via Infrequent Execution Codes

Published: 24 July 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Backdoor attacks can mislead deep bug search models by exploring model-sensitive assembly code, which can change alerts to benign results and cause buggy binaries to enter production environments. But assembly instructions have strict constraints and dependencies, and these additional model-sensitive assembly codes destroy semantics and syntax and are easily detected by dynamic analysis or context-based detection. To escape from the dynamic analysis-based detection, we propose a novel latent backdoor attack (LateBA) scheme based on the locality principle of program execution, which only poisons a few of infrequent execution codes, minimizing the effects on the original code logic. In LateBA, a progressive seed mutating strategy is designated to change the American Fuzzy Lop (AFL)-based path search tool to pay more attention to infrequent execution codes. With this strategy, the optimal range to positions in the whole program is determined. Subsequently, triggers are target model-sensitive assembly instructions, and try to minimize the variables that have been called in the context instructions in the trigger. Finally, we employ code semantic feature comparisons to select precise trigger injection positions within these ranges. The selection criteria of the trigger injection position is whether the corresponding code segments in this position have a data dependency relationship with other code segments. We evaluate the performance of LateBA over 7 deep bug search tasks. The results demonstrate the attack success rate of the proposed LateBA is considerable and competitive against the baselines.

    References

    [1]
    Roberto Baldoni, Emilio Coppa, Daniele Cono D’elia, Camil Demetrescu, and Irene Finocchi. 2018. A survey of symbolic execution techniques. ACM Computing Surveys (CSUR) 51, 3 (2018), 1–39.
    [2]
    Sen Chen, Minhui Xue, Lingling Fan, Shuang Hao, Lihua Xu, Haojin Zhu, and Bo Li. 2018. Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach. computers & security 73 (2018), 326–344.
    [3]
    Rudolf Ferenc, Dénes Bán, Tamás Grósz, and Tibor Gyimóthy. 2020. Deep learning in static, metric-based bug prediction. Array 6 (2020), 100021.
    [4]
    Patrice Godefroid. 2020. Fuzzing: Hack, art, and science. Commun. ACM 63, 2 (2020), 70–76.
    [5]
    Patrice Godefroid, Michael Y Levin, David A Molnar, 2008. Automated whitebox fuzz testing. In NDSS, Vol. 8. 151–166.
    [6]
    Bojan Kolosnjaji, Ambra Demontis, Battista Biggio, Davide Maiorca, Giorgio Giacinto, Claudia Eckert, and Fabio Roli. 2018. Adversarial malware binaries: Evading deep learning for malware detection in executables. In 2018 26th European signal processing conference (EUSIPCO). IEEE, 533–537.
    [7]
    Chaoran Li, Xiao Chen, Derui Wang, Sheng Wen, Muhammad Ejaz Ahmed, Seyit Camtepe, and Yang Xiang. 2021. Backdoor attack on machine learning based android malware detectors. IEEE Transactions on Dependable and Secure Computing 19, 5 (2021), 3357–3370.
    [8]
    Jia Li, Zhuo Li, HuangZhao Zhang, Ge Li, Zhi Jin, Xing Hu, and Xin Xia. 2023. Poison attack and poison detection on deep source code processing models. ACM Transactions on Software Engineering and Methodology (2023).
    [9]
    Yiming Li, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. 2022. Backdoor learning: A survey. IEEE Transactions on Neural Networks and Learning Systems (2022).
    [10]
    Yanzhou Li, Shangqing Liu, Kangjie Chen, Xiaofei Xie, Tianwei Zhang, and Yang Liu. 2023. Multi-target backdoor attacks for code pre-trained models. arXiv preprint arXiv:2306.08350 (2023).
    [11]
    Bin Liang, Pan Bian, Yan Zhang, Wenchang Shi, Wei You, and Yan Cai. 2016. AntMiner: mining more bugs by reducing noise interference. In Proceedings of the 38th International Conference on Software Engineering. 333–344.
    [12]
    Junyu Lin, Lei Xu, Yingqi Liu, and Xiangyu Zhang. 2020. Composite backdoor attack for deep neural network by mixing existing benign features. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 113–131.
    [13]
    Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2018. Trojaning attack on neural networks. In 25th Annual Network And Distributed System Security Symposium (NDSS 2018). Internet Soc.
    [14]
    Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. Safe: Self-attentive function embeddings for binary similarity. In Detection of Intrusions and Malware, and Vulnerability Assessment: 16th International Conference, DIMVA 2019, Gothenburg, Sweden, June 19–20, 2019, Proceedings 16. Springer, 309–329.
    [15]
    Barton P Miller, David Koski, Cjin Pheow Lee, Vivekandanda Maganty, Ravi Murthy, Ajitkumar Natarajan, and Jeff Steidl. 1995. Fuzz revisited: A re-examination of the reliability of UNIX utilities and services. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.
    [16]
    Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Finding likely errors with Bayesian specifications. arXiv preprint arXiv:1703.01370 (2017).
    [17]
    Minh Tu Nguyen, Viet Hung Nguyen, and Nathan Shone. 2024. Using deep graph learning to improve dynamic analysis-based malware detection in PE files. Journal of Computer Virology and Hacking Techniques 20, 1 (2024), 153–172.
    [18]
    Sravya Polisetty, Andriy Miranskyy, and Ayşe Başar. 2019. On usefulness of the deep-learning-based bug localization models to practitioners. In Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering. 16–25.
    [19]
    Michael Pradel and Koushik Sen. 2018. Deepbugs: A learning approach to name-based bug detection. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 1–25.
    [20]
    Ge Ren, Jun Wu, Gaolei Li, Shenghong Li, and Mohsen Guizani. 2022. Protecting intellectual property with reliable availability of learning models in ai-based cybersecurity services. IEEE Transactions on Dependable and Secure Computing (2022).
    [21]
    Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, and Yang Zhang. 2022. Dynamic backdoor attacks against machine learning models. In 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE, 703–718.
    [22]
    Kosta Serebryany. 2016. Continuous fuzzing with libfuzzer and addresssanitizer. In 2016 IEEE Cybersecurity Development (SecDev). IEEE, 157–157.
    [23]
    Giorgio Severi. 2022. Explanation guided backdoor implementation.Retrieved June 7, 2023 from https://github.com/ClonedOne/MalwareBackdoors
    [24]
    Giorgio Severi, Jim Meyer, Scott Coull, and Alina Oprea. 2021. { Explanation-Guided} backdoor poisoning attacks against malware classifiers. In 30th USENIX security symposium (USENIX security 21). 1487–1504.
    [25]
    S Sivapurnima and D Manjula. 2023. Adaptive Deep Learning Model for Software Bug Detection and Classification.Computer Systems Science & Engineering 45, 2 (2023).
    [26]
    Octavian Suciu, Radu Marginean, Yigitcan Kaya, Hal Daume III, and Tudor Dumitras. 2018. When does machine learning { FAIL} ? generalized transferability for evasion and poisoning attacks. In 27th USENIX Security Symposium (USENIX Security 18). 1299–1316.
    [27]
    Weisong Sun, Yuchen Chen, Guanhong Tao, Chunrong Fang, Xiangyu Zhang, Quanjun Zhang, and Bin Luo. 2023. Backdooring neural code search. arXiv preprint arXiv:2305.17506 (2023).
    [28]
    Guanhong Tao, Yingqi Liu, Guangyu Shen, Qiuling Xu, Shengwei An, Zhuo Zhang, and Xiangyu Zhang. 2022. Model orthogonalization: Class distance hardening in neural networks for better security. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 1372–1389.
    [29]
    Yao Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Dezhong Yao, Hai Jin, and Lichao Sun. 2022. You see what i want you to see: poisoning vulnerabilities in neural code search. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1233–1245.
    [30]
    Lun Wang, Zaynah Javed, Xian Wu, Wenbo Guo, Xinyu Xing, and Dawn Song. 2021. Backdoorl: Backdoor attack against competitive reinforcement learning. arXiv preprint arXiv:2105.00579 (2021).
    [31]
    Hao Wu, Hui Shu, Fei Kang, and Xiaobing Xiong. 2019. BiN: A two-level learning-based bug search for cross-architecture binary. IEEE Access 7 (2019), 169548–169564.
    [32]
    Geunseok Yang, Kyeongsic Min, and Byungjeong Lee. 2020. Applying deep learning algorithm to automatic bug localization and repair. In Proceedings of the 35th Annual ACM symposium on applied computing. 1634–1641.
    [33]
    Jia Yang, Cai Fu, Xiao-Yang Liu, Heng Yin, and Pan Zhou. 2021. Codee: A tensor embedding scheme for binary code search. IEEE Transactions on Software Engineering 48, 7 (2021), 2224–2244.
    [34]
    Limin Yang. 2022. Jigsaw Puzzle.Retrieved June 15, 2023 from https://whyisyoung.github.io/JigsawPuzzle
    [35]
    Limin Yang, Zhi Chen, Jacopo Cortellazzi, Feargus Pendlebury, Kevin Tu, Fabio Pierazzi, Lorenzo Cavallaro, and Gang Wang. 2023. Jigsaw puzzle: Selective backdoor attack to subvert malware classifiers. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 719–736.
    [36]
    Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, and Xu Sun. 2021. Rethinking stealthiness of backdoor attack against nlp models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 5543–5557.
    [37]
    Zhou Yang, Bowen Xu, Jie M Zhang, Hong Jin Kang, Jieke Shi, Junda He, and David Lo. 2024. Stealthy backdoor attack for code models. IEEE Transactions on Software Engineering (2024).
    [38]
    Xiaoyu Yi, Gaolei Li, Ao Ding, Yuqing Li, Zheng Yan, and Jianhua Li. 2023. AdvBinSD: Poisoning the Binary Code Similarity Detector via Isolated Instruction Sequences. IEEE SpaCCS-2023 (2023).
    [39]
    Xiaoyu Yi, Jun Wu, Gaolei Li, Ali Kashif Bashir, Jianhua Li, and Ahmad Ali AlZubi. 2022. Recurrent semantic learning-driven fast binary vulnerability detection in healthcare cyber physical systems. IEEE Transactions on Network Science and Engineering (2022).
    [40]
    Xinyang Zhang, Zheng Zhang, Shouling Ji, and Ting Wang. 2021. Trojaning language models for fun and profit. In 2021 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 179–197.
    [41]
    Zhuo Zhang, Guanhong Tao, Guangyu Shen, Shengwei An, Qiuling Xu, Yingqi Liu, Yapeng Ye, Yaoxuan Wu, and Xiangyu Zhang. 2023. { PELICAN} : Exploiting Backdoors of Naturally Trained Deep Learning Models In Binary Code Analysis. In 32nd USENIX Security Symposium (USENIX Security 23). 2365–2382.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    Internetware '24: Proceedings of the 15th Asia-Pacific Symposium on Internetware
    July 2024
    518 pages
    ISBN:9798400707056
    DOI:10.1145/3671016
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 July 2024

    Check for updates

    Author Tags

    1. American Fuzzy Lop
    2. Backdoor Attack
    3. Deep Bug Search
    4. Infrequent Execution Code

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    Internetware 2024
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 55 of 111 submissions, 50%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 11
      Total Downloads
    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 30 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media