Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503161.3548261acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Opportunistic Backdoor Attacks: Exploring Human-imperceptible Vulnerabilities on Speech Recognition Systems

Published: 10 October 2022 Publication History

Abstract

Speech recognition systems, trained and updated based on large-scale audio data, are vulnerable to backdoor attacks that inject dedicated triggers in system training. The used triggers are generally human-inaudible audio, such as ultrasonic waves. However, we note that such a design is not feasible, as it can be easily filtered out via pre-processing. In this work, we propose the first audible backdoor attack paradigm for speech recognition, characterized by passively triggering and opportunistically invoking. Traditional device-synthetic triggers are replaced with ambient noise in daily scenarios. For adapting triggers to the application dynamics of speech interaction, we exploit the observed knowledge inherited from the context to a trained model and accommodate the injection and poisoning with certainty-based trigger selection, performance-oblivious sample binding, and trigger late-augmentation. Experiments on two datasets under various environments evaluate the proposal's effectiveness in maintaining a high benign rate and facilitating outstanding attack success rate (99.27%, ~4% higher than BadNets), robustness (bounded infectious triggers), feasibility in real-world scenarios. It requires less than 1% data to be poisoned and is demonstrated to be able to resist typical speech enhancement techniques and general countermeasures (e.g., dedicated fine-tuning). The code and data will be made available at https://github.com/lqsunshine/DABA.

Supplementary Material

MP4 File (MM22-fp2175.mp4)
This video describes our work (Opportunistic Backdoor Attacks: Exploring Human-imperceptible Vulnerabilities on Speech Recognition Systems) published in ACM Multimedia 2022, including background and motivation, methods (DABA), experimental evaluation, ablation studies and defense test.

References

[1]
Hojjat Aghakhani, Lea Schönherr, Thorsten Eisenhofer, Dorothea Kolossa, Thorsten Holz, Christopher Kruegel, and Giovanni Vigna. 2020. VenoMave: Targeted Poisoning Against Speech Recognition. arXiv preprint arXiv:2010.10682 (2020).
[2]
Jont B Allen and David A Berkley. 1979. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America 65, 4 (1979), 943--950.
[3]
Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. Hidden voice commands. In Proc. of USENIX Security Symposium. 513--530.
[4]
Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In Proc. of SPW. IEEE, 1--7.
[5]
Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).
[6]
Siyuan Cheng, Yingqi Liu, Shiqing Ma, and Xiangyu Zhang. 2020. Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification. arXiv preprint arXiv:2012.11212 (2020).
[7]
Daniel J Dubois, Roman Kolcun, Anna Maria Mandalari, Muhammad Talha Paracha, David Choffnes, and Hamed Haddadi. 2020. When speakers are all ears: Characterizing misactivations of iot smart speakers. Proceedings on Privacy Enhancing Technologies 2020, 4 (2020), 255--276.
[8]
Abd El-Fattah, A Marwa, Moawad I Dessouky, Alaa M Abbas, Salaheldin M Diab, El-Sayed M El-Rabaie, Waleed Al-Nuaimy, Saleh A Alshebeili, Abd El-samie, and E Fathi. 2014. Speech enhancement with an adaptive Wiener filter. International Journal of Speech Technology 17, 1 (2014), 53--64.
[9]
Yansong Gao, Change Xu, Derui Wang, Shiping Chen, Damith C Ranasinghe, and Surya Nepal. 2019. Strip: A defence against trojan attacks on deep neural networks. In Proc. of Annual Computer Security Applications Conference. 113--125.
[10]
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proc. of ICASSP. 6645--6649.
[11]
Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access 7 (2019), 47230--47244.
[12]
Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens Van Der Maaten. 2017. Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117 (2017).
[13]
Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2019. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977 (2019).
[14]
Stefanos Koffas, Jing Xu, Mauro Conti, and Stjepan Picek. 2021. Can You Hear It? Backdoor Attacks via Ultrasonic Triggers. arXiv preprint arXiv:2107.14569 (2021).
[15]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 (2012), 1097--1105.
[16]
Bittu Kumar. 2018. Comparative performance evaluation of MMSE-based speech enhancement techniques through simulation and real-time implementation. International Journal of Speech Technology 21, 4 (2018), 1033--1044.
[17]
Josephine Lau, Benjamin Zimmerman, and Florian Schaub. 2018. Alexa, are you listening? Privacy perceptions, concerns and privacy-seeking behaviors with smart speakers. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1--31.
[18]
Yuezun Li, Yiming Li, Baoyuan Wu, Longkang Li, Ran He, and Siwei Lyu. 2021. Invisible backdoor attack with sample-specific triggers. In Proc. of ICCV. 16463--16472.
[19]
Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In Proc. of International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 273--294.
[20]
Yingqi Liu, Wen-Chuan Lee, Guanhong Tao, Shiqing Ma, Yousra Aafer, and Xiangyu Zhang. 2019. Abs: Scanning neural networks for back-doors by artificial brain stimulation. In Proc. of the ACM SIGSAC Conference on Computer and Communications Security. 1265--1282.
[21]
Yingqi Liu, Shiqing Ma, Yousra Aafer,Wen-Chuan Lee, Juan Zhai,WeihangWang, and Xiangyu Zhang. 2017. Trojaning attack on neural networks. In Proc. of NDSS.
[22]
Yunfei Liu, Xingjun Ma, James Bailey, and Feng Lu. 2020. Reflection backdoor: A natural backdoor attack on deep neural networks. In Proc. of ECCV. Springer, 182--199.
[23]
Kuldip Paliwal, Kamil Wójcicki, and Belinda Schwerin. 2010. Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech communication 52, 5 (2010), 450--475.
[24]
Ben J Shannon and Kuldip K Paliwal. 2003. A comparative study of filter bank spacing for speech recognition. In Proc. of Microelectronic engineering research conference, Vol. 41. Citeseer, 310--12.
[25]
Sigurdur Sigurdsson, Kaare Brandt Petersen, and Tue Lehn-Schiøler. 2006. Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music. In Proc. of ISMIR. 286--289.
[26]
Andrew Varga and Herman JM Steeneken. 1993. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech communication 12, 3 (1993), 247--251.
[27]
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proc. of IEEE Symposium on Security and Privacy (SP). IEEE, 707--723.
[28]
Shiyao Wang, Minlie Huang, Zhidong Deng, et al. 2018. Densely connected CNN with multi-scale feature attention for text classification. In Proc. of IJCAI. 4468--4474.
[29]
Pete Warden. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).
[30]
Yong Xu, Jun Du, Li-Rong Dai, and Chin-Hui Lee. 2013. An experimental study on speech enhancement based on deep neural networks. IEEE Signal processing letters 21, 1 (2013), 65--68.
[31]
Hui Zeng, Tongqing Zhou, Xinyi Wu, and Zhiping Cai. 2022. Never Too Late: Tracing and Mitigating Backdoor Attacks in Federated Learning. In Proc. of the 41st International Symposium on Reliable Distributed Systems. 1--13.
[32]
Tongqing Zhai, Yiming Li, Ziqi Zhang, Baoyuan Wu, Yong Jiang, and Shu-Tao Xia. 2021. Backdoor attack against speaker verification. In Proc. of ICASSP. IEEE, 2560--2564.
[33]
Victor Zue, Stephanie Seneff, and James Glass. 1990. Speech database development at MIT: TIMIT and beyond. Speech communication 9, 4 (1990), 351--356.

Cited By

View all
  • (2025)Imperceptible rhythm backdoor attacks: Exploring rhythm transformation for embedding undetectable vulnerabilities on speech recognitionNeurocomputing10.1016/j.neucom.2024.128779614(128779)Online publication date: Jan-2025
  • (2024)Backdoor Attacks against Voice Recognition Systems: A SurveyACM Computing Surveys10.1145/370198557:3(1-35)Online publication date: 22-Nov-2024
  • (2024)EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional ProsodyProceedings of the 2024 Workshop on Artificial Intelligence and Security10.1145/3689932.3694773(137-148)Online publication date: 6-Nov-2024
  • Show More Cited By

Index Terms

  1. Opportunistic Backdoor Attacks: Exploring Human-imperceptible Vulnerabilities on Speech Recognition Systems

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '22: Proceedings of the 30th ACM International Conference on Multimedia
      October 2022
      7537 pages
      ISBN:9781450392037
      DOI:10.1145/3503161
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 October 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. AI security
      2. backdoor attacks
      3. speech recognition

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MM '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)133
      • Downloads (Last 6 weeks)23
      Reflects downloads up to 25 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Imperceptible rhythm backdoor attacks: Exploring rhythm transformation for embedding undetectable vulnerabilities on speech recognitionNeurocomputing10.1016/j.neucom.2024.128779614(128779)Online publication date: Jan-2025
      • (2024)Backdoor Attacks against Voice Recognition Systems: A SurveyACM Computing Surveys10.1145/370198557:3(1-35)Online publication date: 22-Nov-2024
      • (2024)EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional ProsodyProceedings of the 2024 Workshop on Artificial Intelligence and Security10.1145/3689932.3694773(137-148)Online publication date: 6-Nov-2024
      • (2024)Toward Stealthy Backdoor Attacks Against Speech Recognition via Elements of SoundIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.340488519(5852-5866)Online publication date: 2024
      • (2024)FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00148(1646-1664)Online publication date: 19-May-2024
      • (2024)SpeechGuard: Online Defense against Backdoor Attacks on Speech Recognition Models2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650300(1-8)Online publication date: 30-Jun-2024
      • (2024)Boosting Imperceptibility of Adversarial Attacks for Environmental Sound Classification2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI62512.2024.00116(790-797)Online publication date: 28-Oct-2024
      • (2024)Breaking Speaker Recognition with PaddingbackICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448169(4435-4439)Online publication date: 14-Apr-2024
      • (2024)Audio Steganography Based Backdoor Attack for Speech Recognition Software2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00161(1208-1217)Online publication date: 2-Jul-2024
      • (2024)SilentTrigPattern Recognition Letters10.1016/j.patrec.2023.12.002177:C(103-109)Online publication date: 12-Apr-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media