Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and Autoencoders

Published: 02 August 2022 Publication History

Abstract

Deep Neural Network (DNN) is gaining popularity thanks to its ability to attain high accuracy and performance in various security-crucial scenarios. However, recent research shows that DNN-based Automatic Speech Recognition (ASR) systems are vulnerable to adversarial attacks. Specifically, these attacks mainly focus on formulating a process of adversarial example generation as iterative, optimization-based attacks. Although these attacks make significant progress, they still take large generation time to produce adversarial examples, which makes them difficult to be launched in real-world scenarios. In this article, we propose a real-time attack framework that utilizes the neural network trained by the gradient approximation method to generate adversarial examples on Keyword Spotting (KWS) systems. The experimental results show that these generated adversarial examples can easily fool a black-box KWS system to output incorrect results with only one inference. In comparison to previous works, our attack can achieve a higher success rate with less than 0.004 s. We also extend our work by presenting a novel ensemble audio adversarial attack and testing the attack on KWS systems equipped with existing defense mechanisms. The efficacy of the proposed attack is well supported by promising experimental results.

References

[2]
Moustafa Alzantot, Bharathan Balaji, and Mani B. Srivastava. 2018. Did you hear that? Adversarial examples against automatic speech recognition. Retrieved from https://arxiv.org/abs/1801.00554.
[3]
Chakraborty Anirban, Alam Manaar, Dey Vishal, Chattopadhyay Anupam, and Mukhopadhyay Debdeep. 2018. Adversarial attacks and defences: A survey. Retrieved from https://arxiv.org/abs/1810.00069.
[4]
Battista Biggio and Fabio Roli. 2018. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recogn. 84 (2018), 317–331.
[5]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (S&P’17). 39–57.
[6]
Nicholas Carlini and David A. Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). 39–57.
[7]
Nicholas Carlini and David A. Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. Retrieved from https://arxiv.org/abs/1801.01944.
[8]
Kuei-Huan Chang, Po-Hao Huang, Honggang Yu, Yier Jin, and Ting-Chi Wang. 2020. Audio adversarial examples generation with recurrent neural networks. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’20). 488–493.
[9]
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017. ZOO: Zeroth-order optimization-based black-box attacks to deep neural networks without training substitute models. In Proceedings of the ACM Workshop on Artificial Intelligence and Security. 15–26.
[10]
Xiangyi Chen, Sijia Liu, Kaidi Xu, Xingguo Li, Xue Lin, Mingyi Hong, and David Cox. 2019. ZO-AdaMM: Zeroth-order adaptive momentum method for black-box optimization. In Proceedings of the Neural Information Processing Systems (NIPS’19).
[11]
Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Florian Tramer, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2018. Physical adversarial examples for object detectors. Retrieved from https://arxiv.org/abs/1807.07769.
[12]
Santiago Fernández, Alex Graves, and Jürgen Schmidhuber. 2007. An application of recurrent neural networks to discriminative keyword spotting. In Proceedings of the Internet Corporation for Assigned Names and Numbers (ICANN’07). 220–229.
[13]
M. A. Ganaie, Minghui Hu, M. Tanveer, and P. N. Suganthan. 2021. Ensemble deep learning: A review. Retrieved from https://arxiv.org/abs/2104.02395.
[14]
Yuan Gong, Boyang Li, Christian Poellabauer, and Yiyu Shi. 2019. Real-time adversarial attacks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’19). 4672–4680.
[15]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial networks. Retrieved from https://arxiv.org/abs/1406.2661.
[16]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. Retrieved from https://arxiv.org/abs/1412.6572.
[17]
Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep speech: Scaling up end-to-end speech recognition. Retrieved from https://arxiv.org/abs/1412.5567.
[18]
Yujie Ji, Xinyang Zhang, Shouling Ji, Xiapu Luo, and Ting Wang. 2018. Model-reuse attacks on deep learning systems. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’18). 349–363.
[19]
Jason Ku, Alex D. Pon, and Steven L. Waslander. 2019. Monocular 3D object detection leveraging accurate proposals and shape reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).
[20]
Yun Lei, Nicolas Scheffer, Luciana Ferrer, and Mitchell McLaren. 2014. A novel scheme for speaker recognition using a phonetically-aware deep neural network. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’14). 1695–1699.
[21]
Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al Dujaili, Minyi Hong, and Una-May O’Reilly. 2020. Min-max optimization without gradients: Convergence and applications to black-box evasion and poisoning attacks. In Proceedings of the International Conference on Machine Learning (ICML’20).
[22]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18).
[23]
GPreetum Nakkiran, Raziel Alvarez, Rohit Prabhavalkar, and Carolina Parada. 2015. Compressing deep neural networks using a rank-constrained topology. In Proceedings of the International Speech Communication Association (INTERSPEECH’15). 1473–1477.
[24]
Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P’16). IEEE, 372–387.
[25]
Yao Qin, Nicholas Carlini, Ian Goodfellow, Garrison Cottrell, and Colin Raffel. 2019. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In Proceedings of the International Conference on Machine Learning (ICML’19).
[26]
Krishan Rajaratnam, Kunal Shah, and Jugal Kalita. 2018. Isolated and ensemble audio preprocessing methods for detecting adversarial examples against automatic speech recognition. In Proceedings of the Conference on Computational Linguistics and Speech Processing (ROCLING’18).
[27]
Richard C. Rose and Douglas B. Paul. 1990. A hidden Markov model-based keyword recognition system. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’90). 129–132.
[28]
Vinod Subramanian, Emmanouil Benetos, Ning Xu, SKoT McDonald, and Mark Sandler. 2019. Adversarial attacks in sound event classification. Retrieved from https://arxiv.org/abs/1907.02477.
[29]
Rohan Taori, Amog Kamsetty, Brenton Chu, and Nikita Vemuri. 2018. Targeted adversarial examples for black box audio systems. Retrieved from https://arxiv.org/abs/1805.07820.
[30]
C. Teacher, H. Kellett, and L. Focht. 1967. Experimental, limited vocabulary, speech recognizer. IEEE Trans. Audio Electroacoust. 15 (1967), 127–130.
[31]
Peter Teufl, Udo Payer, and Guenter Lackner. 2010. From NLP (natural language processing) to MLP (machine language processing). In Computer Network Security, Igor Kotenko and Victor Skormin (Eds.). Springer, Berlin, 256–269.
[32]
Florian Tramer, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. 2020. On adaptive attacks to adversarial example defenses. Retrieved from https://arxiv.org/abs/2002.08347.
[33]
Chun-Chen Tu, Paishun Ting, Pin-Yu Chen, Sijia Liu, Huan Zhang, Jinfeng Yi, Cho-Jui Hsieh, and Shin-Ming Cheng. 2019. AutoZOOM: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’19).
[34]
George Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, and Shiv Vitaladevuni. 2016. Model compression applied to small-footprint keyword spotting. In Proceedings of the International Speech Communication Association (INTERSPEECH’16). 1878–1882.
[35]
Jon Vadillo and Roberto Santana. 2019. Universal adversarial examples in speech command classification. Retrieved from https://arxiv.org/abs/1911.10182.
[36]
Jay Wilpon, Lawrence Rabiner, Chin-Hui Lee, and E. R. Goldman. 1990. Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Trans. Audio Electroacoust. 38 (1990), 1870–1878.
[37]
Hiromu Yakura and Jun Sakuma. 2018. Robust audio adversarial example for a physical attack. Retrieved from https://arxiv.org/abs/1810.11793.
[38]
Jiancheng Yang, Qiang Zhang, Rongyao Fang, Bingbing Ni, Jinxian Liu, and Qi Tian. 2019. Adversarial attack and defense on point sets. Retrieved from https://arxiv.org/abs/1902.10899.
[39]
Zhuolin Yang, Bo Li, Pin-Yu Chen, and Dawn Song. 2018. Toward mitigating audio adversarial perturbations. In Proceedings of the International Conference on Learning Representations (ICLR’18).
[40]
Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello edge: Keyword spotting on microcontrollers. Retrieved from https://arxiv.org/abs/1711.07128.

Cited By

View all
  • (2024)Adversarial Attacks on Automatic Speech Recognition (ASR): A SurveyIEEE Access10.1109/ACCESS.2024.341696512(88279-88302)Online publication date: 2024
  • (2023)Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in NoiseIEEE Open Journal of Signal Processing10.1109/OJSP.2023.32563214(179-187)Online publication date: 2023

Index Terms

  1. Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and Autoencoders

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal on Emerging Technologies in Computing Systems
    ACM Journal on Emerging Technologies in Computing Systems  Volume 18, Issue 3
    July 2022
    428 pages
    ISSN:1550-4832
    EISSN:1550-4840
    DOI:10.1145/3508463
    • Editor:
    • Ramesh Karri
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 02 August 2022
    Online AM: 25 March 2022
    Accepted: 01 August 2021
    Revised: 01 July 2021
    Received: 01 December 2020
    Published in JETC Volume 18, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Adversarial examples
    2. deep neural network
    3. automatic speech recognition

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)75
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Adversarial Attacks on Automatic Speech Recognition (ASR): A SurveyIEEE Access10.1109/ACCESS.2024.341696512(88279-88302)Online publication date: 2024
    • (2023)Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in NoiseIEEE Open Journal of Signal Processing10.1109/OJSP.2023.32563214(179-187)Online publication date: 2023

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media