Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Generating Transferable Adversarial Examples for Speech Classification

Published: 01 May 2023 Publication History

Highlights

Speech classification systems are vulnerable to adversarial attacks.
Transferability of audio adversarial examples is related to noise sensitivity.
A new transferable adversarial attack is developed based on noise injection.
Proposed method outperforms other adversarial attacks in terms of transferability.

Abstract

Despite the success of deep neural networks, the existence of adversarial attacks has revealed the vulnerability of neural networks in terms of security. Adversarial attacks add subtle noise to the original example, resulting in a false prediction. Although adversarial attacks have been mainly studied in the image domain, a recent line of research has discovered that speech classification systems are also exposed to adversarial attacks. By adding inaudible noise, an adversary can deceive speech classification systems and cause fatal issues in various applications, such as speaker identification and command recognition tasks. However, research on the transferability of audio adversarial examples is still limited. Thus, in this study, we first investigate the transferability of audio adversarial examples with different structures and conditions. Through extensive experiments, we discover that the transferability of audio adversarial examples is related to their noise sensitivity. Based on the analyses, we present a new adversarial attack called noise injected attack that generates highly transferable audio adversarial examples by injecting additive noise during the gradient ascent process. Our experimental results demonstrate that the proposed method outperforms other adversarial attacks in terms of transferability.

References

[1]
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, Int. Conf. Learn. Represent. (2014).
[2]
I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, Int. Conf. Learn. Represent. (2015).
[3]
B. Biggio, F. Roli, Wild patterns: ten years after the rise of adversarial machine learning, Pattern Recognit. 84 (2018) 317–331.
[4]
M. Alzantot, B. Balaji, M. Srivastava, Did you hear that? adversarial examples against automatic speech recognition, arXiv preprint arXiv:1801.00554 (2018).
[5]
N. Carlini, D. Wagner, Audio adversarial examples: Targeted attacks on speech-to-text, 2018 IEEE Security and Privacy Workshops (SPW), IEEE, 2018, pp. 1–7.
[6]
M. Cisse, Y. Adi, N. Neverova, J. Keshet, Houdini: fooling deep structured prediction models, Adv. Neural Inf. Process. Syst. 30 (2017).
[7]
Z. Yang, B. Li, P.-Y. Chen, D. Song, Characterizing audio adversarial examples using temporal dependency, Int. Conf. Learn. Represent. (2019).
[8]
H. Yakura, J. Sakuma, Robust audio adversarial example for a physical attack, IJCAI, 2019, pp. 5334–5341.
[9]
S. Schneider, A. Baevski, R. Collobert, M. Auli, wav2vec: Unsupervised pre-training for speech recognition, INTERSPEECH, ISCA, 2019, pp. 3465–3469.
[10]
A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, et al., Deep speech: scaling up end-to-end speech recognition, arXiv preprint arXiv:1412.5567 (2014).
[11]
Q. Wang, B. Zheng, Q. Li, C. Shen, Z. Ba, Towards query-efficient adversarial attacks against automatic speech recognition systems, IEEE Trans. Inf. Forensics Secur. 16 (2020) 896–908.
[12]
A. Baevski, Y. Zhou, A. Mohamed, M. Auli, Wav2vec 2.0: a framework for self-supervised learning of speech representations, Adv. Neural Inf. Process. Syst. 33 (2020).
[13]
D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen, et al., Deep speech 2: End-to-end speech recognition in english and mandarin, International conference on machine learning, 2016, pp. 173–182.
[14]
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, Towards deep learning models resistant to adversarial attacks, Int. Conf. Learn. Represent. (2018).
[15]
N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks, 2017 ieee symposium on security and privacy (sp), IEEE, 2017, pp. 39–57.
[16]
X. Yuan, Y. Chen, Y. Zhao, Y. Long, X. Liu, K. Chen, S. Zhang, H. Huang, X. Wang, C.A. Gunter, Commandersong: A systematic approach for practical adversarial voice recognition, 27th {USENIX} Security Symposium ({USENIX} Security 18), 2018, pp. 49–64.
[17]
F. Kreuk, Y. Adi, M. Cisse, J. Keshet, Fooling end-to-end speaker verification with adversarial examples, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2018, pp. 1962–1966.
[18]
D. Iter, J. Huang, M. Jermann, Generating adversarial examples for speech recognition, Stanford Technical Report (2017).
[19]
Y. Gong, C. Poellabauer, Crafting adversarial examples for speech paralinguistics applications, arXiv preprint arXiv:1711.03280 (2017).
[20]
R. Taori, A. Kamsetty, B. Chu, N. Vemuri, Targeted adversarial examples for black box audio systems, 2019 IEEE Security and Privacy Workshops (SPW), IEEE, 2019, pp. 15–20.
[21]
F. Tramèr, N. Papernot, I. Goodfellow, D. Boneh, P. McDaniel, The space of transferable adversarial examples, arXiv preprint arXiv:1704.03453 (2017).
[22]
W. Wu, Y. Su, X. Chen, S. Zhao, I. King, M.R. Lyu, Y.-W. Tai, Boosting the transferability of adversarial samples via attention, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1161–1170.
[23]
C. Xie, Z. Zhang, Y. Zhou, S. Bai, J. Wang, Z. Ren, A.L. Yuille, Improving transferability of adversarial examples with input diversity, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2730–2739.
[24]
Y. Dong, T. Pang, H. Su, J. Zhu, Evading defenses to transferable adversarial examples by translation-invariant attacks, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4312–4321.
[25]
J. Hang, K. Han, H. Chen, Y. Li, Ensemble adversarial black-box attacks against deep learning systems, Pattern Recognit. 101 (2020) 107184.
[26]
D. Li, J. Zhang, K. Huang, Universal adversarial perturbations against object detection, Pattern Recognit. 110 (2021) 107584.
[27]
P. Warden, Speech commands: a dataset for limited-vocabulary speech recognition, arXiv preprint arXiv:1804.03209 (2018).
[28]
P. Foggia, N. Petkov, A. Saggese, N. Strisciuglio, M. Vento, Reliable detection of audio events in highly noisy environments, Pattern Recognit. Lett. 65 (2015) 22–28.
[29]
A. Demontis, M. Melis, M. Pintor, M. Jagielski, B. Biggio, A. Oprea, C. Nita-Rotaru, F. Roli, Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks, 28th {USENIX} Security Symposium ({USENIX} Security 19), 2019, pp. 321–338.
[30]
W. Zhou, X. Hou, Y. Chen, M. Tang, X. Huang, X. Gan, Y. Yang, Transferable adversarial perturbations, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 452–467.
[31]
E. Wong, L. Rice, J.Z. Kolter, Fast is better than free: revisiting adversarial training, Int. Conf. Learn. Represent. (2020).
[32]
H. Kim, J. Park, J. Lee, Comment on transferability and input transformation with additive noise, arXiv preprint arXiv:2206.09075 (2022).
[33]
H. Kim, W. Lee, J. Lee, Understanding catastrophic overfitting in single-step adversarial training, AAAI, AAAI Press, 2021, pp. 8119–8127.
[34]
C. Veaux, J. Yamagishi, K. MacDonald, Cstr vctk corpus: english multi-speaker corpus for cstr voice cloning toolkit, University of Edinburgh (2017).
[35]
Z. Li, Y. Wu, J. Liu, Y. Chen, B. Yuan, Advpulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations, Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, pp. 1121–1134.
[36]
D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, S. Khudanpur, X-vectors: Robust dnn embeddings for speaker recognition, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2018, pp. 5329–5333.
[37]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems, 2019, pp. 8026–8037.
[38]
H. Kim, Torchattacks: a pytorch repository for adversarial attacks, arXiv preprint arXiv:2010.01950 (2020).
[39]
T.N. Sainath, C. Parada, Convolutional neural networks for small-footprint keyword spotting, INTERSPEECH, ISCA, 2017, pp. 1606–1610.
[40]
A. Graves, S. Fernández, F. Gomez, J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369–376.

Cited By

View all
  • (2024)Enhancing cross-domain transferability of black-box adversarial attacks on speaker recognition systems using linearized backpropagationPattern Analysis & Applications10.1007/s10044-024-01269-w27:2Online publication date: 13-May-2024
  • (2023)Fantastic robustness measuresProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668241(48793-48818)Online publication date: 10-Dec-2023
  • (2023)Differentially private sharpness-aware trainingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619540(27204-27224)Online publication date: 23-Jul-2023
  • Show More Cited By

Index Terms

  1. Generating Transferable Adversarial Examples for Speech Classification
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Pattern Recognition
        Pattern Recognition  Volume 137, Issue C
        May 2023
        874 pages

        Publisher

        Elsevier Science Inc.

        United States

        Publication History

        Published: 01 May 2023

        Author Tags

        1. Speech classification
        2. Adversarial attack
        3. Transferability

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 01 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Enhancing cross-domain transferability of black-box adversarial attacks on speaker recognition systems using linearized backpropagationPattern Analysis & Applications10.1007/s10044-024-01269-w27:2Online publication date: 13-May-2024
        • (2023)Fantastic robustness measuresProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668241(48793-48818)Online publication date: 10-Dec-2023
        • (2023)Differentially private sharpness-aware trainingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619540(27204-27224)Online publication date: 23-Jul-2023
        • (2023)Cross-modal and Cross-medium Adversarial Attack for AudioProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612475(444-453)Online publication date: 26-Oct-2023
        • (2023)Towards Minimising Perturbation Rate for Adversarial Machine Learning with PruningMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43412-9_9(147-163)Online publication date: 18-Sep-2023

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media