research-article

Generating Transferable Adversarial Examples for Speech Classification

Authors:

Jaewook LeeAuthors Info & Claims

Volume 137, Issue C

https://doi.org/10.1016/j.patcog.2022.109286

Published: 01 May 2023 Publication History

Highlights

•

Speech classification systems are vulnerable to adversarial attacks.

•

Transferability of audio adversarial examples is related to noise sensitivity.

•

A new transferable adversarial attack is developed based on noise injection.

•

Proposed method outperforms other adversarial attacks in terms of transferability.

Abstract

Despite the success of deep neural networks, the existence of adversarial attacks has revealed the vulnerability of neural networks in terms of security. Adversarial attacks add subtle noise to the original example, resulting in a false prediction. Although adversarial attacks have been mainly studied in the image domain, a recent line of research has discovered that speech classification systems are also exposed to adversarial attacks. By adding inaudible noise, an adversary can deceive speech classification systems and cause fatal issues in various applications, such as speaker identification and command recognition tasks. However, research on the transferability of audio adversarial examples is still limited. Thus, in this study, we first investigate the transferability of audio adversarial examples with different structures and conditions. Through extensive experiments, we discover that the transferability of audio adversarial examples is related to their noise sensitivity. Based on the analyses, we present a new adversarial attack called noise injected attack that generates highly transferable audio adversarial examples by injecting additive noise during the gradient ascent process. Our experimental results demonstrate that the proposed method outperforms other adversarial attacks in terms of transferability.

References

[1]

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, Int. Conf. Learn. Represent. (2014).

[2]

I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, Int. Conf. Learn. Represent. (2015).

[3]

B. Biggio, F. Roli, Wild patterns: ten years after the rise of adversarial machine learning, Pattern Recognit. 84 (2018) 317–331.

Digital Library

[4]

M. Alzantot, B. Balaji, M. Srivastava, Did you hear that? adversarial examples against automatic speech recognition, arXiv preprint arXiv:1801.00554 (2018).

[5]

N. Carlini, D. Wagner, Audio adversarial examples: Targeted attacks on speech-to-text, 2018 IEEE Security and Privacy Workshops (SPW), IEEE, 2018, pp. 1–7.

[6]

M. Cisse, Y. Adi, N. Neverova, J. Keshet, Houdini: fooling deep structured prediction models, Adv. Neural Inf. Process. Syst. 30 (2017).

[7]

Z. Yang, B. Li, P.-Y. Chen, D. Song, Characterizing audio adversarial examples using temporal dependency, Int. Conf. Learn. Represent. (2019).

[8]

H. Yakura, J. Sakuma, Robust audio adversarial example for a physical attack, IJCAI, 2019, pp. 5334–5341.

[9]

S. Schneider, A. Baevski, R. Collobert, M. Auli, wav2vec: Unsupervised pre-training for speech recognition, INTERSPEECH, ISCA, 2019, pp. 3465–3469.

[10]

A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, et al., Deep speech: scaling up end-to-end speech recognition, arXiv preprint arXiv:1412.5567 (2014).

[11]

Q. Wang, B. Zheng, Q. Li, C. Shen, Z. Ba, Towards query-efficient adversarial attacks against automatic speech recognition systems, IEEE Trans. Inf. Forensics Secur. 16 (2020) 896–908.

[12]

A. Baevski, Y. Zhou, A. Mohamed, M. Auli, Wav2vec 2.0: a framework for self-supervised learning of speech representations, Adv. Neural Inf. Process. Syst. 33 (2020).

[13]

D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen, et al., Deep speech 2: End-to-end speech recognition in english and mandarin, International conference on machine learning, 2016, pp. 173–182.

[14]

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, Towards deep learning models resistant to adversarial attacks, Int. Conf. Learn. Represent. (2018).

[15]

N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks, 2017 ieee symposium on security and privacy (sp), IEEE, 2017, pp. 39–57.

[16]

X. Yuan, Y. Chen, Y. Zhao, Y. Long, X. Liu, K. Chen, S. Zhang, H. Huang, X. Wang, C.A. Gunter, Commandersong: A systematic approach for practical adversarial voice recognition, 27th {USENIX} Security Symposium ({USENIX} Security 18), 2018, pp. 49–64.

[17]

F. Kreuk, Y. Adi, M. Cisse, J. Keshet, Fooling end-to-end speaker verification with adversarial examples, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2018, pp. 1962–1966.

Digital Library

[18]

D. Iter, J. Huang, M. Jermann, Generating adversarial examples for speech recognition, Stanford Technical Report (2017).

[19]

Y. Gong, C. Poellabauer, Crafting adversarial examples for speech paralinguistics applications, arXiv preprint arXiv:1711.03280 (2017).

[20]

R. Taori, A. Kamsetty, B. Chu, N. Vemuri, Targeted adversarial examples for black box audio systems, 2019 IEEE Security and Privacy Workshops (SPW), IEEE, 2019, pp. 15–20.

[21]

F. Tramèr, N. Papernot, I. Goodfellow, D. Boneh, P. McDaniel, The space of transferable adversarial examples, arXiv preprint arXiv:1704.03453 (2017).

[22]

W. Wu, Y. Su, X. Chen, S. Zhao, I. King, M.R. Lyu, Y.-W. Tai, Boosting the transferability of adversarial samples via attention, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1161–1170.

[23]

C. Xie, Z. Zhang, Y. Zhou, S. Bai, J. Wang, Z. Ren, A.L. Yuille, Improving transferability of adversarial examples with input diversity, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2730–2739.

[24]

Y. Dong, T. Pang, H. Su, J. Zhu, Evading defenses to transferable adversarial examples by translation-invariant attacks, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4312–4321.

[25]

J. Hang, K. Han, H. Chen, Y. Li, Ensemble adversarial black-box attacks against deep learning systems, Pattern Recognit. 101 (2020) 107184.

Digital Library

[26]

D. Li, J. Zhang, K. Huang, Universal adversarial perturbations against object detection, Pattern Recognit. 110 (2021) 107584.

[27]

P. Warden, Speech commands: a dataset for limited-vocabulary speech recognition, arXiv preprint arXiv:1804.03209 (2018).

[28]

P. Foggia, N. Petkov, A. Saggese, N. Strisciuglio, M. Vento, Reliable detection of audio events in highly noisy environments, Pattern Recognit. Lett. 65 (2015) 22–28.

Digital Library

[29]

A. Demontis, M. Melis, M. Pintor, M. Jagielski, B. Biggio, A. Oprea, C. Nita-Rotaru, F. Roli, Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks, 28th {USENIX} Security Symposium ({USENIX} Security 19), 2019, pp. 321–338.

[30]

W. Zhou, X. Hou, Y. Chen, M. Tang, X. Huang, X. Gan, Y. Yang, Transferable adversarial perturbations, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 452–467.

[31]

E. Wong, L. Rice, J.Z. Kolter, Fast is better than free: revisiting adversarial training, Int. Conf. Learn. Represent. (2020).

[32]

H. Kim, J. Park, J. Lee, Comment on transferability and input transformation with additive noise, arXiv preprint arXiv:2206.09075 (2022).

[33]

H. Kim, W. Lee, J. Lee, Understanding catastrophic overfitting in single-step adversarial training, AAAI, AAAI Press, 2021, pp. 8119–8127.

[34]

C. Veaux, J. Yamagishi, K. MacDonald, Cstr vctk corpus: english multi-speaker corpus for cstr voice cloning toolkit, University of Edinburgh (2017).

[35]

Z. Li, Y. Wu, J. Liu, Y. Chen, B. Yuan, Advpulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations, Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, pp. 1121–1134.

[36]

D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, S. Khudanpur, X-vectors: Robust dnn embeddings for speaker recognition, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2018, pp. 5329–5333.

[37]

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems, 2019, pp. 8026–8037.

[38]

H. Kim, Torchattacks: a pytorch repository for adversarial attacks, arXiv preprint arXiv:2010.01950 (2020).

[39]

T.N. Sainath, C. Parada, Convolutional neural networks for small-footprint keyword spotting, INTERSPEECH, ISCA, 2017, pp. 1606–1610.

[40]

A. Graves, S. Fernández, F. Gomez, J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369–376.

Cited By

Kumar RGhosh R(2025)Bidirectional Legendre memory unit: bidirectional memory for person authentication combining voice and online signatureNeural Computing and Applications10.1007/s00521-024-10717-x37:3(1541-1563)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s00521-024-10717-x
Patel UBhilare SHati A(2024)Enhancing cross-domain transferability of black-box adversarial attacks on speaker recognition systems using linearized backpropagationPattern Analysis & Applications10.1007/s10044-024-01269-w27:2Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1007/s10044-024-01269-w
Kim HPark JChoi YLee JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Fantastic robustness measuresProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668241(48793-48818)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668241
Show More Cited By

Index terms have been assigned to the content through auto-classification.

Recommendations

AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations
CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

Existing efforts in audio adversarial attacks only focus on the scenarios where an adversary has prior knowledge of the entire speech input so as to generate an adversarial example by aligning and mixing the audio input with corresponding adversarial ...
Black-box Adversarial Attacks on Commercial Speech Platforms with Minimal Information
CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security

Adversarial attacks against commercial black-box speech platforms, including cloud speech APIs and voice control devices, have received little attention until recent years. Constructing such attacks is difficult mainly due to the unique characteristics ...
Generating Transferable Adversarial Examples against Vision Transformers
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Vision transformers (ViTs) are prevailing among several visual recognition tasks, therefore drawing intensive interest in generating adversarial examples against them. Different from CNNs, ViTs enjoy unique architectures, e.g., self-attention and image-...

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition

Pattern Recognition Volume 137, Issue C

May 2023

874 pages

ISSN:0031-3203

Issue’s Table of Contents

Copyright © 2022.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 May 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kumar RGhosh R(2025)Bidirectional Legendre memory unit: bidirectional memory for person authentication combining voice and online signatureNeural Computing and Applications10.1007/s00521-024-10717-x37:3(1541-1563)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s00521-024-10717-x
Patel UBhilare SHati A(2024)Enhancing cross-domain transferability of black-box adversarial attacks on speaker recognition systems using linearized backpropagationPattern Analysis & Applications10.1007/s10044-024-01269-w27:2Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1007/s10044-024-01269-w
Kim HPark JChoi YLee JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Fantastic robustness measuresProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668241(48793-48818)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668241
Park JKim HChoi YLee JKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Differentially private sharpness-aware trainingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619540(27204-27224)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619540
Zhang LTian ZLong YLi SYin GEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Cross-modal and Cross-medium Adversarial Attack for AudioProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612475(444-453)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612475
Zhu ZZhang JJin ZWang XXue MShen JChoo KChen H(2023)Towards Minimising Perturbation Rate for Adversarial Machine Learning with PruningMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43412-9_9(147-163)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43412-9_9

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents