Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ASP-DAC47756.2020.9045597acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

Audio Adversarial Examples Generation with Recurrent Neural Networks

Published: 13 January 2020 Publication History

Abstract

Previous methods of performing adversarial attacks against speech recognition systems often treat this problem as a solely optimization problem and require iterative updates to generate optimal solutions. Although they can achieve high success rate, the process is too computational heavy even with the help of GPU. In this paper, we introduce a new type of real-time adversarial attack methodology, which applies Recurrent Neural Networks (RNN) with a two-step training process to generate adversarial examples targeting a Keyword Spotting (KWS) system. We extend our attack to physical world by adding extra constraints in order to eliminate the distortions in real world. In the experiment, we launch a real-time adversarial attack on the KWS system both in digital and physical world. The experimental results of digital world show that the execution time of our attack is more than 400 times faster than the state-of-the-art attack (i.e., C&W attack) with the comparable attack success rate. In physical world, after adding extra constraints, the perturbation becomes more robust such that the average attack success rate increases from 40.3% to 84.3%.

References

[1]
Y. Lei, N. Scheffer, L. Ferrer, and M. McLaren, "A novel scheme for speaker recognition using a phonetically-aware deep neural network," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 1695--1699.
[2]
S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: towards real-time object detection with region proposal networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137--1149, June 2017.
[3]
L. Logeswaran and H. Lee, "An efficient framework for learning sentence representations," CoRR, vol. abs/1803.02893, 2018.
[4]
T. Afouras, J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, "Deep audio-visual speech recognition," CoRR, vol. abs/1809.02108, 2018.
[5]
C. Chiu, T. N. Sainath, Y. Wu, and R. Prabhavalkar, "State-of-the-art speech recognition with sequence-to-sequence models," CoRR, vol. abs/1712.01769, 2018.
[6]
Y. Chung, W. Weng, S. Tong, and J. Glass, "Towards unsupervised speech-to-text translation," CoRR, vol. abs/1811.01307, 2018.
[7]
Z. Zhao, P. Zheng, S. Xu, and X. Wu, "Object detection with deep learning: A review," CoRR, vol. abs/1807.05511, 2019.
[8]
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, "Intriguing properties of neural networks," in Proc. International Conference on Learning Representations (ICLR), 2014.
[9]
I. J. Goodfellow, J. Shlens, and C. Szegedy, "Explaining and harnessing adversarial examples," CoRR, vol. abs/1412.6572, 2015.
[10]
Rey, A. Wiyatno, and Xu, "Maximal jacobian-based saliency map attack," CoRR, vol. abs/1808.0794, 2018.
[11]
N. Carlini and D. A. Wagner, "Towards evaluating the robustness of neural networks," CoRR, vol. abs/1608.04644, 2016.
[12]
P. Zhao, K. Xu, S. Liu, Y. Wang, and X. Lin, "Admm attack: an enhanced adversarial attack for deep neural networks with undetectable distortions," in Proc. Asia and South Pacific Design Automation Conference (ASP-DAC), january 2019, pp. 538--543.
[13]
S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, "Deepfool: a simple and accurate method to fool deep neural networks," CoRR, vol. abs/1511.04599, 2016.
[14]
J. Su, D. V. Vargas, and K. Sakurai, "One pixel attack for fooling deep neural networks," CoRR, vol. abs/1710.08864, 2019.
[15]
S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, "Universal adversarial perturbations," CoRR, vol. abs/1610.08401, 2017.
[16]
C. Moustapha, A. Yossi, N. Natalia, and K. Joseph, "Houdini: fooling deep structured prediction models," CoRR, vol. abs/1707.05373, 2017.
[17]
Yanpei, X. Liu, C. Chen, D. Liu, and Song, "Delving into transferable adversarial examples and black-box attacks," CoRR, vol. abs/1611.02770, 2017.
[18]
M. Alzantot, B. Balaji, and M. B. Srivastava, "Did you hear that? adversarial examples against automatic speech recognition," CoRR, vol. abs/1801.00554, 2018.
[19]
R. Taori, A. Kamsetty, B. Chu, and N. Vemuri, "Targeted adversarial examples for black box audio systems," CoRR, vol. abs/1805.07820, 2018.
[20]
G. Zhang, C. Yan, X. Ji, T. Zhang, T. Zhang, and W. Xu, "Dolphin atack: inaudible voice commands," CoRR, vol. abs/1708.09537, 2017.
[21]
N. Carlini and D. A. Wagner, "Audio adversarial examples: targeted attacks on speech-to-text," CoRR, vol. abs/1801.01944, 2018.
[22]
X. Yuan, Y. Chen, Y. Zhao, Y. Long, X. Liu, K. Chen, S. Zhang, H. Huang, X. Wang, and C. A. Gunter, "Commandersong: a systematic approach for practical adversarial voice recognition," CoRR, vol. abs/1801.08535, 2018.
[23]
H. Yakura and J. Sakuma, "Robust audio adversarial example for a physical attack," CoRR, vol. abs/1810.11793, 2018.
[24]
G. Sreenu and M. A. Saleem Durai, "Intelligent video surveillance: a review through deep learning techniques for crowd analysis," Journal of Big Data, June 2019.
[25]
K. Cho, B. van Merrienboer, Ç. Gülçehre, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using rnn encoder-decoder for statistical machine translation," CoRR, vol. abs/1406.1078, 2014.
[26]
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735--1780, November 1997.
[27]
Y. Zhang, N. Suda, L. Lai, and V. Chandra, "Hello edge: keyword spotting on microcontrollers," CoRR, vol. abs/1711.07128, 2017.
[28]
H. Abdullah, W. Garcia, C. Peeters, P. Traynor, K. R. B. Butler, and J. Wilson, "Practical hidden voice attacks against speech and speaker recognition systems," CoRR, vol. abs/1904.05734, 2019.
[29]
"Short-time fourier transform," https://en.wikipedia.org/wiki/Short-time_Fourier_transform.
[30]
A. Kurakin, I. J. Goodfellow, and S. Bengio, "Adversarial examples in the physical world," CoRR, vol. abs/1607.02533, 2016.
[31]
"White gaussian noise," https://en.wikipedia.org/wiki/Additive_white_Gaussian_noise.
[32]
"Speech commands dataset," https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html.
[33]
R. Scheibler, E. Bezzam, and I. Dokmanic, "Pyroomacoustics: a python package for audio room simulations and array processing algorithms," CoRR, vol. abs/1710.04196, 2017.

Cited By

View all
  • (2022)A Survey on Voice Assistant Security: Attacks and CountermeasuresACM Computing Surveys10.1145/352715355:4(1-36)Online publication date: 21-Nov-2022
  • (2022)SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition SystemsACM Transactions on Privacy and Security10.1145/351058225:3(1-31)Online publication date: 19-May-2022
  • (2022)Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and AutoencodersACM Journal on Emerging Technologies in Computing Systems10.1145/349122018:3(1-19)Online publication date: 2-Aug-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPDAC '20: Proceedings of the 25th Asia and South Pacific Design Automation Conference
January 2020
721 pages
ISBN:9781728141237
DOI:10.1109/3656337

Sponsors

Publisher

IEEE Press

Publication History

Published: 13 January 2020

Check for updates

Qualifiers

  • Research-article

Conference

ASPDAC '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A Survey on Voice Assistant Security: Attacks and CountermeasuresACM Computing Surveys10.1145/352715355:4(1-36)Online publication date: 21-Nov-2022
  • (2022)SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition SystemsACM Transactions on Privacy and Security10.1145/351058225:3(1-31)Online publication date: 19-May-2022
  • (2022)Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and AutoencodersACM Journal on Emerging Technologies in Computing Systems10.1145/349122018:3(1-19)Online publication date: 2-Aug-2022
  • (2021)Research Progress and Challenges on Application-Driven Adversarial Examples: A SurveyACM Transactions on Cyber-Physical Systems10.1145/34704935:4(1-25)Online publication date: 22-Sep-2021
  • (2021)Can We Use Arbitrary Objects to Attack LiDAR Perception in Autonomous Driving?Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security10.1145/3460120.3485377(1945-1960)Online publication date: 12-Nov-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media