research-article

Audio Adversarial Examples Generation with Recurrent Neural Networks

Authors:

Kuei-Huan Chang,

Ting-Chi WangAuthors Info & Claims

ASPDAC '20: Proceedings of the 25th Asia and South Pacific Design Automation Conference

Pages 488 - 493

https://doi.org/10.1109/ASP-DAC47756.2020.9045597

Published: 13 January 2020 Publication History

Abstract

Previous methods of performing adversarial attacks against speech recognition systems often treat this problem as a solely optimization problem and require iterative updates to generate optimal solutions. Although they can achieve high success rate, the process is too computational heavy even with the help of GPU. In this paper, we introduce a new type of real-time adversarial attack methodology, which applies Recurrent Neural Networks (RNN) with a two-step training process to generate adversarial examples targeting a Keyword Spotting (KWS) system. We extend our attack to physical world by adding extra constraints in order to eliminate the distortions in real world. In the experiment, we launch a real-time adversarial attack on the KWS system both in digital and physical world. The experimental results of digital world show that the execution time of our attack is more than 400 times faster than the state-of-the-art attack (i.e., C&W attack) with the comparable attack success rate. In physical world, after adding extra constraints, the perturbation becomes more robust such that the average attack success rate increases from 40.3% to 84.3%.

References

[1]

Y. Lei, N. Scheffer, L. Ferrer, and M. McLaren, "A novel scheme for speaker recognition using a phonetically-aware deep neural network," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 1695--1699.

[2]

S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: towards real-time object detection with region proposal networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137--1149, June 2017.

Digital Library

[3]

L. Logeswaran and H. Lee, "An efficient framework for learning sentence representations," CoRR, vol. abs/1803.02893, 2018.

[4]

T. Afouras, J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, "Deep audio-visual speech recognition," CoRR, vol. abs/1809.02108, 2018.

[5]

C. Chiu, T. N. Sainath, Y. Wu, and R. Prabhavalkar, "State-of-the-art speech recognition with sequence-to-sequence models," CoRR, vol. abs/1712.01769, 2018.

[6]

Y. Chung, W. Weng, S. Tong, and J. Glass, "Towards unsupervised speech-to-text translation," CoRR, vol. abs/1811.01307, 2018.

[7]

Z. Zhao, P. Zheng, S. Xu, and X. Wu, "Object detection with deep learning: A review," CoRR, vol. abs/1807.05511, 2019.

[8]

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, "Intriguing properties of neural networks," in Proc. International Conference on Learning Representations (ICLR), 2014.

[9]

I. J. Goodfellow, J. Shlens, and C. Szegedy, "Explaining and harnessing adversarial examples," CoRR, vol. abs/1412.6572, 2015.

[10]

Rey, A. Wiyatno, and Xu, "Maximal jacobian-based saliency map attack," CoRR, vol. abs/1808.0794, 2018.

[11]

N. Carlini and D. A. Wagner, "Towards evaluating the robustness of neural networks," CoRR, vol. abs/1608.04644, 2016.

[12]

P. Zhao, K. Xu, S. Liu, Y. Wang, and X. Lin, "Admm attack: an enhanced adversarial attack for deep neural networks with undetectable distortions," in Proc. Asia and South Pacific Design Automation Conference (ASP-DAC), january 2019, pp. 538--543.

Digital Library

[13]

S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, "Deepfool: a simple and accurate method to fool deep neural networks," CoRR, vol. abs/1511.04599, 2016.

[14]

J. Su, D. V. Vargas, and K. Sakurai, "One pixel attack for fooling deep neural networks," CoRR, vol. abs/1710.08864, 2019.

[15]

S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, "Universal adversarial perturbations," CoRR, vol. abs/1610.08401, 2017.

[16]

C. Moustapha, A. Yossi, N. Natalia, and K. Joseph, "Houdini: fooling deep structured prediction models," CoRR, vol. abs/1707.05373, 2017.

[17]

Yanpei, X. Liu, C. Chen, D. Liu, and Song, "Delving into transferable adversarial examples and black-box attacks," CoRR, vol. abs/1611.02770, 2017.

[18]

M. Alzantot, B. Balaji, and M. B. Srivastava, "Did you hear that? adversarial examples against automatic speech recognition," CoRR, vol. abs/1801.00554, 2018.

[19]

R. Taori, A. Kamsetty, B. Chu, and N. Vemuri, "Targeted adversarial examples for black box audio systems," CoRR, vol. abs/1805.07820, 2018.

[20]

G. Zhang, C. Yan, X. Ji, T. Zhang, T. Zhang, and W. Xu, "Dolphin atack: inaudible voice commands," CoRR, vol. abs/1708.09537, 2017.

[21]

N. Carlini and D. A. Wagner, "Audio adversarial examples: targeted attacks on speech-to-text," CoRR, vol. abs/1801.01944, 2018.

[22]

X. Yuan, Y. Chen, Y. Zhao, Y. Long, X. Liu, K. Chen, S. Zhang, H. Huang, X. Wang, and C. A. Gunter, "Commandersong: a systematic approach for practical adversarial voice recognition," CoRR, vol. abs/1801.08535, 2018.

[23]

H. Yakura and J. Sakuma, "Robust audio adversarial example for a physical attack," CoRR, vol. abs/1810.11793, 2018.

[24]

G. Sreenu and M. A. Saleem Durai, "Intelligent video surveillance: a review through deep learning techniques for crowd analysis," Journal of Big Data, June 2019.

[25]

K. Cho, B. van Merrienboer, Ç. Gülçehre, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using rnn encoder-decoder for statistical machine translation," CoRR, vol. abs/1406.1078, 2014.

[26]

S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735--1780, November 1997.

Digital Library

[27]

Y. Zhang, N. Suda, L. Lai, and V. Chandra, "Hello edge: keyword spotting on microcontrollers," CoRR, vol. abs/1711.07128, 2017.

[28]

H. Abdullah, W. Garcia, C. Peeters, P. Traynor, K. R. B. Butler, and J. Wilson, "Practical hidden voice attacks against speech and speaker recognition systems," CoRR, vol. abs/1904.05734, 2019.

[29]

"Short-time fourier transform," https://en.wikipedia.org/wiki/Short-time_Fourier_transform.

[30]

A. Kurakin, I. J. Goodfellow, and S. Bengio, "Adversarial examples in the physical world," CoRR, vol. abs/1607.02533, 2016.

[31]

"White gaussian noise," https://en.wikipedia.org/wiki/Additive_white_Gaussian_noise.

[32]

"Speech commands dataset," https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html.

[33]

R. Scheibler, E. Bezzam, and I. Dokmanic, "Pyroomacoustics: a python package for audio room simulations and array processing algorithms," CoRR, vol. abs/1710.04196, 2017.

Cited By

Yan CJi XWang KJiang QJin ZXu W(2022)A Survey on Voice Assistant Security: Attacks and CountermeasuresACM Computing Surveys10.1145/352715355:4(1-36)Online publication date: 21-Nov-2022
https://dl.acm.org/doi/10.1145/3527153
Chen YZhang JYuan XZhang SChen KWang XGuo S(2022)SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition SystemsACM Transactions on Privacy and Security10.1145/351058225:3(1-31)Online publication date: 19-May-2022
https://dl.acm.org/doi/10.1145/3510582
Huang PYu HPanoff MWang T(2022)Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and AutoencodersACM Journal on Emerging Technologies in Computing Systems10.1145/349122018:3(1-19)Online publication date: 2-Aug-2022
https://dl.acm.org/doi/10.1145/3491220
Show More Cited By

Recommendations

Generating adversarial examples with adversarial networks
IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial Intelligence

Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack ...
Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems
Abstract
Over the last few years, the adoption of machine learning in a wide range of domains has been remarkable. Deep learning, in particular, has been extensively used to drive applications and services in specializations such as computer vision, ...
Highlights
- A taxonomy of cybersecurity applications is established.
- Adversarial machine learning is systematically overviewed.
- An extensive, curated list of cybersecurity-related datasets is provided.
- Methods for generating adversarial ...
Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and Autoencoders
Deep Neural Network (DNN) is gaining popularity thanks to its ability to attain high accuracy and performance in various security-crucial scenarios. However, recent research shows that DNN-based Automatic Speech Recognition (ASR) systems are vulnerable to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPDAC '20: Proceedings of the 25th Asia and South Pacific Design Automation Conference

January 2020

721 pages

ISBN:9781728141237

DOI:10.1109/3656337

General Chairs:
K.-T. Tim Cheng
Hong Kong University of Science and Technology
,
Huazhong Yang
Tsinghua University

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

IEEE Press

Publication History

Published: 13 January 2020

Check for updates

Qualifiers

Research-article

Conference

ASPDAC '20

Sponsor:

SIGDA

ASPDAC '20: 25th Asia and South Pacific Design Automation Conference

January 13 - 16, 2020

Beijing, China

Acceptance Rates

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
3
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yan CJi XWang KJiang QJin ZXu W(2022)A Survey on Voice Assistant Security: Attacks and CountermeasuresACM Computing Surveys10.1145/352715355:4(1-36)Online publication date: 21-Nov-2022
https://dl.acm.org/doi/10.1145/3527153
Chen YZhang JYuan XZhang SChen KWang XGuo S(2022)SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition SystemsACM Transactions on Privacy and Security10.1145/351058225:3(1-31)Online publication date: 19-May-2022
https://dl.acm.org/doi/10.1145/3510582
Huang PYu HPanoff MWang T(2022)Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and AutoencodersACM Journal on Emerging Technologies in Computing Systems10.1145/349122018:3(1-19)Online publication date: 2-Aug-2022
https://dl.acm.org/doi/10.1145/3491220
Jiang WHe ZZhan JPan WAdhikari D(2021)Research Progress and Challenges on Application-Driven Adversarial Examples: A SurveyACM Transactions on Cyber-Physical Systems10.1145/34704935:4(1-25)Online publication date: 22-Sep-2021
https://dl.acm.org/doi/10.1145/3470493
Zhu YMiao CZheng THajiaghajani FSu LQiao CKim YKim JVigna GShi E(2021)Can We Use Arbitrary Objects to Attack LiDAR Perception in Autonomous Driving?Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security10.1145/3460120.3485377(1945-1960)Online publication date: 12-Nov-2021
https://dl.acm.org/doi/10.1145/3460120.3485377

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten