survey

A Survey on Voice Assistant Security: Attacks and Countermeasures

Authors:

Wenyuan XuAuthors Info & Claims

ACM Computing Surveys, Volume 55, Issue 4

Article No.: 84, Pages 1 - 36

https://doi.org/10.1145/3527153

Published: 21 November 2022 Publication History

Abstract

Voice assistants (VA) have become prevalent on a wide range of personal devices such as smartphones and smart speakers. As companies build voice assistants with extra functionalities, attacks that trick a voice assistant into performing malicious behaviors can pose a significant threat to a user’s security, privacy, and even safety. However, the diverse attacks and stand-alone defenses in the literature often lack a systematic perspective, making it challenging for designers to properly identify, understand, and mitigate the security threats against voice assistants. To overcome this problem, this article provides a thorough survey of the attacks and countermeasures for voice assistants. We systematize a broad category of relevant but seemingly unrelated attacks by the vulnerable system components and attack methods, and categorize existing countermeasures based on the defensive strategies from a system designer’s perspective. To assist designers in planning defense based on their demands, we provide a qualitative comparison of existing countermeasures by the implementation cost, usability, and security and propose practical suggestions. We envision this work can help build more reliability into voice assistants and promote research in this fast-evolving area.

References

[1]

Sajjad Abdoli, Luiz G. Hafemann, Jerome Rony, Ismail Ben Ayed, Patrick Cardinal, and Alessandro L. Koerich. 2019. Universal adversarial audio perturbations. arXiv:1908.03173. Retrieved from https://arxiv.org/abs/1908.03173.

[2]

Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick Traynor, Kevin R. B. Butler, and Joseph Wilson. 2019. Practical hidden voice attacks against speech and speaker recognition systems. In Proceedings of the NDSS 2019.

[3]

Hadi Abdullah, Muhammad Sajidur Rahman, Washington Garcia, Logan Blue, Kevin Warren, Anurag Swarnim Yadav, Tom Shrimpton, and Patrick Traynor. 2019. Hear “No Evil”, See “Kenansville”: Efficient and transferable black-box attacks on speech recognition and voice identification systems. arXiv:1910.05262. Retrieved from https://arxiv.org/abs/1910.05262.

[4]

Hadi Abdullah, Kevin Warren, Vincent Bindschaedler, Nicolas Papernot, and Patrick Traynor. 2020. The faults in our ASRs: An overview of attacks against automatic speech recognition and speaker identification systems. arXiv:2007.06622. Retrieved from https://arxiv.org/abs/2007.06622.

[5]

Muhammad Ejaz Ahmed, Il-Youp Kwak, Jun Ho Huh, Iljoo Kim, Taekkyung Oh, and Hyoungshick Kim. 2020. Void: A fast and light voice liveness detection system. In Proceedings of the 29th USENIX Security Symposium. 2685–2702.

[6]

Victor Akinwande, Celia Cintas, Skyler Speakman, and Srihari Sridharan. 2020. Identifying audio adversarial examples via anomalous pattern detection. arXiv:2002.05463. Retrieved from https://arxiv.org/abs/2002.05463.

[7]

Federico Alegre, Asmaa Amehraye, and Nicholas Evans. 2013. Spoofing countermeasures to protect automatic speaker verification from voice conversion. In Proceedings of the IEEE ICASSP 2013. IEEE, 3068–3072.

[8]

Efthimios Alepis and Constantinos Patsakis. 2017. Monkey says, monkey does: Security and privacy on voice assistants. IEEE Access 5 (2017), 17841–17851.

[9]

Moustafa Alzantot, Bharathan Balaji, and Mani Srivastava. 2017. Did you hear that? Adversarial examples against automatic speech recognition. In Proceedings of the 31st Conference on Neural Information Processing Systems. 6.

[10]

Iustina Andronic, Ludwig Kürzinger, Edgar Ricardo Chavez Rosas, Gerhard Rigoll, and Bernhard U. Seeber. 2020. MP3 compression to diminish adversarial noise in end-to-end speech recognition. arXiv:2007.12892. Retrieved from https://arxiv.org/abs/2007.12892.

[11]

Arjun Nitin Bhagoji, Warren He, Bo Li, and Dawn Song. 2017. Exploring the space of black-box attacks on deep neural networks. arXiv:1712.09491. Retrieved from https://arxiv.org/abs/1712.09491.

[12]

Sourav Bhattacharya, Dionysis Manousakas, Alberto Gil C. P. Ramos, Stylianos I. Venieris, Nicholas D. Lane, and Cecilia Mascolo. 2020. Countering acoustic adversarial attacks in microphone-equipped smart home devices. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 2 (2020), 1–24.

[13]

Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, and Karen Simonyan. 2019. High fidelity speech synthesis with adversarial networks. arXiv:1909.11646. Retrieved from https://arxiv.org/abs/1909.11646.

[14]

M. Bispham, Ioannis Agrafiotis, and Michael Goldsmith. 2019. The speech interface as an attack surface: An overview. International Journal On Advances in Security 12, 1 and 2 (2019).

[15]

Mary K. Bispham, Ioannis Agrafiotis, and Michael Goldsmith. 2019. Nonsense attacks on google assistant and missense attacks on amazon alexa. In Proceedings of the 5th ICISSP. SciTe Press.

[16]

Logan Blue, Hadi Abdullah, Luis Vargas, and Patrick Traynor. 2018. 2ma: Verifying voice commands via two microphone authentication. In Proceedings of the ACM AsiaCCS 2018. 89–100.

Digital Library

[17]

Logan Blue, Luis Vargas, and Patrick Traynor. 2018. Hello, is it me you’re looking for? differentiating between human and electronic speakers for voice interface security. In Proceedings of the ACM WiSec 2018. 123–133.

[18]

Fabian Braunlein and Luise Frerichs. 2020. Smart Spies: Alexa and Google Home Expose Users to Vishing and Eavesdropping. Retrieved from https://srlabs.de/bites/smart-spies/.

[19]

Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. Hidden voice commands. In Proceedings of the 25th USENIX Security Symposium. 513–530.

Digital Library

[20]

Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In Proceedings of the 2018 IEEE Security and Privacy Workshops. IEEE, 1–7.

[21]

Lucy Chai, Thavishi Illandara, and Zhongxia Yan. 2019. Private speech adversaries. (2019).

[22]

Kuei-Huan Chang, Po-Hao Huang, Honggang Yu, Yier Jin, and Ting-Chi Wang. 2020. Audio adversarial examples generation with recurrent neural networks. In Proceedings of the 25th ASPDAC.

Digital Library

[23]

Yun-Tai Chang. 2018. A Two-layer Authentication Using Voiceprint for Voice Assistants. Ph.D. Dissertation.

[24]

Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, and Yang Liu. 2019. Who is real bob? adversarial attacks on speaker recognition systems. arXiv:1911.01840. Retrieved from https://arxiv.org/abs/1911.01840.

[25]

Si Chen, Kui Ren, Sixu Piao, Cong Wang, Qian Wang, Jian Weng, Lu Su, and Aziz Mohaisen. 2017. You can hear but you cannot steal: Defending against voice impersonation attacks on smartphones. In Proceedings of the IEEE 37th ICDCS. IEEE, 183–195.

[26]

Tao Chen, Longfei Shangguan, Zhenjiang Li, and Kyle Jamieson. 2020. Metamorph: Injecting inaudible commands into over-the-air voice controlled systems. In Proceedings of the NDSS 2020. 17.

[27]

Yanjiao Chen, Yijie Bai, Richard Mitev, Kaibo Wang, Ahmad-Reza Sadeghi, and Wenyuan Xu. 2021. FakeWake: Understanding and mitigating fake wake-up words of voice assistants. In Proceedings of the ACM CCS 2021. 1861–1883.

Digital Library

[28]

Yuxuan Chen, Xuejing Yuan, Jiangshan Zhang, Yue Zhao, Shengzhi Zhang, Kai Chen, and XiaoFeng Wang. 2020. Devil’s whisper: A general approach for physical adversarial attacks against commercial black-box speech recognition devices. In Proceedings of the 29th USENIX Security Symposium.

[29]

Long Cheng, Christin Wilson, Jeffrey Alan Young, Daniel Dong, and Hongxin Hu. 2020. Dangerous skills got certified: Measuring the trustworthiness of amazon alexa platform. (2020).

[30]

Girija Chetty and Michael Wagner. 2004. Liveness verification in audio-video speaker authentication. In Proceedings of the 10th ASSTA Conference. Macquarie University Press, 358–363.

[31]

Geumhwan Cho, Jusop Choi, Hyoungshick Kim, Sangwon Hyun, and Jungwoo Ryoo. 2018. Threat modeling and analysis of voice assistant applications. In Proceedings of the WISA 2018. Springer, 197–209.

[32]

Moustapha M. Cisse, Yossi Adi, Natalia Neverova, and Joseph Keshet. 2017. Houdini: Fooling deep structured visual and speech recognition models with adversarial examples. In Proceedings of the 31st NeurIPS. 6977–6987.

[33]

Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Li Chen, Michael E. Kounavis, and Duen Horng Chau. 2018. Adagio: Interactive experimentation with adversarial attack and defense for audio. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 677–681.

[34]

Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, and Haizhou Li. 2020. The attacker’s perspective on automatic speaker verification: An overview. arXiv:2004.08849. Retrieved from https://arxiv.org/abs/2004.08849.

[35]

Sina Däubener, Lea Schönherr, Asja Fischer, and Dorothea Kolossa. 2020. Detecting adversarial examples for speech recognition via uncertainty quantification. arXiv:2005.14611. Retrieved from https://arxiv.org/abs/2005.14611.

[36]

Wenrui Diao, Xiangyu Liu, Zhe Zhou, and Kehuan Zhang. 2014. Your voice assistant is mine: How to abuse speakers to steal information and control your phone. In Proceedings of the 4th ACM SPSM. 63–74.

Digital Library

[37]

Tianyu Du, Shouling Ji, Jinfeng Li, Qinchen Gu, Ting Wang, and Raheem Beyah. 2019. Sirenattack: Generating adversarial audio for end-to-end acoustic systems. arXiv:1901.07846. Retrieved from https://arxiv.org/abs/1901.07846.

[38]

Xia Du, Chi-Man Pun, and Zheng Zhang. 2020. A unified framework for detecting audio adversarial examples. In Proceedings of the 28th ACM International Conference on Multimedia. 3986–3994.

Digital Library

[39]

Jide S. Edu, Jose M. Such, and Guillermo Suarez-Tangil. 2019. Smart home personal assistants: A security and privacy review. arXiv:1903.05593. Retrieved from https://arxiv.org/abs/1903.05593.

[40]

J. Lopes Esteves and C. Kasmi. 2018. Remote and silent voice command injection on a smartphone through conducted IEMI: Threats of smart IEMI for information security. Technical Report 48 (2018).

[41]

Nicholas W. D. Evans, Tomi Kinnunen, and Junichi Yamagishi. 2013. Spoofing and countermeasures for automatic speaker verification. In Proceedings of the INTERSPEECH. 925–929.

[42]

Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. 2018. Adversarial vulnerability for any classifier. In Proceedings of the Advances in Neural Information Processing Systems. 1178–1187.

[43]

Huan Feng, Kassem Fawaz, and Kang G. Shin. 2017. Continuous authentication for voice assistants. In Proceedings of the ACM MobiCom 2017. 343–355.

Digital Library

[44]

Santiago Fernández, Alex Graves, and Jürgen Schmidhuber. 2007. An application of recurrent neural networks to discriminative keyword spotting. In Proceedings of the ICANN 2007. Springer, 220–229.

[45]

Kevin Fu and Wenyuan Xu. 2018. Risks of trusting the physics of sensors. Communications of the ACM 61, 2 (2018), 20–23.

Digital Library

[46]

Mostafa Ghorbandoost, Abolghasem Sayadiyan, Mohsen Ahangar, Hamid Sheikhzadeh, Abdoreza Sabzi Shahrebabaki, and Jamal Amini. 2015. Voice conversion based on feature combination with limited training data. Speech Communication 67 (2015), 113–128.

[47]

Taesik Gong, Alberto Gil C. P. Ramos, Sourav Bhattacharya, Akhil Mathur, and Fahim Kawsar. 2019. AudiDoS: Real-time denial-of-service adversarial attacks on deep audio models. In Proceedings of the IEEE ICMLA 2019. 978–985.

[48]

Yuan Gong, Boyang Li, Christian Poellabauer, and Yiyu Shi. 2019. Real-time adversarial attacks. In Proceedings of the 28th IJCAI. 4672–4680.

[49]

Yuan Gong and Christian Poellabauer. 2018. Crafting adversarial examples for speech paralinguistics applications. In Proceedings of the DYNAMICS 2018. ACM, 1–8.

[50]

Yuan Gong and Christian Poellabauer. 2018. An overview of vulnerabilities of voice controlled systems. arXiv:1803.09156. Retrieved from https://arxiv.org/abs/1803.09156.

[51]

Yuan Gong and Christian Poellabauer. 2018. Protecting voice controlled systems using sound source identification based on acoustic cues. In Proceedings of the 27th ICCCN. IEEE, 1–9.

[52]

Yuan Gong, Jian Yang, and Christian Poellabauer. 2020. Detecting replay attacks using multi-channel audio: A neural network-based method. IEEE Signal Processing Letters (2020).

[53]

Xavi Gonzalvo, Siamak Tazari, Chun-an Chan, Markus Becker, Alexander Gutkin, and Hanna Silen. 2016. Recent advances in Google real-time HMM-driven unit selection synthesizer. (2016).

[54]

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572. Retrieved from https://arxiv.org/abs/1412.6572.

[55]

Qingli Guo, Jing Ye, Yiran Chen, Yu Hu, Yazhu Lan, Guohe Zhang, and Xiaowei Li. 2020. INOR–an intelligent noise reduction method to defend against adversarial audio examples. Neurocomputing (2020).

[56]

Qingli Guo, Jing Ye, Yu Hu, Guohe Zhang, Xiaowei Li, and Huawei Li. 2020. MultiPAD: A multivariant partition-based method for audio adversarial examples detection. IEEE Access 8 (2020), 63368–63380.

[57]

Zhixiu Guo, Zijin Lin, Pan Li, and Kai Chen. 2020. SkillExplorer: Understanding the behavior of skills in large scale. In Proceedings of the 29th USENIX Security Symposium. 1–18.

[58]

Joon Kuy Han, Hyoungshick Kim, and Simon S. Woo. 2019. Nickel to lego: Using foolgle to create adversarial examples to fool google cloud speech-to-text API. In Proceedings of the ACM CCS 2019. 2593–2595.

Digital Library

[59]

Abeerah Hashim. 2019. New Attack Strategy Against Smart Assistants Dubbed LightCommands. Retrieved from https://latesthackingnews.com/2019/11/10/new-attack-strategy-against-smart-assistants-dubbed-lightcommands/.

[60]

Rosa González Hautamäki, Tomi Kinnunen, Ville Hautamäki, Timo Leino, and Anne-Maria Laukkanen. 2013. I-vectors meet imitators: On vulnerability of speaker verification systems against voice mimicry. In Proceedings of the INTERSPEECH. 930–934.

[61]

Ruiwen He, Xiaoyu Ji, Xinfeng Li, Yushi Cheng, and Wenyuan Xu. 2022. “OK, Siri” or “Hey, Google”: Evaluating voiceprint distinctiveness via content-based PROLE score. In Proceedings of the 31th USENIX Security Symposium.

[62]

Yitao He, Junyu Bian, Xinyu Tong, Zihui Qian, Wei Zhu, Xiaohua Tian, and Xinbing Wang. 2019. Canceling inaudible voice commands against voice control systems. In Proceedings of the ACM MbiCom 2019. 1–15.

Digital Library

[63]

Shengshan Hu, Xingcan Shang, Zhan Qin, Minghui Li, Qian Wang, and Cong Wang. 2019. Adversarial examples for automatic speech recognition: Attacks and countermeasures. IEEE Communications Magazine 57, 10 (2019), 120–126.

[64]

Ryo Iijima, Shota Minami, Zhou Yunao, Tatsuya Takehisa, Takeshi Takahashi, Yasuhiro Oikawa, and Tatsuya Mori. 2018. Audio hotspot attack: An attack on voice assistance systems using directional sound beams. In Proceedings of the ACM CCS 2018. 2222–2224.

Digital Library

[65]

Dan Iter, Jade Huang, and Mike Jermann. 2017. Generating adversarial examples for speech recognition. Stanford Technical Report (2017).

[66]

Yeongjin Jang, Chengyu Song, Simon P. Chung, Tielei Wang, and Wenke Lee. 2014. A11y attacks: Exploiting accessibility in operating systems. In Proceedings of ACM CCS 2014. 103–115.

Digital Library

[67]

Mohammad Javad Jannati and Abolghasem Sayadiyan. 2018. Part-syllable transformation-based voice conversion with very limited training data. Circuits, Systems, and Signal Processing 37, 5 (2018), 1935–1957.

Digital Library

[68]

Tejas Jayashankar, Jonathan Le Roux, and Pierre Moulin. 2020. Detecting audio attacks on ASR systems with dropout uncertainty. arXiv:2006.01906. Retrieved from https://arxiv.org/abs/2006.01906.

[69]

Sarfaraz Jelil, Sishir Kalita, S. R. Mahadeva Prasanna, and Rohit Sinha. 2018. Exploration of compressed ILPR features for replay attack detection. In Proceedings of the INTERSPEECH. 631–635.

[70]

Xiaoyu Ji, Juchuan Zhang, Shui Jiang, Jishen Li, and Wenyuan Xu. 2021. CapSpeaker: Injecting voices to microphones via capacitors. In Proceedings of the ACM CCS 2021. 1915–1929.

Digital Library

[71]

Madhu R. Kamble, Hemlata Tak, and Hemant A. Patil. 2018. Effectiveness of speech demodulation-based features for replay detection. In Proceedings of the INTERSPEECH. 641–645.

[72]

Chaouki Kasmi and Jose Lopes Esteves. 2015. IEMI threats for information security: Remote command injection on modern smartphones. IEEE Transactions on Electromagnetic Compatibility 57, 6 (2015), 1752–1755.

[73]

Lawrence George Kersta. 1962. Voiceprint identification. Nature 196, 4861 (1962), 1253–1257.

[74]

Shreya Khare, Rahul Aralikatte, and Senthil Mani. 2019. Adversarial black-box attacks on automatic speech recognition systems using multi-objective evolutionary optimization. In Proceedings of the INTERSPEECH.

[75]

Juntae Kim and Minsoo Hahn. 2018. Voice activity detection using an adaptive context attention model. IEEE Signal Processing Letters 25, 8 (2018), 1181–1185.

[76]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from https://arxiv.org/abs/1412.6980.

[77]

Tomi Kinnunen and Haizhou Li. 2010. An overview of text-independent speaker recognition: From features to supervectors. Speech Communication 52, 1 (2010), 12–40.

Digital Library

[78]

Felix Kreuk, Yossi Adi, Moustapha Cisse, and Joseph Keshet. 2018. Fooling end-to-end speaker verification with adversarial examples. In Proceedings of the 2018 ICASSP. IEEE, 1962–1966.

Digital Library

[79]

Deepak Kumar, Riccardo Paccagnella, Paul Murley, Eric Hennenfent, Joshua Mason, Adam Bates, and Michael Bailey. 2018. Skill squatting attacks on amazon alexa. In Proceedings of the 27th USENIX Security Symposium. 33–47.

Digital Library

[80]

Denis Foo Kune, John Backes, Shane S. Clark, Daniel Kramer, Matthew Reynolds, Kevin Fu, Yongdae Kim, and Wenyuan Xu. 2013. Ghost talk: Mitigating EMI signal injection attacks against analog sensors. In Proceedings of the IEEE Symposium on Security and Privacy. IEEE, 145–159.

Digital Library

[81]

Hyun Kwon, Yongchul Kim, Hyunsoo Yoon, and Daeseon Choi. 2019. Selective audio adversarial example in evasion attack on speech recognition system. IEEE Transactions on Information Forensics and Security 15 (2019), 526–538.

Digital Library

[82]

Hyun Kwon, Hyunsoo Yoon, and Ki-Woong Park. 2019. POSTER: Detecting audio adversarial example through audio modification. In Proceedings of the ACM CCS 2019. 2521–2523.

Digital Library

[83]

Yee W. Lau, Dat Tran, and Michael Wagner. 2005. Testing voice mimicry with the YOHO speaker verification corpus. In Proceedings of the KES 2005. Springer, 15–21.

Digital Library

[84]

Yee Wah Lau, Michael Wagner, and Dat Tran. 2004. Vulnerability of speaker verification to voice mimicking. In Proceedings of the ISIMP 2004. IEEE, 145–148.

[85]

Yeonjoon Lee, Yue Zhao, Jiutian Zeng, Kwangwuk Lee, Nan Zhang, Faysal Hossain Shezan, Yuan Tian, Kai Chen, and XiaoFeng Wang. 2020. Using sonar for liveness detection to protect smart speakers against remote attackers. Proceedings of the ACM UbiComp 2020 4, 1 (2020), 1–28.

[86]

Huining Li, Chenhan Xu, Aditya Singh Rathore, Zhengxiong Li, Hanbin Zhang, Chen Song, Kun Wang, Lu Su, Feng Lin, Kui Ren, et al. 2020. VocalPrint: Exploring a resilient and secure voice authentication via mmWave biometric interrogation. In Proceedings of the SenSys 2020. 312–325.

Digital Library

[87]

Juncheng Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, and Florian Metze. 2019. Adversarial music: Real world audio adversary against wake-word detection system. In Proceedings of the 33rd NeurIPS. 11908–11918.

[88]

Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, and Wen Gao. 2020. Universal adversarial perturbations generative network for speaker recognition. In Proceedings of the ICME 2020. IEEE, 1–6.

[89]

Jiguo Li, Xinfeng Zhang, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, and Wen Gao. 2020. Learning to fool the speaker recognition. In Proceedings of the 2020 IEEE ICASSP. IEEE, 2937–2941.

[90]

Xu Li, Na Li, Jinghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu, and Helen Meng. 2020. Investigating robustness of adversarial samples detection for automatic speaker verification. arXiv:2006.06186. Retrieved from https://arxiv.org/abs/2006.06186.

[91]

Xu Li, Jinghua Zhong, Xixin Wu, Jianwei Yu, Xunying Liu, and Helen Meng. 2020. Adversarial attacks on GMM i-vector based speaker verification systems. In Proceedings of the 2020 IEEE ICASSP. IEEE, 6579–6583.

[92]

Zhuohang Li, Cong Shi, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. 2020. Practical adversarial attacks against speaker recognition systems. In Proceedings of the 21st HotMobile. 9–14.

Digital Library

[93]

Zhuohang Li, Yi Wu, Jian Liu, Yingying Chen, and Bo Yuan. 2020. AdvPulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations. In Proceedings of the ACM CCS 2020.

Digital Library

[94]

Yiqing Lin and Waleed H. Abdulla. 2015. Principles of psychoacoustics. In Proceedings of the Audio Watermark. Springer, 15–49.

[95]

Johan Lindberg and Mats Blomberg. 1999. Vulnerability in speaker verification-a study of technical impostor techniques. In Proceedings of the 6th EUROSPEECH.

[96]

Songxiang Liu, Haibin Wu, Hung-Yi Lee, and Helen Meng. 2019. Adversarial attacks on spoofing countermeasures of automatic speaker verification. In Proceedings of the IEEE ASRU 2019. IEEE, 312–319.

[97]

Xiaolei Liu, Xiaosong Zhang, Kun Wan, Qingxin Zhu, and Yufei Ding. 2019. Towards weighted-sampling audio adversarial example attack. arXiv:1901.10300. Retrieved from https://arxiv.org/abs/1901.10300.

[98]

Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi Yamagishi, and Tomi Kinnunen. 2018. Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data. arXiv:1803.00860. Retrieved from https://arxiv.org/abs/1803.00860.

[99]

Li Lu, Jiadi Yu, Yingying Chen, and Yan Wang. 2020. VocalLock: Sensing vocal tract for passphrase-independent user authentication leveraging acoustic signals on smartphones. Proceedings of the ACM UbiComp 2020 4, 2 (2020), 1–24.

[100]

Pingchuan Ma, Stavros Petridis, and Maja Pantic. 2019. Detecting adversarial attacks on audio-visual speech recognition. arXiv:1912.08639. Retrieved from https://arxiv.org/abs/1912.08639.

[101]

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083. Retrieved from https://arxiv.org/abs/1706.06083.

[102]

Jian Mao, Shishi Zhu, Dai Xuan, Qixiao Lin, and Jianwei Liu. 2020. Watchdog: Detecting ultrasonic-based inaudible voice attacks to smart home systems. IEEE Internet of Things Journal (2020).

[103]

Johnny Mariéthoz and Samy Bengio. 2005. Can a Professional Imitator Fool a GMM-based Speaker Verification System?Technical Report. IDIAP.

[104]

Mirko Marras, Pawel Korus, Nasir D. Memon, and Gianni Fenu. 2019. Adversarial optimization for dictionary attacks on speaker verification. In Proceedings of the INTERSPEECH. 2913–2917.

[105]

Jenny Medeiros. 2019. Here’s How The Military is Using Voice Technology. Retrieved from https://www.voicesummit.ai/blog/how-the-military-is-using-voice-technology.

[106]

Ethan Mendes and Kyle Hogan. 2020. Defending against imperceptible audio adversarial examples using proportional additive gaussian noise. (2020).

[107]

Yan Meng, Zichang Wang, Wei Zhang, Peilin Wu, Haojin Zhu, Xiaohui Liang, and Yao Liu. 2018. Wivo: Enhancing the security of voice control system via wireless signal in iot environment. In Proceedings of the ACM Mobihoc 2018. 81–90.

Digital Library

[108]

Richard Mitev, Markus Miettinen, and Ahmad-Reza Sadeghi. 2019. Alexa lied to me: Skill-based man-in-the-middle attacks on virtual assistants. In Proceedings of the ACM AsiaCCS 2019. 465–478.

Digital Library

[109]

Shihono Mochizuki, Sayaka Shiota, and Hitoshi Kiya. 2018. Voice liveness detection using phoneme-based pop-noise detector for speaker verifcation. In Proceedings of the Odyssey 2018 Speaker and Language Recognition Workshop.

[110]

Seyed Hamidreza Mohammadi and Alexander Kain. 2017. An overview of voice conversion systems. Speech Communication 88 (2017), 65–82.

Digital Library

[111]

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE CVPR 2016. 2574–2582.

[112]

Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian McAuley, and Farinaz Koushanfar. 2019. Universal adversarial perturbations for speech recognition systems. arXiv:1905.03828. Retrieved from https://arxiv.org/abs/1905.03828.

[113]

Aaron Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George Driessche, Edward Lockhart, Luis Cobo, Florian Stimberg, et al. 2018. Parallel wavenet: Fast high-fidelity speech synthesis. In Proceedings of the ICML 2018. 3918–3926.

[114]

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv:1609.03499. Retrieved from https://arxiv.org/abs/1609.03499.

[115]

E. Oppenheim and R. W. Schafer. 1980. Digital processing of speech signals. Moscow: Mir.–323 (1980).

[116]

Jayashree Padmanabhan and Melvin Jose Johnson Premkumar. 2015. Machine learning in automatic speech recognition: A survey. IETE Technical Review 32, 4 (2015), 240–251.

[117]

Mary Papenfuss. 201. Amazon Voice Assistant Alexa Orders Herself Some Dollhouses. Retrieved from https://www.huffpost.com/entry/amazon-alexa-orders-dollhouses_n_587317bbe4b099cdb0fdff5b.

[118]

Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016. Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv:1605.07277. Retrieved from https://arxiv.org/abs/1605.07277.

[119]

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the ACM AsiaCCS 2017. 506–519.

Digital Library

[120]

Youngseok Park, Hyunsang Choi, Sanghyun Cho, and Young-Gab Kim. 2019. Security analysis of smart speaker: Security attacks and mitigation. CMC-COMPUTERS MATERIALS & CONTINUA 61, 3 (2019), 1075–1090.

[121]

Santiago Pascual, Antonio Bonafonte, and Joan Serra. 2017. SEGAN: Speech enhancement generative adversarial network. arXiv:1703.09452. Retrieved from https://arxiv.org/abs/1703.09452.

[122]

Ankur T. Patil, Rajul Acharya, Pulikonda Krishna Aditya Sai, and Hemant A. Patil. 2019. Energy separation-based instantaneous frequency estimation for cochlear cepstral feature for replay spoof detection. In Proceedings of the INTERSPEECH. 2898–2902.

[123]

Swadhin Pradhan, Wei Sun, Ghufran Baig, and Lili Qiu. 2019. Combating replay attacks against voice assistants. Proceedings of the ACM UbiComp 2019 3, 3 (2019), 1–26.

[124]

Yao Qin, Nicholas Carlini, Ian Goodfellow, Garrison Cottrell, and Colin Raffel. 2019. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In Proceedings of the 36th ICML.

[125]

Krishan Rajaratnam, Basemah Alshemali, Kunal Shah, and Jugal Kalita. 2018. Speech Coding and Audio Preprocessing for Mitigating and Detecting Audio Adversarial Examples on Automatic Speech Recognition.

[126]

Krishan Rajaratnam and Jugal Kalita. 2018. Noise flooding for detecting audio adversarial examples against automatic speech recognition. In Proceedings of the IEEE ISSPIT 2018. IEEE, 197–201.

[127]

Krishan Rajaratnam, Kunal Shah, and Jugal Kalita. 2018. Isolated and ensemble audio preprocessing methods for detecting adversarial examples against automatic speech recognition. arXiv:1809.04397. Retrieved from https://arxiv.org/abs/1809.04397.

[128]

Douglas A. Reynolds. 1995. Speaker identification and verification using gaussian mixture speaker models. Speech Communication 17, 1–2 (1995), 91–108.

Digital Library

[129]

Nirupam Roy, Sheng Shen, Haitham Hassanieh, and Romit Roy Choudhury. 2018. Inaudible voice commands: The long-range attack and defense. In Proceedings of the 15th NSDI. 547–560.

[130]

Neville Ryant, Mark Liberman, and Jiahong Yuan. 2013. Speech activity detection on youtube using deep neural networks. In Proceedings of the INTERSPEECH. Lyon, France, 728–731.

[131]

Md Sahidullah, Dennis Alexander Lehmann Thomsen, Rosa Gonzalez Hautamäki, Tomi Kinnunen, Zheng-Hua Tan, Robert Parts, and Martti Pitkänen. 2017. Robust voice liveness detection and speaker verification using throat microphones. IEEE/ACM TASLP 26, 1 (2017), 44–56.

[132]

Tara N. Sainath, Oriol Vinyals, Andrew Senior, and Haşim Sak. 2015. Convolutional, long short-term memory, fully connected deep neural networks. In Proceedings of the IEEE ICASSP 2015. IEEE, 4580–4584.

[133]

Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari. 2017. Statistical parametric speech synthesis incorporating generative adversarial networks. IEEE/ACM TASLP 26, 1 (2017), 84–96.

[134]

Saeid Samizade, Zheng-Hua Tan, Chao Shen, and Xiaohong Guan. 2020. Adversarial example detection by classification for deep speech recognition. In Proceedings of the IEEE ICASSP 2020. IEEE, 3102–3106.

[135]

Lea Schonherr, Katharina Kohls, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. 2019. Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. In Proceedings of the NDSS 2019.

[136]

Lea Schönherr, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. 2019. Imperio: Robust over-the-air adversarial examples against automatic speech recognition systems. arXiv:1908.01551. Retrieved from https://arxiv.org/abs/1908.01551.

[137]

Ali Shahin Shamsabadi, Francisco Sepúlveda Teixeira, Alberto Abad, Bhiksha Raj, Andrea Cavallaro, and Isabel Trancoso. 2020. FoolHD: Fooling speaker identification by highly imperceptible adversarial disturbances. arXiv:2011.08483. arXiv preprint arXiv:2011.08483.

[138]

Jiacheng Shang, Si Chen, and Jie Wu. 2018. Defending against voice spoofing: A robust software-based liveness detection system. In Proceedings of the IEEE MASS 2018. IEEE, 28–36.

[139]

Jiacheng Shang and Jie Wu. 2019. Enabling secure voice input on augmented reality headsets using internal body voice. In Proceedings of the IEEE SECON 2019. IEEE, 1–9.

Digital Library

[140]

Jiacheng Shang and Jie Wu. 2020. Secure voice input on augmented reality headsets. IEEE TMC (2020).

[141]

Jiacheng Shang and Jie Wu. 2020. Voice liveness detection for voice assistants using ear canal pressure. In Proceedings of the IEEE MASS 2020.

[142]

W. Shang and M. Stevenson. 2010. Score normalization in playback attack detection. In Proceedings of the IEEE ICASSP 2010. 1678–1681.

[143]

Faysal Hossain Shezan, Hang Hu, Jiamin Wang, Gang Wang, and Yuan Tian. 2020. Read between the lines: An empirical measurement of sensitive applications of voice personal assistant systems. In Proceedings of the Web Conference 2020. 1006–1017.

Digital Library

[144]

Sayaka Shiota, Fernando Villavicencio, Junichi Yamagishi, Nobutaka Ono, Isao Echizen, and Tomoko Matsui. 2016. Voice liveness detection for speaker verification based on a tandem single/double-channel pop noise detector. In Proceeding of the Odyssey 2016 Speaker and Language Recognition Workshop.259–263.

[145]

Liwei Song and Prateek Mittal. 2017. Poster: Inaudible voice commands. In Proceedings of the ACM CCS 2017. 2583–2585.

Digital Library

[146]

Statista. 2020. Number of Digital Voice Assistants in Use Worldwide from 2019 to 2023. Retrieved from https://www.statista.com/statistics/973815/worldwide-digital-voice-assistant-in-use/.

[147]

Dan Su, Jiqiang Liu, Sencun Zhu, Xiaoyang Wang, and Wei Wang. 2020. “Are you home alone?” “Yes” disclosing security and privacy vulnerabilities in alexa skills. arXiv:2010.10788. Retrieved from https://arxiv.org/abs/2010.10788.

[148]

Takeshi Sugawara, Benjamin Cyr, Sara Rampazzi, Daniel Genkin, and Kevin Fu. 2019. Light commands: Laser-based audio injection attacks on voice-controllable systems. (2019).

[149]

Lifa Sun, Kun Li, Hao Wang, Shiyin Kang, and Helen Meng. 2016. Phonetic posteriorgrams for many-to-one voice conversion without parallel data training. In Proceedings of the IEEE ICME 2016. IEEE, 1–6.

[150]

Sining Sun, Pengcheng Guo, Lei Xie, and Mei-Yuh Hwang. 2019. Adversarial regularization for attention based end-to-end robust speech recognition. IEEE/ACM TASLP 27, 11 (2019), 1826–1838.

Digital Library

[151]

Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang, and Lei Xie. 2018. Training augmentation with adversarial examples for robust speech recognition. arXiv:1806.02782. Retrieved from https://arxiv.org/abs/1806.02782.

[152]

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv:1312.6199. Retrieved from https://arxiv.org/abs/1312.6199.

[153]

Joseph Szurley and J. Zico Kolter. 2019. Perceptual based adversarial audio attacks. arXiv:1906.06355. Retrieved from https://arxiv.org/abs/1906.06355.

[154]

Keiichi Tamura, Akitada Omagari, and Shuichi Hashida. 2019. Novel defense method against audio adversarial example for speech-to-text transcription neural networks. In Proceedings of the IWCIA 2019. IEEE, 115–120.

[155]

Rohan Taori, Amog Kamsetty, Brenton Chu, and Nikita Vemuri. 2019. Targeted adversarial examples for black box audio systems. In Proceedings of the 2019 IEEE Security and Privacy Workshops. IEEE, 15–20.

[156]

Tomoki Toda, Yamato Ohtani, and Kiyohiro Shikano. 2007. One-to-many and many-to-one voice conversion based on eigenvoices. In Proceedings of the IEEE ICASSP 2007. IEEE, IV–1249.

[157]

Massimiliano Todisco, Xin Wang, Ville Vestman, Md Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, and Kong Aik Lee. 2019. Asvspoof 2019: Future horizons in spoofed and fake audio detection. arXiv:1904.05441. Retrieved from https://arxiv.org/abs/1904.05441.

[158]

Keiichi Tokuda, Yoshihiko Nankaku, Tomoki Toda, Heiga Zen, Junichi Yamagishi, and Keiichiro Oura. 2013. Speech synthesis based on hidden Markov models. 101, 5 (2013), 1234–1252.

[159]

Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via prediction apis. In Proceedings of the 25th USENIX Security Symposium. 601–618.

Digital Library

[160]

Kai Tubbesing. 2018. Alexa is Engaged as Voice Assistant in Industry. Retrieved from https://www.hannovermesse.de/en/news/news-articles/alexa-is-engaged-as-voice-assistant-in-industry.

[161]

Jon Vadillo and Roberto Santana. 2019. Universal adversarial examples in speech command classification. arXiv:1911.10182. Retrieved from https://arxiv.org/abs/1911.10182.

[162]

Jon Vadillo and Roberto Santana. 2020. On the human evaluation of audio adversarial examples. arXiv:2001.08444. Retrieved from https://arxiv.org/abs/2001.08444.

[163]

Tavish Vaidya, Yuankai Zhang, Micah Sherr, and Clay Shields. 2015. Cocaine noodles: Exploiting the gap between human and machine speech recognition. In Proceedings of the 9th USENIX Workshop on Offensive Technologies.

[164]

Jesús Villalba and Eduardo Lleida. 2010. Speaker verification performance degradation against spoofing and tampering attacks. In Proceedings of the FALA Workshop. 131–134.

[165]

Jesús Villalba and Eduardo Lleida. 2011. Detecting replay attacks from far-field recordings on speaker verification systems. In Proceedings of the European Workshop on Biometrics and Identity Management. Springer, 274–285.

[166]

J. Villalba and E. Lleida. 2011. Preventing replay attacks on speaker verification systems. In Proceedings of the 2011 Carnahan Conference on Security Technology. 1–8.

[167]

Wolfgang Wahlster. 2013. Verbmobil: Foundations of Speech-to-speech Translation. Springer Science & Business Media.

[168]

Binghui Wang and Neil Zhenqiang Gong. 2018. Stealing hyperparameters in machine learning. In Proceedings of the 2018 IEEE Symposium on Security and Privacy. IEEE, 36–52.

[169]

Chen Wang, Cong Shi, Yingying Chen, Yan Wang, and Nitesh Saxena. 2020. WearID: Wearable-assisted low-effort authentication to voice assistants using cross-domain speech similarity. arXiv:2003.09083. Retrieved from https://arxiv.org/abs/2003.09083.

[170]

Donghua Wang, Li Dong, Rangding Wang, Diqun Yan, and Jie Wang. 2020. Targeted speech adversarial example generation with generative adversarial network. IEEE Access (2020).

[171]

Donghua Wang, Rangding Wang, Li Dong, Diqun Yan, Xueyuan Zhang, and Yongkang Gong. 2020. Adversarial examples attack and countermeasure for speech recognition system: A survey. In Proceedings of the SPDE. 443–468.

[172]

Qing Wang, Pengcheng Guo, Sining Sun, Lei Xie, and John H. L. Hansen. 2019. Adversarial regularization for end-to-end robust speaker verification. In Proceedings of the INTERSPEECH. 4010–4014.

[173]

Qing Wang, Pengcheng Guo, and Lei Xie. 2020. Inaudible adversarial perturbations for targeted attack in speaker recognition. arXiv:2005.10637. Retrieved from https://arxiv.org/abs/2005.10637.

[174]

Qian Wang, Xiu Lin, Man Zhou, Yanjiao Chen, Cong Wang, Qi Li, and Xiangyang Luo. 2019. Voicepop: A pop noise based anti-spoofing system for voice authentication on smartphones. In Proceedings of the INFOCOM. IEEE, 2062–2070.

Digital Library

[175]

Qian Wang, Baolin Zheng, Qi Li, Chao Shen, and Zhongjie Ba. 2020. Towards query-efficient adversarial attacks against automatic speech recognition systems. IEEE Transactions on Information Forensics and Security (2020).

[176]

Shu Wang, Jiahao Cao, Xu He, Kun Sun, and Qi Li. 2020. When the differences in frequency domain are compensated: Understanding and defeating modulated replay attacks on automatic speech recognition. In Proceedings of the ACM CCS 2020.

Digital Library

[177]

Shu Wang, Jiahao Cao, Kun Sun, and Qi Li. 2020. SIEVE: Secure in-vehicle automatic speech recognition systems. In Proceedings of the RAID 2020. 365–379.

[178]

Xianliang Wang, Yanhong Xiao, and Xuan Zhu. 2017. Feature selection based on CQCCs for automatic speaker verification spoofing. In Proceedings of the INTERSPEECH. 32–36.

[179]

Yao Wang, Wandong Cai, Tao Gu, Wei Shao, Yannan Li, and Yong Yu. 2019. Secure your voice: An oral airflow-based continuous liveness detection for voice assistants. 3, 4 (2019), 1–28.

[180]

Zhuoran Wang, Hongliang Chen, Guanchun Wang, Hao Tian, Hua Wu, and Haifeng Wang. 2014. Policy learning for domain selection in an extensible multi-domain spoken dialogue system. In Proceedings of the EMNLP 2014. 57–67.

[181]

Z. Wang, G. Wei, and Q. He. 2011. Channel pattern noise based playback attack detection algorithm for speaker recognition. In Proceedings of the ICMLC 2011.1708–1713.

[182]

Marcin Witkowski, Stanislaw Kacprzak, Piotr Zelasko, Konrad Kowalczyk, and Jakub Galka. 2017. Audio replay attack detection using high-frequency features. In Proceedings of the INTERSPEECH. 27–31.

[183]

Venessa Wong. 2017. Burger King’s New Ad Will Hijack Your Google Home. Retrieved from https://www.cnbc.com/2017/04/12/burger-kings-new-ad-will-hijack-your-google-home.html.

[184]

Yi Wu, Jian Liu, Yingying Chen, and Jerry Cheng. 2019. Semi-black-box attacks against speech recognition systems using adversarial samples. In Proceedings of the 2019 DySPAN. IEEE, 1–5.

Digital Library

[185]

Zhizheng Wu, Eng Siong Chng, and Haizhou Li. 2012. Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In Proceedings of the INTERSPEECH.

[186]

Zhizheng Wu, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Federico Alegre, and Haizhou Li. 2015. Spoofing and countermeasures for speaker verification: A survey. Speech Communication 66 (2015), 130–153.

Digital Library

[187]

Zhizheng Wu, Sheng Gao, Eng Siong Cling, and Haizhou Li. 2014. A study on replay attack and anti-spoofing for text-dependent speaker verification. In Proceedings of the ASPIPA ASC. IEEE, 1–5.

[188]

Zhizheng Wu, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, and Eliathamby Ambikairajah. 2012. A study on spoofing attack in state-of-the-art speaker verification: The telephone speech case. In Proceedings of the APSIPA ASC. IEEE, 1–5.

[189]

Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, and Bo Yuan. 2020. Real-time, universal, and robust adversarial attacks against speaker recognition systems. In Proceedings of the 2020 IEEE ICASSP. IEEE, 1738–1742.

[190]

Hiromu Yakura and Jun Sakuma. 2019. Robust audio adversarial example for a physical attack. In Proceedings of the 28th IJCAI. 5334–5341.

[191]

Chen Yan, Yan Long, Xiaoyu Ji, and Wenyuan Xu. 2019. The catcher in the field: A fieldprint based spoofing detection for text-independent speaker verification. In Proceedings of the ACM CCS 2019. 1215–1229.

Digital Library

[192]

Chen Yan, Hocheol Shin, Connor Bolton, Wenyuan Xu, Yongdae Kim, and Kevin Fu. 2020. SoK: A minimalist approach to formalizing analog sensor security. In Proceedings of the 2020 IEEE Symposium on Security and Privacy. 480–495.

[193]

Chen Yan, Guoming Zhang, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. 2019. The feasibility of injecting inaudible voice commands to voice assistants. IEEE TDSC (2019).

[194]

Qiben Yan, Kehai Liu, Qin Zhou, Hanqing Guo, and Ning Zhang. 2020. SurfingAttack: Interactive hidden attack on voice assistants using ultrasonic guided waves. In Proceedings of the NDSS 2020. Internet Society.

[195]

Zhuolin Yang, Bo Li, Pin-Yu Chen, and Dawn Song. 2018. Characterizing audio adversarial examples using temporal dependency. arXiv:1809.10875. Retrieved from https://arxiv.org/abs/1809.10875.

[196]

Sung-Hyun Yoon, Min-Sung Koh, Jae-Han Park, and Ha-Jin Yu. 2020. A new replay attack against automatic speaker verification systems. IEEE Access 8 (2020), 36080–36088.

[197]

Park Joon Young, Jo Hyo Jin, Samuel Woo, and Dong Hoon Lee. 2016. BadVoice: Soundless voice-control replay attack on modern smartphones. In Proceedings of the 8th ICUFN. IEEE, 882–887.

[198]

Xuejing Yuan, Yuxuan Chen, Aohui Wang, Kai Chen, Shengzhi Zhang, Heqing Huang, and Ian M. Molloy. 2018. All your alexa are belong to us: A remote voice control attack against echo. In Proceedings of the GLOBECOM. IEEE, 1–6.

Digital Library

[199]

Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, XiaoFeng Wang, and Carl A. Gunter. 2018. Commandersong: A systematic approach for practical adversarial voice recognition. In Proceedings of the 27th USENIX Security Symposium. 49–64.

[200]

Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. 2019. Adversarial examples: Attacks and defenses for deep learning. IEEE Transactions on Neural Networks and Learning Systems 30, 9 (2019), 2805–2824.

[201]

Heiga Zen, Takashi Nose, Junichi Yamagishi, Shinji Sako, Takashi Masuko, Alan W. Black, and Keiichi Tokuda. 2007. The HMM-based speech synthesis system (HTS) version 2.0. In Proceedings of the SSW 2007. Citeseer, 294–299.

[202]

Heiga Zen, Keiichi Tokuda, and Alan W. Black. 2009. Statistical parametric speech synthesis. Speech Communication 51, 11 (2009), 1039–1064.

Digital Library

[203]

Qiang Zeng, Jianhai Su, Chenglong Fu, Golam Kayas, Lannan Luo, Xiaojiang Du, Chiu C. Tan, and Jie Wu. 2019. A multiversion programming inspired approach to detecting audio adversarial examples. In Proceedings of the DSN 2019. IEEE, 39–51.

[204]

Guoming Zhang, Xiaoyu Ji, Xinfeng Li, Gang Qu, and Wenyuan Xu. 2021. EarArray: Defending against dolphinattack via acoustic attenuation. In Proceedings of the NDSS 2021.

[205]

Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. 2017. Dolphinattack: Inaudible voice commands. In Proceedings of the ACM CCS 2017. 103–117.

Digital Library

[206]

Hongting Zhang, Qiben Yan, Pan Zhou, and Xiao-Yang Liu. 2020. Generating robust audio adversarial examples with temporal dependency. In Proceedings of the 29th IJCAI. 3167–3173.

[207]

Jiajie Zhang, Bingsheng Zhang, and Bincheng Zhang. 2019. Defending adversarial attacks on cloud-aided automatic speech recognition systems. In Proceedings of the 7th International Workshop on Security in Cloud Computing. 23–31.

Digital Library

[208]

Lei Zhang, Yan Meng, Jiahao Yu, Chong Xiang, Brandon Falk, and Haojin Zhu. 2020. Voiceprint mimicry attack towards speaker verification system in smart home. In Proceedings of the IEEE INFOCOM 2020.

Digital Library

[209]

Linghan Zhang, Sheng Tan, and Jie Yang. 2017. Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication. In Proceedings of the ACM CCS 2017. 57–71.

Digital Library

[210]

Linghan Zhang, Sheng Tan, Jie Yang, and Yingying Chen. 2016. Voicelive: A phoneme localization based liveness detection for voice authentication on smartphones. In Proceedings of the ACM CCS 2016. 1080–1091.

Digital Library

[211]

Nan Zhang, Xianghang Mi, Xuan Feng, XiaoFeng Wang, Yuan Tian, and Feng Qian. 2019. Dangerous skills: Understanding and mitigating security risks of voice-controlled third-party functions on virtual personal assistant systems. In Proceedings of the 2019 IEEE Symposium on Security and Privacy. IEEE, 1381–1396.

[212]

Rongjunchen Zhang, Xiao Chen, Sheng Wen, Xi Zheng, and Yong Ding. 2019. Using AI to attack VA: A stealthy spyware against voice assistances in smart phones. IEEE Access 7 (2019), 153542–153554.

[213]

Yangyong Zhang, Lei Xu, Abner Mendoza, Guangliang Yang, Phakpoom Chinprutthiwong, and Guofei Gu. 2019. Life after speech recognition: fuzzing semantic misinterpretation for voice assistant applications. In Proceedings of the NDSS 2019.

[214]

Man Zhou, Zhan Qin, Xiu Lin, Shengshan Hu, Qian Wang, and Kui Ren. 2019. Hidden voice commands: Attacks and defenses on the vcs of autonomous driving cars. IEEE Wireless Communications 26, 5 (2019), 128–133.

[215]

Yi Zhou, Xiaohai Tian, Haihua Xu, Rohan Kumar Das, and Haizhou Li. 2019. Cross-lingual voice conversion with bilingual phonetic posteriorgram and average modeling. In Proceedings of the IEEE ICASSP 2019. IEEE, 6790–6794.

Cited By

Liu TLin FBa ZLu LQin ZRen KBalzarotti DXu W(2024)MicGuardProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699122(3963-3978)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.5555/3698900.3699122
Yuan XZhang JChen KWei CLi RMa ZLing X(2024)Adversarial Attack and Defense for Commercial Black-box Chinese-English Speech Recognition SystemsACM Transactions on Privacy and Security10.1145/370172528:1(1-27)Online publication date: 7-Nov-2024
https://dl.acm.org/doi/10.1145/3701725
Jin WCao YSu JShen QYe KWang DHao JLiu Z(2024)Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style TransferProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665532(47-55)Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1145/3665451.3665532
Show More Cited By

Index Terms

A Survey on Voice Assistant Security: Attacks and Countermeasures
1. Human-centered computing
  1. Human computer interaction (HCI)
2. Security and privacy
  1. Systems security

Recommendations

Security and privacy problems in voice assistant applications: A survey
Abstract
Voice assistant applications have become omniscient nowadays. Two models that provide the two most important functions for real-life applications (i.e., Google Home, Amazon Alexa, Siri, etc.) are Automatic Speech Recognition (ASR) models and ...
A survey of detection methods for XSS attacks
Abstract
Cross-site scripting attack (abbreviated as XSS) is an unremitting problem for the Web applications since the early 2000s. It is a code injection attack on the client-side where an attacker injects malicious payload into a vulnerable ...
Address resolution protocol spoofing attacks and security approaches: A survey

Address resolution protocol (ARP) is a very popular communication protocol in the local area network (LAN), working under the network layer, as per the open systems interconnection (OSI) model. It is used to associate Internet protocol (IP) address to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 55, Issue 4

April 2023

871 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3567469

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 November 2022

Online AM: 25 March 2022

Accepted: 14 March 2022

Revised: 09 February 2022

Received: 15 December 2020

Published in CSUR Volume 55, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Refereed

Funding Sources

China NSFC

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
3,180
Total Downloads

Downloads (Last 12 months)844
Downloads (Last 6 weeks)64

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu TLin FBa ZLu LQin ZRen KBalzarotti DXu W(2024)MicGuardProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699122(3963-3978)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.5555/3698900.3699122
Yuan XZhang JChen KWei CLi RMa ZLing X(2024)Adversarial Attack and Defense for Commercial Black-box Chinese-English Speech Recognition SystemsACM Transactions on Privacy and Security10.1145/370172528:1(1-27)Online publication date: 7-Nov-2024
https://dl.acm.org/doi/10.1145/3701725
Jin WCao YSu JShen QYe KWang DHao JLiu Z(2024)Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style TransferProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665532(47-55)Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1145/3665451.3665532
Huang WChen HCao HRen JJiang HFu ZZhang Y(2024)Manipulating Voice Assistants Eavesdropping via Inherent Vulnerability Unveiling in Mobile SystemsIEEE Transactions on Mobile Computing10.1109/TMC.2024.340109623:12(11549-11563)Online publication date: Dec-2024
https://doi.org/10.1109/TMC.2024.3401096
Zhang XTu YLong YShan LElsaadani MFu KLin ZHei X(2024)From Virtual Touch to Tesla Command: Unlocking Unauthenticated Control Chains From Smart Glasses for Vehicle Takeover2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00231(2366-2384)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00231
He RCheng YZe JJi XXu W(2024)Understanding and Benchmarking the Commonality of Adversarial Examples2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00111(1665-1683)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00111
Lu CXu HLi YChen WYe KXu C(2024)SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless ComputingProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00044(1-17)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00044
He RCheng YZheng ZJi XXu W(2024)Fast and Lightweight Voice Replay Attack Detection via Time-Frequency Spectrum DifferenceIEEE Internet of Things Journal10.1109/JIOT.2024.340696211:18(29798-29810)Online publication date: 15-Sep-2024
https://doi.org/10.1109/JIOT.2024.3406962
Fujimoto YOkamoto YTakaki K(2024)Efficient Implementation of Kernel Regularization Based on ADMM and Its Application to Room Impulse Response EstimationIEEE Access10.1109/ACCESS.2024.347920812(152721-152729)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3479208
Bawack RBonhoure EMallek S(2024)Why would consumers risk taking purchase recommendations from voice assistants?Information Technology & People10.1108/ITP-01-2023-0001Online publication date: 2-Apr-2024
https://doi.org/10.1108/ITP-01-2023-0001
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents