Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3548606.3559357acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

FenceSitter: Black-box, Content-Agnostic, and Synchronization-Free Enrollment-Phase Attacks on Speaker Recognition Systems

Published: 07 November 2022 Publication History

Abstract

Speaker Recognition Systems (SRSs) grant access to legitimate users based on voiceprint. Recent research has shown that SRSs can be bypassed during the training phase (backdoor attacks) and the recognition phase (evasion attacks). In this paper, we explore a new attack surface of SRSs by presenting an enrollment-phase attack paradigm, named FenceSitter, where the adversary poisons the SRS using imperceptible adversarial ambient sound when the legitimate user registers into the SRS. The tainted voiceprint extracted by the SRS allows both the adversary and the legitimate user to access the system in all future recognition phases. To materialize such attack, we interleave carefully-designed continuous adversarial perturbations into innocent-sounding ambient sound. As computing adversarial perturbations over a long sequence of ambient sound carrier is intractable, we optimize over adversarial segments with content desensitization and physical realization. In addition, the attack is made available under the black-box settings by gradient estimation based on the natural evolution strategy. Extensive experiments have been conducted on both English and Chinese voice datasets for close-set identification (CSI), open-set identification (OSI), and speaker verification (SV) tasks. The results under various digital and physical conditions have verified the effectiveness and robustness of FenceSitter. With live enrollment experiments and user study, we further validate the practicality of FenceSitter. Our work reveals the vulnerability of SRSs during the enrollment phase, which may spur future research in improving the security of SRSs.

Supplementary Material

MP4 File (CCS22-fp0172.mp4)
Presentation video

References

[1]
[n. d.]. TORCH.NN.FUNCTIONAL.CONV1D. https://pytorch.org/docs/stable/ generated/torch.nn.functional.conv1d.html#torch.nn.functional.conv1d.
[2]
Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick Traynor, Kevin R. B. Butler, and Joseph Wilson. 2019. Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems. In Network and Distributed System Security Symposium. The Internet Society.
[3]
Hadi Abdullah, Muhammad Sajidur Rahman, Washington Garcia, Kevin Warren, Anurag Swarnim Yadav, Tom Shrimpton, and Patrick Traynor. 2021. Hear "No Evil", See "Kenansville"*: Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems. In IEEE Symposium on Security and Privacy.
[4]
Marina Bosi, Karlheinz Brandenburg, Schuyler Quackenbush, Louis Fielder, Kenzo Akagiri, Hendrik Fuchs, and Martin Dietz. 1997. ISO/IEC MPEG-2 Advanced Audio Coding. Journal of the Audio Engineering Society, Vol. 45, 10 (1997), 789--814.
[5]
Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, and Yang Liu. 2021. Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems. In IEEE Symposium on Security and Privacy.
[6]
Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. 2018. VoxCeleb2: Deep Speaker Recognition. In Conference of the International Speech Communication Association.
[7]
Najim Dehak, Patrick Kenny, Ré da Dehak, Pierre Dumouchel, and Pierre Ouellet. 2011. Front-End Factor Analysis for Speaker Verification. IEEE Transactions on Speech and Audio Processing, Vol. 19, 4 (2011), 788--798.
[8]
Jiangyi Deng, Yanjiao Chen, and Wenyuan Xu. [n.,d.]. FenceSitter: Black-box, Content-Agnostic, and Synchronization-Free Enrollment-Phase Attacks on Speaker Recognition Systems extended version. https://person.zju.edu.cn/person/attachments/2022-08/01--1661840363--856887.pdf.
[9]
Brecht Desplanques, Jenthe Thienpondt, and Kris Demuynck. 2020. ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification. In Conference of the International Speech Communication Association.
[10]
Tianyu Du, Shouling Ji, Jinfeng Li, Qinchen Gu, Ting Wang, and Raheem Beyah. 2020. SirenAttack: Generating Adversarial Audio for End-to-End Acoustic Systems. In ACM Asia Conference on Computer and Communications Security.
[11]
Maycel Isaac Faraj and Josef Bigü n. 2007. Audio-visual person authentication using lip-motion from orientation maps. Pattern Recognition Letters, Vol. 28, 11 (2007), 1368--1382.
[12]
J. Fortuna, P. Sivakumaran, Aladdin M. Ariyaeeinia, and Amit S. Malegaonkar. 2005. Open-set speaker identification using adapted Gaussian mixture models. In European Conference on Speech Communication and Technology. ISCA.
[13]
Stanley A Gelfand. 2017. Hearing: An introduction to psychological and physiological acoustics. CRC Press.
[14]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations. OpenReview.net.
[15]
Hynek Hermansky. 1990. Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America, Vol. 87, 4 (1990), 1738--1752.
[16]
Marco Jeub, Magnus Schäfer, and Peter Vary. 2009a. A binaural room impulse response database for the evaluation of dereverberation algorithms. In International Conference on Digital Signal Processing. IEEE.
[17]
Marco Jeub, Magnus Schäfer, and Peter Vary. 2009b. A binaural room impulse response database for the evaluation of dereverberation algorithms. In International Conference on Digital Signal Processing. IEEE.
[18]
Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, and Zhenyao Zhu. 2017. Deep Speaker: an End-to-End Neural Speaker Embedding System. arXiv preprint arXiv:1705.02304 (2017).
[19]
Zhuohang Li, Cong Shi, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. 2020a. Practical Adversarial Attacks Against Speaker Recognition Systems. In International Workshop on Mobile Computing Systems and Applications. ACM.
[20]
Zhuohang Li, Yi Wu, Jian Liu, Yingying Chen, and Bo Yuan. 2020b. AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations. In ACM SIGSAC Conference on Computer and Communications Security.
[21]
Surfing Technology Ltd. [n.,d.]. ST-CMDS-20170001_1, Free ST Chinese Mandarin Corpus. https://www.openslr.org/38/.
[22]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In International Conference on Learning Representations. OpenReview.net.
[23]
Lindasalwa Muda, Mumtaj Begam, and I. Elamvazuthi. 2010. Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. arXiv preprint arXiv:1003.4083 (2010).
[24]
Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. VoxCeleb: A Large-Scale Speaker Identification Dataset. In Conference of the International Speech Communication Association.
[25]
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In IEEE International Conference on Acoustics, Speech and Signal Processing.
[26]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Conference on Neural Information Processing Systems. PMLR.
[27]
Yao Qin, Nicholas Carlini, Garrison W. Cottrell, Ian J. Goodfellow, and Colin Raffel. 2019. Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition. In International Conference on Machine Learning. PMLR.
[28]
David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. X-Vectors: Robust DNN Embeddings for Speaker Recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing.
[29]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In International Conference on Learning Representations. OpenReview.net.
[30]
Norman Poh Hoon Thian, Conrad Sanderson, and Samy Bengio. 2004. Spectral Subband Centroids as Complementary Features for Speaker Authentication. In Biometric Authentication First International Conference. Springer.
[31]
Dong Wang, Lantian Li, Zhiyuan Tang, and Thomas Fang Zheng. 2017. Deep speaker verification: Do we need end to end?. In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE.
[32]
Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and Jü rgen Schmidhuber. 2014. Natural evolution strategies. Journal of Machine Learning Research, Vol. 15, 1 (2014), 949--980.
[33]
Chen Yan, Yan Long, Xiaoyu Ji, and Wenyuan Xu. 2019. The Catcher in the Field: A Fieldprint based Spoofing Detection for Text-Independent Speaker Verification. In ACM SIGSAC Conference on Computer and Communications Security.
[34]
Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, Xiaofeng Wang, and Carl A. Gunter. 2018. CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition. In USENIX Security Symposium.
[35]
Tongqing Zhai, Yiming Li, Ziqi Zhang, Baoyuan Wu, Yong Jiang, and Shu-Tao Xia. 2021. Backdoor Attack Against Speaker Verification. In IEEE International Conference on Acoustics, Speech and Signal Processing.
[36]
Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. 2017. Dolphin Attack: Inaudible Voice Commands. In ACM SIGSAC Conference on Computer and Communications Security.
[37]
Baolin Zheng, Peipei Jiang, Qian Wang, Qi Li, Chao Shen, Cong Wang, Yunjie Ge, Qingyang Teng, and Shenyi Zhang. 2021. Black-box Adversarial Attacks on Commercial Speech Platforms with Minimal Information. In ACM SIGSAC Conference on Computer and Communications Security.

Cited By

View all
  • (2024)Biometrics-Based Authenticated Key Exchange With Multi-Factor Fuzzy ExtractorIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.346862419(9344-9358)Online publication date: 2024
  • (2024)Enrollment-Stage Backdoor Attacks on Speaker Recognition Systems via Adversarial UltrasoundIEEE Internet of Things Journal10.1109/JIOT.2023.332825311:8(13108-13124)Online publication date: 15-Apr-2024
  • (2024)Toward Pitch-Insensitive Speaker Verification via SoundfieldIEEE Internet of Things Journal10.1109/JIOT.2023.329000111:1(1175-1189)Online publication date: 1-Jan-2024

Index Terms

  1. FenceSitter: Black-box, Content-Agnostic, and Synchronization-Free Enrollment-Phase Attacks on Speaker Recognition Systems

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CCS '22: Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security
      November 2022
      3598 pages
      ISBN:9781450394505
      DOI:10.1145/3548606
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 November 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. black-box
      2. content-agnostic
      3. enrollment-phase attack
      4. speaker recognition system
      5. synchronization-free

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China

      Conference

      CCS '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)161
      • Downloads (Last 6 weeks)23
      Reflects downloads up to 07 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Biometrics-Based Authenticated Key Exchange With Multi-Factor Fuzzy ExtractorIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.346862419(9344-9358)Online publication date: 2024
      • (2024)Enrollment-Stage Backdoor Attacks on Speaker Recognition Systems via Adversarial UltrasoundIEEE Internet of Things Journal10.1109/JIOT.2023.332825311:8(13108-13124)Online publication date: 15-Apr-2024
      • (2024)Toward Pitch-Insensitive Speaker Verification via SoundfieldIEEE Internet of Things Journal10.1109/JIOT.2023.329000111:1(1175-1189)Online publication date: 1-Jan-2024
      • (2023)MASTERKEY: Practical Backdoor Attack Against Speaker Verification SystemsProceedings of the 29th Annual International Conference on Mobile Computing and Networking10.1145/3570361.3613261(1-15)Online publication date: 2-Oct-2023

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media