Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

VoiceCloak: Adversarial Example Enabled Voice De-Identification with Balanced Privacy and Utility

Published: 12 June 2023 Publication History

Abstract

Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying the utility of voice services. Existing machine-centric studies employ direct modification or text-based re-synthesis to de-identify users' voices but cause inconsistent audibility for human participants in emerging online communication scenarios, such as virtual meetings. In this paper, we propose a human-centric voice de-identification system, VoiceCloak, which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefiting from this, VoiceCloak could preserve user identity from exposure by Automatic Speaker Identification (ASI), while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, VoiceCloak learns a compact speaker distribution through a conditional variational auto-encoder to synthesize diverse targets on demand. Guided by these pseudo targets, VoiceCloak constructs adversarial examples in an input-specific manner, enabling any-to-any identity transformation for robust de-identification. Experimental results show that VoiceCloak could achieve over 92% and 84% successful de-identification on mainstream ASIs and commercial systems with excellent voiceprint consistency, speech integrity, and audio quality.

References

[1]
Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick Traynor, Kevin R. B. Butler, and Joseph Wilson. 2019. Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems. In Proceedings of NDSS. San Diego, California, USA.
[2]
Mohamed Abou-Zleikha, Zheng-Hua Tan, Mads Græsbøll Christensen, and Søren Holdt Jensen. 2015. A discriminative approach for speaker selection in speaker de-identification systems. In Proceedings of IEEE EUSIPCO. Nice, France, 2102--2106.
[3]
Shimaa Ahmed, Amrita Roy Chowdhury, Kassem Fawaz, and Parmesh Ramanathan. 2020. Preech: A System for Privacy-Preserving Speech Transcription. In Proceedings of USENIX Security. Virtual Event, 2703--2720.
[4]
Rawan Alharbi, Mariam Tolba, Lucia C. Petito, Josiah D. Hester, and Nabil Alshurafa. 2019. To Mask or Not to Mask?: Balancing Privacy with Visual Confirmation Utility in Activity-Oriented Wearable Cameras. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 3 (2019), 72:1--72:29.
[5]
Alibaba Cloud. 2017. Voiceprint Recognition System --- Not Just a Powerful Authentication Tool. https://alibaba-cloud.medium.com/voiceprint-recognition-system-not-just-a-powerful-authentication-tool-6b3702b5c5a.
[6]
Apple. 2022. Apple Siri. https://machinelearning.apple.com/research/personalized-hey-siri.
[7]
Sourav Bhattacharya, Dionysis Manousakas, Alberto Gil C. P. Ramos, Stylianos I. Venieris, Nicholas D. Lane, and Cecilia Mascolo. 2020. Countering Acoustic Adversarial Attacks in Microphone-equipped Smart Home Devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 2 (2020), 73:1--73:24.
[8]
Nicholas Carlini and David A. Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In Proceedings of IEEE S&P. San Jose, CA, USA, 39--57.
[9]
Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, and Yang Liu. 2021. Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems. In Proceedings of IEEE S&P. San Francisco, CA, USA, 694--711.
[10]
Meng Chen, Li Lu, Zhongjie Ba, and Kui Ren. 2022. PhoneyTalker: An Out-of-the-Box Toolkit for Adversarial Example Attack on Speaker Recognition. In Proceedings of IEEE INFOCOM. London, United Kingdom, 1419--1428.
[11]
Meng Chen, Li Lu, Jiadi Yu, Yingying Chen, Zhongjie Ba, Feng Lin, and Kui Ren. 2022. A non-intrusive and adaptive speaker de-identification scheme using adversarial examples. In Proceedings of ACM MobiCom. Sydney, NSW, Australia, 853--855.
[12]
Qianniu Chen, Meng Chen, Li Lu, Jiadi Yu, Yingying Chen, Zhibo Wang, Zhongjie Ba, Feng Lin, and Kui Ren. 2022. Push the Limit of Adversarial Example Attack on Speaker Recognition in Physical Domain. In Proceedings of ACM SenSys. Boston, Massachusetts, 710--724.
[13]
Ju-Chieh Chou and Hung-yi Lee. 2019. One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization. In Proceedings of ISCA Interspeech. Graz, Austria, 664--668.
[14]
Veaux Christophe, Yamagishi Junichi, and MacDonald Kirsten. 2016. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit. https://datashare.ed.ac.uk/handle/10283/2119.
[15]
Najim Dehak, Patrick J Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet. 2010. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing 19, 4 (2010), 788--798.
[16]
Brecht Desplanques, Jenthe Thienpondt, and Kris Demuynck. 2020. ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification. In Proceedings of ISCA Interspeech. Shanghai, China, 3830--3834.
[17]
Ellen Eide and Herbert Gish. 1996. A parametric approach to vocal tract length normalization. In Proceedings of IEEE ICASSP. Atlanta, Georgia, USA, 346--348.
[18]
Thorsten Eisenhofer, Lea Schönherr, Joel Frank, Lars Speckemeier, Dorothea Kolossa, and Thorsten Holz. 2021. Dompteur: Taming Audio Adversarial Examples. In Proceedings of USENIX Security. 2309--2326.
[19]
Fernando M. Espinoza-Cuadros, Juan M. Perero-Codosero, Javier Antón-Martín, and Luis A. Hernández Gómez. 2020. Speaker De-identification System using Autoencoders and Adversarial Training. CoRR abs/2011.04696 (2020). arXiv:2011.04696
[20]
Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas W. D. Evans, and Jean-François Bonastre. 2019. Speaker Anonymization Using X-vector and Neural Waveform Models. CoRR abs/1905.13561 (2019). arXiv:1905.13561
[21]
Haytham M. Fayek. 2016. Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What's In-Between. https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html.
[22]
Forbes. 2021. Apple Just Gave 1.5 Billion iPad, iPhone Users A Reason To Leave. https://www.forbes.com/sites/gordonkelly/2022/02/12/apple-iphone-ipad-siri-audio-recordings-iphone-privacy/?sh=68fc85bd4193.
[23]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In Proceedings of ICLR. San Diego, CA, USA.
[24]
Google. 2022. Google Meet. https://www.bluestacks.com/apps/communication/google-meet-on-pc.html.
[25]
Google Privacy & Terms. 2022. How Google Voice works. https://policies.google.com/technologies/voice?hl=en-US.
[26]
CMU Speech Group. 2012. Statistical parametirc sythesis and voice conversion techniques. http://festvox.org/11752/slides/lecture11a.pdf.
[27]
Yaowei Han, Sheng Li, Yang Cao, Qiang Ma, and Masatoshi Yoshikawa. 2020. Voice-Indistinguishability: Protecting Voiceprint In Privacy-Preserving Speech Data Release. In Proceedings of IEEE ICME. London, UK, 1--6.
[28]
Shehzeen Hussain, Paarth Neekhara, Shlomo Dubnov, Julian J. McAuley, and Farinaz Koushanfar. 2021. WaveGuard: Understanding and Mitigating Audio Adversarial Examples. In Proceedings of USENIX Security. 2273--2290.
[29]
iFLYTEK Open Platform. 2022. Voiceprint Recognition. https://www.xfyun.cn/service/isv.
[30]
Qin Jin, Arthur R. Toth, Tanja Schultz, and Alan W. Black. 2009. Voice convergin: Speaker de-identification by voice transformation. In Proceedings of IEEE ICASSP. Taipei, Taiwan, 3909--3912.
[31]
Tadej Justin, Vitomir Struc, Simon Dobrisek, Bostjan Vesnicer, Ivo Ipsic, and France Mihelic. 2015. Speaker de-identification using diphone recognition and speech synthesis. In Proceedings of IEEE FG. Ljubljana, Slovenia, 1--7.
[32]
Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In Proceedings of ICLR. Banff, AB, Canada.
[33]
Keisuke Kinoshita, Marc Delcroix, Sharon Gannot, Emanuël A. P. Habets, Reinhold Haeb-Umbach, Walter Kellermann, Volker Leutnant, Roland Maas, Tomohiro Nakatani, Bhiksha Raj, Armin Sehr, and Takuya Yoshioka. 2016. A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research. EURASIP J. Adv. Signal Process. 2016 (2016), 7.
[34]
Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, and Dong Wang. 2017. Deep Speaker Feature Learning for Text-Independent Speaker Verification. In Proceedings of ISCA Interspeech. Stockholm, Sweden, 1542--1546.
[35]
Zhuohang Li, Yi Wu, Jian Liu, Yingying Chen, and Bo Yuan. 2020. AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations. In Proceedings of ACM CCS. Virtual Event, USA, 1121--1134.
[36]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of ICLR. Vancouver, BC, Canada.
[37]
Carmen Magariños, Paula Lopez-Otero, Laura Docío Fernández, Eduardo Rodríguez Banga, Daniel Erro, and Carmen García-Mateo. 2017. Reversible speaker de-identification using pre-trained transformation functions. Comput. Speech Lang. 46 (2017), 36--52.
[38]
Carmen Magariños, Paula Lopez-Otero, Laura Docío Fernández, Eduardo R. Banga, Carmen García-Mateo, and Daniel Erro. 2016. Piecewise linear definition of transformation functions for speaker de-identification. In Proceedings of IEEE SPLINE. Aalborg, Denmark, 1--5.
[39]
Microsoft. 2022. How does Microsoft protect my privacy while improving its speech recognition technology? https://support.microsoft.com/en-us/windows/how-does-microsoft-protect-my-privacy-while-improving-its-speech-recognition-technology-f465d7a7--4a4f-40b7--9441-f0e6e97e24ec.
[40]
Microsoft. 2022. Microsoft Teams. https://www.microsoft.com/en-us/microsoft-teams/group-chat-software.
[41]
Microsoft Azure Congnitive Service. 2022. Speaker recognition. https://azure.microsoft.com/en-us/services/cognitive-services/speaker-recognition/.
[42]
Seyed Hamidreza Mohammadi and Alexander Kain. 2017. An overview of voice conversion systems. Speech Commun. 88 (2017), 65--82.
[43]
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In Proceedings of IEEE ICASSP. South Brisbane, Queensland, Australia, 5206--5210.
[44]
Miran Pobar and Ivo Ipsic. 2014. Online speaker de-identification using voice transformation. In Proceedings of IEEE MIPRO. Opatija, Croatia, 1264--1267.
[45]
Popular Mechanics. 2018. Hundreds of Apps Can Eavesdrop Through Phone Microphones to Target Ads. https://www.popularmechanics.com/technology/security/a14533262/alphonso-audio-ad-targeting/.
[46]
Jianwei Qian, Haohua Du, Jiahui Hou, Linlin Chen, Taeho Jung, and Xiang-Yang Li. 2018. Hidebehind: Enjoy Voice Input with Voiceprint Unclonability and Anonymity. In Proceedings of ACM SenSys. Shenzhen, China, 82--94.
[47]
Jianwei Qian, Feng Han, Jiahui Hou, Chunhong Zhang, Yu Wang, and Xiang-Yang Li. 2018. Towards Privacy-Preserving Speech Data Publishing. In Proceedings of IEEE INFOCOM. Honolulu, HI, USA, 1079--1087.
[48]
Yao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, and Colin Raffel. 2019. Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition. In Proceedings of ICML, Vol. 97. Long Beach, California, 5231--5240.
[49]
Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, and Yoshua Bengio. 2021. SpeechBrain: A General-Purpose Speech Toolkit. CoRR abs/2106.04624 (2021). arXiv:2106.04624
[50]
Lea Schönherr, Katharina Kohls, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. 2019. Adversarial Attacks Against Automatic Speech Recognition Systems via Psychoacoustic Hiding. In Proceedings of NDSS. San Diego, California, USA.
[51]
Matthew Schultz and Thorsten Joachims. 2003. Learning a Distance Metric from Relative Comparisons. In Proceedings of NIPS. Vancouver and Whistler, British Columbia, Canada, 41--48.
[52]
David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. X-vectors: Robust dnn embeddings for speaker recognition. In Proceedings of IEEE ICASSP. Calgary, AB, Canada, 5329--5333.
[53]
Brij Mohan Lal Srivastava, Aurélien Bellet, Marc Tommasi, and Emmanuel Vincent. 2019. Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?. In Proceedings of ISCA Interspeech. Graz, Austria, 3700--3704.
[54]
Brij Mohan Lal Srivastava, Natalia A. Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, and Marc Tommasi. 2020. Design Choices for X-Vector Based Speaker Anonymization. In Proceedings of ISCA Interspeech. Virtual Event, Shanghai, China, 1713--1717.
[55]
Brij Mohan Lal Srivastava, Nathalie Vauquier, Md. Sahidullah, Aurélien Bellet, Marc Tommasi, and Emmanuel Vincent. 2020. Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers. In Proceedings of IEEE ICASSP. Barcelona, Spain, 2802--2806.
[56]
Brij Mohan Lal Srivastava, Nathalie Vauquier, Md. Sahidullah, Aurélien Bellet, Marc Tommasi, and Emmanuel Vincent. 2020. Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers. In Proceedings of IEEE ICASSP. Barcelona, Spain, 2802--2806.
[57]
The New York Times. 2019. Amazon's Alexa Never Stops Listening to You. Should You Worry? https://www.nytimes.com/wirecutter/blog/amazons-alexa- never- stops- listening-to-you/.
[58]
Henry Turner, Giulio Lovisotto, and Ivan Martinovic. 2022. Generating identities with mixture models for speaker anonymization. Comput. Speech Lang. 72 (2022), 101318.
[59]
Tavish Vaidya and Micah Sherr. 2019. You Talk Too Much: Limiting Privacy Exposure Via Voice Input. In Proceedings of IEEE S&P Workshops. San Francisco, CA, USA, 84--91.
[60]
Qing Wang, Pengcheng Guo, and Lei Xie. 2020. Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition. In Proceedings of ISCA Interspeech. Virtual Event, 4228--4232.
[61]
Wechat Official. 2015. Voiceprint: The New WeChat Password. https://blog.wechat.com/2015/05/21/voiceprint-the-new-wechat-password.
[62]
Yi Xie, Zhuohang Li, Cong Shi, Jian Liu, Yingying Chen, and Bo Yuan. 2021. Enabling Fast and Universal Audio Adversarial Attack Using Generative Model. In Proceedings of AAAI. Virtual Event, 14129--14137.
[63]
Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, Xiaofeng Wang, and Carl A. Gunter. 2018. CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition. In Proceedings of USENIX Security. Baltimore, MD, USA, 49--64.
[64]
Guanglin Zhang, Sifan Ni, and Ping Zhao. 2020. Enhancing Privacy Preservation in Speech Data Publishing. IEEE Internet Things J. 7, 8 (2020), 7357--7367.
[65]
Yuxuan Zhou, Huangxun Chen, Chenyu Huang, and Qian Zhang. 2022. WiAdv: Practical and Robust Adversarial Attack against WiFi-based Gesture Recognition System. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 2 (2022), 92:1--92:25.
[66]
Zoom. 2022. One platform to connect. https://zoom.us/.

Cited By

View all
  • (2024)Generating Multivariate Synthetic Time Series Data for Absent Sensors from Correlated SourcesProceedings of the 2nd International Workshop on Networked AI Systems10.1145/3662004.3663553(19-24)Online publication date: 3-Jun-2024
  • (2024)Push the Limit of Highly Accurate Ranging on Commercial UWB DevicesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596028:2(1-27)Online publication date: 15-May-2024
  • (2024)RFSpy: Eavesdropping on Online Conversations with Out-of-Vocabulary Words by Sensing Metal Coil Vibration of Headsets Leveraging RFIDProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661887(169-182)Online publication date: 3-Jun-2024
  • Show More Cited By

Index Terms

  1. VoiceCloak: Adversarial Example Enabled Voice De-Identification with Balanced Privacy and Utility

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
      Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 7, Issue 2
      June 2023
      969 pages
      EISSN:2474-9567
      DOI:10.1145/3604631
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 June 2023
      Published in IMWUT Volume 7, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. adversarial examples
      2. voice de-identification
      3. voice privacy preservation

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)219
      • Downloads (Last 6 weeks)31
      Reflects downloads up to 28 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Generating Multivariate Synthetic Time Series Data for Absent Sensors from Correlated SourcesProceedings of the 2nd International Workshop on Networked AI Systems10.1145/3662004.3663553(19-24)Online publication date: 3-Jun-2024
      • (2024)Push the Limit of Highly Accurate Ranging on Commercial UWB DevicesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596028:2(1-27)Online publication date: 15-May-2024
      • (2024)RFSpy: Eavesdropping on Online Conversations with Out-of-Vocabulary Words by Sensing Metal Coil Vibration of Headsets Leveraging RFIDProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661887(169-182)Online publication date: 3-Jun-2024
      • (2024)Conan's Bow Tie: A Streaming Voice Conversion for Real-Time VTuber LivestreamingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645146(35-50)Online publication date: 18-Mar-2024
      • (2024)MSense: Boosting Wireless Sensing Capability Under Motion InterferenceProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649350(108-123)Online publication date: 29-May-2024
      • (2024)Manipulating Voice Assistants Eavesdropping via Inherent Vulnerability Unveiling in Mobile SystemsIEEE Transactions on Mobile Computing10.1109/TMC.2024.340109623:12(11549-11563)Online publication date: Dec-2024
      • (2024)Adversarial Perturbation Prediction for Real-Time Protection of Speech PrivacyIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.346353819(8701-8716)Online publication date: 2024
      • (2024)Designing for Complementarity: A Conceptual Framework to Go Beyond the Current Paradigm of Using XAI in HealthcareArtificial Intelligence in HCI10.1007/978-3-031-60606-9_16(277-296)Online publication date: 29-Jun-2024
      • (2023)Self-adaptive motion tracking against on-body displacement of flexible sensorsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669501(77277-77289)Online publication date: 10-Dec-2023
      • (2023)MESEN: Exploit Multimodal Data to Design Unimodal Human Activity Recognition with Few LabelsProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3625782(1-14)Online publication date: 12-Nov-2023

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media