research-article

Public Access

Hidebehind: Enjoy Voice Input with Voiceprint Unclonability and Anonymity

Authors:

Xiang-Yang LiAuthors Info & Claims

SenSys '18: Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems

Pages 82 - 94

https://doi.org/10.1145/3274783.3274855

Published: 04 November 2018 Publication History

Abstract

We are speeding toward a not-too-distant future when we can perform human-computer interaction using solely our voice. Speech recognition is the key technology that powers voice input, and it is usually outsourced to the cloud for the best performance. However, user privacy is at risk because voiceprints are directly exposed to the cloud, which gives rise to security issues such as spoof attacks on speaker authentication systems. Additionally, it may cause privacy issues as well, for instance, the speech content could be abused for user profiling. To address this unexplored problem, we propose to add an intermediary between users and the cloud, named VoiceMask, to anonymize speech data before sending it to the cloud for speech recognition. It aims to mitigate the security and privacy risks by concealing voiceprints from the cloud. VoiceMask is built upon voice conversion but is much more than that; it is resistant to two de-anonymization attacks and satisfies differential privacy. It performs anonymization in resource-limited mobile devices while still maintaining the usability of the cloud-based voice input service. We implement VoiceMask on Android and present extensive experimental results. The evaluation substantiates the efficacy of VoiceMask, e.g., it is able to reduce the chance of a user's voice being identified from 50 people by a mean of 84%, while reducing voice input accuracy no more than 14.2%.

References

[1]

2013. Apple stores your voice data for two years. https://goo.gl/6hx1kh.

[2]

2014. Faked Obama speech. https://goo.gl/pnR3VK.

[3]

2016. Microsoft achieves speech recognition milestone. https://goo.gl/FsPLrJ.

[4]

2017. CMU PDA database. www.speech.cs.cmu.edu/databases/pda/.

[5]

2017. Google stores your voice inputs. https://goo.gl/7w5We1.

[6]

2017. The Invisible Internet Project. geti2p.net/en/.

[7]

2017. Phone scam. https://goo.gl/T4xMxM.

[8]

2017. Tor Project. www.torproject.org.

[9]

Alex Acero and Richard M Stern. 1991. Robust speech recognition by normalization of the acoustic space. In ICASSP. IEEE, 893--896.

Digital Library

[10]

Sercan O Arik, Jitong Chen, Kainan Peng, Wei Ping, and Yanqi Zhou. 2018. Neural Voice Cloning with a Few Samples. arXiv preprint arXiv:1802.06006 (2018).

[11]

Yu-Shuo Chang, Shih-Hao Hung, Nick JC Wang, and Bor-Shen Lin. 2011. CSR: A cloud-assisted speech recognition service for personal mobile device. In ICPP. IEEE, 305--314.

Digital Library

[12]

Linlin Chen, Taeho Jung, Haohua Du, Jianwei Qian, Jiahui Hou, and Xiang-Yang Li. 2018. Crowdlearning: Crowded Deep Learning with Data Privacy. In 2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). IEEE, 1--9.

[13]

Jordan Cohen, Terri Kamm, and Andreas G Andreou. 1995. Vocal tract normalization in speech recognition: Compensating for systematic speaker variability. The Journal of the Acoustical Society of America 97, 5 (1995), 3246--3247.

[14]

Anupam Das, Nikita Borisov, and Matthew Caesar. 2014. Do you hear what i hear?: Fingerprinting smart devices through embedded acoustic components. In CCS. ACM, 441--452.

Digital Library

[15]

Phillip L De Leon, Michael Pucher, Junichi Yamagishi, Inma Hernaez, and Ibon Saratxaga. 2012. Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Transactions on Audio, Speech, and Language Processing 20, 8 (2012), 2280--2290.

Digital Library

[16]

Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. FnT-TCS 9, 3-4 (2014), 211--407.

Digital Library

[17]

Ellen Eide and Herbert Gish. 1996. A parametric approach to vocal tract length normalization. In ICASSP, Vol. 1. IEEE, 346--348.

Digital Library

[18]

Dan Gillick. 2010. Can conversational word usage be used to predict speaker demographics?. In Interspeech. Citeseer, 1381--1384.

[19]

Jiahui Hou, Xiang-Yang Li, Taeho Jung, Yu Wang, and Daren Zheng. 2018. CASTLE: Enhancing the Utility of Inequality Query Auditing Without Denial Threats. IEEE Transactions on Information Forensics and Security 13, 7 (2018), 1656--1669.

[20]

Bernard J Jansen. 2006. Search log analysis: What it is, what's been done, how to do it. Library & information science research 28, 3 (2006), 407--432.

[21]

Shouling Ji, Weiqing Li, Mudhakar Srivatsa, and Raheem Beyah. 2014. Structural data de-anonymization: Quantification, practice, and implications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1040--1053.

Digital Library

[22]

Taeho Jung, Xiang-Yang Li, Wenchao Huang, Jianwei Qian, Linlin Chen, Junze Han, Jiahui Hou, and Cheng Su. 2017. Accounttrade: Accountable protocols for big data trading against dishonest consumers. In INFOCOM 2017-IEEE Conference on Computer Communications, IEEE. IEEE, 1--9.

[23]

Taeho Jung, Xiang-Yang Li, Wenchao Huang, Zhongying Qiao, Jianwei Qian, Linlin Chen, Junze Han, and Jiahui Hou. 2019. AccountTrade: Accountability Against Dishonest Big Data Buyers and Sellers. IEEE Transactions on Information Forensics and Security 14, 1 (2019), 223--234.

[24]

Peter F King. 2003. Server based speech recognition user interface for wireless devices. US Patent 6,532,446.

[25]

Tomi Kinnunen, Zhi-Zheng Wu, Kong Aik Lee, Filip Sedlak, Eng Siong Chng, and Haizhou Li. 2012. Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech. In ICASSP. IEEE, 4401--4404.

[26]

Xiang-Yang Li, Chunhong Zhang, Taeho Jung, Jianwei Qian, and Linlin Chen. 2016. Graph-based privacy-preserving data publication. In INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, IEEE. IEEE, 1--9.

[27]

Johan Lindberg, Mats Blomberg, et al. 1999. Vulnerability in speaker verification-a study of technical impostor techniques. In Eurospeech, Vol. 99. 1211--1214.

[28]

François Mairesse, Marilyn A Walker, Matthias R Mehl, and Roger K Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of artificial intelligence research 30 (2007), 457--500.

Digital Library

[29]

Iosif Mporas and Todor Ganchev. 2009. Estimation of unknown speaker's height from speech. International Journal of Speech Technology 12, 4 (2009), 149--160.

[30]

Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: an ASR corpus based on public domain audio books. In ICASSP. IEEE, 5206--5210.

[31]

Pavitra Patel, Anand Chaudhari, Ruchita Kale, and MA Pund. 2017. Emotion recognition from speech with Gaussian mixture models & via boosted GMM. IJRISE 3 (2017).

[32]

Manas Pathak, Jose Portelo, Bhiksha Raj, and Isabel Trancoso. 2012. Privacy-preserving speaker authentication. In International Conference on Information Security. Springer, 1--22.

Digital Library

[33]

Manas A Pathak and Bhiksha Raj. 2013. Privacy-preserving speaker verification and identification using Gaussian mixture models. IEEE Transactions on Audio, Speech, and Language Processing 21, 2 (2013), 397--406.

Digital Library

[34]

Michael Pitz and Hermann Ney. 2005. Vocal tract normalization equals linear transformation in cepstral space. IEEE Transactions on Speech and Audio Processing 13, 5 (2005), 930--944.

[35]

Jianwei Qian, Feng Han, Jiahui Hou, Chunhong Zhang, Yu Wang, and Xiang-Yang Li. 2018. Towards privacy-preserving speech data publishing. In INFOCOM. IEEE.

[36]

Jianwei Qian, Xiang-Yang Li, Chunhong Zhang, and Linlin Chen. 2016. Deanonymizing social networks and inferring private attributes using knowledge graphs. In Computer Communications, IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on. IEEE, 1--9.

[37]

Jianwei Qian, Xiang-Yang Li, Chunhong Zhang, Linlin Chen, Taeho Jung, and Junze Han. 2017. Social network de-anonymization and privacy inference with knowledge graph model. IEEE Transactions on Dependable and Secure Computing (2017).

[38]

Jianwei Qian, Fudong Qiu, Fan Wu, Na Ruan, Guihai Chen, and Shaojie Tang. 2017. Privacy-preserving selective aggregation of online user behavior data. IEEE Trans. Comput. 66, 2 (2017), 326--338.

Digital Library

[39]

Giorgio Roffo, Marco Cristani, Loris Bazzani, Ha Minh, and Vittorio Murino. 2013. Trusting Skype: Learning the way people chat for fast user recognition and verification. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 748--754.

Digital Library

[40]

Klaus R Scherer, Judy Koivumaki, and Robert Rosenthal. 1972. Minimal cues in the vocal communication of affect: Judging emotions from content-masked speech. Journal of Psycholinguistic Research 1, 3 (1972), 269--285.

[41]

Björn Schuller, Ronald Müller, Florian Eyben, Jürgen Gast, Benedikt Hörnler, Martin Wöllmer, Gerhard Rigoll, Anja Höthker, and Hitoshi Konosu. 2009. Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image and Vision Computing 27, 12 (2009), 1760--1774.

Digital Library

[42]

Björn Schuller, Stefan Steidl, Anton Batliner, Florian Schiel, and Jarek Krajewski. 2011. The Interspeech 2011 Speaker State Challenge. In Interspeech. 3201--3204.

[43]

James Scovell, Marco Beltman, Rina Doherty, Rania Elnaggar, and Chaitanya Sreerama. 2015. Impact of accuracy and latency on mean opinion scores for speech recognition solutions. Procedia Manufacturing 3 (2015), 4377--4383.

[44]

Paris Smaragdis and Madhusudana Shashanka. 2007. A framework for secure speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 15, 4 (2007), 1404--1413.

Digital Library

[45]

David Sundermann and Hermann Ney. 2003. VTLN-based voice conversion. In ISSPIT. IEEE, 556--559.

[46]

Hélene Valbret, Eric Moulines, and Jean-Pierre Tubach. 1992. Voice transformation using PSOLA technique. In ICASSP, Vol. 1. IEEE, 145--148.

Digital Library

[47]

Philip Weber, Linxue Bai, SM Houghton, P Jančovič, and Martin J Russell. 2016. Progress on phoneme recognition with a continuous-state HMM. In ICASSP. IEEE, 5850--5854.

[48]

Zhizheng Wu, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Federico Alegre, and Haizhou Li. 2015. Spoofing and countermeasures for speaker verification: a survey. Speech Communication 66 (2015), 130--153.

Digital Library

[49]

Zhizheng Wu, Sheng Gao, Eng Siong Cling, and Haizhou Li. 2014. A study on replay attack and anti-spoofing for text-dependent speaker verification. In APSIPA Annual Summit and Conference. IEEE, 1--5.

[50]

Zhizheng Wu and Haizhou Li. 2014. Voice conversion versus speaker verification: an overview. APSIPA Transactions on Signal and Information Processing 3 (2014), e17.

[51]

Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geoffrey Zweig. 2016. Achieving human parity in conversational speech recognition. arXiv preprint arXiv:1610.05256 (2016).

[52]

Weiqi Zhang, Liang He, Yen-Lu Chow, RongZhen Yang, and YePing Su. 2000. The study on distributed speech recognition system. In ICASSP, Vol. 3. IEEE, 1431--1434.

Cited By

Zhang YTong PLi SXie YLi MOkoshi TKo JLiKamWa R(2024)Face Recognition In Harsh Conditions: An Acoustic Based ApproachProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661855(1-14)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661855
Yao JWang QGuo PNing ZXie L(2024)Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted MatrixIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.340760032(2944-2956)Online publication date: 2024
https://doi.org/10.1109/TASLP.2024.3407600
Singh DPrajapati GPatil H(2024)Voice Privacy Using Time-Scale and Pitch ModificationSN Computer Science10.1007/s42979-023-02549-85:2Online publication date: 27-Jan-2024
https://doi.org/10.1007/s42979-023-02549-8
Show More Cited By

Index Terms

Hidebehind: Enjoy Voice Input with Voiceprint Unclonability and Anonymity
1. Human-centered computing
  1. Ubiquitous and mobile computing
2. Security and privacy
  1. Database and storage security
    1. Data anonymization and sanitization
  2. Security services
    1. Pseudonymity, anonymity and untraceability

Recommendations

Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients
Highlights
- Proposed a dysarthria voice conversion system that can effectively improve dysarthria speech intelligibility.
Abstract Background and Objective
Most dysarthric patients encounter communication problems due to unintelligible speech. Currently, there are many voice-driven systems aimed at improving their speech intelligibility; however, the ...
Voice conversion by mapping the speaker-specific features using pitch synchronous approach

The basic goal of the voice conversion system is to modify the speaker-specific characteristics, keeping the message and the environmental information contained in the speech signal intact. Speaker characteristics reflect in speech at different levels, ...
Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency

This paper describes a speaker-independent HMM-based voice conversion technique that incorporates context-dependent prosodic symbols obtained using adaptive quantization of the fundamental frequency (F0). In the HMM-based conversion of our previous ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SenSys '18: Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems

November 2018

449 pages

ISBN:9781450359528

DOI:10.1145/3274783

Editors:
Gowri Sankar Ramachandran
University of Southern California, Los Angeles
,
Bhaskar Krishnamachari
University of Southern California, Los Angeles

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Key Research Program of Frontier Sciences, CAS
National Science Foundation
China National Funds for Distinguished Young Scientists
National Natural Science Foundation of China
National Key R&D Program of China

Conference

SenSys '18

Sponsor:

SenSys '18: The 16th ACM Conference on Embedded Networked Sensor Systems

November 4 - 7, 2018

Shenzhen, China

Acceptance Rates

Overall Acceptance Rate 174 of 867 submissions, 20%

Upcoming Conference

SenSys '24

Sponsor:
sigbed
sigbed
sigbed
sigbed
sigbed

The 22nd ACM Conference on Embedded Networked Sensor Systems

November 4 - 7, 2024

Hangzhou , China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
1,595
Total Downloads

Downloads (Last 12 months)292
Downloads (Last 6 weeks)33

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YTong PLi SXie YLi MOkoshi TKo JLiKamWa R(2024)Face Recognition In Harsh Conditions: An Acoustic Based ApproachProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661855(1-14)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661855
Yao JWang QGuo PNing ZXie L(2024)Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted MatrixIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.340760032(2944-2956)Online publication date: 2024
https://doi.org/10.1109/TASLP.2024.3407600
Singh DPrajapati GPatil H(2024)Voice Privacy Using Time-Scale and Pitch ModificationSN Computer Science10.1007/s42979-023-02549-85:2Online publication date: 27-Jan-2024
https://doi.org/10.1007/s42979-023-02549-8
Prajapati GSingh DPatil H(2024)Voice Privacy Through Time-Scale and Pitch ModificationPattern Recognition and Machine Intelligence10.1007/978-3-031-12700-7_8(72-80)Online publication date: 24-Jul-2024
https://doi.org/10.1007/978-3-031-12700-7_8
Deng JTeng FChen YChen XWang ZXu WCalandrino JTroncoso C(2023)V-CLOAKProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620527(5181-5198)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620527
Melzner JBonezzi AMeyvis T(2023)Information Disclosure in the Era of Voice TechnologyJournal of Marketing10.1177/0022242922113828687:4(491-509)Online publication date: 22-Mar-2023
https://doi.org/10.1177/00222429221138286
Chen MLu LWang JYu JChen YWang ZBa ZLin FRen K(2023)VoiceCloakProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35962667:2(1-21)Online publication date: 12-Jun-2023
https://dl.acm.org/doi/10.1145/3596266
Aloufi RHaddadi HBoyle D(2023)Paralinguistic Privacy Protection at the EdgeACM Transactions on Privacy and Security10.1145/357016126:2(1-27)Online publication date: 13-Apr-2023
https://dl.acm.org/doi/10.1145/3570161
Zhang SLi ZDas ABoureanu ISchneider SReaves BTippenhauer N(2023)VoicePM: A Robust Privacy Measurement on Voice AnonymityProceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks10.1145/3558482.3590175(215-226)Online publication date: 29-May-2023
https://dl.acm.org/doi/10.1145/3558482.3590175
Li HXu CRathore ALi ZZhang HSong CWang KSu LLin FRen KXu W(2023)VocalPrint: A mmWave-Based Unmediated Vocal Sensing System for Secure AuthenticationIEEE Transactions on Mobile Computing10.1109/TMC.2021.308497122:1(589-606)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TMC.2021.3084971
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents