Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3274783.3274855acmconferencesArticle/Chapter ViewAbstractPublication PagessensysConference Proceedingsconference-collections
research-article
Public Access

Hidebehind: Enjoy Voice Input with Voiceprint Unclonability and Anonymity

Published: 04 November 2018 Publication History
  • Get Citation Alerts
  • Abstract

    We are speeding toward a not-too-distant future when we can perform human-computer interaction using solely our voice. Speech recognition is the key technology that powers voice input, and it is usually outsourced to the cloud for the best performance. However, user privacy is at risk because voiceprints are directly exposed to the cloud, which gives rise to security issues such as spoof attacks on speaker authentication systems. Additionally, it may cause privacy issues as well, for instance, the speech content could be abused for user profiling. To address this unexplored problem, we propose to add an intermediary between users and the cloud, named VoiceMask, to anonymize speech data before sending it to the cloud for speech recognition. It aims to mitigate the security and privacy risks by concealing voiceprints from the cloud. VoiceMask is built upon voice conversion but is much more than that; it is resistant to two de-anonymization attacks and satisfies differential privacy. It performs anonymization in resource-limited mobile devices while still maintaining the usability of the cloud-based voice input service. We implement VoiceMask on Android and present extensive experimental results. The evaluation substantiates the efficacy of VoiceMask, e.g., it is able to reduce the chance of a user's voice being identified from 50 people by a mean of 84%, while reducing voice input accuracy no more than 14.2%.

    References

    [1]
    2013. Apple stores your voice data for two years. https://goo.gl/6hx1kh.
    [2]
    2014. Faked Obama speech. https://goo.gl/pnR3VK.
    [3]
    2016. Microsoft achieves speech recognition milestone. https://goo.gl/FsPLrJ.
    [4]
    2017. CMU PDA database. www.speech.cs.cmu.edu/databases/pda/.
    [5]
    2017. Google stores your voice inputs. https://goo.gl/7w5We1.
    [6]
    2017. The Invisible Internet Project. geti2p.net/en/.
    [7]
    2017. Phone scam. https://goo.gl/T4xMxM.
    [8]
    2017. Tor Project. www.torproject.org.
    [9]
    Alex Acero and Richard M Stern. 1991. Robust speech recognition by normalization of the acoustic space. In ICASSP. IEEE, 893--896.
    [10]
    Sercan O Arik, Jitong Chen, Kainan Peng, Wei Ping, and Yanqi Zhou. 2018. Neural Voice Cloning with a Few Samples. arXiv preprint arXiv:1802.06006 (2018).
    [11]
    Yu-Shuo Chang, Shih-Hao Hung, Nick JC Wang, and Bor-Shen Lin. 2011. CSR: A cloud-assisted speech recognition service for personal mobile device. In ICPP. IEEE, 305--314.
    [12]
    Linlin Chen, Taeho Jung, Haohua Du, Jianwei Qian, Jiahui Hou, and Xiang-Yang Li. 2018. Crowdlearning: Crowded Deep Learning with Data Privacy. In 2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). IEEE, 1--9.
    [13]
    Jordan Cohen, Terri Kamm, and Andreas G Andreou. 1995. Vocal tract normalization in speech recognition: Compensating for systematic speaker variability. The Journal of the Acoustical Society of America 97, 5 (1995), 3246--3247.
    [14]
    Anupam Das, Nikita Borisov, and Matthew Caesar. 2014. Do you hear what i hear?: Fingerprinting smart devices through embedded acoustic components. In CCS. ACM, 441--452.
    [15]
    Phillip L De Leon, Michael Pucher, Junichi Yamagishi, Inma Hernaez, and Ibon Saratxaga. 2012. Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Transactions on Audio, Speech, and Language Processing 20, 8 (2012), 2280--2290.
    [16]
    Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. FnT-TCS 9, 3-4 (2014), 211--407.
    [17]
    Ellen Eide and Herbert Gish. 1996. A parametric approach to vocal tract length normalization. In ICASSP, Vol. 1. IEEE, 346--348.
    [18]
    Dan Gillick. 2010. Can conversational word usage be used to predict speaker demographics?. In Interspeech. Citeseer, 1381--1384.
    [19]
    Jiahui Hou, Xiang-Yang Li, Taeho Jung, Yu Wang, and Daren Zheng. 2018. CASTLE: Enhancing the Utility of Inequality Query Auditing Without Denial Threats. IEEE Transactions on Information Forensics and Security 13, 7 (2018), 1656--1669.
    [20]
    Bernard J Jansen. 2006. Search log analysis: What it is, what's been done, how to do it. Library & information science research 28, 3 (2006), 407--432.
    [21]
    Shouling Ji, Weiqing Li, Mudhakar Srivatsa, and Raheem Beyah. 2014. Structural data de-anonymization: Quantification, practice, and implications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1040--1053.
    [22]
    Taeho Jung, Xiang-Yang Li, Wenchao Huang, Jianwei Qian, Linlin Chen, Junze Han, Jiahui Hou, and Cheng Su. 2017. Accounttrade: Accountable protocols for big data trading against dishonest consumers. In INFOCOM 2017-IEEE Conference on Computer Communications, IEEE. IEEE, 1--9.
    [23]
    Taeho Jung, Xiang-Yang Li, Wenchao Huang, Zhongying Qiao, Jianwei Qian, Linlin Chen, Junze Han, and Jiahui Hou. 2019. AccountTrade: Accountability Against Dishonest Big Data Buyers and Sellers. IEEE Transactions on Information Forensics and Security 14, 1 (2019), 223--234.
    [24]
    Peter F King. 2003. Server based speech recognition user interface for wireless devices. US Patent 6,532,446.
    [25]
    Tomi Kinnunen, Zhi-Zheng Wu, Kong Aik Lee, Filip Sedlak, Eng Siong Chng, and Haizhou Li. 2012. Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech. In ICASSP. IEEE, 4401--4404.
    [26]
    Xiang-Yang Li, Chunhong Zhang, Taeho Jung, Jianwei Qian, and Linlin Chen. 2016. Graph-based privacy-preserving data publication. In INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, IEEE. IEEE, 1--9.
    [27]
    Johan Lindberg, Mats Blomberg, et al. 1999. Vulnerability in speaker verification-a study of technical impostor techniques. In Eurospeech, Vol. 99. 1211--1214.
    [28]
    François Mairesse, Marilyn A Walker, Matthias R Mehl, and Roger K Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of artificial intelligence research 30 (2007), 457--500.
    [29]
    Iosif Mporas and Todor Ganchev. 2009. Estimation of unknown speaker's height from speech. International Journal of Speech Technology 12, 4 (2009), 149--160.
    [30]
    Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: an ASR corpus based on public domain audio books. In ICASSP. IEEE, 5206--5210.
    [31]
    Pavitra Patel, Anand Chaudhari, Ruchita Kale, and MA Pund. 2017. Emotion recognition from speech with Gaussian mixture models & via boosted GMM. IJRISE 3 (2017).
    [32]
    Manas Pathak, Jose Portelo, Bhiksha Raj, and Isabel Trancoso. 2012. Privacy-preserving speaker authentication. In International Conference on Information Security. Springer, 1--22.
    [33]
    Manas A Pathak and Bhiksha Raj. 2013. Privacy-preserving speaker verification and identification using Gaussian mixture models. IEEE Transactions on Audio, Speech, and Language Processing 21, 2 (2013), 397--406.
    [34]
    Michael Pitz and Hermann Ney. 2005. Vocal tract normalization equals linear transformation in cepstral space. IEEE Transactions on Speech and Audio Processing 13, 5 (2005), 930--944.
    [35]
    Jianwei Qian, Feng Han, Jiahui Hou, Chunhong Zhang, Yu Wang, and Xiang-Yang Li. 2018. Towards privacy-preserving speech data publishing. In INFOCOM. IEEE.
    [36]
    Jianwei Qian, Xiang-Yang Li, Chunhong Zhang, and Linlin Chen. 2016. Deanonymizing social networks and inferring private attributes using knowledge graphs. In Computer Communications, IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on. IEEE, 1--9.
    [37]
    Jianwei Qian, Xiang-Yang Li, Chunhong Zhang, Linlin Chen, Taeho Jung, and Junze Han. 2017. Social network de-anonymization and privacy inference with knowledge graph model. IEEE Transactions on Dependable and Secure Computing (2017).
    [38]
    Jianwei Qian, Fudong Qiu, Fan Wu, Na Ruan, Guihai Chen, and Shaojie Tang. 2017. Privacy-preserving selective aggregation of online user behavior data. IEEE Trans. Comput. 66, 2 (2017), 326--338.
    [39]
    Giorgio Roffo, Marco Cristani, Loris Bazzani, Ha Minh, and Vittorio Murino. 2013. Trusting Skype: Learning the way people chat for fast user recognition and verification. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 748--754.
    [40]
    Klaus R Scherer, Judy Koivumaki, and Robert Rosenthal. 1972. Minimal cues in the vocal communication of affect: Judging emotions from content-masked speech. Journal of Psycholinguistic Research 1, 3 (1972), 269--285.
    [41]
    Björn Schuller, Ronald Müller, Florian Eyben, Jürgen Gast, Benedikt Hörnler, Martin Wöllmer, Gerhard Rigoll, Anja Höthker, and Hitoshi Konosu. 2009. Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image and Vision Computing 27, 12 (2009), 1760--1774.
    [42]
    Björn Schuller, Stefan Steidl, Anton Batliner, Florian Schiel, and Jarek Krajewski. 2011. The Interspeech 2011 Speaker State Challenge. In Interspeech. 3201--3204.
    [43]
    James Scovell, Marco Beltman, Rina Doherty, Rania Elnaggar, and Chaitanya Sreerama. 2015. Impact of accuracy and latency on mean opinion scores for speech recognition solutions. Procedia Manufacturing 3 (2015), 4377--4383.
    [44]
    Paris Smaragdis and Madhusudana Shashanka. 2007. A framework for secure speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 15, 4 (2007), 1404--1413.
    [45]
    David Sundermann and Hermann Ney. 2003. VTLN-based voice conversion. In ISSPIT. IEEE, 556--559.
    [46]
    Hélene Valbret, Eric Moulines, and Jean-Pierre Tubach. 1992. Voice transformation using PSOLA technique. In ICASSP, Vol. 1. IEEE, 145--148.
    [47]
    Philip Weber, Linxue Bai, SM Houghton, P Jančovič, and Martin J Russell. 2016. Progress on phoneme recognition with a continuous-state HMM. In ICASSP. IEEE, 5850--5854.
    [48]
    Zhizheng Wu, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Federico Alegre, and Haizhou Li. 2015. Spoofing and countermeasures for speaker verification: a survey. Speech Communication 66 (2015), 130--153.
    [49]
    Zhizheng Wu, Sheng Gao, Eng Siong Cling, and Haizhou Li. 2014. A study on replay attack and anti-spoofing for text-dependent speaker verification. In APSIPA Annual Summit and Conference. IEEE, 1--5.
    [50]
    Zhizheng Wu and Haizhou Li. 2014. Voice conversion versus speaker verification: an overview. APSIPA Transactions on Signal and Information Processing 3 (2014), e17.
    [51]
    Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geoffrey Zweig. 2016. Achieving human parity in conversational speech recognition. arXiv preprint arXiv:1610.05256 (2016).
    [52]
    Weiqi Zhang, Liang He, Yen-Lu Chow, RongZhen Yang, and YePing Su. 2000. The study on distributed speech recognition system. In ICASSP, Vol. 3. IEEE, 1431--1434.

    Cited By

    View all
    • (2024)Face Recognition In Harsh Conditions: An Acoustic Based ApproachProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661855(1-14)Online publication date: 3-Jun-2024
    • (2024)Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted MatrixIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.340760032(2944-2956)Online publication date: 2024
    • (2024)Voice Privacy Using Time-Scale and Pitch ModificationSN Computer Science10.1007/s42979-023-02549-85:2Online publication date: 27-Jan-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SenSys '18: Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems
    November 2018
    449 pages
    ISBN:9781450359528
    DOI:10.1145/3274783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 November 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Voiceprint concealment
    2. voice anonymity
    3. voice anonymization
    4. voice conversion
    5. voice privacy
    6. voice security

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    Acceptance Rates

    Overall Acceptance Rate 174 of 867 submissions, 20%

    Upcoming Conference

    SenSys '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)292
    • Downloads (Last 6 weeks)33
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Face Recognition In Harsh Conditions: An Acoustic Based ApproachProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661855(1-14)Online publication date: 3-Jun-2024
    • (2024)Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted MatrixIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2024.340760032(2944-2956)Online publication date: 2024
    • (2024)Voice Privacy Using Time-Scale and Pitch ModificationSN Computer Science10.1007/s42979-023-02549-85:2Online publication date: 27-Jan-2024
    • (2024)Voice Privacy Through Time-Scale and Pitch ModificationPattern Recognition and Machine Intelligence10.1007/978-3-031-12700-7_8(72-80)Online publication date: 24-Jul-2024
    • (2023)V-CLOAKProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620527(5181-5198)Online publication date: 9-Aug-2023
    • (2023)Information Disclosure in the Era of Voice TechnologyJournal of Marketing10.1177/0022242922113828687:4(491-509)Online publication date: 22-Mar-2023
    • (2023)VoiceCloakProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35962667:2(1-21)Online publication date: 12-Jun-2023
    • (2023)Paralinguistic Privacy Protection at the EdgeACM Transactions on Privacy and Security10.1145/357016126:2(1-27)Online publication date: 13-Apr-2023
    • (2023)VoicePM: A Robust Privacy Measurement on Voice AnonymityProceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks10.1145/3558482.3590175(215-226)Online publication date: 29-May-2023
    • (2023)VocalPrint: A mmWave-Based Unmediated Vocal Sensing System for Secure AuthenticationIEEE Transactions on Mobile Computing10.1109/TMC.2021.308497122:1(589-606)Online publication date: 1-Jan-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media