Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

VocalLock: Sensing Vocal Tract for Passphrase-Independent User Authentication Leveraging Acoustic Signals on Smartphones

Published: 15 June 2020 Publication History

Abstract

Recent years have witnessed the surge of biometric-based user authentication for mobile devices due to its promising security and convenience. As a natural and widely-existed behavior, human speaking has been exploited for user authentication. Existing voice-based user authentication explores the unique characteristics from either the voiceprint or mouth movements, which is vulnerable to replay attacks and mimic attacks. During speaking, the vocal tract, including the static shape and dynamic movements, also exhibits the individual uniqueness, and they are hardly eavesdropped and imitated by adversaries. Hence, our work aims to employ the individual uniqueness of vocal tract to realize user authentication on mobile devices. Moreover, most voice-based user authentications are passphrase-dependent, which significantly degrade the user experience. Thus, such user authentications are pressed to be implemented in a passphrase-independent manner while being able to resist various attacks. In this paper, we propose a user authentication system, VocalLock, which senses the whole vocal tract during speaking to identify different individuals in a passphrase-independent manner on smartphones leveraging acoustic signals. VocalLock first utilizes FMCW on acoustic signals to characterize both the static shape and dynamic movements of the vocal tract during speaking, and then constructs a passphrase-independent user authentication model based on the unique characteristics of vocal tract through GMM-UBM. The proposed VocalLock can resist various spoofing attacks, while achieving a satisfactory user experience. Extensive experiments in real environments demonstrate VocalLock can accurately authenticate user identity in a passphrase-independent manner and successfully resist various attacks.

References

[1]
Jont B Allen and Lawrence R Rabiner. 1977. A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65, 11 (1977), 1558--1564.
[2]
Amazon. 2019. Echo & Alexa - Amazon Device. [Online]. Available: https://www.amazon.com. (2019).
[3]
Apple. 2019. iPhone XS - FaceID - Apple. [Online]. Available: https://www.apple.com/iphone-xs/face-id/. (2019).
[4]
L. Benedikt, D. Cosker, P. L. Rosin, and D. Marshall. 2010. Assessing the Uniqueness and Permanence of Facial Actions for Use in Biometric Applications. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40, 3 (2010), 449--460.
[5]
C. BYU. 2020. Word frequency: based on 450 million word coca corpus. [Online]. Available: https://www.wordfrequency.info. (2020).
[6]
J. P. Campbell. 1997. Speaker recognition: a tutorial. Proc. IEEE 85, 9 (1997), 1437--1462.
[7]
Aaron Carroll and Gernot Heiser. 2010. An Analysis of Power Consumption in a Smartphone. In Proc. USENIX ATC. Boston, MA, USA, 21:1--21:14.
[8]
Mingshi Chen, Panlong Yang, Jie Xiong, Maotian Zhang, Youngki Lee, Chaocan Xiang, and Chang Tian. 2019. Your Table Can Be an Input Panel: Acoustic-based Device-Free Interaction Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 1 (2019), 3:1--3:21.
[9]
S. Chen, K. Ren, S. Piao, C. Wang, Q. Wang, J. Weng, L. Su, and A. Mohaisen. 2017. You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones. In Proc. IEEE ICDCS. 183--195.
[10]
Najim Dehak, Patrick J Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet. 2011. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing 19, 4 (2011), 788--798.
[11]
Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1--22.
[12]
G. R. Doddington. 1985. Speaker recognition---Identifying people by their voices. Proc. IEEE 73, 11 (1985), 1651--1664.
[13]
J.-L. Gauvain and Chin-Hui Lee. 1994. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing 2, 2 (1994), 291--298.
[14]
H. Gish and M. Schmidt. 1994. Text-independent speaker identification. IEEE Signal Processing Magazine 11, 4 (Oct 1994), 18--32.
[15]
Xavier Glorot, Antoine Bordes, Yoshua Bengio, Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2012. Deep Sparse Rectifier Neural Networks. In Proc. AISTATS'12. La Palma, Canary Islands, 315--323.
[16]
Google. 2019. Google Home - Smart Speaker & Home Assistant. [Online]. Available: https://store.google.com/us/product/google_home. (2019).
[17]
Google. 2019. Google Smart Lock. [Online]. Available: https://get.google.com/smartlock/. (2019).
[18]
Diego Gragnaniello, Giovanni Poggi, Carlo Sansone, and Luisa Verdoliva. 2015. Local contrast phase descriptor for fingerprint liveness detection. Pattern Recognition 48, 4 (2015), 1050--1058.
[19]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).
[20]
Cesar Iovescu and Sandeep Rao. 2017. The fundamentals of millimeter wave sensors. Technical Report. Texas Instruments. http://www.ti.com/lit/wp/spyy005/spyy005.pdf
[21]
Artur Janicki, Federico Alegre, and Nicholas Evans. 2016. An assessment of automatic speaker verification vulnerabilities to replay spoofing attacks. Security and Communication Networks 9, 15 (2016), 3030--3044.
[22]
Mark Keith, Benjamin Shao, and Paul John Steinbart. 2007. The usability of passphrases for authentication: An empirical field study. International journal of human-computer studies 65, 1 (2007), 17--28.
[23]
HJ Landau. 1967. Sampling, data transmission, and the Nyquist rate. Proc. IEEE 55, 10 (1967), 1701--1706.
[24]
Yun Lei, Nicolas Scheffer, Luciana Ferrer, and Mitchell McLaren. 2014. A novel scheme for speaker recognition using a phonetically-aware deep neural network. In Proc. IEEE ICASSP. Florence, Italy, 1695--1699.
[25]
Mengyuan Li, Yan Meng, Junyi Liu, Haojin Zhu, Xiaohui Liang, Yao Liu, and Na Ruan. 2016. When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals. In Proc. ACM CCS. Vienna, Austria, 1068--1079.
[26]
Li Lu, Jiadi Yu, Yingying Chen, Hongbo Liu, Yanmin Zhu, Linghe Kong, and Minglu Li. 2019. Lip Reading-Based User Authentication Through Acoustic Sensing on Smartphones. IEEE/ACM Transactions on Networking 27, 1 (2019), 447--460.
[27]
Li Lu, Jiadi Yu, Yingying Chen, Hongbo Liu, Yanmin Zhu, Yunfei Liu, and Minglu Li. 2018. LipPass: Lip Reading-based User Authentication on Smartphones Leveraging Acoustic Signals. In Proc. IEEE INFOCOM. Honolulu, HI, USA, 1466--1474.
[28]
Li Lu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Minglu Li, and Xiangyu Xu. 2019. I3: Sensing Scrolling Human-Computer Interactions for Intelligent Interest Inference on Smartphones. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 3 (2019), 97:1--97:22.
[29]
Li Lu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Xiangyu Xu, Guangtao Xue, and Minglu Li. 2019. KeyListener: Inferring Keystrokes on QWERTY Keyboard of Touch Screen through Acoustic Signals. In Proc. IEEE INFOCOM. Paris, France, 1--9.
[30]
Wenguang Mao, Jian He, and Lili Qiu. 2016. CAT: high-precision acoustic motion tracking. In Proc. ACM MobiCom. New York City, NY, USA, 69--81.
[31]
Wenguang Mao, Mei Wang, and Lili Qiu. 2018. AIM: Acoustic Imaging on a Mobile. In Proc. ACM MobiSys. Munich, Germany, 468--481.
[32]
Pavel Matějka, Ondřej Glembek, Fabio Castaldo, Md Jahangir Alam, Oldřich Plchot, Patrick Kenny, Lukáš Burget, and Jan Černocky. 2011. Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In Proc. IEEE ICASSP. Prague, Czech Republic, 4828--4831.
[33]
Dibya Mukhopadhyay, Maliheh Shirvanian, and Nitesh Saxena. 2015. All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines. In Proc. ESORICS. Springer, Vienna, Austria, 599--621.
[34]
A. Nagrani, J. S. Chung, and A. Zisserman. 2017. VoxCeleb: a large-scale speaker identification dataset. In Proc. ISCA INTERSPEECH. Stockholm, Sweden, 2616--2620.
[35]
Swadhin Pradhan, Ghufran Baig, Wenguang Mao, Lili Qiu, Guohai Chen, and Bo Yang. 2018. Smartphone-based Acoustic Indoor Space Mapping. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 2, Article 75 (2018), 26 pages.
[36]
Swadhin Pradhan, Wei Sun, Ghufran Baig, and Lili Qiu. 2019. Combating Replay Attacks Against Voice Assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 3 (2019), 100:1--100:26.
[37]
K. Qian, C. Wu, F. Xiao, Y. Zheng, Y. Zhang, Z. Yang, and Y. Liu. 2018. Acousticcardiogram: Monitoring Heartbeats using Acoustic Signals on Smart Devices. In Proc. IEEE INFOCOM. Honolulu, HI, USA, 1574--1582.
[38]
Douglas A. Reynolds. 1997. Comparison of Background Normalization Methods for Text-Independent Speaker Verification. In Proc. ISCA EUROSPEECH. Rhodes, Greece, 963--966.
[39]
Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn. 2000. Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 1 (2000), 19--41.
[40]
Samsung. 2017. Iris recognition on Galaxy S8. [Online]. Available: https://www.samsung.com/au/iris/. (2017).
[41]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proc. IEEE CVPR. Boston, MA, USA, 815--823.
[42]
Wei Shang and Maryhelen Stevenson. 2010. Score normalization in playback attack detection. In Proc. IEEE ICASSP. Dallas, Texas, USA, 1678--1681.
[43]
Sigurdur Sigurdsson, Kaare Brandt Petersen, and Tue Lehn-Schiøler. 2006. Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music. In Proc. ISMIR. Victoria, Canada, 286--289.
[44]
Merrill Ivan Skolnik. 1970. Radar handbook. McGraw-Hill, Incorporated, New York, NY, USA.
[45]
Jiayao Tan, Cam-Tu Nguyen, and Xiaoliang Wang. 2017. SilentTalk: Lip reading through ultrasonic sensing on mobile phones. In Proceedings of IEEE INFOCOM. IEEE, Atlanta, GA, USA, 1--9.
[46]
Jiayao Tan, Xiaoliang Wang, Cam-Tu Nguyen, and Yu Shi. 2018. SilentKey: A New Authentication Framework Through Ultrasonic-based Lip Reading. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 1 (2018), 36:1--36:18.
[47]
Emanuel von Zezschwitz, Paul Dunphy, and Alexander De Luca. 2013. Patterns in the Wild: A Field Study of the Usability of Pattern and Pin-based Authentication on Mobile Devices. In Proc. ACM MobileHCI. Munich, Germany, 261--270.
[48]
Tianben Wang, Daqing Zhang, Yuanqing Zheng, Tao Gu, Xingshe Zhou, and Bernadette Dorizzi. 2018. C-FMCW Based Contactless Respiration Detection Using Acoustic Signal. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4 (2018), 170:1--170:20.
[49]
Zhi-Feng Wang, Gang Wei, and Qian-Hua He. 2011. Channel pattern noise based playback attack detection algorithm for speaker recognition. In Proc. IEEE ICMLC. Guilin, China, 1708--1713.
[50]
Wechat. 2015. Voiceprint: The New Wechat Password. [Online]. Available: https://blog.wechat.com/2015/05/21/voiceprint-the-new-wechat-password/. (2015).
[51]
Zhizheng Wu, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Federico Alegre, and Haizhou Li. 2015. Spoofing and countermeasures for speaker verification: A survey. Speech Communication 66 (2015), 130--153.
[52]
Xiangyu Xu, Hang Gao, Jiadi Yu, Yingying Chen, Yanmin Zhu, Guangtao Xue, and Minglu Li. 2017. ER: Early recognition of inattentive driving leveraging audio devices on smartphones. In Proc. IEEE INFOCOM. Atlanta, GA, USA, 1--9.
[53]
Xiangyu Xu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Linghe Kong, and Minglu Li. 2019. BreathListener: Fine-grained Breathing Monitoring in Driving Environments Utilizing Acoustic Signals. In Proc. ACM MobiSys. Seoul, South Korea, 1--13.
[54]
Chen Yan, Yan Long, Xiaoyu Ji, and Wenyuan Xu. 2019. The Catcher in the Field: A Fieldprint Based Spoofing Detection for Text-Independent Speaker Verification. In Proc. ACM CCS. London, United Kingdom, 1215--1229.
[55]
J. Yan, A. Blackwell, R. Anderson, and A. Grant. 2004. Password memorability and security: empirical results. IEEE Security Privacy 2, 5 (2004), 25--31.
[56]
Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-grained acoustic-based device-free tracking. In Proc. ACM MobiSys. Niagara Falls, NY, USA, 15--28.
[57]
Matthew D Zeiler, Graham W Taylor, Rob Fergus, et al. 2011. Adaptive deconvolutional networks for mid and high level feature learning. In Proc. IEEE ICCV. Barcelona, Spain, 2018--2025.
[58]
Linghan Zhang, Sheng Tan, and Jie Yang. 2017. Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication. In Proc. ACM CCS. Dallas, TX, USA, 57--71.
[59]
Linghan Zhang, Sheng Tan, Jie Yang, and Yingying Chen. 2016. Voicelive: A phoneme localization based liveness detection for voice authentication on smartphones. In Proc. ACM CCS. Vienna, Austria, 1080--1091.
[60]
Man Zhou, Qian Wang, Jingxiao Yang, Qi Li, Feng Xiao, Zhibo Wang, and Xiaofeng Chen. 2018. PatternListener: Cracking Android Pattern Lock Using Acoustic Signals. In Proc. ACM CCS. Toronto, Canada, 1775--1787.

Cited By

View all
  • (2024)Accuth+: Accelerometer-Based Anti-Spoofing Voice Authentication on Wrist-Worn WearablesIEEE Transactions on Mobile Computing10.1109/TMC.2023.331483723:5(5571-5588)Online publication date: May-2024
  • (2024)Memory-Augmented Autoencoder based Continuous Authentication on Smartphones with Conditional Transformer GANsIEEE Transactions on Mobile Computing10.1109/TMC.2023.3290834(1-16)Online publication date: 2024
  • (2024)Toward Pitch-Insensitive Speaker Verification via SoundfieldIEEE Internet of Things Journal10.1109/JIOT.2023.329000111:1(1175-1189)Online publication date: 1-Jan-2024
  • Show More Cited By

Index Terms

  1. VocalLock: Sensing Vocal Tract for Passphrase-Independent User Authentication Leveraging Acoustic Signals on Smartphones

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
      Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 4, Issue 2
      June 2020
      771 pages
      EISSN:2474-9567
      DOI:10.1145/3406789
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 June 2020
      Published in IMWUT Volume 4, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. FMCW
      2. User authentication
      3. acoustic signal
      4. passphrase-independent
      5. vocal-tract behavior

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)38
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Accuth+: Accelerometer-Based Anti-Spoofing Voice Authentication on Wrist-Worn WearablesIEEE Transactions on Mobile Computing10.1109/TMC.2023.331483723:5(5571-5588)Online publication date: May-2024
      • (2024)Memory-Augmented Autoencoder based Continuous Authentication on Smartphones with Conditional Transformer GANsIEEE Transactions on Mobile Computing10.1109/TMC.2023.3290834(1-16)Online publication date: 2024
      • (2024)Toward Pitch-Insensitive Speaker Verification via SoundfieldIEEE Internet of Things Journal10.1109/JIOT.2023.329000111:1(1175-1189)Online publication date: 1-Jan-2024
      • (2023)Echo-ID: Smartphone Placement Region Identification for Context-Aware ComputingSensors10.3390/s2309430223:9(4302)Online publication date: 26-Apr-2023
      • (2023)TwinkleTwinkleProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35962387:2(1-30)Online publication date: 12-Jun-2023
      • (2023)LIPAuth: Hand-dependent Light Intensity Patterns for Resilient User AuthenticationACM Transactions on Sensor Networks10.1145/357290919:3(1-29)Online publication date: 8-May-2023
      • (2023)PD-FMCW: Push the Limit of Device-Free Acoustic Sensing Using Phase Difference in FMCWIEEE Transactions on Mobile Computing10.1109/TMC.2022.316263122:8(4865-4880)Online publication date: 1-Aug-2023
      • (2023) HearFit + : Personalized Fitness Monitoring via Audio Signals on Smart Speakers IEEE Transactions on Mobile Computing10.1109/TMC.2021.312568422:5(2756-2770)Online publication date: 1-May-2023
      • (2023)BackLip: Passphrase-Independent Lip-reading User Authentication with Backscatter Signals2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS)10.1109/IWQoS57198.2023.10188767(1-10)Online publication date: 19-Jun-2023
      • (2022)MetaEar: Imperceptible Acoustic Side Channel Continuous Authentication Based on ERTFElectronics10.3390/electronics1120340111:20(3401)Online publication date: 20-Oct-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media