research-article

Public Access

Face-Mic: inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors

Authors:

Tianfang Zhang,

Jiadi YuAuthors Info & Claims

MobiCom '21: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking

Pages 478 - 490

https://doi.org/10.1145/3447993.3483272

Published: 25 October 2021 Publication History

Abstract

Augmented reality/virtual reality (AR/VR) has extended beyond 3D immersive gaming to a broader array of applications, such as shopping, tourism, education. And recently there has been a large shift from handheld-controller dominated interactions to headset-dominated interactions via voice interfaces. In this work, we show a serious privacy risk of using voice interfaces while the user is wearing the face-mounted AR/VR devices. Specifically, we design an eavesdropping attack, Face-Mic, which leverages speech-associated subtle facial dynamics captured by zero-permission motion sensors in AR/VR headsets to infer highly sensitive information from live human speech, including speaker gender, identity, and speech content. Face-Mic is grounded on a key insight that AR/VR headsets are closely mounted on the user's face, allowing a potentially malicious app on the headset to capture underlying facial dynamics as the wearer speaks, including movements of facial muscles and bone-borne vibrations, which encode private biometrics and speech characteristics. To mitigate the impacts of body movements, we develop a signal source separation technique to identify and separate the speech-associated facial dynamics from other types of body movements. We further extract representative features with respect to the two types of facial dynamics. We successfully demonstrate the privacy leakage through AR/VR headsets by deriving the user's gender/identity and extracting speech information via the development of a deep learning-based framework. Extensive experiments using four mainstream VR headsets validate the generalizability, effectiveness, and high accuracy of Face-Mic.

References

[1]

S. A. Anand and N. Saxena. 2018. Speechless: Analyzing the Threat to Speech Privacy from Smartphone Motion Sensors. In Proceedings of IEEE Symposium on Security and Privacy (SP). 1000--1017.

[2]

Android. 2020. MediaRecorder overview. https://developer.android.com/guide/topics/media/mediarecorder.

[3]

Barry Arons. 1992. A review of the cocktail party effect. Journal of the American Voice I/O Society 12, 7 (1992), 35--50.

[4]

Zhongjie Ba, Tianhang Zheng, Xinyu Zhang, Zhan Qin, Baochun Li, Xue Liu, and Kui Ren. 2020. Learning-based practical smartphone eavesdropping with built-in accelerometer. In Proceedings of the Network and Distributed Systems Security Symposium (NDSS). 23--26.

[5]

BFW. 2020. VR in Advertising: Examples and Predictions for the Next Decade. https://www.gobfw.com/vr-advertising/vr-in-advertising-examples-predictions-for-2020s/.

[6]

Bootcamp. 2020. Case study: Navigating shopping malls with augmented reality. https://medium.com/design-bootcamp/navigation-in-shopping-malls-through-augmented-reality-d8194f1a7a23.

[7]

BOSCH. 2020. IMU: BMI160. https://www.bosch-sensortec.com/products/motion-sensors/imus/bmi160.html.

[8]

Carlijn VC Bouten, Karel TM Koekkoek, Maarten Verduin, Rens Kodde, and Jan D Janssen. 1997. A triaxial accelerometer and portable data processing unit for the assessment of daily physical activity. IEEE transactions on biomedical engineering 44, 3 (1997), 136--147.

[9]

The Financial Brand. 2017. 10 Ways Banks And Credit Unions Are Using Virtual Reality. https://thefinancialbrand.com/68593/banks-credit-unions-finances-virtual-reality/.

[10]

S. Chen, Zupei Li, F. Dangelo, C. Gao, and X. Fu. 2018. A Case Study of Security and Privacy Threats from Augmented Reality (AR). 2018 International Conference on Computing, Networking and Communications (IEEE ICNC) (2018), 442--446.

[11]

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17, 1 (2016), 2096--2030.

Digital Library

[12]

Google. 2020. Google Cardboard. https://arvr.google.com/cardboard/.

[13]

Paula Henry and Tomasz R Letowski. 2007. Bone conduction: Anatomy, physiology, and communication. Technical Report. Army Research Lab Aberdeen Proving Ground Human Research and Engineering.

[14]

InvenSense. 2020. MPU-6500 Six-Axis (Gyro + Accelerometer) MEMS Motion-Tracking Devices. https://invensense.tdk.com/products/motion-tracking/6-axis/mpu-6500/.

[15]

Suman Jana, David Molnar, Alexander Moshchuk, Alan Dunn, Benjamin Livshits, Helen J Wang, and Eyal Ofek. 2013. Enabling fine-grained permissions for augmented reality applications with recognizers. In Proceedings of USENIX Security Symposium. 415--430.

[16]

Patrick Juola and Philip Zimmermann. 1996. Whole-Word Phonetic Distances and the PGPfone Alphabet. In The International Conference of Spoken Language Processing(ICSLP). 98--101.

[17]

Kiron Lebeck, Tadayoshi Kohno, and Franziska Roesner. 2016. How to Safely Augment Reality: Challenges and Directions. Workshop on Mobile Computing Systems and Applications (ACM HotMobile).

Digital Library

[18]

Zhen Ling, Zupei Li, Chen Chen, Junzhou Luo, W. Yu, and X. Fu. 2019. I Know What You Enter on Gear VR. IEEE Conference on Communications and Network Security (IEEE CNS) (2019), 241--249.

[19]

Google LLC. 2020. YouTube VR. https://play.google.com/store/apps/details?id=com.google.android.apps.youtube.vr&hl=en_US.

[20]

Héctor A Cordourier Maruri, Paulo Lopez-Meyer, Jonathan Huang, Willem Marco Beltman, Lama Nachman, and Hong Lu. 2018. V-Speech: Noise-Robust Speech Capturing Glasses Using Vibration Sensors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1--23.

Digital Library

[21]

Maranda McBride, Phuong Tran, and Tomasz Letowski. 2008. Head mapping: Search for an optimum bone microphone placement. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 52. SAGE Publications Sage CA: Los Angeles, CA, 503--507.

[22]

Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing Speech from Gyroscope Signals. In Proceedings of USENIX Security Symposium. 1053--1067.

[23]

Microsoft. 2020. Use Speech in Windows Mixed Reality. https://support.microsoft.com/en-us/windows/use-speech-in-windows-mixed-reality-af24e0a9-7e17-b542-3720-203e278e588e.

[24]

Brian B Monson, Eric J Hunter, Andrew J Lotto, and Brad H Story. 2014. The perceptual significance of high-frequency energy in the human voice. Frontiers in Psychology 5 (2014), 587.

[25]

Nick Nikiforakis, Alexandros Kapravelos, Wouter Joosen, Christopher Kruegel, Frank Piessens, and Giovanni Vigna. 2013. Cookieless monster: Exploring the ecosystem of web-based device fingerprinting. In IEEE Symposium on Security and Privacy (IEEE S&P). 541--555.

Digital Library

[26]

Obsess. 2020. Virtual Reality Shopping Platform. https://obsessar.com/virtual-reality-shopping/.

[27]

Oculus. 2020. Oculus PC SDK v23. https://developer.oculus.com/downloads/package/oculus-sdk-for-windows/.

[28]

Oculus. 2020. Oculus Privacy Policy. https://www.oculus.com/legal/privacy-policy-for-oculus-account-users/.

[29]

Oculus. 2020. VrApi. https://uploadvr.com/oculus-go-rooms-alternatives/.

[30]

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).

[31]

George Doddington R. Gary Leonard. 1993. TIDIGITS LDC93S10. Web Download. Philadelphia: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC93S10.

[32]

Franziska Roesner, Tadayoshi Kohno, and David Molnar. 2014. Security and Privacy for Augmented Reality Systems. Commun. ACM (2014), 88--96.

[33]

Franziska Roesner, David Molnar, Alexander Moshchuk, Tadayoshi Kohno, and Helen J Wang. 2014. World-driven access control for continuous sensing. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (ACM CCS). 1169--1181.

Digital Library

[34]

Cong Shi, Yan Wang, Yingying Chen, Nitesh Saxena, and Chen Wang^*. 2020. WearID: Low-Effort Wearable-Assisted Authentication of Voice Commands via Cross-Domain Comparison without Training. In Annual Computer Security Applications Conference (ACSAC). 829--842.

Digital Library

[35]

SkyPaw. 2021. Decibel X: dB Sound Level Meter. https://apps.apple.com/us/app/decibel-x-db-sound-level-meter/id448155923.

[36]

Statista. 2020. Immersive technology consumer market revenue worldwide from 2018 to 2023. https://www.statista.com/statistics/936078/worldwide-consumer-immersive-technology-market-revenue/.

[37]

STEAMWORKS. 2020. OpenVR. https://partner.steamgames.com/doc/features/steamvr/openvr.

[38]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS). 6000--6010.

[39]

Chen Wang, S Abhishek Anand, Jian Liu, Payton Walker, Yingying Chen, and Nitesh Saxena. 2019. Defeating hidden audio channel attacks on voice assistants via audio-induced surface vibrations. In Annual Computer Security Applications Conference (ACSAC). 42--56.

Digital Library

[40]

Wikipedia. 2020. Virtual reality games. https://en.wikipedia.org/wiki/Category:Virtual_reality_games.

[41]

Li Zhang, Parth H Pathak, Muchen Wu, Yixin Zhao, and Prasant Mohapatra. 2015. Accelword: Energy efficient hotword detection through accelerometer. In Proceedings of the Annual International Conference on Mobile Systems, Applications, and Services (ACM MobiSys). ACM, 301--315.

Digital Library

[42]

Linghan Zhang, S. Tan, Z. Wang, Yili Ren, and J. Yang. 2020. VibLive: A Continuous Liveness Detection for Secure Voice User Interface in IoT Environment. Proceedings of Annual Computer Security Applications Conference (ACSAC) (2020).

Digital Library

[43]

Han Zhao, Shanghang Zhang, Guanhang Wu, José MF Moura, Joao P Costeira, and Geoffrey J Gordon. 2018. Adversarial multiple source domain adaptation. Advances in Neural Information Processing Systems (NeurIPS) 31 (2018), 8559--8570.

Cited By

Xiao GLing ZFan QXu XWu WDing DChen CFu X(2024)Pivot: Panoramic-image-based VR User Authentication against Side-Channel AttacksACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3694975Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3694975
Chen YYu JChen YKong LZhu YChen YOkoshi TKo JLiKamWa R(2024)RFSpy: Eavesdropping on Online Conversations with Out-of-Vocabulary Words by Sensing Metal Coil Vibration of Headsets Leveraging RFIDProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661887(169-182)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661887
Yuan JYang CCai DWang SYuan XZhang ZLi XZhang DMei HJia XWang SXu MGanesan DLane NShi W(2024)Mobile Foundation Model as FirmwareProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649361(279-295)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649361
Show More Cited By

Index Terms

Face-Mic: inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors
1. Security and privacy
  1. Security in hardware
    1. Hardware attacks and countermeasures

Recommendations

Personalized health monitoring via vital sign measurements leveraging motion sensors on AR/VR headsets
MobiSys '22: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services

Augmented reality/virtual reality (AR/VR) headsets have attracted millions of users and gained predictable popularity. However, long-period usage of immersive technology may lead to health issues (e.g., cybersickness, anxiety). In this poster, we design ...
Poster: Unobtrusively Mining Vital Sign and Embedded Sensitive Info via AR/VR Motion Sensors
MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

Despite the rapid growth of augmented reality and virtual reality (AR/VR) in various applications, the understanding of information leakage through sensor-rich headsets remains in its infancy. In this poster, we investigate an unobtrusive privacy ...
Continuous blood pressure monitoring using low-cost motion sensors on AR/VR headsets
MobiSys '22: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services

The Augmented reality/Virtual reality (AR/VR) industry has ushered in a period of rapid development. The next decade leaves a massive imagination for AR/VR in terms of end product form, software, content, applications, and user increment. The AR & VR ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MobiCom '21: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking

October 2021

887 pages

ISBN:9781450383424

DOI:10.1145/3447993

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ACM MobiCom '21

Sponsor:

SIGMOBILE

ACM MobiCom '21: The 27th Annual International Conference on Mobile Computing and Networking

October 25 - 29, 2021

Louisiana, New Orleans

Acceptance Rates

Overall Acceptance Rate 440 of 2,972 submissions, 15%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
1,769
Total Downloads

Downloads (Last 12 months)540
Downloads (Last 6 weeks)59

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xiao GLing ZFan QXu XWu WDing DChen CFu X(2024)Pivot: Panoramic-image-based VR User Authentication against Side-Channel AttacksACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3694975Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3694975
Chen YYu JChen YKong LZhu YChen YOkoshi TKo JLiKamWa R(2024)RFSpy: Eavesdropping on Online Conversations with Out-of-Vocabulary Words by Sensing Metal Coil Vibration of Headsets Leveraging RFIDProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661887(169-182)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661887
Yuan JYang CCai DWang SYuan XZhang ZLi XZhang DMei HJia XWang SXu MGanesan DLane NShi W(2024)Mobile Foundation Model as FirmwareProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649361(279-295)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649361
Sparrow LGalwey CLoveridge BGlasser SOsborne MKelly R(2024)Heart and Soul: The Ethics of Biometric Capture in Immersive Artistic PerformanceProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642309(1-23)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642309
Cao YLi FChen HLiu XZhai SYang SWang Y(2024)Live Speech Recognition via Earphone Motion SensorsIEEE Transactions on Mobile Computing10.1109/TMC.2023.333321423:6(7284-7300)Online publication date: Jun-2024
https://doi.org/10.1109/TMC.2023.3333214
Han FYang PDu HLi X(2024)Accuth+: Accelerometer-Based Anti-Spoofing Voice Authentication on Wrist-Worn WearablesIEEE Transactions on Mobile Computing10.1109/TMC.2023.331483723:5(5571-5588)Online publication date: May-2024
https://doi.org/10.1109/TMC.2023.3314837
Li JMeng YZhan YZhang LZhu H(2024)Dangers Behind Charging VR Devices: Hidden Side Channel Attacks via Charging CablesIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.346502619(8892-8907)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3465026
Khalili HChen APapaiakovou TJacques TChien HLiu CDing AHass AZonouz SSehatbakhsh N(2024)Virtual Keymysteries Unveiled: Detecting Keystrokes in VR with External Side-Channels2024 IEEE Security and Privacy Workshops (SPW)10.1109/SPW63631.2024.00031(260-266)Online publication date: 23-May-2024
https://doi.org/10.1109/SPW63631.2024.00031
Xu XChen YLing ZLu LLuo JFu X(2024)mmEar: Push the Limit of COTS mmWave Eavesdropping on HeadphonesIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621229(351-360)Online publication date: 20-May-2024
https://doi.org/10.1109/INFOCOM52122.2024.10621229
Liu ZZhang YMiao C(2023)Protecting Your Voice from Speech Synthesis AttacksProceedings of the 39th Annual Computer Security Applications Conference10.1145/3627106.3627183(394-408)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1145/3627106.3627183
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents