Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3447993.3483272acmconferencesArticle/Chapter ViewAbstractPublication PagesmobicomConference Proceedingsconference-collections
research-article
Public Access

Face-Mic: inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors

Published: 25 October 2021 Publication History

Abstract

Augmented reality/virtual reality (AR/VR) has extended beyond 3D immersive gaming to a broader array of applications, such as shopping, tourism, education. And recently there has been a large shift from handheld-controller dominated interactions to headset-dominated interactions via voice interfaces. In this work, we show a serious privacy risk of using voice interfaces while the user is wearing the face-mounted AR/VR devices. Specifically, we design an eavesdropping attack, Face-Mic, which leverages speech-associated subtle facial dynamics captured by zero-permission motion sensors in AR/VR headsets to infer highly sensitive information from live human speech, including speaker gender, identity, and speech content. Face-Mic is grounded on a key insight that AR/VR headsets are closely mounted on the user's face, allowing a potentially malicious app on the headset to capture underlying facial dynamics as the wearer speaks, including movements of facial muscles and bone-borne vibrations, which encode private biometrics and speech characteristics. To mitigate the impacts of body movements, we develop a signal source separation technique to identify and separate the speech-associated facial dynamics from other types of body movements. We further extract representative features with respect to the two types of facial dynamics. We successfully demonstrate the privacy leakage through AR/VR headsets by deriving the user's gender/identity and extracting speech information via the development of a deep learning-based framework. Extensive experiments using four mainstream VR headsets validate the generalizability, effectiveness, and high accuracy of Face-Mic.

References

[1]
S. A. Anand and N. Saxena. 2018. Speechless: Analyzing the Threat to Speech Privacy from Smartphone Motion Sensors. In Proceedings of IEEE Symposium on Security and Privacy (SP). 1000--1017.
[2]
Android. 2020. MediaRecorder overview. https://developer.android.com/guide/topics/media/mediarecorder.
[3]
Barry Arons. 1992. A review of the cocktail party effect. Journal of the American Voice I/O Society 12, 7 (1992), 35--50.
[4]
Zhongjie Ba, Tianhang Zheng, Xinyu Zhang, Zhan Qin, Baochun Li, Xue Liu, and Kui Ren. 2020. Learning-based practical smartphone eavesdropping with built-in accelerometer. In Proceedings of the Network and Distributed Systems Security Symposium (NDSS). 23--26.
[5]
BFW. 2020. VR in Advertising: Examples and Predictions for the Next Decade. https://www.gobfw.com/vr-advertising/vr-in-advertising-examples-predictions-for-2020s/.
[6]
Bootcamp. 2020. Case study: Navigating shopping malls with augmented reality. https://medium.com/design-bootcamp/navigation-in-shopping-malls-through-augmented-reality-d8194f1a7a23.
[7]
BOSCH. 2020. IMU: BMI160. https://www.bosch-sensortec.com/products/motion-sensors/imus/bmi160.html.
[8]
Carlijn VC Bouten, Karel TM Koekkoek, Maarten Verduin, Rens Kodde, and Jan D Janssen. 1997. A triaxial accelerometer and portable data processing unit for the assessment of daily physical activity. IEEE transactions on biomedical engineering 44, 3 (1997), 136--147.
[9]
The Financial Brand. 2017. 10 Ways Banks And Credit Unions Are Using Virtual Reality. https://thefinancialbrand.com/68593/banks-credit-unions-finances-virtual-reality/.
[10]
S. Chen, Zupei Li, F. Dangelo, C. Gao, and X. Fu. 2018. A Case Study of Security and Privacy Threats from Augmented Reality (AR). 2018 International Conference on Computing, Networking and Communications (IEEE ICNC) (2018), 442--446.
[11]
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17, 1 (2016), 2096--2030.
[12]
Google. 2020. Google Cardboard. https://arvr.google.com/cardboard/.
[13]
Paula Henry and Tomasz R Letowski. 2007. Bone conduction: Anatomy, physiology, and communication. Technical Report. Army Research Lab Aberdeen Proving Ground Human Research and Engineering.
[14]
InvenSense. 2020. MPU-6500 Six-Axis (Gyro + Accelerometer) MEMS Motion-Tracking Devices. https://invensense.tdk.com/products/motion-tracking/6-axis/mpu-6500/.
[15]
Suman Jana, David Molnar, Alexander Moshchuk, Alan Dunn, Benjamin Livshits, Helen J Wang, and Eyal Ofek. 2013. Enabling fine-grained permissions for augmented reality applications with recognizers. In Proceedings of USENIX Security Symposium. 415--430.
[16]
Patrick Juola and Philip Zimmermann. 1996. Whole-Word Phonetic Distances and the PGPfone Alphabet. In The International Conference of Spoken Language Processing(ICSLP). 98--101.
[17]
Kiron Lebeck, Tadayoshi Kohno, and Franziska Roesner. 2016. How to Safely Augment Reality: Challenges and Directions. Workshop on Mobile Computing Systems and Applications (ACM HotMobile).
[18]
Zhen Ling, Zupei Li, Chen Chen, Junzhou Luo, W. Yu, and X. Fu. 2019. I Know What You Enter on Gear VR. IEEE Conference on Communications and Network Security (IEEE CNS) (2019), 241--249.
[19]
Google LLC. 2020. YouTube VR. https://play.google.com/store/apps/details?id=com.google.android.apps.youtube.vr&hl=en_US.
[20]
Héctor A Cordourier Maruri, Paulo Lopez-Meyer, Jonathan Huang, Willem Marco Beltman, Lama Nachman, and Hong Lu. 2018. V-Speech: Noise-Robust Speech Capturing Glasses Using Vibration Sensors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1--23.
[21]
Maranda McBride, Phuong Tran, and Tomasz Letowski. 2008. Head mapping: Search for an optimum bone microphone placement. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 52. SAGE Publications Sage CA: Los Angeles, CA, 503--507.
[22]
Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing Speech from Gyroscope Signals. In Proceedings of USENIX Security Symposium. 1053--1067.
[23]
Microsoft. 2020. Use Speech in Windows Mixed Reality. https://support.microsoft.com/en-us/windows/use-speech-in-windows-mixed-reality-af24e0a9-7e17-b542-3720-203e278e588e.
[24]
Brian B Monson, Eric J Hunter, Andrew J Lotto, and Brad H Story. 2014. The perceptual significance of high-frequency energy in the human voice. Frontiers in Psychology 5 (2014), 587.
[25]
Nick Nikiforakis, Alexandros Kapravelos, Wouter Joosen, Christopher Kruegel, Frank Piessens, and Giovanni Vigna. 2013. Cookieless monster: Exploring the ecosystem of web-based device fingerprinting. In IEEE Symposium on Security and Privacy (IEEE S&P). 541--555.
[26]
Obsess. 2020. Virtual Reality Shopping Platform. https://obsessar.com/virtual-reality-shopping/.
[27]
Oculus. 2020. Oculus PC SDK v23. https://developer.oculus.com/downloads/package/oculus-sdk-for-windows/.
[28]
Oculus. 2020. Oculus Privacy Policy. https://www.oculus.com/legal/privacy-policy-for-oculus-account-users/.
[29]
Oculus. 2020. VrApi. https://uploadvr.com/oculus-go-rooms-alternatives/.
[30]
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).
[31]
George Doddington R. Gary Leonard. 1993. TIDIGITS LDC93S10. Web Download. Philadelphia: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC93S10.
[32]
Franziska Roesner, Tadayoshi Kohno, and David Molnar. 2014. Security and Privacy for Augmented Reality Systems. Commun. ACM (2014), 88--96.
[33]
Franziska Roesner, David Molnar, Alexander Moshchuk, Tadayoshi Kohno, and Helen J Wang. 2014. World-driven access control for continuous sensing. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (ACM CCS). 1169--1181.
[34]
Cong Shi, Yan Wang, Yingying Chen, Nitesh Saxena, and Chen Wang*. 2020. WearID: Low-Effort Wearable-Assisted Authentication of Voice Commands via Cross-Domain Comparison without Training. In Annual Computer Security Applications Conference (ACSAC). 829--842.
[35]
SkyPaw. 2021. Decibel X: dB Sound Level Meter. https://apps.apple.com/us/app/decibel-x-db-sound-level-meter/id448155923.
[36]
Statista. 2020. Immersive technology consumer market revenue worldwide from 2018 to 2023. https://www.statista.com/statistics/936078/worldwide-consumer-immersive-technology-market-revenue/.
[37]
STEAMWORKS. 2020. OpenVR. https://partner.steamgames.com/doc/features/steamvr/openvr.
[38]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS). 6000--6010.
[39]
Chen Wang, S Abhishek Anand, Jian Liu, Payton Walker, Yingying Chen, and Nitesh Saxena. 2019. Defeating hidden audio channel attacks on voice assistants via audio-induced surface vibrations. In Annual Computer Security Applications Conference (ACSAC). 42--56.
[40]
Wikipedia. 2020. Virtual reality games. https://en.wikipedia.org/wiki/Category:Virtual_reality_games.
[41]
Li Zhang, Parth H Pathak, Muchen Wu, Yixin Zhao, and Prasant Mohapatra. 2015. Accelword: Energy efficient hotword detection through accelerometer. In Proceedings of the Annual International Conference on Mobile Systems, Applications, and Services (ACM MobiSys). ACM, 301--315.
[42]
Linghan Zhang, S. Tan, Z. Wang, Yili Ren, and J. Yang. 2020. VibLive: A Continuous Liveness Detection for Secure Voice User Interface in IoT Environment. Proceedings of Annual Computer Security Applications Conference (ACSAC) (2020).
[43]
Han Zhao, Shanghang Zhang, Guanhang Wu, José MF Moura, Joao P Costeira, and Geoffrey J Gordon. 2018. Adversarial multiple source domain adaptation. Advances in Neural Information Processing Systems (NeurIPS) 31 (2018), 8559--8570.

Cited By

View all
  • (2024)Pivot: Panoramic-image-based VR User Authentication against Side-Channel AttacksACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3694975Online publication date: 9-Sep-2024
  • (2024)RFSpy: Eavesdropping on Online Conversations with Out-of-Vocabulary Words by Sensing Metal Coil Vibration of Headsets Leveraging RFIDProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661887(169-182)Online publication date: 3-Jun-2024
  • (2024)Mobile Foundation Model as FirmwareProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649361(279-295)Online publication date: 29-May-2024
  • Show More Cited By

Index Terms

  1. Face-Mic: inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MobiCom '21: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking
    October 2021
    887 pages
    ISBN:9781450383424
    DOI:10.1145/3447993
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. AR/VR headsets
    2. facial dynamics
    3. speech and speaker privacy

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ACM MobiCom '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 440 of 2,972 submissions, 15%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)540
    • Downloads (Last 6 weeks)59
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Pivot: Panoramic-image-based VR User Authentication against Side-Channel AttacksACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3694975Online publication date: 9-Sep-2024
    • (2024)RFSpy: Eavesdropping on Online Conversations with Out-of-Vocabulary Words by Sensing Metal Coil Vibration of Headsets Leveraging RFIDProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661887(169-182)Online publication date: 3-Jun-2024
    • (2024)Mobile Foundation Model as FirmwareProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649361(279-295)Online publication date: 29-May-2024
    • (2024)Heart and Soul: The Ethics of Biometric Capture in Immersive Artistic PerformanceProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642309(1-23)Online publication date: 11-May-2024
    • (2024)Live Speech Recognition via Earphone Motion SensorsIEEE Transactions on Mobile Computing10.1109/TMC.2023.333321423:6(7284-7300)Online publication date: Jun-2024
    • (2024)Accuth+: Accelerometer-Based Anti-Spoofing Voice Authentication on Wrist-Worn WearablesIEEE Transactions on Mobile Computing10.1109/TMC.2023.331483723:5(5571-5588)Online publication date: May-2024
    • (2024)Dangers Behind Charging VR Devices: Hidden Side Channel Attacks via Charging CablesIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.346502619(8892-8907)Online publication date: 2024
    • (2024)Virtual Keymysteries Unveiled: Detecting Keystrokes in VR with External Side-Channels2024 IEEE Security and Privacy Workshops (SPW)10.1109/SPW63631.2024.00031(260-266)Online publication date: 23-May-2024
    • (2024)mmEar: Push the Limit of COTS mmWave Eavesdropping on HeadphonesIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621229(351-360)Online publication date: 20-May-2024
    • (2023)Protecting Your Voice from Speech Synthesis AttacksProceedings of the 39th Annual Computer Security Applications Conference10.1145/3627106.3627183(394-408)Online publication date: 4-Dec-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media