Abstract
Virtual Reality (VR) Head Mounted Display’s (HMD) are equipped with a range of sensors, which have been recently exploited to infer users’ sensitive and private information through a deep learning-based eavesdropping attack that leverage facial dynamics. Mindful that the eavesdropping attack employs facial dynamics, which vary across race and gender, we evaluate the robustness of such attack under various users characteristics. We base our evaluation on the existing anthropological research that shows statistically significant differences for face width, length, and lip length among ethnic/racial groups, suggesting that a “challenger” with similar features (ethnicity/race and gender) to a victim might be able to more easily deceive the eavesdropper than when they have different features. By replicating the classification model in [17] and examining its accuracy with six different scenarios that vary the victim and attacker based on their ethnicity/race and gender, we show that our adversary is able to impersonate a user with the same ethnicity/race and gender more accurately, with an average accuracy difference between the original and adversarial setting being the lowest among all scenarios. Similarly, an adversary with different ethnicity/race and gender than the victim had the highest average accuracy difference, emphasizing an inherent bias in the fundamentals of the approach through impersonation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Oculus Quest 2 tech specs deep dive (2023). https://business.oculus.com/products/specs/
MediaRecorder overview (2023). https://developer.android.com/guide/topics/media/mediarecorder
Get Raw Sensor Data (2023). https://developer.oculus.com/documentation/unreal/unreal-blueprints-get-raw-sensor-data
Oculus SDK for developer (2023). https://developer.oculus.com/downloads/
Oculus Device Specifications (2023). https://developer.oculus.com/resources/oculus-device-specs/
Unitydocument: CommonUsages (2023). https://docs.unity3d.com/ScriptReference/XR.CommonUsages.html
How Facebook protects the privacy of your Voice Commands and Voice Dictation (2023). https://support.oculus.com/articles/in-vr-experiences/oculus-features/privacy-protection-with-voice-commands
tf.keras.losses.SparseCategoricalCrossentropy (2023). https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
Roark, D.A., Barrett, S.E., Spence, M.J., Abdi, H., O’Toole, A.J.: Psychological and neural perspectives on the role of motion in face recognition. Behav. Cogn. Neurosci. Rev. 2(1), 15–46 (2003)
Abhishek, A.S., Nitesh, S.: Speechless: analyzing the threat to speech privacy from smartphone motion sensors. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 1000–1017. IEEE (2018)
Akansu, A.N., Haddad, R.A.: Time-frequency representations. In: Multiresolution Signal Decomposition, 2nd edn., pp. 331–390. Academic Press, San Diego (2001). https://doi.org/10.1016/B978-012047141-6/50005-7. https://www.sciencedirect.com/science/article/pii/B9780120471416500057
Alan, C., Lei, Y., Erik, A.: Teaching language and culture with a virtual reality game. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 541–549 (2017)
Andrea, F., Marco, F., Xavier, G.G., Lea, L., Alberto, D.B.: Natural experiences in museums through virtual reality and voice commands. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1233–1234 (2017)
Antitza, D., François, B.: Gender estimation based on smile-dynamics. IEEE Trans. Inf. Forensics Secur. 12(3), 719–729 (2016)
Barry, A.: A review of the cocktail party effect. J. Am. Voice I/O Soc. 12(7), 35–50 (1992)
Burdea, G.C., Coiffet, P.: Virtual Reality Technology. Wiley, Hoboken (2003)
Shi, C., et al.: Face-Mic: inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors. In: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, pp. 478–490 (2021)
Shi, C., Wang, Y., Chen, Y., Saxena, N., Wang, C.: WearID: low-effort wearable-assisted authentication of voice commands via cross-domain comparison without training. In: Annual Computer Security Applications Conference, pp. 829–842 (2020)
Florian, K., Thore, K., Florian, N., Erich, L.M.: Using hand tracking and voice commands to physically align virtual surfaces in AR for handwriting and sketching with HoloLens 2. In: Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology, pp. 1–3 (2021)
Segura, R.J., del Pino, F.J., Ogáyar, C.J., Rueda, A.J.: VR-OCKS: a virtual reality game for learning the basic concepts of programming. Comput. Appl. Eng. Educ. 28(1), 31–41 (2020)
Radianti, J., Majchrzak, T.A., Fromm, J., Stieglitz, S., Vom Brocke, J.: Virtual reality applications for higher educations: a market analysis (2021)
Zhang, L., Pathak, P.H., Wu, M., Zhao, Y., Mohapatra, P.: AccelWord: Energy efficient hotword detection through accelerometer. In: Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, pp. 301–315 (2015)
Durak, L., Arikan, O.: Short-time Fourier transform: two fundamental properties and an optimal implementation. IEEE Trans. Sig. Process. 51(5), 1231–1242 (2003)
Johns Hopkins Medicine: Vocal Cord Disorders (2023). https://www.hopkinsmedicine.org/health/conditions-and-diseases/vocal-cord-disorders
Thelwell, M., Chiu, C.Y., Bullas, A., Hart, J., Wheat, J., Choppin, S.: How shape-based anthropometry can complement traditional anthropometric techniques: a cross-sectional study. Sci. Rep. 10(1), 1–11 (2020)
Nick, N., Alexandros, K., Wouter, J., Christopher, K., Frank, P., Giovanni, V.: Cookieless monster: exploring the ecosystem of web-based device fingerprinting. In: 2013 IEEE Symposium on Security and Privacy, pp. 541–555. IEEE (2013)
Rick, P., Scott, K., Osamu, F.: Issues with lip sync animation: can you read my lips? In: Proceedings of Computer Animation 2002 (CA 2002), pp. 3–10. IEEE (2002)
Theodoros, G.: A method for silence removal and segmentation of speech signals, implemented in Matlab. University of Athens, Athens 2 (2009)
Ülkü, M.Y., Fazıl, Y.N., Amro, A., David, M.: A keylogging inference attack on air-tapping keyboards in virtual environments. In: 2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 765–774. IEEE (2022)
Yan, M., Dan, B., Gabi, N.: Gyrophone: recognizing speech from gyroscope signals. In: 23rd USENIX Security Symposium (USENIX Security 2014), pp. 1053–1067 (2014)
Zhuang, Z., Guan, J., Hsiao, H., Bradtmiller, B.: Evaluating the representativeness of the LANL respirator fit test panels for the current US civilian workers. J. Int. Soc. Respir. Prot. 21, 83–93 (2004)
Ba, Z., et al.: Learning-based practical smartphone eavesdropping with built-in accelerometer. In: NDSS (2020)
Ziqing, Z., Douglas, L., Stacey, B., Raymond, R., Ronald, S.: Facial anthropometric differences among gender, ethnicity, and age groups. Ann. Occup. Hyg. 54(4), 391–402 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Choi, S., Mohaisen, M., Nyang, D., Mohaisen, D. (2023). Revisiting the Deep Learning-Based Eavesdropping Attacks via Facial Dynamics from VR Motion Sensors. In: Wang, D., Yung, M., Liu, Z., Chen, X. (eds) Information and Communications Security. ICICS 2023. Lecture Notes in Computer Science, vol 14252. Springer, Singapore. https://doi.org/10.1007/978-981-99-7356-9_24
Download citation
DOI: https://doi.org/10.1007/978-981-99-7356-9_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7355-2
Online ISBN: 978-981-99-7356-9
eBook Packages: Computer ScienceComputer Science (R0)