Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3447993.3483252acmconferencesArticle/Chapter ViewAbstractPublication PagesmobicomConference Proceedingsconference-collections
research-article
Public Access

BioFace-3D: continuous 3d facial reconstruction through lightweight single-ear biosensors

Published: 25 October 2021 Publication History

Abstract

Over the last decade, facial landmark tracking and 3D reconstruction have gained considerable attention due to their numerous applications such as human-computer interactions, facial expression analysis, and emotion recognition, etc. Traditional approaches require users to be confined to a particular location and face a camera under constrained recording conditions (e.g., without occlusions and under good lighting conditions). This highly restricted setting prevents them from being deployed in many application scenarios involving human motions. In this paper, we propose the first single-earpiece lightweight biosensing system, BioFace-3D, that can unobtrusively, continuously, and reliably sense the entire facial movements, track 2D facial landmarks, and further render 3D facial animations. Our single-earpiece biosensing system takes advantage of the cross-modal transfer learning model to transfer the knowledge embodied in a high-grade visual facial landmark detection model to the low-grade biosignal domain. After training, our BioFace-3D can directly perform continuous 3D facial reconstruction from the biosignals, without any visual input. Without requiring a camera positioned in front of the user, this paradigm shift from visual sensing to biosensing would introduce new opportunities in many emerging mobile and IoT applications. Extensive experiments involving 16 participants under various settings demonstrate that BioFace-3D can accurately track 53 major facial landmarks with only 1.85 mm average error and 3.38% normalized mean error, which is comparable with most state-of-the-art camera-based solutions. The rendered 3D facial animations, which are in consistency with the real human facial movements, also validate the system's capability in continuous 3D facial reconstruction.

References

[1]
2021. Covidien Kendall Disposable Surface EMG/ECG/EKG Electrodes 1" (24mm). https://bio-medical.com/covidien-kendall-disposable-surface-emgecg-ekg-electrodes-1-24mm-50pkg.html
[2]
2021. Demo Video for BioFace-3D. https://mosis.eecs.utk.edu/bioface-3d.html
[3]
2021. Monsoon High Voltage Power Monitor. https://www.msoon.com/highvoltage-power-monitor
[4]
Takashi Amesaka, Hiroki Watanabe, and Masanori Sugimoto. 2019. Facial expression recognition using ear canal transfer function. In <u>Proceedings of the 23rd International Symposium on Wearable Computers.</u> 1--9.
[5]
Anwesha Banerjee, Shreyasi Datta, Monalisa Pal, Amit Konar, DN Tibarewala, and R Janarthanan. 2013. Classifying electrooculogram to detect directional eye movements. <u>Procedia Technology</u> 10 (2013), 67--75.
[6]
Open BCI. 2021. Cyton Biosensing Board (8-channels). https://shop.openbci.com/products/cyton-biosensing-board-8-channel?variant=38958638542
[7]
Matthew Brand. 1999. Voice puppetry. In <u>Proceedings of the 26th annual conference on Computer graphics and interactive techniques.</u> 21--28.
[8]
Sandra Carrasco and Miguel Ángel Sotelo UAH. 2020. D3. 3 Driver Monitoring Concept Report. (2020).
[9]
Lam Aun Cheah, James M Gilbert, José A González, Phil D Green, Stephen R Ell, Roger K Moore, and Ed Holdsworth. 2018. A Wearable Silent Speech Interface based on Magnetic Sensors with Motion-Artefact Removal. In <u>BIODEVICES.</u> 56--62.
[10]
Tuochao Chen, Benjamin Steeper, Kinan Alsheikh, Songyun Tao, François Guimbretière, and Cheng Zhang. 2020. C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-mounted Miniature Cameras. In <u>Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology.</u> 112--125.
[11]
Kyoungho Choi, Ying Luo, and Jenq-Neng Hwang. 2001. Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system. <u>Journal of VLSI signal processing systems for signal, image and video technology</u> 29, 1 (2001), 51--61.
[12]
Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor. 2001. Active appearance models. <u>IEEE Transactions on pattern analysis and machine intelligence</u> 23, 6 (2001), 681--685.
[13]
Timothy F Cootes, Christopher J Taylor, David H Cooper, and Jim Graham. 1995. Active shape models-their training and application. <u>Computer vision and image understanding</u> 61, 1 (1995), 38--59.
[14]
Darren Cosker, Dave Marshall, Paul L Rosin, and Yulia Hicks. 2004. Speech driven facial animation using a hidden markov coarticulation model. In <u>Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004.</u>, Vol. 1. IEEE, 128--131.
[15]
David Cristinacce and Timothy F Cootes. 2006. Feature detection and tracking with constrained local models. In <u>Bmvc</u>, Vol. 1. Citeseer, 3.
[16]
Bruce Denby, Tanja Schultz, Kiyoshi Honda, Thomas Hueber, Jim M Gilbert, and Jonathan S Brumberg. 2010. Silent speech interfaces. <u>Speech Communication</u> 52, 4 (2010), 270--287.
[17]
Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Style aggregated network for facial landmark detection. In <u>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.</u> 379--388.
[18]
Paul Ekman and Dacher Keltner. 1997. Universal facial expressions of emotion. <u>Segerstrale U, P. Molnar P, eds. Nonverbal communication: Where nature meets culture</u> (1997), 27--46.
[19]
Rosenberg Ekman. 1997. <u>What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS).</u> Oxford University Press, USA.
[20]
Sefik Emre Eskimez, Ross K Maddox, Chenliang Xu, and Zhiyao Duan. 2018. Generating talking face landmarks from speech. In <u>International Conference on Latent Variable Analysis and Signal Separation.</u> Springer, 372--381.
[21]
Sefik Emre Eskimez, Ross K Maddox, Chenliang Xu, and Zhiyao Duan. 2019. Noise-resilient training method for face landmark generation from speech. <u>IEEE/ACM Transactions on Audio, Speech, and Language Processing</u> 28 (2019), 27--38.
[22]
Avard Tennyson Fairbanks and Eugene F Fairbanks. 2005. <u>Human proportions for artists.</u> Fairbanks Art and Books.
[23]
Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, and Xiao-Jun Wu. 2018. Wing loss for robust facial landmark localisation with convolutional neural networks. In <u>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.</u> 2235--2245.
[24]
Taesik Gong, Yeonsu Kim, Jinwoo Shin, and Sung-Ju Lee. 2019. Metasense: few-shot adaptation to untrained conditions in deep mobile sensing. In <u>Proceedings of the 17th Conference on Embedded Networked Sensor Systems.</u> 110--123.
[25]
Mahyar Hamedi, Iman Mohammad Rezazadeh, and Mohammad Firoozabadi. 2011. Facial gesture recognition using two-channel bio-sensors configuration and fuzzy classifier: A pilot study. In <u>International Conference on Electrical, Control and Computer Engineering 2011 (InECCE).</u> IEEE, 338--343.
[26]
Mahyar Hamedi, Sh-Hussain Salleh, Mehdi Astaraki, and Alias Mohd Noor. 2013. EMG-based facial gesture recognition through versatile elliptic basis function neural network. <u>Biomedical engineering online</u> 12, 1 (2013), 73.
[27]
M Hamedi, Sh-Hussain Salleh, TS Tan, K Ismail, J Ali, C Dee-Uam, C Pavaganun, and PP Yupapin. 2011. Human facial neural activities and gesture recognition for machine-interfacing applications. <u>International Journal of Nanomedicine</u> 6 (2011), 3461.
[28]
Alexandru Eugen Ichim, Sofien Bouaziz, and Mark Pauly. 2015. Dynamic 3d avatar creation from hand-held video input. <u>ACM Transactions on Graphics (ToG)</u> 34, 4 (2015), 1--14.
[29]
Texas Instruments. 2020. ADS1299-x Low-Noise, 4-, 6-, 8-Channel, 24-Bit, Analog-to-Digital Converter for EEG and Biopotential Measurements. https://www.ti.com/lit/ds/symlink/ads1299.pdf?ts=1615154540121.
[30]
Yasha Iravantchi, Yang Zhang, Evi Bernitsas, Mayank Goel, and Chris Harrison. 2019. Interferi: Gesture Sensing Using On-Body Acoustic Interferometry. In <u>Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems.</u> 1--13.
[31]
Sasan Karamizadeh, Shahidan M Abdullah, Azizah A Manaf, Mazdak Zamani, and Alireza Hooman. 2013. An overview of principal component analysis. <u>Journal of Signal and Information Processing</u> 4, 3B (2013), 173.
[32]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. <u>arXiv preprint arXiv:1412.6980</u> (2014).
[33]
Serkan Kiranyaz, Onur Avci, Osama Abdeljaber, Turker Ince, Moncef Gabbouj, and Daniel J Inman. 2021. 1D convolutional neural networks and applications: A survey. <u>Mechanical Systems and Signal Processing</u> 151 (2021), 107398.
[34]
Jyoti Kumari, R Rajesh, and KM Pooja. 2015. Facial expression recognition: A survey. <u>Procedia Computer Science</u> 58 (2015), 486--491.
[35]
Lumen Learning. 2021. Muscle Contraction and Locomotion. https://courses.lumenlearning.com/ivytech-bio1-1/chapter/muscle-contraction-and-locomotion/
[36]
Hao Li, Laura Trutoiu, Kyle Olszewski, Lingyu Wei, Tristan Trutna, Pei-Lun Hsieh, Aaron Nicholls, and Chongyang Ma. 2015. Facial performance sensing head-mounted display. <u>ACM Transactions on Graphics (ToG)</u> 34, 4 (2015), 1--9.
[37]
Richard Li, Jason Wu, and Thad Starner. 2019. TongueBoard: An Oral Interface for Subtle Input. In <u>Proceedings of the 10th Augmented Human International Conference 2019.</u> 1--9.
[38]
Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. <u>ACM Trans. Graph.</u> 36, 6 (2017), 194--1.
[39]
Katsutoshi Masai, Kai Kunze, Daisuke Sakamoto, Yuta Sugiura, and Maki Sugimoto. 2020. Face Commands-User-Defined Facial Gestures for Smart Glasses. In <u>2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).</u> IEEE, 374--386.
[40]
Denys JC Matthies, Bernhard A Strecker, and Bodo Urban. 2017. Earfieldsensing: A novel in-ear electric field sensing to enrich wearable gesture input through facial expressions. In <u>Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems.</u> 1911--1922.
[41]
Yunjun Nam, Bonkon Koo, Andrzej Cichocki, and Seungjin Choi. 2013. GOM-Face: GKP, EOG, and EMG-based multimodal interface with application to humanoid robot control. <u>IEEE Transactions on Biomedical Engineering</u> 61, 2 (2013), 453--462.
[42]
Phuc Nguyen, Nam Bui, Anh Nguyen, Hoang Truong, Abhijit Suresh, Matt Whitlock, Duy Pham, Thang Dinh, and Tam Vu. 2018. Tyth-typing on your teeth: Tongue-teeth localization for human-computer interface. In <u>Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services.</u> 269--282.
[43]
Seonwook Park, Shalini De Mello, Pavlo Molchanov, Umar Iqbal, Otmar Hilliges, and Jan Kautz. 2019. Few-shot adaptive gaze estimation. In <u>Proceedings of the IEEE/CVF International Conference on Computer Vision.</u> 9368--9377.
[44]
Xi Peng, Rogerio S Feris, Xiaoyu Wang, and Dimitris N Metaxas. 2016. A recurrent encoder-decoder network for sequential face alignment. In <u>European conference on computer vision.</u> Springer, 38--56.
[45]
Patrick Perkins and Steffen Heber. 2018. Identification of ribosome pause sites using a Z-Score based peak detection algorithm. In <u>2018 IEEE 8th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS).</u> IEEE, 1--6.
[46]
Hai X Pham, Yuting Wang, and Vladimir Pavlovic. 2017. End-to-end learning for 3d facial animation from raw waveforms of speech. <u>arXiv preprint arXiv:1710.00920</u> (2017).
[47]
Utsav Prabhu, Keshav Seshadri, and Marios Savvides. 2010. Automatic facial landmark tracking in video sequences using kalman filter assisted active shape models. In <u>European Conference on Computer Vision.</u> Springer, 86--99.
[48]
Marcos Quintana, Sezer Karaoglu, Federico Alvarez, Jose Manuel Menendez, and Theo Gevers. 2019. Three-d wide faces (3dwf): Facial landmark detection and 3d reconstruction over a new rgb-d multi-camera dataset. <u>Sensors</u> 19, 5 (2019), 1103.
[49]
Iman Mohammad Rezazadeh, S Mohammad Firoozabadi, Huosheng Hu, and S Mohammad Reza Hashemi Golpayegani. 2011. A novel human-machine interface based on recognition of multi-channel facial bioelectric signals. <u>Australasian physical & engineering sciences in medicine</u> 34, 4 (2011), 497--513.
[50]
Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 faces in-the-wild challenge: The first facial landmark localization challenge. In <u>Proceedings of the IEEE International Conference on Computer Vision Workshops.</u> 397--403.
[51]
Himanshu Sahni, Abdelkareem Bedri, Gabriel Reyes, Pavleen Thukral, Zehua Guo, Thad Starner, and Maysam Ghovanloo. 2014. The tongue and ear interface: a wearable system for silent speech recognition. In <u>Proceedings of the 2014 ACM International Symposium on Wearable Computers.</u> 47--54.
[52]
Jason M Saragih, Simon Lucey, and Jeffrey F Cohn. 2011. Deformable model fitting by regularized landmark mean-shift. <u>International journal of computer vision</u> 91, 2 (2011), 200--215.
[53]
Jocelyn Scheirer, Raul Fernandez, and Rosalind W Picard. 1999. Expression glasses: a wearable device for facial expression recognition. In <u>CHI'99 Extended Abstracts on Human Factors in Computing Systems.</u> 262--263.
[54]
Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand keypoint detection in single images using multiview bootstrapping. In <u>Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.</u> 1145--1153.
[55]
Supasorn Suwajanakorn, Steven M Seitz, and Ira Kemelmacher-Shlizerman. 2017. Synthesizing obama: learning lip sync from audio. <u>ACM Transactions on Graphics (ToG)</u> 36, 4 (2017), 1--13.
[56]
M Emin Tagluk, Necmettin Sezgin, and Mehmet Akin. 2010. Estimation of sleep stages by an artificial neural network employing EEG, EMG and EOG. <u>Journal of medical systems</u> 34, 4 (2010), 717--725.
[57]
Lucas D Terissi and Juan Carlos Gomez. 2008. Audio-to-visual conversion via HMM inversion for speech-driven facial animation. In <u>Brazilian Symposium on Artificial Intelligence.</u> Springer, 33--42.
[58]
Chun Sing Louis Tsui, Pei Jia, John Q Gan, Huosheng Hu, and Kui Yuan. 2007. EMG-based hands-free wheelchair control with EOG attention shift detection. In <u>2007 IEEE International Conference on Robotics and Biomimetics (ROBIO).</u> IEEE, 1266--1271.
[59]
Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. In <u>Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001</u>, Vol. 1. IEEE, I-I.
[60]
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. 2020. Deep highresolution representation learning for visual recognition. <u>IEEE transactions on pattern analysis and machine intelligence</u> (2020).
[61]
Delsys: wearable sensors for movement sciences. 2021. How to improve EMG signal quality. https://delsys.com/emgworks/signal-quality-monitor/improve/
[62]
John G Webster. 1984. Reducing motion artifacts and interference in biopotential recording. <u>IEEE transactions on biomedical engineering</u> 12 (1984), 823--826.
[63]
Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. <u>ACM transactions on graphics (TOG)</u> 30, 4 (2011), 1--10.
[64]
Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at boundary: A boundary-aware face alignment algorithm. In <u>Proceedings of the IEEE conference on computer vision and pattern recognition.</u> 2129--2138.
[65]
Yue Wu, Tal Hassner, KangGeon Kim, Gerard Medioni, and Prem Natarajan. 2017. Facial landmark detection with tweaked convolutional neural networks. <u>IEEE transactions on pattern analysis and machine intelligence</u> 40, 12 (2017), 3067--3074.
[66]
Shengtao Xiao, Jiashi Feng, Junliang Xing, Hanjiang Lai, Shuicheng Yan, and Ashraf Kassim. 2016. Robust facial landmark detection via recurrent attentive-refinement networks. In <u>European conference on computer vision.</u> Springer, 57--72.
[67]
Xuehan Xiong and Fernando De la Torre. 2013. Supervised descent method and its applications to face alignment. In <u>Proceedings of the IEEE conference on computer vision and pattern recognition.</u> 532--539.
[68]
Uldis Zarins. 2018. <u>Anatomy of Facial Expressions.</u> Exonicus, Incorporated.
[69]
Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2014. Facial landmark detection by deep multi-task learning. In <u>European conference on computer vision.</u> Springer, 94--108.
[70]
Shizhan Zhu, Cheng Li, Chen Change Loy, and Xiaoou Tang. 2015. Face alignment by coarse-to-fine shape searching. In <u>Proceedings of the IEEE conference on computer vision and pattern recognition.</u> 4998--5006.

Cited By

View all
  • (2025)3D Facial Tracking and User Authentication Through Lightweight Single-Ear BiosensorsIEEE Transactions on Mobile Computing10.1109/TMC.2024.347033924:2(749-762)Online publication date: Feb-2025
  • (2024)Facial Landmark Detection Based on High Precision Spatial Sampling via Millimeter-wave RadarProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997398:4(1-26)Online publication date: 21-Nov-2024
  • (2024)Artificial Intelligence of Things: A SurveyACM Transactions on Sensor Networks10.1145/369063921:1(1-75)Online publication date: 30-Aug-2024
  • Show More Cited By

Index Terms

  1. BioFace-3D: continuous 3d facial reconstruction through lightweight single-ear biosensors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MobiCom '21: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking
    October 2021
    887 pages
    ISBN:9781450383424
    DOI:10.1145/3447993
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D facial reconstruction
    2. mobile computing
    3. single-ear biosensing
    4. wearable sensing

    Qualifiers

    • Research-article

    Funding Sources

    • NSF

    Conference

    ACM MobiCom '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 440 of 2,972 submissions, 15%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)463
    • Downloads (Last 6 weeks)74
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)3D Facial Tracking and User Authentication Through Lightweight Single-Ear BiosensorsIEEE Transactions on Mobile Computing10.1109/TMC.2024.347033924:2(749-762)Online publication date: Feb-2025
    • (2024)Facial Landmark Detection Based on High Precision Spatial Sampling via Millimeter-wave RadarProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997398:4(1-26)Online publication date: 21-Nov-2024
    • (2024)Artificial Intelligence of Things: A SurveyACM Transactions on Sensor Networks10.1145/369063921:1(1-75)Online publication date: 30-Aug-2024
    • (2024)Behaviors Speak More: Achieving User Authentication Leveraging Facial Activities via mmWave SensingProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699330(169-183)Online publication date: 4-Nov-2024
    • (2024)F2Key: Dynamically Converting Your Face into a Private Key Based on COTS Headphones for Reliable Voice InteractionProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661860(127-140)Online publication date: 3-Jun-2024
    • (2024)EarSEProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314477:4(1-33)Online publication date: 12-Jan-2024
    • (2024)EyeEcho: Continuous and Low-power Facial Expression Tracking on GlassesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642613(1-24)Online publication date: 11-May-2024
    • (2024)Mordo2: A Personalization Framework for Silent Command RecognitionIEEE Transactions on Neural Systems and Rehabilitation Engineering10.1109/TNSRE.2023.334206832(133-143)Online publication date: 2024
    • (2024)EarSSR: Silent Speech Recognition via EarphonesIEEE Transactions on Mobile Computing10.1109/TMC.2024.335671923:8(8493-8507)Online publication date: Aug-2024
    • (2024)A 15.4-ENOB, Fourth-Order Truncation-Error-Shaping NS-SAR-Nested ΔΣ Modulator With Boosted Input Impedance and Range for Biosignal AcquisitionIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.330092859:2(528-539)Online publication date: Feb-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media