research-article

Public Access

MuteIt: Jaw Motion Based Unvoiced Command Recognition Using Earable

Authors:

Tanmay Srivastava,

Shubham JainAuthors Info & Claims

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 6, Issue 3

Article No.: 140, Pages 1 - 26

https://doi.org/10.1145/3550281

Published: 07 September 2022 Publication History

Abstract

In this paper, we present MuteIt, an ear-worn system for recognizing unvoiced human commands. MuteIt presents an intuitive alternative to voice-based interactions that can be unreliable in noisy environments, disruptive to those around us, and compromise our privacy. We propose a twin-IMU set up to track the user's jaw motion and cancel motion artifacts caused by head and body movements. MuteIt processes jaw motion during word articulation to break each word signal into its constituent syllables, and further each syllable into phonemes (vowels, visemes, and plosives). Recognizing unvoiced commands by only tracking jaw motion is challenging. As a secondary articulator, jaw motion is not distinctive enough for unvoiced speech recognition. MuteIt combines IMU data with the anatomy of jaw movement as well as principles from linguistics, to model the task of word recognition as an estimation problem. Rather than employing machine learning to train a word classifier, we reconstruct each word as a sequence of phonemes using a bi-directional particle filter, enabling the system to be easily scaled to a large set of words. We validate MuteIt for 20 subjects with diverse speech accents to recognize 100 common command words. MuteIt achieves a mean word recognition accuracy of 94.8% in noise-free conditions. When compared with common voice assistants, MuteIt outperforms them in noisy acoustic environments, achieving higher than 90% recognition accuracy. Even in the presence of motion artifacts, such as head movement, walking, and riding in a moving vehicle, MuteIt achieves mean word recognition accuracy of 91% over all scenarios.

References

[1]

Luís Aguiar-Conraria and Maria Joana Soares. 2011. The continuous wavelet transform: A primer. Technical Report. NIPE-Universidade do Minho.

[2]

Amazon. 2021. Most used voice assistants in the United States in 2021, by age group. https://www.statista.com/statistics/1274429/voice-assistants-use-by-age-group-united-states/

[3]

Amazon. 2022. Amazon Alexa. https://developer.amazon.com/en-US/alexa

[4]

IoT Analytics. 2021. State of IoT 2021: Number of connected IoT devices growing 9% to 12.3 billion globally, cellular IoT now surpassing 2 billion. https://iot-analytics.com/number-connected-iot-devices/

[5]

Toshiyuki Ando, Yuki Kubo, Buntarou Shizuki, and Shin Takahashi. 2017. Canalsense: Face-related movement recognition system based on sensing air pressure in ear canals. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 679--689.

Digital Library

[6]

Apple. 2022. Siri Apple. https://www.apple.com/siri/

[7]

Helen L Bear. 2017. Decoding visemes: improving machine lipreading. arXiv:1710.01288 [cs.CV]

[8]

Helen L Bear, Gari Owen, Richard Harvey, and Barry-John Theobald. 2014. Some observations on computer lip-reading: moving from the dream to the reality. In Optics and Photonics for Counterterrorism, Crime Fighting, and Defence X; and Optical Materials and Biomaterials in Security and Defence Systems Technology XI, Vol. 9253. International Society for Optics and Photonics, 92530G.

[9]

Štefan Beňuš and Marianne Pouplier. 2011. Jaw movement in vowels and liquids forming the syllable nucleus. In Twelfth Annual Conference of the International Speech Communication Association.

[10]

J. Bird, D. V. M. Bishop, and N. H. Freeman. 1995. Phonological Awareness and Literacy Development in Children With Expressive Phonological Impairments. Journal of Speech, Language, and Hearing Research 38, 2 (April 1995), 446--462. https://doi.org/10.1044/jshr.3802.446 Publisher: American Speech-Language-Hearing Association.

[11]

Peter Birkholz, Simon Stone, Klaus Wolf, and Dirk Plettemeier. 2018. Non-Invasive Silent Phoneme Recognition Using Microwave Signals. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 12 (Dec. 2018), 2404--2411. https://doi.org/10.1109/TASLP.2018.2865609 Conference Name: IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Digital Library

[12]

Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Chin-Hui Lee, and Bao-Cai Yin. 2020. Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention. arXiv:2012.14360 [cs.CV]

[13]

Alberto Compagno, Mauro Conti, Daniele Lain, and Gene Tsudik. 2017. Don't Skype & Type! Acoustic Eavesdropping in Voice-Over-IP. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (Abu Dhabi, United Arab Emirates) (ASIA CCS '17). Association for Computing Machinery, New York, NY, USA, 703--715. https://doi.org/10.1145/3052973.3053005

Digital Library

[14]

Tamás Gábor Csapó, Csaba Zainkó, László Tóth, Gábor Gosztolya, and Alexandra Markó. 2020. Ultrasound-based articulatory-to-acoustic mapping with WaveGlow speech synthesis. arXiv preprint arXiv:2008.03152 (2020).

[15]

Na Le Dang, Tyler B Hughes, Varun Krishnamurthy, and S Joshua Swamidass. 2016. A simple model predicts UGT-mediated metabolism. Bioinformatics 32, 20 (2016), 3183--3189.

[16]

P. Delacourt and C. Wellekens. 1999. Audio data indexing: Use of second-order statistics for speaker-based segmentation. In Proceedings IEEE International Conference on Multimedia Computing and Systems, Vol. 2. 959-963 vol.2. https://doi.org/10.1109/MMCS.1999.778619

Digital Library

[17]

B. Denby, Y. Oussar, G. Dreyfus, and M. Stone. 2006. Prospects for a Silent Speech Interface using Ultrasound Imaging. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Vol. 1. I-I. https://doi.org/10.1109/ICASSP.2006.1660033

[18]

Richard P Di Fabio. 1998. Physical therapy for patients with TMD: a descriptive study of treatment, disability, and health status. Journal of orofacial pain 12, 2 (1998).

[19]

Collins Dictionary. 2021. Collins Dictionary. https://www.collinsdictionary.com/

[20]

Elago. 2021. AirPods Pro EarHook. https://www.elago.com/new/airpods-pro-earhook-white-lkt4w

[21]

Donna Erickson. 2002. Articulation of Extreme Formant Patterns for Emphasized Vowels. Phonetica 59, 2-3 (2002), 134--149. https://doi.org/10.1159/000066067

[22]

Adriana Fernandez-Lopez and Federico M. Sukno. 2018. Survey on automatic lip-reading in the era of deep learning. Image and Vision Computing 78 (2018), 53--72. https://doi.org/10.1016/j.imavis.2018.07.002

[23]

Cletus G Fisher. 1968. Confusions among visually perceived consonants. Journal of speech and hearing research 11, 4 (1968), 796--804.

[24]

Fortune Business Insights. 2021. Speech and Voice Recognition Market Size. https://tinyurl.com/yyyxe4rk

[25]

Masaaki Fukumoto. 2018. SilentVoice: Unnoticeable Voice Input by Ingressive Speech. In Proceedings of UIST '18 (Berlin, Germany). Association for Computing Machinery, New York, NY, USA, 237--246. https://doi.org/10.1145/3242587.3242603

Digital Library

[26]

Yang Gao, Yincheng Jin, Jiyang Li, Seokmin Choi, and Zhanpeng Jin. 2020. EchoWhisper: Exploring an Acoustic-Based Silent Speech Interface for Smartphone Users. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 3, Article 80 (Sept. 2020), 27 pages. https://doi.org/10.1145/3411830

Digital Library

[27]

Jose A. Gonzalez, Lam A. Cheah, Angel M. Gomez, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger K. Moore, and Ed Holdsworth. 2017. Direct Speech Reconstruction From Articulatory Sensor Data by Machine Learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 12 (2017), 2362--2374. https://doi.org/10.1109/TASLP.2017.2757263

Digital Library

[28]

Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. 369--376.

Digital Library

[29]

Jiaxi Gu, Zhiwen Yu, and Kele Shen. 2020. Alohomora: Motion-Based Hotword Detection in Head-Mounted Displays. IEEE Internet of Things Journal 7, 1 (2020), 611--620. https://doi.org/10.1109/JIOT.2019.2946593

[30]

J. Han, L. Shao, D. Xu, and J. Shotton. 2013. Enhanced Computer Vision With Microsoft Kinect Sensor: A Review. IEEE Transactions on Cybernetics 43, 5 (2013), 1318--1334. https://doi.org/10.1109/TCYB.2013.2265378

[31]

Theodore P. Hill. 2009. Conflations of Probability Distributions. arXiv:0808.1808 [math.PR]

[32]

Hirotaka Hiraki and Jun Rekimoto. 2021. SilentMask: Mask-Type Silent Speech Interface with Measurement of Mouth Movement. In Augmented Humans Conference 2021 (Rovaniemi, Finland) (AHs'21). Association for Computing Machinery, New York, NY, USA, 86--90. https://doi.org/10.1145/3458709.3458985

Digital Library

[33]

Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, and Sergey I. Rybchenko. 2013. Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Communication 55, 1 (2013), 22--32. https://doi.org/10.1016/j.specom.2012.02.001

Digital Library

[34]

Qiang Huang, Yongxiong Wang, and Zhong Yin. 2020. View-based weight network for 3D object recognition. Image and Vision Computing 93 (2020), 103828.

Digital Library

[35]

Thomas Hueber, Elie-Laurent Benaroya, Gérard Chollet, Bruce Denby, Gérard Dreyfus, and Maureen Stone. 2010. Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Communication 52, 4 (2010), 288--300.

Digital Library

[36]

Thomas Hueber, Gérard Chollet, Bruce Denby, and Maureen Stone. 2008. Acquisition of ultrasound, video and acoustic speech data for a silent-speech interface application. Proc. of ISSP (2008), 365--369.

[37]

Monsoon Solutions Inc. 2022. Monsoon Power Monitor. https://www.msoon.com/online-store

[38]

Madeline Jefferson. 2019. Usability of Automatic Speech Recognition Systems for Individuals with Speech Disorders: Past, Present, Future, and A Proposed Model. undefined (2019). https://www.semanticscholar.org/paper/Usability-of-Automatic-Speech-Recognition-Systems-A-Jefferson/73eefd141f43750b3ae0648e6ef099597e24c6c9

[39]

Frederick Jelinek. 1997. Statistical methods for speech recognition. MIT press.

Digital Library

[40]

Yincheng Jin, Yang Gao, Xuhai Xu, Seokmin Choi, Jiyang Li, Feng Liu, Zhengxiong Li, and Zhanpeng Jin. 2022. EarCommand: "Hearing" Your Silent Speech Commands In Ear. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 1--28.

Digital Library

[41]

Ankur Joshi, Saket Kale, Satish Chandel, and D Kumar Pal. 2015. Likert scale: Explored and explained. British Journal of Applied Science & Technology 7, 4 (2015), 396.

[42]

Arnav Kapur, Shreyas Kapur, and Pattie Maes. 2018. AlterEgo: A Personalized Wearable Silent Speech Interface. In 23rd International Conference on Intelligent User Interfaces (Tokyo, Japan) (IUI '18). Association for Computing Machinery, New York, NY, USA, 43--53. https://doi.org/10.1145/3172944.3172977

Digital Library

[43]

Fahim Kawsar, Chulhong Min, Akhil Mathur, Marc Van den Broeck, Utku Günay Acer, and Claudio Forlivesi. 2018. esense: Earable platform for human sensing. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. 541--541.

Digital Library

[44]

Rachel Kaye, Christopher G Tang, and Catherine F Sinclair. 2017. The electrolarynx: voice restoration after total laryngectomy. Medical Devices (Auckland, NZ) 10 (2017), 133.

[45]

Gokce Keskin, Tyler Lee, Cory Stephenson, and Oguz H Elibol. 2019. Measuring the effectiveness of voice conversion on speaker identification and automatic speech recognition systems. arXiv preprint arXiv:1905.12531 (2019).

[46]

Prerna Khanna, Tanmay Srivastava, Shijia Pan, Shubham Jain, and Phuc Nguyen. 2021. JawSense: recognizing unvoiced sound using a low-cost ear-worn system. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications. 44--49.

[47]

Rohan Khanna, Daegun Oh, and Youngwook Kim. 2019. Through-Wall Remote Human Voice Recognition Using Doppler Radar With Transfer Learning. IEEE Sensors Journal 19, 12 (June 2019), 4571--4576. https://doi.org/10.1109/JSEN.2019.2901271 Conference Name: IEEE Sensors Journal.

[48]

Naoki Kimura, Tan Gemicioglu, Jonathan Womack, Richard Li, Yuhui Zhao, Abdelkareem Bedri, Zixiong Su, Alex Olwal, Jun Rekimoto, and Thad Starner. 2022. SilentSpeller: Towards mobile, hands-free, silent speech text entry using electropalatography. In CHI Conference on Human Factors in Computing Systems. 1--19.

Digital Library

[49]

Naoki Kimura, Kentaro Hayashi, and Jun Rekimoto. 2020. TieLent: A Casual Neck-Mounted Mouth Capturing Device for Silent Speech Interaction. In Proceedings of the International Conference on Advanced Visual Interfaces (Salerno, Italy) (AVI '20). Association for Computing Machinery, New York, NY, USA, Article 33, 8 pages. https://doi.org/10.1145/3399715.3399852

Digital Library

[50]

Naoki Kimura, Michinari Kono, and Jun Rekimoto. 2019. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks. In Proceedings of CHI 2019 (Glasgow, Scotland Uk). Association for Computing Machinery, New York, NY, USA, 1--11. https://doi.org/10.1145/3290605.3300376

Digital Library

[51]

Mbient Lab. 2020. Mbient IMU. https://mbientlab.com/metamotionr/

[52]

James R Lewis. 2018. The system usability scale: past, present, and future. International Journal of Human-Computer Interaction 34, 7 (2018), 577--590.

[53]

Richard Li, Jason Wu, and Thad Starner. 2019. TongueBoard: An Oral Interface for Subtle Input. In Proceedings of the 10th Augmented Human International Conference 2019 (Reims, France) (AH2019). Association for Computing Machinery, New York, NY, USA, Article 1, 9 pages. https://doi.org/10.1145/3311823.3311831

Digital Library

[54]

Rochelle Lieber. [n.d.]. Point and manner of articulation of English consonants and vowels. Introducing Morphology ([n. d.]), xii-xii. https://doi.org/10.1017/cbo9780511808845.003

[55]

LifeWire. 2021. Top commands. https://www.lifewire.com/top-google-assistant-and-google-home-commands-4158256

[56]

Ian Maddieson. 2013. Voicing and gaps in plosive systems. The world atlas of language structures online (2013).

[57]

Magoosh. 2022. 44 Phonemes In English And Other Sound Blends. https://magoosh.com/english-speaking/44-phonemes-in-english-and-other-sound-blends/

[58]

Hiroyuki Manabe, Akira Hiraiwa, and Toshiaki Sugimura. 2003. Unvoiced speech recognition using EMG-mime speech recognition. In CHI'03 extended abstracts on Human factors in computing systems. 794--795.

Digital Library

[59]

Geoffrey S Meltzner, James T Heaton, Yunbin Deng, Gianluca De Luca, Serge H Roy, and Joshua C Kline. 2018. Development of sEMG sensors and algorithms for silent speech recognition. Journal of neural engineering 15, 4 (2018), 046031.

[60]

Geoffrey S Meltzner, James T Heaton, Yunbin Deng, Gianluca De Luca, Serge H Roy, and Joshua C Kline. 2018. Development of sEMG sensors and algorithms for silent speech recognition. Journal of neural engineering 15, 4 (2018), 046031.

[61]

Phuc Nguyen, Nam Bui, Anh Nguyen, Hoang Truong, Abhijit Suresh, Matt Whitlock, Duy Pham, Thang Dinh, and Tam Vu. 2018. TYTH-Typing On Your Teeth: Tongue-Teeth Localization for Human-Computer Interface. In Proceedings of MobiSys 2018 (Munich, Germany). Association for Computing Machinery, New York, NY, USA, 269--282. https://doi.org/10.1145/3210240.3210322

Digital Library

[62]

Diane Corcoran Nielsen and Barbara Luetke-Stahlman. 2002. Phonological Awareness: One Key to the Reading Proficiency of Deaf Children. American Annals of the Deaf 147, 3(2002), 11--19. https://doi.org/10.1353/aad.2012.0213 Publisher: Gallaudet University Press.

[63]

The University of Reading. 2021. The production of speech sounds. http://www.personal.rdg.ac.uk/~llsroach/phon2/artic-basics.htm

[64]

John J. Ohala and Haruko Kawasaki-Fukumori. [n.d.]. Alternatives to the sonority hierarchy for explaining segmental sequential constraints. Language and its Ecology ([n.d.]). https://doi.org/10.1515/9783110805369.343

[65]

Laxmi Pandey and Ahmed Sabbir Arif. 2021. LipType: A Silent Speech Recognizer Augmented with an Independent Repair Model. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3411764.3445565

Digital Library

[66]

Laxmi Pandey, Khalad Hasan, and Ahmed Sabbir Arif. 2021. Acceptability of Speech and Silent Speech Input Methods in Private and Public. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3411764.3445430

Digital Library

[67]

Yuru Pei, Tae-Kyun Kim, and Hongbin Zha. 2013. Unsupervised random forest manifold alignment for lipreading. In Proceedings of the IEEE International Conference on Computer Vision. 129--136.

Digital Library

[68]

Physiopedia. 2021. TMJ Anatomy. https://www.physio-pedia.com/TMJ_Anatomy

[69]

PlayStore. [n.d.]. Sound Meter. https://play.google.com/store/apps/details?id=com.gamebasic.decibel&hl=en_US&gl=US

[70]

Jay Prakash, Zhijian Yang, Yu-Lin Wei, Haitham Hassanieh, and Romit Roy Choudhury. 2020. EarSense: Earphones as a Teeth Activity Sensor. In Proceedings of MobiCom 2020 (London, United Kingdom). Association for Computing Machinery, New York, NY, USA, Article 40, 13 pages. https://doi.org/10.1145/3372224.3419197

Digital Library

[71]

Amanda Purington, Jessie G. Taft, Shruti Sannon, Natalya N. Bazarova, and Samuel Hardman Taylor. 2017. "Alexa is My New BFF": Social Roles, User Satisfaction, and Personification of the Amazon Echo. In Proceedings of CHI EA 2017 (Denver, Colorado, USA). Association for Computing Machinery, New York, NY, USA, 2853--2859. https://doi.org/10.1145/3027063.3053246

Digital Library

[72]

L.R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (1989), 257--286. https://doi.org/10.1109/5.18626

[73]

Jun Rekimoto and Yu Nishimura. 2021. Derma: Silent Speech Interaction Using Transcutaneous Motion Sensing. In Augmented Humans Conference 2021 (Rovaniemi, Finland) (AHs'21). Association for Computing Machinery, New York, NY, USA, 91--100. https://doi.org/10.1145/3458709.3458941

Digital Library

[74]

Sherry Ruan, Jacob O Wobbrock, Kenny Liou, Andrew Ng, and James Landay. 2016. Speech is 3x faster than typing for english and mandarin text entry on mobile devices. arXiv preprint arXiv:1608.07323 (2016).

[75]

Himanshu Sahni, Abdelkareem Bedri, Gabriel Reyes, Pavleen Thukral, Zehua Guo, Thad Starner, and Maysam Ghovanloo. 2014. The Tongue and Ear Interface: A Wearable System for Silent Speech Recognition. In Proceedings of the 2014 ACM International Symposium on Wearable Computers (Seattle, Washington) (ISWC '14). Association for Computing Machinery, New York, NY, USA, 47--54. https://doi.org/10.1145/2634317.2634322

Digital Library

[76]

Himanshu Sahni, Abdelkareem Bedri, Gabriel Reyes, Pavleen Thukral, Zehua Guo, Thad Starner, and Maysam Ghovanloo. 2014. The Tongue and Ear Interface: A Wearable System for Silent Speech Recognition. In Proceedings of the 2014 ACM International Symposium on Wearable Computers (Seattle, Washington) (ISWC 14). Association for Computing Machinery, New York, NY, USA, 47--54. https://doi.org/10.1145/2634317.2634322

Digital Library

[77]

Haşim Sak, Andrew Senior, Kanishka Rao, and Françoise Beaufays. 2015. Fast and accurate recurrent neural network acoustic models for speech recognition. arXiv preprint arXiv:1507.06947 (2015).

[78]

Amin Honarmandi Shandiz and László Tóth. 2021. Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks. arXiv preprint arXiv:2105.13718 (2021).

[79]

Noah H Silbert. 2012. Syllable structure and integration of voicing and manner of articulation information in labial consonant identification. The Journal of the Acoustical Society of America 131, 5 (2012), 4076--4086.

[80]

Jorge Silva and Shrikanth Narayanan. 2008. Upper Bound Kullback-Leibler Divergence for Transient Hidden Markov Models. IEEE Transactions on Signal Processing 56, 9 (2008), 4176--4188. https://doi.org/10.1109/TSP.2008.924137

Digital Library

[81]

Cheryl Smith Gabig. 2010. Phonological Awareness and Word Recognition in Reading by Children With Autism. Communication Disorders Quarterly 31, 2 (Feb. 2010), 67--85. https://doi.org/10.1177/1525740108328410 Publisher: SAGE Publications Inc.

[82]

Joon Son Chung, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. 2017. Lip reading sentences in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6447--6456.

[83]

Ke Sun, Chun Yu, Weinan Shi, Lan Liu, and Yuanchun Shi. 2018. Lip-interact: Improving mobile device interaction with silent speech commands. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 581--593.

Digital Library

[84]

J. Tan, C. Nguyen, and X. Wang. 2017. SilentTalk: Lip reading through ultrasonic sensing on mobile phones. In IEEE INFOCOM 2017-IEEE Conference on Computer Communications. 1--9. https://doi.org/10.1109/INFOCOM.2017.8057099

[85]

Mohamed Trabelsi, Jin Cao, and Jeff Heflin. 2021. SeLaB: Semantic Labeling with BERT. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.

[86]

Kwang-Hyun Uhm, Seung-Won Jung, Moon Hyung Choi, Hong-Kyu Shin, Jae-Ik Yoo, Se Won Oh, Jee Young Kim, Hyun Gi Kim, Young Joon Lee, Seo Yeon Youn, et al. 2021. Deep learning for end-to-end kidney cancer diagnosis on multi-phase abdominal computed tomography. NPJ Precision Oncology 5, 1 (2021), 1--6.

[87]

G. Wang, Y. Zou, Z. Zhou, K. Wu, and L. M. Ni. 2016. We Can Hear You with Wi-Fi! IEEE Transactions on Mobile Computing 15, 11 (2016), 2907--2920. https://doi.org/10.1109/TMC.2016.2517630

Digital Library

[88]

Jingxian Wang, Chengfeng Pan, Haojian Jin, Vaibhav Singh, Yash Jain, Jason I. Hong, Carmel Majidi, and Swarun Kumar. 2019. RFID Tattoo: A Wireless Platform for Speech Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 4, Article 155 (Dec. 2019), 24 pages. https://doi.org/10.1145/3369812

Digital Library

[89]

Xinyuan Wang, Make Tao, Runpu Wang, and Likui Zhang. 2021. Reduce the medical burden: An automatic medical triage system using text classification BERT based on Transformer structure. In 2021 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE). IEEE, 679--685.

[90]

Zhangyu Xiao, Zhijian Ou, Wei Chu, and Hui Lin. 2018. Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units. In 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP). 146--150. https://doi.org/10.1109/ISCSLP.2018.8706675

[91]

Wai Chee Yau, Sridhar Poosapadi Arjunan, and Dinesh Kant Kumar. 2008. Classification of voiceless speech using facial muscle activity and vision based techniques. In TENCON 2008-2008 IEEE Region 10 Conference. IEEE, 1--6.

[92]

Hallie Kay Yopp. 1988. The validity and reliability of phonemic awareness tests. Reading research quarterly (1988), 159--177.

[93]

Dong Yu and Jinyu Li. 2017. Recent progresses in deep learning based acoustic models. IEEE/CAA Journal of automatica sinica 4, 3 (2017), 396--409.

[94]

Qian Zhang, Dong Wang, Run Zhao, and Yinggang Yu. 2021. SoundLip: Enabling Word and Sentence-Level Lip Interaction for Smart Devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 1, Article 43 (March 2021), 28 pages. https://doi.org/10.1145/3448087

Digital Library

[95]

Qian Zhang, Dong Wang, Run Zhao, and Yinggang Yu. 2021. SoundLip: Enabling Word and Sentence-level Lip Interaction for Smart Devices. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 1 (2021), 1--28.

Digital Library

[96]

Yongzhao Zhang, Wei-Hsiang Huang, Chih-Yun Yang, Wen-Ping Wang, Yi-Chao Chen, Chuang-Wen You, Da-Yuan Huang, Guangtao Xue, and Jiadi Yu. 2020. Endophasia: Utilizing acoustic-based imaging for issuing contact-free silent speech commands. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--26.

Digital Library

Cited By

Sui YZhao MXia JJiang XXia S(2024)TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable PlatformsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997578:4(1-29)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3699757
Srivastava TWinters RGable TWang YLaScala TTashev I(2024)Whispering Wearables: Multimodal Approach to Silent Speech Recognition with Head-Worn DevicesProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685720(214-223)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3678957.3685720
Srivastava TKhanna PPan SNguyen PJain SShu YLiu JTan RHe YChen J(2024)Poster Unvoiced: Designing an Unvoiced User Interface using Earables and LLMsProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699413(871-872)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3666025.3699413
Show More Cited By

Index Terms

MuteIt: Jaw Motion Based Unvoiced Command Recognition Using Earable
1. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing systems and tools

Recommendations

Jawthenticate: Microphone-free Speech-based Authentication using Jaw Motion and Facial Vibrations
SenSys '23: Proceedings of the 21st ACM Conference on Embedded Networked Sensor Systems

In this paper, we present Jawthenticate, an earable system that authenticates a user using audible or inaudible speech without using a microphone. This system can overcome the shortcomings of traditional voice-based authentication systems like ...
Poster Unvoiced: Designing an Unvoiced User Interface using Earables and LLMs
SenSys '24: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems

This poster presents the design and implementation of Unvoiced, a silent speech interaction system. Unvoiced transforms subtle jaw movements into rich speech spectrograms, enabling seamless and private device interaction. Our system captures low-...
Leveraging earables for unvoiced command recognition
MobiSys '22: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services

We demonstrate an ear-worn technology that recognizes unvoiced human commands by tracking jaw motion. The ear-worn system is designed to achieve continual unvoiced command recognition for robust human-computer interaction (HCI) applications. First, the ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 6, Issue 3

September 2022

1612 pages

EISSN:2474-9567

DOI:10.1145/3563014

Issue’s Table of Contents

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2022

Published in IMWUT Volume 6, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
1,278
Total Downloads

Downloads (Last 12 months)622
Downloads (Last 6 weeks)60

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sui YZhao MXia JJiang XXia S(2024)TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable PlatformsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997578:4(1-29)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3699757
Srivastava TWinters RGable TWang YLaScala TTashev I(2024)Whispering Wearables: Multimodal Approach to Silent Speech Recognition with Head-Worn DevicesProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685720(214-223)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3678957.3685720
Srivastava TKhanna PPan SNguyen PJain SShu YLiu JTan RHe YChen J(2024)Poster Unvoiced: Designing an Unvoiced User Interface using Earables and LLMsProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699413(871-872)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3666025.3699413
Srivastava TKhanna PPan SNguyen PJain SShu YLiu JTan RHe YChen J(2024)Unvoiced: Designing an LLM-assisted Unvoiced User Interface using EarablesProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699374(784-798)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3666025.3699374
Zhang QLan YGuo KWang D(2024)Lipwatch: Enabling Silent Speech Recognition on Smartwatches using Acoustic SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596148:2(1-29)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659614
Chen TYang YQiu CFan XGuo XShangguan LOkoshi TKo JLiKamWa R(2024)Enabling Hands-Free Voice Assistant Activation on EarphonesProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661890(155-168)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661890
Duan DSun ZNi TLi SJia XXu WLi TOkoshi TKo JLiKamWa R(2024)F2Key: Dynamically Converting Your Face into a Private Key Based on COTS Headphones for Reliable Voice InteractionProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661860(127-140)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661860
Hu ZRadmehr AZhang YPan SNguyen P(2024)IOTeethProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435168:1(1-29)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3643516
Shimon SNeshati ASun JXu QZhao J(2024)Exploring Uni-manual Around Ear Off-Device Gestures for EarablesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435138:1(1-29)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3643513
Srivastava TKhanna PPan SNguyen VJain SGanesan DLane NShi W(2024)Enabling Accessible and Ubiquitous Interaction in Next-Generation Wearables: An Unvoiced Speech ApproachProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3695908(2257-2259)Online publication date: 4-Dec-2024
https://dl.acm.org/doi/10.1145/3636534.3695908
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents