Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

MuteIt: Jaw Motion Based Unvoiced Command Recognition Using Earable

Published: 07 September 2022 Publication History

Abstract

In this paper, we present MuteIt, an ear-worn system for recognizing unvoiced human commands. MuteIt presents an intuitive alternative to voice-based interactions that can be unreliable in noisy environments, disruptive to those around us, and compromise our privacy. We propose a twin-IMU set up to track the user's jaw motion and cancel motion artifacts caused by head and body movements. MuteIt processes jaw motion during word articulation to break each word signal into its constituent syllables, and further each syllable into phonemes (vowels, visemes, and plosives). Recognizing unvoiced commands by only tracking jaw motion is challenging. As a secondary articulator, jaw motion is not distinctive enough for unvoiced speech recognition. MuteIt combines IMU data with the anatomy of jaw movement as well as principles from linguistics, to model the task of word recognition as an estimation problem. Rather than employing machine learning to train a word classifier, we reconstruct each word as a sequence of phonemes using a bi-directional particle filter, enabling the system to be easily scaled to a large set of words. We validate MuteIt for 20 subjects with diverse speech accents to recognize 100 common command words. MuteIt achieves a mean word recognition accuracy of 94.8% in noise-free conditions. When compared with common voice assistants, MuteIt outperforms them in noisy acoustic environments, achieving higher than 90% recognition accuracy. Even in the presence of motion artifacts, such as head movement, walking, and riding in a moving vehicle, MuteIt achieves mean word recognition accuracy of 91% over all scenarios.

References

[1]
Luís Aguiar-Conraria and Maria Joana Soares. 2011. The continuous wavelet transform: A primer. Technical Report. NIPE-Universidade do Minho.
[2]
Amazon. 2021. Most used voice assistants in the United States in 2021, by age group. https://www.statista.com/statistics/1274429/voice-assistants-use-by-age-group-united-states/
[3]
Amazon. 2022. Amazon Alexa. https://developer.amazon.com/en-US/alexa
[4]
IoT Analytics. 2021. State of IoT 2021: Number of connected IoT devices growing 9% to 12.3 billion globally, cellular IoT now surpassing 2 billion. https://iot-analytics.com/number-connected-iot-devices/
[5]
Toshiyuki Ando, Yuki Kubo, Buntarou Shizuki, and Shin Takahashi. 2017. Canalsense: Face-related movement recognition system based on sensing air pressure in ear canals. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 679--689.
[6]
Apple. 2022. Siri Apple. https://www.apple.com/siri/
[7]
Helen L Bear. 2017. Decoding visemes: improving machine lipreading. arXiv:1710.01288 [cs.CV]
[8]
Helen L Bear, Gari Owen, Richard Harvey, and Barry-John Theobald. 2014. Some observations on computer lip-reading: moving from the dream to the reality. In Optics and Photonics for Counterterrorism, Crime Fighting, and Defence X; and Optical Materials and Biomaterials in Security and Defence Systems Technology XI, Vol. 9253. International Society for Optics and Photonics, 92530G.
[9]
Štefan Beňuš and Marianne Pouplier. 2011. Jaw movement in vowels and liquids forming the syllable nucleus. In Twelfth Annual Conference of the International Speech Communication Association.
[10]
J. Bird, D. V. M. Bishop, and N. H. Freeman. 1995. Phonological Awareness and Literacy Development in Children With Expressive Phonological Impairments. Journal of Speech, Language, and Hearing Research 38, 2 (April 1995), 446--462. https://doi.org/10.1044/jshr.3802.446 Publisher: American Speech-Language-Hearing Association.
[11]
Peter Birkholz, Simon Stone, Klaus Wolf, and Dirk Plettemeier. 2018. Non-Invasive Silent Phoneme Recognition Using Microwave Signals. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 12 (Dec. 2018), 2404--2411. https://doi.org/10.1109/TASLP.2018.2865609 Conference Name: IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[12]
Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Chin-Hui Lee, and Bao-Cai Yin. 2020. Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention. arXiv:2012.14360 [cs.CV]
[13]
Alberto Compagno, Mauro Conti, Daniele Lain, and Gene Tsudik. 2017. Don't Skype & Type! Acoustic Eavesdropping in Voice-Over-IP. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (Abu Dhabi, United Arab Emirates) (ASIA CCS '17). Association for Computing Machinery, New York, NY, USA, 703--715. https://doi.org/10.1145/3052973.3053005
[14]
Tamás Gábor Csapó, Csaba Zainkó, László Tóth, Gábor Gosztolya, and Alexandra Markó. 2020. Ultrasound-based articulatory-to-acoustic mapping with WaveGlow speech synthesis. arXiv preprint arXiv:2008.03152 (2020).
[15]
Na Le Dang, Tyler B Hughes, Varun Krishnamurthy, and S Joshua Swamidass. 2016. A simple model predicts UGT-mediated metabolism. Bioinformatics 32, 20 (2016), 3183--3189.
[16]
P. Delacourt and C. Wellekens. 1999. Audio data indexing: Use of second-order statistics for speaker-based segmentation. In Proceedings IEEE International Conference on Multimedia Computing and Systems, Vol. 2. 959-963 vol.2. https://doi.org/10.1109/MMCS.1999.778619
[17]
B. Denby, Y. Oussar, G. Dreyfus, and M. Stone. 2006. Prospects for a Silent Speech Interface using Ultrasound Imaging. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Vol. 1. I-I. https://doi.org/10.1109/ICASSP.2006.1660033
[18]
Richard P Di Fabio. 1998. Physical therapy for patients with TMD: a descriptive study of treatment, disability, and health status. Journal of orofacial pain 12, 2 (1998).
[19]
Collins Dictionary. 2021. Collins Dictionary. https://www.collinsdictionary.com/
[20]
Elago. 2021. AirPods Pro EarHook. https://www.elago.com/new/airpods-pro-earhook-white-lkt4w
[21]
Donna Erickson. 2002. Articulation of Extreme Formant Patterns for Emphasized Vowels. Phonetica 59, 2-3 (2002), 134--149. https://doi.org/10.1159/000066067
[22]
Adriana Fernandez-Lopez and Federico M. Sukno. 2018. Survey on automatic lip-reading in the era of deep learning. Image and Vision Computing 78 (2018), 53--72. https://doi.org/10.1016/j.imavis.2018.07.002
[23]
Cletus G Fisher. 1968. Confusions among visually perceived consonants. Journal of speech and hearing research 11, 4 (1968), 796--804.
[24]
Fortune Business Insights. 2021. Speech and Voice Recognition Market Size. https://tinyurl.com/yyyxe4rk
[25]
Masaaki Fukumoto. 2018. SilentVoice: Unnoticeable Voice Input by Ingressive Speech. In Proceedings of UIST '18 (Berlin, Germany). Association for Computing Machinery, New York, NY, USA, 237--246. https://doi.org/10.1145/3242587.3242603
[26]
Yang Gao, Yincheng Jin, Jiyang Li, Seokmin Choi, and Zhanpeng Jin. 2020. EchoWhisper: Exploring an Acoustic-Based Silent Speech Interface for Smartphone Users. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 3, Article 80 (Sept. 2020), 27 pages. https://doi.org/10.1145/3411830
[27]
Jose A. Gonzalez, Lam A. Cheah, Angel M. Gomez, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger K. Moore, and Ed Holdsworth. 2017. Direct Speech Reconstruction From Articulatory Sensor Data by Machine Learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 12 (2017), 2362--2374. https://doi.org/10.1109/TASLP.2017.2757263
[28]
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. 369--376.
[29]
Jiaxi Gu, Zhiwen Yu, and Kele Shen. 2020. Alohomora: Motion-Based Hotword Detection in Head-Mounted Displays. IEEE Internet of Things Journal 7, 1 (2020), 611--620. https://doi.org/10.1109/JIOT.2019.2946593
[30]
J. Han, L. Shao, D. Xu, and J. Shotton. 2013. Enhanced Computer Vision With Microsoft Kinect Sensor: A Review. IEEE Transactions on Cybernetics 43, 5 (2013), 1318--1334. https://doi.org/10.1109/TCYB.2013.2265378
[31]
Theodore P. Hill. 2009. Conflations of Probability Distributions. arXiv:0808.1808 [math.PR]
[32]
Hirotaka Hiraki and Jun Rekimoto. 2021. SilentMask: Mask-Type Silent Speech Interface with Measurement of Mouth Movement. In Augmented Humans Conference 2021 (Rovaniemi, Finland) (AHs'21). Association for Computing Machinery, New York, NY, USA, 86--90. https://doi.org/10.1145/3458709.3458985
[33]
Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, and Sergey I. Rybchenko. 2013. Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Communication 55, 1 (2013), 22--32. https://doi.org/10.1016/j.specom.2012.02.001
[34]
Qiang Huang, Yongxiong Wang, and Zhong Yin. 2020. View-based weight network for 3D object recognition. Image and Vision Computing 93 (2020), 103828.
[35]
Thomas Hueber, Elie-Laurent Benaroya, Gérard Chollet, Bruce Denby, Gérard Dreyfus, and Maureen Stone. 2010. Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Communication 52, 4 (2010), 288--300.
[36]
Thomas Hueber, Gérard Chollet, Bruce Denby, and Maureen Stone. 2008. Acquisition of ultrasound, video and acoustic speech data for a silent-speech interface application. Proc. of ISSP (2008), 365--369.
[37]
Monsoon Solutions Inc. 2022. Monsoon Power Monitor. https://www.msoon.com/online-store
[38]
Madeline Jefferson. 2019. Usability of Automatic Speech Recognition Systems for Individuals with Speech Disorders: Past, Present, Future, and A Proposed Model. undefined (2019). https://www.semanticscholar.org/paper/Usability-of-Automatic-Speech-Recognition-Systems-A-Jefferson/73eefd141f43750b3ae0648e6ef099597e24c6c9
[39]
Frederick Jelinek. 1997. Statistical methods for speech recognition. MIT press.
[40]
Yincheng Jin, Yang Gao, Xuhai Xu, Seokmin Choi, Jiyang Li, Feng Liu, Zhengxiong Li, and Zhanpeng Jin. 2022. EarCommand: "Hearing" Your Silent Speech Commands In Ear. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 1--28.
[41]
Ankur Joshi, Saket Kale, Satish Chandel, and D Kumar Pal. 2015. Likert scale: Explored and explained. British Journal of Applied Science & Technology 7, 4 (2015), 396.
[42]
Arnav Kapur, Shreyas Kapur, and Pattie Maes. 2018. AlterEgo: A Personalized Wearable Silent Speech Interface. In 23rd International Conference on Intelligent User Interfaces (Tokyo, Japan) (IUI '18). Association for Computing Machinery, New York, NY, USA, 43--53. https://doi.org/10.1145/3172944.3172977
[43]
Fahim Kawsar, Chulhong Min, Akhil Mathur, Marc Van den Broeck, Utku Günay Acer, and Claudio Forlivesi. 2018. esense: Earable platform for human sensing. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. 541--541.
[44]
Rachel Kaye, Christopher G Tang, and Catherine F Sinclair. 2017. The electrolarynx: voice restoration after total laryngectomy. Medical Devices (Auckland, NZ) 10 (2017), 133.
[45]
Gokce Keskin, Tyler Lee, Cory Stephenson, and Oguz H Elibol. 2019. Measuring the effectiveness of voice conversion on speaker identification and automatic speech recognition systems. arXiv preprint arXiv:1905.12531 (2019).
[46]
Prerna Khanna, Tanmay Srivastava, Shijia Pan, Shubham Jain, and Phuc Nguyen. 2021. JawSense: recognizing unvoiced sound using a low-cost ear-worn system. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications. 44--49.
[47]
Rohan Khanna, Daegun Oh, and Youngwook Kim. 2019. Through-Wall Remote Human Voice Recognition Using Doppler Radar With Transfer Learning. IEEE Sensors Journal 19, 12 (June 2019), 4571--4576. https://doi.org/10.1109/JSEN.2019.2901271 Conference Name: IEEE Sensors Journal.
[48]
Naoki Kimura, Tan Gemicioglu, Jonathan Womack, Richard Li, Yuhui Zhao, Abdelkareem Bedri, Zixiong Su, Alex Olwal, Jun Rekimoto, and Thad Starner. 2022. SilentSpeller: Towards mobile, hands-free, silent speech text entry using electropalatography. In CHI Conference on Human Factors in Computing Systems. 1--19.
[49]
Naoki Kimura, Kentaro Hayashi, and Jun Rekimoto. 2020. TieLent: A Casual Neck-Mounted Mouth Capturing Device for Silent Speech Interaction. In Proceedings of the International Conference on Advanced Visual Interfaces (Salerno, Italy) (AVI '20). Association for Computing Machinery, New York, NY, USA, Article 33, 8 pages. https://doi.org/10.1145/3399715.3399852
[50]
Naoki Kimura, Michinari Kono, and Jun Rekimoto. 2019. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks. In Proceedings of CHI 2019 (Glasgow, Scotland Uk). Association for Computing Machinery, New York, NY, USA, 1--11. https://doi.org/10.1145/3290605.3300376
[51]
Mbient Lab. 2020. Mbient IMU. https://mbientlab.com/metamotionr/
[52]
James R Lewis. 2018. The system usability scale: past, present, and future. International Journal of Human-Computer Interaction 34, 7 (2018), 577--590.
[53]
Richard Li, Jason Wu, and Thad Starner. 2019. TongueBoard: An Oral Interface for Subtle Input. In Proceedings of the 10th Augmented Human International Conference 2019 (Reims, France) (AH2019). Association for Computing Machinery, New York, NY, USA, Article 1, 9 pages. https://doi.org/10.1145/3311823.3311831
[54]
Rochelle Lieber. [n.d.]. Point and manner of articulation of English consonants and vowels. Introducing Morphology ([n. d.]), xii-xii. https://doi.org/10.1017/cbo9780511808845.003
[55]
LifeWire. 2021. Top commands. https://www.lifewire.com/top-google-assistant-and-google-home-commands-4158256
[56]
Ian Maddieson. 2013. Voicing and gaps in plosive systems. The world atlas of language structures online (2013).
[57]
Magoosh. 2022. 44 Phonemes In English And Other Sound Blends. https://magoosh.com/english-speaking/44-phonemes-in-english-and-other-sound-blends/
[58]
Hiroyuki Manabe, Akira Hiraiwa, and Toshiaki Sugimura. 2003. Unvoiced speech recognition using EMG-mime speech recognition. In CHI'03 extended abstracts on Human factors in computing systems. 794--795.
[59]
Geoffrey S Meltzner, James T Heaton, Yunbin Deng, Gianluca De Luca, Serge H Roy, and Joshua C Kline. 2018. Development of sEMG sensors and algorithms for silent speech recognition. Journal of neural engineering 15, 4 (2018), 046031.
[60]
Geoffrey S Meltzner, James T Heaton, Yunbin Deng, Gianluca De Luca, Serge H Roy, and Joshua C Kline. 2018. Development of sEMG sensors and algorithms for silent speech recognition. Journal of neural engineering 15, 4 (2018), 046031.
[61]
Phuc Nguyen, Nam Bui, Anh Nguyen, Hoang Truong, Abhijit Suresh, Matt Whitlock, Duy Pham, Thang Dinh, and Tam Vu. 2018. TYTH-Typing On Your Teeth: Tongue-Teeth Localization for Human-Computer Interface. In Proceedings of MobiSys 2018 (Munich, Germany). Association for Computing Machinery, New York, NY, USA, 269--282. https://doi.org/10.1145/3210240.3210322
[62]
Diane Corcoran Nielsen and Barbara Luetke-Stahlman. 2002. Phonological Awareness: One Key to the Reading Proficiency of Deaf Children. American Annals of the Deaf 147, 3(2002), 11--19. https://doi.org/10.1353/aad.2012.0213 Publisher: Gallaudet University Press.
[63]
The University of Reading. 2021. The production of speech sounds. http://www.personal.rdg.ac.uk/~llsroach/phon2/artic-basics.htm
[64]
John J. Ohala and Haruko Kawasaki-Fukumori. [n.d.]. Alternatives to the sonority hierarchy for explaining segmental sequential constraints. Language and its Ecology ([n.d.]). https://doi.org/10.1515/9783110805369.343
[65]
Laxmi Pandey and Ahmed Sabbir Arif. 2021. LipType: A Silent Speech Recognizer Augmented with an Independent Repair Model. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3411764.3445565
[66]
Laxmi Pandey, Khalad Hasan, and Ahmed Sabbir Arif. 2021. Acceptability of Speech and Silent Speech Input Methods in Private and Public. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3411764.3445430
[67]
Yuru Pei, Tae-Kyun Kim, and Hongbin Zha. 2013. Unsupervised random forest manifold alignment for lipreading. In Proceedings of the IEEE International Conference on Computer Vision. 129--136.
[68]
Physiopedia. 2021. TMJ Anatomy. https://www.physio-pedia.com/TMJ_Anatomy
[69]
PlayStore. [n.d.]. Sound Meter. https://play.google.com/store/apps/details?id=com.gamebasic.decibel&hl=en_US&gl=US
[70]
Jay Prakash, Zhijian Yang, Yu-Lin Wei, Haitham Hassanieh, and Romit Roy Choudhury. 2020. EarSense: Earphones as a Teeth Activity Sensor. In Proceedings of MobiCom 2020 (London, United Kingdom). Association for Computing Machinery, New York, NY, USA, Article 40, 13 pages. https://doi.org/10.1145/3372224.3419197
[71]
Amanda Purington, Jessie G. Taft, Shruti Sannon, Natalya N. Bazarova, and Samuel Hardman Taylor. 2017. "Alexa is My New BFF": Social Roles, User Satisfaction, and Personification of the Amazon Echo. In Proceedings of CHI EA 2017 (Denver, Colorado, USA). Association for Computing Machinery, New York, NY, USA, 2853--2859. https://doi.org/10.1145/3027063.3053246
[72]
L.R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (1989), 257--286. https://doi.org/10.1109/5.18626
[73]
Jun Rekimoto and Yu Nishimura. 2021. Derma: Silent Speech Interaction Using Transcutaneous Motion Sensing. In Augmented Humans Conference 2021 (Rovaniemi, Finland) (AHs'21). Association for Computing Machinery, New York, NY, USA, 91--100. https://doi.org/10.1145/3458709.3458941
[74]
Sherry Ruan, Jacob O Wobbrock, Kenny Liou, Andrew Ng, and James Landay. 2016. Speech is 3x faster than typing for english and mandarin text entry on mobile devices. arXiv preprint arXiv:1608.07323 (2016).
[75]
Himanshu Sahni, Abdelkareem Bedri, Gabriel Reyes, Pavleen Thukral, Zehua Guo, Thad Starner, and Maysam Ghovanloo. 2014. The Tongue and Ear Interface: A Wearable System for Silent Speech Recognition. In Proceedings of the 2014 ACM International Symposium on Wearable Computers (Seattle, Washington) (ISWC '14). Association for Computing Machinery, New York, NY, USA, 47--54. https://doi.org/10.1145/2634317.2634322
[76]
Himanshu Sahni, Abdelkareem Bedri, Gabriel Reyes, Pavleen Thukral, Zehua Guo, Thad Starner, and Maysam Ghovanloo. 2014. The Tongue and Ear Interface: A Wearable System for Silent Speech Recognition. In Proceedings of the 2014 ACM International Symposium on Wearable Computers (Seattle, Washington) (ISWC 14). Association for Computing Machinery, New York, NY, USA, 47--54. https://doi.org/10.1145/2634317.2634322
[77]
Haşim Sak, Andrew Senior, Kanishka Rao, and Françoise Beaufays. 2015. Fast and accurate recurrent neural network acoustic models for speech recognition. arXiv preprint arXiv:1507.06947 (2015).
[78]
Amin Honarmandi Shandiz and László Tóth. 2021. Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks. arXiv preprint arXiv:2105.13718 (2021).
[79]
Noah H Silbert. 2012. Syllable structure and integration of voicing and manner of articulation information in labial consonant identification. The Journal of the Acoustical Society of America 131, 5 (2012), 4076--4086.
[80]
Jorge Silva and Shrikanth Narayanan. 2008. Upper Bound Kullback-Leibler Divergence for Transient Hidden Markov Models. IEEE Transactions on Signal Processing 56, 9 (2008), 4176--4188. https://doi.org/10.1109/TSP.2008.924137
[81]
Cheryl Smith Gabig. 2010. Phonological Awareness and Word Recognition in Reading by Children With Autism. Communication Disorders Quarterly 31, 2 (Feb. 2010), 67--85. https://doi.org/10.1177/1525740108328410 Publisher: SAGE Publications Inc.
[82]
Joon Son Chung, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. 2017. Lip reading sentences in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6447--6456.
[83]
Ke Sun, Chun Yu, Weinan Shi, Lan Liu, and Yuanchun Shi. 2018. Lip-interact: Improving mobile device interaction with silent speech commands. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 581--593.
[84]
J. Tan, C. Nguyen, and X. Wang. 2017. SilentTalk: Lip reading through ultrasonic sensing on mobile phones. In IEEE INFOCOM 2017-IEEE Conference on Computer Communications. 1--9. https://doi.org/10.1109/INFOCOM.2017.8057099
[85]
Mohamed Trabelsi, Jin Cao, and Jeff Heflin. 2021. SeLaB: Semantic Labeling with BERT. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.
[86]
Kwang-Hyun Uhm, Seung-Won Jung, Moon Hyung Choi, Hong-Kyu Shin, Jae-Ik Yoo, Se Won Oh, Jee Young Kim, Hyun Gi Kim, Young Joon Lee, Seo Yeon Youn, et al. 2021. Deep learning for end-to-end kidney cancer diagnosis on multi-phase abdominal computed tomography. NPJ Precision Oncology 5, 1 (2021), 1--6.
[87]
G. Wang, Y. Zou, Z. Zhou, K. Wu, and L. M. Ni. 2016. We Can Hear You with Wi-Fi! IEEE Transactions on Mobile Computing 15, 11 (2016), 2907--2920. https://doi.org/10.1109/TMC.2016.2517630
[88]
Jingxian Wang, Chengfeng Pan, Haojian Jin, Vaibhav Singh, Yash Jain, Jason I. Hong, Carmel Majidi, and Swarun Kumar. 2019. RFID Tattoo: A Wireless Platform for Speech Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 4, Article 155 (Dec. 2019), 24 pages. https://doi.org/10.1145/3369812
[89]
Xinyuan Wang, Make Tao, Runpu Wang, and Likui Zhang. 2021. Reduce the medical burden: An automatic medical triage system using text classification BERT based on Transformer structure. In 2021 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE). IEEE, 679--685.
[90]
Zhangyu Xiao, Zhijian Ou, Wei Chu, and Hui Lin. 2018. Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units. In 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP). 146--150. https://doi.org/10.1109/ISCSLP.2018.8706675
[91]
Wai Chee Yau, Sridhar Poosapadi Arjunan, and Dinesh Kant Kumar. 2008. Classification of voiceless speech using facial muscle activity and vision based techniques. In TENCON 2008-2008 IEEE Region 10 Conference. IEEE, 1--6.
[92]
Hallie Kay Yopp. 1988. The validity and reliability of phonemic awareness tests. Reading research quarterly (1988), 159--177.
[93]
Dong Yu and Jinyu Li. 2017. Recent progresses in deep learning based acoustic models. IEEE/CAA Journal of automatica sinica 4, 3 (2017), 396--409.
[94]
Qian Zhang, Dong Wang, Run Zhao, and Yinggang Yu. 2021. SoundLip: Enabling Word and Sentence-Level Lip Interaction for Smart Devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 1, Article 43 (March 2021), 28 pages. https://doi.org/10.1145/3448087
[95]
Qian Zhang, Dong Wang, Run Zhao, and Yinggang Yu. 2021. SoundLip: Enabling Word and Sentence-level Lip Interaction for Smart Devices. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 1 (2021), 1--28.
[96]
Yongzhao Zhang, Wei-Hsiang Huang, Chih-Yun Yang, Wen-Ping Wang, Yi-Chao Chen, Chuang-Wen You, Da-Yuan Huang, Guangtao Xue, and Jiadi Yu. 2020. Endophasia: Utilizing acoustic-based imaging for issuing contact-free silent speech commands. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--26.

Cited By

View all
  • (2024)TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable PlatformsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997578:4(1-29)Online publication date: 21-Nov-2024
  • (2024)Whispering Wearables: Multimodal Approach to Silent Speech Recognition with Head-Worn DevicesProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685720(214-223)Online publication date: 4-Nov-2024
  • (2024)Poster Unvoiced: Designing an Unvoiced User Interface using Earables and LLMsProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699413(871-872)Online publication date: 4-Nov-2024
  • Show More Cited By

Index Terms

  1. MuteIt: Jaw Motion Based Unvoiced Command Recognition Using Earable

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
    Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 6, Issue 3
    September 2022
    1612 pages
    EISSN:2474-9567
    DOI:10.1145/3563014
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 September 2022
    Published in IMWUT Volume 6, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. IMU Sensing
    2. Signal Processing
    3. Unvoiced Speech

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)622
    • Downloads (Last 6 weeks)60
    Reflects downloads up to 16 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable PlatformsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997578:4(1-29)Online publication date: 21-Nov-2024
    • (2024)Whispering Wearables: Multimodal Approach to Silent Speech Recognition with Head-Worn DevicesProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685720(214-223)Online publication date: 4-Nov-2024
    • (2024)Poster Unvoiced: Designing an Unvoiced User Interface using Earables and LLMsProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699413(871-872)Online publication date: 4-Nov-2024
    • (2024)Unvoiced: Designing an LLM-assisted Unvoiced User Interface using EarablesProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699374(784-798)Online publication date: 4-Nov-2024
    • (2024)Lipwatch: Enabling Silent Speech Recognition on Smartwatches using Acoustic SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596148:2(1-29)Online publication date: 15-May-2024
    • (2024)Enabling Hands-Free Voice Assistant Activation on EarphonesProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661890(155-168)Online publication date: 3-Jun-2024
    • (2024)F2Key: Dynamically Converting Your Face into a Private Key Based on COTS Headphones for Reliable Voice InteractionProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661860(127-140)Online publication date: 3-Jun-2024
    • (2024)IOTeethProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435168:1(1-29)Online publication date: 6-Mar-2024
    • (2024)Exploring Uni-manual Around Ear Off-Device Gestures for EarablesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435138:1(1-29)Online publication date: 6-Mar-2024
    • (2024)Enabling Accessible and Ubiquitous Interaction in Next-Generation Wearables: An Unvoiced Speech ApproachProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3695908(2257-2259)Online publication date: 4-Dec-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media