research-article

LipLearner: Customizable Silent Speech Interactions on Mobile Devices

Authors:

Jun RekimotoAuthors Info & Claims

CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

Article No.: 696, Pages 1 - 21

https://doi.org/10.1145/3544548.3581465

Published: 19 April 2023 Publication History

Abstract

Silent speech interface is a promising technology that enables private communications in natural language. However, previous approaches only support a small and inflexible vocabulary, which leads to limited expressiveness. We leverage contrastive learning to learn efficient lipreading representations, enabling few-shot command customization with minimal user effort. Our model exhibits high robustness to different lighting, posture, and gesture conditions on an in-the-wild dataset. For 25-command classification, an F1-score of 0.8947 is achievable only using one shot, and its performance can be further boosted by adaptively learning from more data. This generalizability allowed us to develop a mobile silent speech interface empowered with on-device fine-tuning and visual keyword spotting. A user study demonstrated that with LipLearner, users could define their own commands with high reliability guaranteed by an online incremental learning scheme. Subjective feedback indicated that our system provides essential functionalities for customizable silent speech interactions with high usability and learnability.

Supplementary Material

Supplemental Materials (3544548.3581465-supplemental-materials.zip)

Download
38.52 MB

MP4 File (3544548.3581465-video-figure.mp4)

Video Figure

Download
79.93 MB

MP4 File (3544548.3581465-video-preview.mp4)

Video Preview

Download
85.45 MB

MP4 File (3544548.3581465-talk-video.mp4)

Pre-recorded Video Presentation

Download
281.77 MB

References

[1]

Miguel Angrick, Christian Herff, Emily Mugler, Matthew C Tate, Marc W Slutzky, Dean J Krusienski, and Tanja Schultz. 2019. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. Journal of neural engineering 16, 3 (2019), 036019.

[2]

Relja Arandjelovic and Andrew Zisserman. 2018. Objects that sound. In Proceedings of the European conference on computer vision (ECCV). 435–451.

Digital Library

[3]

Aaron Bangor, Philip T Kortum, and James T Miller. 2008. An empirical evaluation of the system usability scale. Intl. Journal of Human–Computer Interaction 24, 6(2008), 574–594.

[4]

John Brooke 1996. SUS-A quick and dirty usability scale. Usability evaluation in industry 189, 194 (1996), 4–7.

[5]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

[6]

Tuochao Chen, Benjamin Steeper, Kinan Alsheikh, Songyun Tao, François Guimbretière, and Cheng Zhang. 2020. C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-Mounted Miniature Cameras. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 112–125. https://doi.org/10.1145/3379337.3415879

Digital Library

[7]

Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. 2019. A Closer Look at Few-shot Classification. CoRR abs/1904.04232(2019). arXiv:1904.04232http://arxiv.org/abs/1904.04232

[8]

Xinlei Chen, Saining Xie, and Kaiming He. 2021. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9640–9649.

[9]

Joon Son Chung and Andrew Zisserman. 2016. Lip reading in the wild. In Asian conference on computer vision. Springer, 87–103.

[10]

Statista Research Department. 2022. Main devices used with voice assistants in the U.S. 2021, by brand. Retrieved April 28, 2022 from https://www.statista.com/statistics/1274398/voice-assistant-use-by-device-united-states/

[11]

Tony Ezzat and Tomaso Poggio. 1998. Miketalk: A talking facial display based on morphing visemes. In Proceedings Computer Animation’98 (Cat. No. 98EX169). IEEE, 96–102.

[12]

Tony Ezzat and Tomaso Poggio. 2000. Visual speech synthesis by morphing visemes. International Journal of Computer Vision 38, 1 (2000), 45–57.

Digital Library

[13]

Michael J Fagan, Stephen R Ell, James M Gilbert, E Sarrazin, and Peter M Chapman. 2008. Development of a (silent) speech recognition system for patients following laryngectomy. Medical engineering & physics 30, 4 (2008), 419–425.

[14]

Dalu Feng, Shuang Yang, Shiguang Shan, and Xilin Chen. 2020. Learn an effective lip reading model without pains. arXiv preprint arXiv:2011.07557(2020).

[15]

Masaaki Fukumoto. 2018. Silentvoice: Unnoticeable voice input by ingressive speech. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 237–246.

Digital Library

[16]

Ruohan Gao and Kristen Grauman. 2021. Visualvoice: Audio-visual speech separation with cross-modal consistency. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 15490–15500.

[17]

Jose A Gonzalez, Lam A Cheah, James M Gilbert, Jie Bai, Stephen R Ell, Phil D Green, and Roger K Moore. 2016. A silent speech system based on permanent magnet articulography and direct synthesis. Computer Speech & Language 39 (2016), 67–87.

Digital Library

[18]

Frank H Guenther, Jonathan S Brumberg, E Joseph Wright, Alfonso Nieto-Castanon, Jason A Tourville, Mikhail Panko, Robert Law, Steven A Siebert, Jess L Bartels, Dinal S Andreasen, 2009. A wireless brain-machine interface for real-time speech synthesis. PloS one 4, 12 (2009), e8218.

[19]

Harish Haresamudram, Irfan Essa, and Thomas Plötz. 2021. Contrastive Predictive Coding for Human Activity Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 2, Article 65 (jun 2021), 26 pages. https://doi.org/10.1145/3463506

Digital Library

[20]

Christian Herff, Dominic Heger, Adriana De Pesters, Dominic Telaar, Peter Brunner, Gerwin Schalk, and Tanja Schultz. 2015. Brain-to-text: decoding spoken phrases from phone representations in the brain. Frontiers in neuroscience 9 (2015), 217.

[21]

Yiyang Huang, Xuefeng Liang, and Chaowei Fang. 2021. CALLip: Lipreading Using Contrastive and Attribute Learning. In Proceedings of the 29th ACM International Conference on Multimedia (Virtual Event, China) (MM ’21). Association for Computing Machinery, New York, NY, USA, 2492–2500. https://doi.org/10.1145/3474085.3475420

Digital Library

[22]

Thomas Hueber, Elie-Laurent Benaroya, Gérard Chollet, Bruce Denby, Gérard Dreyfus, and Maureen Stone. 2010. Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Communication 52, 4 (2010), 288–300.

Digital Library

[23]

Apple Inc.2022. Core ML | Apple Developer Documentation. Retrieved Feb. 9, 2023 from https://developer.apple.com/documentation/coreml

[24]

Apple Inc.2022. Create ML | Apple Developer Documentation. Retrieved Feb. 9, 2023 from https://developer.apple.com/documentation/createml

[25]

Apple Inc.2022. SFSpeechRecognizer | Apple Developer Documentation. Retrieved Feb. 9, 2023 from https://developer.apple.com/documentation/speech/sfspeechrecognizer

[26]

Apple Inc.2022. Shortcuts User Guide - Apple Support. Retrieved Feb. 9, 2023 from https://support.apple.com/guide/shortcuts/welcome/ios

[27]

Apple Inc.2022. Vision | Apple Developer Documentation. Retrieved Feb. 9, 2023 from https://developer.apple.com/documentation/vision

[28]

Apple Inc.2022. What can I ask Siri? - Official Apple Support. Retrieved Feb. 9, 2023 from https://support.apple.com/siri

[29]

Dhruv Jain, Khoa Huynh Anh Nguyen, Steven M. Goodman, Rachel Grossman-Kahn, Hung Ngo, Aditya Kusupati, Ruofei Du, Alex Olwal, Leah Findlater, and Jon E. Froehlich. 2022. ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 305, 16 pages. https://doi.org/10.1145/3491102.3502020

Digital Library

[30]

Yan Ji, Licheng Liu, Hongcui Wang, Zhilei Liu, Zhibin Niu, and Bruce Denby. 2018. Updating the silent speech challenge benchmark with deep learning. Speech Communication 98(2018), 42–50.

Digital Library

[31]

Arnav Kapur, Shreyas Kapur, and Pattie Maes. 2018. Alterego: A personalized wearable silent speech interface. In 23rd International conference on intelligent user interfaces. 43–53.

Digital Library

[32]

Vahid Kazemi and Josephine Sullivan. 2014. One Millisecond Face Alignment with an Ensemble of Regression Trees. In CVPR.

[33]

Naoki Kimura, Tan Gemicioglu, Jonathan Womack, Richard Li, Yuhui Zhao, Abdelkareem Bedri, Zixiong Su, Alex Olwal, Jun Rekimoto, and Thad Starner. 2022. SilentSpeller: Towards mobile, hands-free, silent speech text entry using electropalatography. In CHI Conference on Human Factors in Computing Systems. 1–19.

Digital Library

[34]

Naoki Kimura, Kentaro Hayashi, and Jun Rekimoto. 2020. TieLent: A Casual Neck-Mounted Mouth Capturing Device for Silent Speech Interaction. In Proceedings of the International Conference on Advanced Visual Interfaces. 1–8.

Digital Library

[35]

Naoki Kimura, Michinari Kono, and Jun Rekimoto. 2019. SottoVoce: an ultrasound imaging-based silent speech interaction using deep neural networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–11.

Digital Library

[36]

Naoki Kimura, Zixiong Su, and Takaaki Saeki. 2020. End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge. In INTERSPEECH. 1025–1026.

[37]

Naoki Kimura, Zixiong Su, Takaaki Saeki, and Jun Rekimoto. 2022. SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech Recognition. In Proceedings of the Thirteenth Language Resources and Evaluation Conference. 6866–6873.

[38]

Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755–1758.

Digital Library

[39]

Soonkyu Lee and DongSuk Yook. 2002. Audio-to-visual conversion using hidden markov models. In Pacific Rim International Conference on Artificial Intelligence. Springer, 563–570.

[40]

Richard Li, Jason Wu, and Thad Starner. 2019. Tongueboard: An oral interface for subtle input. In Proceedings of the 10th Augmented Human International Conference 2019. 1–9.

Digital Library

[41]

Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, 2019. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172(2019).

[42]

Brais Martinez, Pingchuan Ma, Stavros Petridis, and Maja Pantic. 2020. Lipreading using temporal convolutional networks. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6319–6323.

[43]

Geoffrey S Meltzner, James T Heaton, Yunbin Deng, Gianluca De Luca, Serge H Roy, and Joshua C Kline. 2018. Development of sEMG sensors and algorithms for silent speech recognition. Journal of neural engineering 15, 4 (2018), 046031.

[44]

Carolina Milanesi. 2016. Voice Assistant Anyone? Yes please, but not in public. Creative Strategies (2016).

[45]

Laxmi Pandey and Ahmed Sabbir Arif. 2021. LipType: A Silent Speech Recognizer Augmented with an Independent Repair Model. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–19.

Digital Library

[46]

Stavros Petridis, Themos Stafylakis, Pingehuan Ma, Feipeng Cai, Georgios Tzimiropoulos, and Maja Pantic. 2018. End-to-end audiovisual speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 6548–6552.

Digital Library

[47]

Anne Porbadnigk, Marek Wester, Jan-P Calliess, and Tanja Schultz. 2009. EEG-based speech recognition.

[48]

KR Prajwal, Rudrabha Mukhopadhyay, Vinay P Namboodiri, and CV Jawahar. 2020. Learning individual speaking styles for accurate lip to speech synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13796–13805.

[49]

Qinwan Rabbani, Griffin Milsap, and Nathan E Crone. 2019. The potential for a speech brain–computer interface using chronic electrocorticography. Neurotherapeutics 16, 1 (2019), 144–165.

[50]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.

[51]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1–67. http://jmlr.org/papers/v21/20-074.html

[52]

Takeshi Saitoh and Michiko Kubokawa. 2019. LiP25w: Word-level Lip Reading Web Application for Smart Device. The 15th International Conference on Auditory-Visual Speech Processing (2019).

[53]

Paul W Schönle, Klaus Gräbe, Peter Wenig, Jörg Höhne, Jörg Schrader, and Bastian Conrad. 1987. Electromagnetic articulography: Use of alternating magnetic fields for tracking movements of multiple points inside and outside the vocal tract. Brain and Language 31, 1 (1987), 26–35.

[54]

Alex Sciuto, Arnita Saini, Jodi Forlizzi, and Jason I Hong. 2018. " Hey Alexa, What’s Up?" A Mixed-Methods Studies of In-Home Conversational Agent Usage. In Proceedings of the 2018 designing interactive systems conference. 857–868.

Digital Library

[55]

Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, and In So Kweon. 2018. Learning to localize sound source in visual scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4358–4366.

[56]

Changchong Sheng, Matti Pietikäinen, Qi Tian, and Li Liu. 2021. Cross-Modal Self-Supervised Learning for Lip Reading: When Contrastive Learning Meets Adversarial Training. In Proceedings of the 29th ACM International Conference on Multimedia (Virtual Event, China) (MM ’21). Association for Computing Machinery, New York, NY, USA, 2456–2464. https://doi.org/10.1145/3474085.3475415

Digital Library

[57]

Zixiong Su, Xinlei Zhang, Naoki Kimura, and Jun Rekimoto. 2021. Gaze+ Lip: Rapid, Precise and Expressive Interactions Combining Gaze Input and Silent Speech Commands for Hands-free Smart TV Control. In ACM Symposium on Eye Tracking Research and Applications. 1–6.

[58]

Ke Sun, Chun Yu, Weinan Shi, Lan Liu, and Yuanchun Shi. 2018. Lip-Interact: Improving Mobile Device Interaction with Silent Speech Commands. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology(Berlin, Germany) (UIST ’18). Association for Computing Machinery, New York, NY, USA, 581–593. https://doi.org/10.1145/3242587.3242599

Digital Library

[59]

Tomoki Toda, Mikihiro Nakagiri, and Kiyohiro Shikano. 2012. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing 20, 9(2012), 2505–2517.

Digital Library

[60]

Tomoki Toda, Keigo Nakamura, Hidehiko Sekimoto, and Kiyohiro Shikano. 2009. Voice conversion for various types of body transmitted speech. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3601–3604.

Digital Library

[61]

Tomoki Toda and Kiyohiro Shikano. 2005. NAM-to-speech conversion with Gaussian mixture models. (2005).

[62]

Carnegie Mellon University. 2011. The CMU Pronouncing Dictionary. Retrieved Feb. 9, 2023 from http://www.speech.cs.cmu.edu/cgi-bin/cmudict

[63]

Aaron Van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv e-prints (2018), arXiv–1807.

[64]

Michiel Visser, Mannes Poel, and Anton Nijholt. 1999. Classifying visemes for automatic lipreading. In International Workshop on Text, Speech and Dialogue. Springer, 349–352.

[65]

Michael Wand and Tanja Schultz. 2011. Session-independent EMG-based Speech Recognition. In Biosignals. 295–300.

[66]

Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, and Helen Meng. 2022. VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7252–7256. https://doi.org/10.1109/ICASSP43922.2022.9747427

[67]

Jason Wu, Chris Harrison, Jeffrey P. Bigham, and Gierad Laput. 2020. Automated Class Discovery and One-Shot Interactions for Acoustic Activity Recognition. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3313831.3376875

Digital Library

[68]

Xuhai Xu, Jun Gong, Carolina Brum, Lilian Liang, Bongsoo Suh, Shivam Kumar Gupta, Yash Agarwal, Laurence Lindsey, Runchang Kang, Behrooz Shahsavari, Tu Nguyen, Heriberto Nieto, Scott E Hudson, Charlie Maalouf, Jax Seyed Mousavi, and Gierad Laput. 2022. Enabling Hand Gesture Customization on Wrist-Worn Devices. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 496, 19 pages. https://doi.org/10.1145/3491102.3501904

Digital Library

[69]

Ruidong Zhang, Mingyang Chen, Benjamin Steeper, Yaxuan Li, Zihan Yan, Yizhuo Chen, Songyun Tao, Tuochao Chen, Hyunchul Lim, and Cheng Zhang. 2022. SpeeChin: A Smart Necklace for Silent Speech Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4, Article 192 (dec 2022), 23 pages. https://doi.org/10.1145/3494987

Digital Library

Cited By

Shah NSahipjohn NTambrahalli VSubramanian RGandhi V(2024)StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the SkinProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785158:3(1-21)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678515
Zhang GHu ZBulling A(2024)DisMouse: Disentangling Information from Mouse Movement DataProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676411(1-13)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676411
Hiraki HKanazawa SMiura TYoshida MMochimaru MRekimoto J(2024)WhisperMask: a noise suppressive mask-type microphone for whisper speechProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652925(1-14)Online publication date: 4-Apr-2024
https://dl.acm.org/doi/10.1145/3652920.3652925
Show More Cited By

Index Terms

LipLearner: Customizable Silent Speech Interactions on Mobile Devices
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Sound-based input / output
    2. Interaction techniques

Recommendations

LipLearner: Customizing Silent Speech Commands from Voice Input using One-shot Lipreading
UIST '22 Adjunct: Adjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology

We present LipLearner, a lipreading-based silent speech interface that enables in-situ command customization on mobile devices. By leveraging contrastive learning to learn efficient representations from existing datasets, it performs instant fine-tuning ...
Statistical conversion of silent articulation into audible speech using full-covariance HMM

Conversion of silent articulation captured by ultrasound and video to modal speech.Comparison of GMM and full-covariance phonetic HMM without vocabulary limitation.HMM-based approach allows the use of linguistic information for regularization.Objective ...
Mobile, Hands-free, Silent Speech Texting Using SilentSpeller
CHI EA '21: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems

Voice control provides hands-free access to computing, but there are many situations where audible speech is not appropriate. Most unvoiced speech text entry systems can not be used while on-the-go due to movement artifacts. SilentSpeller enables mobile ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

April 2023

14911 pages

ISBN:9781450394215

DOI:10.1145/3544548

Editors:
Albrecht Schmidt
LMU Munich, Germany60028717
,
Kaisa Väänänen
Tampere University, Finland60011170
,
Tesh Goyal
Google Research, USA60006191
,
Per Ola Kristensson
University of Cambridge, UK60031101
,
Anicia Peters
University of Namibia, Namibia60072704
,
Stefanie Mueller
Massachusetts Institute of Technology, USA60022195
,
Julie R. Williamson
University of Glasgow, UK60001490
,
Max L. Wilson
University of Nottingham, UK60015138

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 April 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Paper

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

JST CREST
JST Moonshot R&D

Conference

CHI '23

Sponsor:

SIGCHI

CHI '23: CHI Conference on Human Factors in Computing Systems

April 23 - 28, 2023

Hamburg, Germany

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
1,986
Total Downloads

Downloads (Last 12 months)1,218
Downloads (Last 6 weeks)195

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shah NSahipjohn NTambrahalli VSubramanian RGandhi V(2024)StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the SkinProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785158:3(1-21)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678515
Zhang GHu ZBulling A(2024)DisMouse: Disentangling Information from Mouse Movement DataProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676411(1-13)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676411
Hiraki HKanazawa SMiura TYoshida MMochimaru MRekimoto J(2024)WhisperMask: a noise suppressive mask-type microphone for whisper speechProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652925(1-14)Online publication date: 4-Apr-2024
https://dl.acm.org/doi/10.1145/3652920.3652925
Chen TYang YQiu CFan XGuo XShangguan LOkoshi TKo JLiKamWa R(2024)Enabling Hands-Free Voice Assistant Activation on EarphonesProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661890(155-168)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661890
Zhan LXiong TZhang HGuo SChen XGong JLin JQin Y(2024)TouchEditorProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314547:4(1-29)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631454
Pandey LArif A(2024)MELDER: The Design and Evaluation of a Real-time Silent Speech Recognizer for Mobile DevicesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642348(1-23)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642348
Zhang GHu ZBâce MBulling A(2024)Mouse2Vec: Learning Reusable Semantic Representations of Mouse BehaviourProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642141(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642141
Dong XChen YNishiyama YSezaki KWang YChristofferson KMariakakis A(2024)ReHEarSSE: Recognizing Hidden-in-the-Ear Silently Spelled ExpressionsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642095(1-16)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642095
Wang XSu ZRekimoto JZhang Y(2024)Watch Your Mouth: Silent Speech Recognition with Depth SensingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642092(1-15)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642092
Kanamaru TSaitoh T(2024)KuchiNavi: lip-reading-based navigation appFifteenth International Conference on Graphics and Image Processing (ICGIP 2023)10.1117/12.3021118(47)Online publication date: 25-Mar-2024
https://doi.org/10.1117/12.3021118
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Table of Contents