Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3172944.3172977acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

AlterEgo: A Personalized Wearable Silent Speech Interface

Published: 05 March 2018 Publication History

Abstract

We present a wearable interface that allows a user to silently converse with a computing device without any voice or any discernible movements - thereby enabling the user to communicate with devices, AI assistants, applications or other people in a silent, concealed and seamless manner. A user's intention to speak and internal speech is characterized by neuromuscular signals in internal speech articulators that are captured by the AlterEgo system to reconstruct this speech. We use this to facilitate a natural language user interface, where users can silently communicate in natural language and receive aural output (e.g - bone conduction headphones), thereby enabling a discreet, bi-directional interface with a computing device, and providing a seamless form of intelligence augmentation. The paper describes the architecture, design, implementation and operation of the entire system. We demonstrate robustness of the system through user studies and report 92% median word accuracy levels.

References

[1]
Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, and Zhenyao Zhu. 2015. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. arXiv {cs.CL}. Retrieved from http://arxiv.org/abs/1512.02595
[2]
W. Ross Ashby. 1956. Design for an intelligence-amplifier. Automata studies 400: 215--233.
[3]
W. Ross Ashby. 1957. An introduction to cybernetics. Retrieved from http://dspace.utalca.cl/handle/1950/6344
[4]
Alan Baddeley, Marge Eldridge, and Vivien Lewis. 1981. The role of subvocalisation in reading. The Quarterly Journal of Experimental Psychology Section A 33, 4: 439--454.
[5]
Richard A. Bolt. 1980. Put-that-there: Voice and Gesture at the Graphics Interface. SIGGRAPH Comput. Graph. 14, 3: 262--270.
[6]
Jonathan S. Brumberg, Alfonso Nieto-Castanon, Philip R. Kennedy, and Frank H. Guenther. 2010. Brain-Computer Interfaces for Speech Communication. Speech communication 52, 4: 367--379.
[7]
Douglas C. Engelbart. 2001. Augmenting human intellect: a conceptual framework (1962). PACKER, Randall and JORDAN, Ken. Multimedia. From Wagner to Virtual Reality. New York: WW Norton & Company: 64--90.
[8]
Douglas C. Engelbart and William K. English. 1968. A Research Center for Augmenting Human Intellect. In Proceedings of the December 9--11, 1968, Fall Joint Computer Conference, Part I (AFIPS '68 (Fall, part I)), 395--410.
[9]
M. J. Fagan, S. R. Ell, J. M. Gilbert, E. Sarrazin, and P. M. Chapman. 2008. Development of a (silent) speech recognition system for patients following laryngectomy. Medical engineering & physics 30, 4: 419--425.
[10]
Victoria M. Florescu, Lise Crevier-Buchman, Bruce Denby, Thomas Hueber, Antonia Colazo-Simon, Claire Pillot-Loiseau, Pierre Roussel, Cédric Gendrot, and Sophie Quattrocchi. 2010. Silent vs vocalized articulation for a portable ultrasound-based silent speech interface. In Eleventh Annual Conference of the International Speech Communication Association. Retrieved from http://www.gipsa-lab.inpg.fr/~thomas.hueber/mes_documents/florescu_etal_interspeech_2010.PDF
[11]
Carl Benedikt Frey and Michael A. Osborne. 2017. The future of employment: How susceptible are jobs to computerisation? Technological forecasting and social change 114, Supplement C: 254--280.
[12]
A. Graves, A. r. Mohamed, and G. Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 6645--6649.
[13]
Jefferson Y. Han. 2005. Low-cost Multi-touch Sensing Through Frustrated Total Internal Reflection. In Proceedings of the 18th Annual ACM Symposium on User Interface Software and Technology (UIST '05), 115--118.
[14]
William J. Hardcastle. 1976. Physiology of speech production: an introduction for speech scientists. Academic Press.
[15]
Tatsuya Hirahara, Makoto Otani, Shota Shimizu, Tomoki Toda, Keigo Nakamura, Yoshitaka Nakajima, and Kiyohiro Shikano. 2010. Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech communication 52, 4: 301--313.
[16]
Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, and Sergey I. Rybchenko. 2013. Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech communication 55, 1: 22--32.
[17]
Thomas Hueber, Gérard Chollet, Bruce Denby, and Maureen Stone. 2008. Acquisition of ultrasound, video and acoustic speech data for a silent-speech interface application. Proc. of ISSP: 365--369.
[18]
Jorgensen, C., & Binsted, K. (2005, January). Web browser control using EMG based sub vocal speech recognition. In System Sciences, 2005. HICSS'05. Proceedings of the 38th Annual Hawaii International Conference on (pp. 294c-294c). IEEE.
[19]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv {cs.LG}. Retrieved from http://arxiv.org/abs/1412.6980
[20]
J. C. R. Licklider. 1960. Man-Computer Symbiosis. IRE Transactions on Human Factors in Electronics HFE-1, 1: 4--11.
[21]
S. Mitra and T. Acharya. 2007. Gesture Recognition: A Survey. IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews: a publication of the IEEE Systems, Man, and Cybernetics Society 37, 3: 311--324.
[22]
A. Nijholt, D. Tan, G. Pfurtscheller, C. Brunner, J. d. R. Millán, B. Allison, B. Graimann, F. Popescu, B. Blankertz, and K. R. Müller. 2008. Brain-Computer Interfacing for Intelligent Systems. IEEE intelligent systems 23, 3: 72--79.
[23]
Sharon Oviatt, Phil Cohen, Lizhong Wu, Lisbeth Duncan, Bernhard Suhm, Josh Bers, Thomas Holzman, Terry Winograd, James Landay, Jim Larson, and David Ferro. 2000. Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions. Human--Computer Interaction 15, 4: 263--322.
[24]
Anne Porbadnigk, Marek Wester, Jan-P Calliess, and Tanja Schultz. 2009. EEG-BASED SPEECH RECOGNITION Impact of Temporal Effects.
[25]
Michael Wand and Tanja Schultz. 2011. Session-independent EMG-based Speech Recognition. In Biosignals, 295--300.
[26]
M. Wand, J. Koutník, and J. Schmidhuber. 2016. Lipreading with long short-term memory. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6115--6119.
[27]
Nicole Yankelovich, Gina-Anne Levow, and Matt Marx. 1995. Designing SpeechActs: Issues in Speech User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '95), 369--376.
[28]
iOS - Siri. Apple. Retrieved October 9, 2017 from https://www.apple.com/ios/siri/
[29]
Alexa. Retrieved from https://developer.amazon.com/alexa
[30]
Cortana | Your Intelligent Virtual & Personal Assistant | Microsoft. Retrieved October 9, 2017 from https://www.microsoft.com/en-us/windows/cortana
[31]
Google Home. Google Store. Retrieved October 9, 2017 from https://store.google.com/us/product/google_home?hl=en-US
[32]
Echo. Retrieved from https://www.amazon.com/Amazon-Echo-And-Alexa-Devices/b?ie=UTF8&node=9818047011

Cited By

View all
  • (2024)ИСПОЛЬЗОВАНИЕ ЦИФРОВЫХ ФИЛЬТРОВ ДЛЯ КЛАССИФИКАЦИИ СИГНАЛОВ В РЕАЛЬНОМ ВРЕМЕНИВестник НИЯУ МИФИ10.26583/vestnik.2024.31413:3(169-175)Online publication date: 24-Jun-2024
  • (2024)Improved Speaker Recognition System Using Automatic Lip RecognitionControl Systems and Computers10.15407/csc.2024.01.038(38-49)Online publication date: 2024
  • (2024)StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the SkinProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785158:3(1-21)Online publication date: 22-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '18: Proceedings of the 23rd International Conference on Intelligent User Interfaces
March 2018
698 pages
ISBN:9781450349451
DOI:10.1145/3172944
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. human-machine symbiosis
  2. intelligence augmentation
  3. peripheral nerve interface
  4. silent speech interface

Qualifiers

  • Research-article

Conference

IUI'18
Sponsor:

Acceptance Rates

IUI '18 Paper Acceptance Rate 43 of 299 submissions, 14%;
Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)328
  • Downloads (Last 6 weeks)26
Reflects downloads up to 03 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ИСПОЛЬЗОВАНИЕ ЦИФРОВЫХ ФИЛЬТРОВ ДЛЯ КЛАССИФИКАЦИИ СИГНАЛОВ В РЕАЛЬНОМ ВРЕМЕНИВестник НИЯУ МИФИ10.26583/vestnik.2024.31413:3(169-175)Online publication date: 24-Jun-2024
  • (2024)Improved Speaker Recognition System Using Automatic Lip RecognitionControl Systems and Computers10.15407/csc.2024.01.038(38-49)Online publication date: 2024
  • (2024)StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the SkinProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785158:3(1-21)Online publication date: 22-Aug-2024
  • (2024)Lipwatch: Enabling Silent Speech Recognition on Smartwatches using Acoustic SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596148:2(1-29)Online publication date: 15-May-2024
  • (2024)WhisperMask: a noise suppressive mask-type microphone for whisper speechProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652925(1-14)Online publication date: 4-Apr-2024
  • (2024)Sonic Entanglements with Electromyography: Between Bodies, Signals, and RepresentationsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661572(2691-2707)Online publication date: 1-Jul-2024
  • (2024)MELDER: The Design and Evaluation of a Real-time Silent Speech Recognizer for Mobile DevicesProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642348(1-23)Online publication date: 11-May-2024
  • (2024)ReHEarSSE: Recognizing Hidden-in-the-Ear Silently Spelled ExpressionsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642095(1-16)Online publication date: 11-May-2024
  • (2024)Watch Your Mouth: Silent Speech Recognition with Depth SensingProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642092(1-15)Online publication date: 11-May-2024
  • (2024)Robust Dual-Modal Speech Keyword Spotting for XR HeadsetsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337209230:5(2507-2516)Online publication date: 5-Mar-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media