Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3343031.3350596acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
demonstration

Ultrasound-Based Silent Speech Interface using Sequential Convolutional Auto-encoder

Published: 15 October 2019 Publication History

Abstract

"Silent Speech Interfaces'' (SSI) refers to a system which uses non-audible signals recorded during speech production to perform speech recognition and synthesis tasks. Different approaches have been proposed for the SSI systems. In this paper, we focus on an ultrasound-based SSI. The performance of ultrasound-based SSI system heavily relies on the feature extraction approach. However, most of the previous attempts are often limited to individual frame analysis, and the context information of the image sequence cannot be taken into account. Inspired by the recent success of the recurrent neural network and convolutional auto-encoder, we explore a novel sequential feature extraction approach for SSI system. The architecture can extract spatial and temporal feature from the image sequence, which can be further deployed for the speech recognition and synthetic tasks. By quantitative comparison between different unsupervised feature extraction approaches, the new approach outperforms other methods on the 2010 SSI challenge.

References

[1]
Jun Cai, Bruce Denby, Pierre Roussel, Gérard Dreyfus, and Lise Crevier-Buchman. 2011. Recognition and real time performances of a lightweight ultrasound based silent speech interface employing a language model. In Twelfth Annual Conference of the International Speech Communication Association .
[2]
Bruce Denby, Tanja Schultz, Kiyoshi Honda, Thomas Hueber, Jim M Gilbert, and Jonathan S Brumberg. 2010. Silent speech interfaces. Speech Communication, Vol. 52, 4 (2010), 270--287.
[3]
Thomas Hueber, Elie-Laurent Benaroya, Gérard Chollet, Bruce Denby, Gérard Dreyfus, and Maureen Stone. 2010. Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Communication, Vol. 52, 4 (2010), 288--300.
[4]
Yan Ji, Licheng Liu, Hongcui Wang, Zhilei Liu, Zhibin Niu, and Bruce Denby. 2018. Updating the Silent Speech Challenge benchmark with deep learning. Speech Communication, Vol. 98 (2018), 42--50.
[5]
Bo Li, Kele Xu, Dawei Feng, Haibo Mi, Huaimin Wang, and Jian Zhu. 2019. Denoising convolutional autoencoder based B-mode ultrasound tongue image feature extraction. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE.
[6]
Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen AWM van der Laak, Bram Van Ginneken, and Clara I Sánchez. 2017. A survey on deep learning in medical image analysis. Medical image analysis, Vol. 42 (2017), 60--88.
[7]
Maureen Stone. 2005. A guide to analysing tongue motion from ultrasound images. Clinical linguistics & phonetics, Vol. 19, 6--7 (2005), 455--501.

Cited By

View all
  • (2025)Spatio-temporal masked autoencoder-based phonetic segments classification from ultrasoundSpeech Communication10.1016/j.specom.2025.103186(103186)Online publication date: Jan-2025
  • (2023)HPSpeech: Silent Speech Interface for Commodity HeadphonesProceedings of the 2023 ACM International Symposium on Wearable Computers10.1145/3594738.3611365(60-65)Online publication date: 8-Oct-2023
  • (2023)DUMask: A Discrete and Unobtrusive Mask-Based Interface for Facial GesturesProceedings of the Augmented Humans International Conference 202310.1145/3582700.3582726(255-266)Online publication date: 12-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '19: Proceedings of the 27th ACM International Conference on Multimedia
October 2019
2794 pages
ISBN:9781450368896
DOI:10.1145/3343031
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Check for updates

Author Tags

  1. sequential convolutional auto-encoder
  2. silent speech interface

Qualifiers

  • Demonstration

Funding Sources

  • National Key Research and Development Program

Conference

MM '19
Sponsor:

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Spatio-temporal masked autoencoder-based phonetic segments classification from ultrasoundSpeech Communication10.1016/j.specom.2025.103186(103186)Online publication date: Jan-2025
  • (2023)HPSpeech: Silent Speech Interface for Commodity HeadphonesProceedings of the 2023 ACM International Symposium on Wearable Computers10.1145/3594738.3611365(60-65)Online publication date: 8-Oct-2023
  • (2023)DUMask: A Discrete and Unobtrusive Mask-Based Interface for Facial GesturesProceedings of the Augmented Humans International Conference 202310.1145/3582700.3582726(255-266)Online publication date: 12-Mar-2023
  • (2023)EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic SensingProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580801(1-18)Online publication date: 19-Apr-2023
  • (2023)Raw Ultrasound-Based Phonetic Segments Classification Via Mask ModelingICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10095156(1-5)Online publication date: 4-Jun-2023
  • (2022)Exploring Silent Speech Interfaces Based on Frequency-Modulated Continuous-Wave RadarSensors10.3390/s2202064922:2(649)Online publication date: 14-Jan-2022
  • (2020)Silent Speech Interfaces for Speech Restoration: A ReviewIEEE Access10.1109/ACCESS.2020.30265798(177995-178021)Online publication date: 2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media