Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation

Published: 11 May 2015 Publication History
  • Get Citation Alerts
  • Abstract

    Whispering is a natural, unphonated, secondary aspect of speech communications for most people. However, it is the primary mechanism of communications for some speakers who have impaired voice production mechanisms, such as partial laryngectomees, as well as for those prescribed voice rest, which often follows surgery or damage to the larynx. Unlike most people, who choose when to whisper and when not to, these speakers may have little choice but to rely on whispers for much of their daily vocal interaction.
    Even though most speakers will whisper at times, and some speakers can only whisper, the majority of today’s computational speech technology systems assume or require phonated speech. This article considers conversion of whispers into natural-sounding phonated speech as a noninvasive prosthetic aid for people with voice impairments who can only whisper. As a by-product, the technique is also useful for unimpaired speakers who choose to whisper.
    Speech reconstruction systems can be classified into those requiring training and those that do not. Among the latter, a recent parametric reconstruction framework is explored and then enhanced through a refined estimation of plausible pitch from weighted formant differences. The improved reconstruction framework, with proposed formant-derived artificial pitch modulation, is validated through subjective and objective comparison tests alongside state-of-the-art alternatives.

    References

    [1]
    Alison Behrman and Lucian Sulica. 2003. Voice rest after microlaryngoscopy: Current opinion and practice. The Laryngoscope 113, 12 (2003), 2182--2186.
    [2]
    Homayoon Beigi. 2012. Speaker Recognition: Advancements and Challenges. Intech Book Publishers, Vienna, Austria, Chapter 1, 3--31.
    [3]
    Bernard Gold. 1963. Vocoded Speech. Technical Report. DTIC Document.
    [4]
    Hirose Hajime. 1986. Pathophysiology of motor speech disorders (dysarthria). Folia Phoniatrica et Logopaedica (International Journal of Phoniatrics, Speech Therapy and Communication Pathology) 38, 2--4 (June 1986), 61--88.
    [5]
    Yi Hu and Philipos C. Loizou. 2008. Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing 16, 1 (2008), 229--238.
    [6]
    Cheng Huang, Xing Yue Tao, Liang Tao, Jian Zhou, and Hua Bin Wang. 2012. Reconstruction of whisper in Chinese by modified MELP. In Proceedings of the 7th International Conference on Computer Science & Education (ICCSE’’12). IEEE, 349--353.
    [7]
    Elodie Joliveau, John Smith, and Joe Wolfe. 2004. Acoustics: Tuning of vocal tract resonance by sopranos. Nature 427, 6970 (2004), 116--116.
    [8]
    Nobuhiko Kitawaki, Hiromi Nagabuchi, and Kenzo Itoh. 1988. Objective quality evaluation for low-bit-rate speech coding systems. IEEE Journal on Selected Areas in Communications 6, 2 (1988), 242--248.
    [9]
    Jing-jie Li, Ian V. McLoughlin, Li-Rong Dai, and Zhen-hua Ling. 2014. Whisper-to-speech conversion using restricted Boltzmann machine arrays. Electronics Letters 50, 24 (2014), 1781--1782.
    [10]
    Philipos C. Loizou. 2013. Speech Enhancement: Theory and Practice. CRC Press.
    [11]
    Ludovic Malfait, Jens Berger, and Martin Kastner. 2006. P. 563 -- The ITU-T standard for single-ended speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing 14, 6 (2006), 1924--1934.
    [12]
    Ian Vince McLoughlin. 2009. Applied Speech and Audio Processing. Cambridge University Press, Cambridge.
    [13]
    Ian Vince McLoughlin, Jingjie Li, and Yan Song. 2013. Reconstruction of continuous voiced speech from whispers. In Proceedings of Interspeech. 1022--1026.
    [14]
    Robert W. Morris and Mark A. Clements. 2002. Reconstruction of speech from whispers. Medical Engineering & Physics 24, 7 (2002), 515--520.
    [15]
    Manish Narwaria, Weisi Lin, Ian Vince McLoughlin, Sabu Emmanuel, and Liang-Tien Chia. 2012. Nonintrusive quality assessment of noise suppressed speech with mel-filtered energies and support vector regression. IEEE Transactions on Audio, Speech, and Language Processing 20, 4 (2012), 1217--1232.
    [16]
    Ronald Netsell and Billie Daniel. 1979. Dysarthria in adults: Physiologic approach to rehabilitation. Archives of Physical Medicine and Rehabilitation 60, 11 (Nov. 1979), 502--508.
    [17]
    Anderson Pierre Passos. 2011. Transformation of whispering voice to pseudo-real voice for unvoiced telephony and communication aid for voice-handicapped persons. Telecommunication Systems 52, 3 (2011), 1--10.
    [18]
    Martin Rothenberg. 1983. Source-tract acoustic interaction in breathy voice. In Proceedings of the International Conference on Physiology and Biophysics of the Voice. 465--481.
    [19]
    Hamid Reza Sharifzadeh. 2011. Reconstruction of Natural Sounding Speech from Whispers. Ph.D. Dissertation. Nanyang Technological University, Singapore. Retrieved from http://hdl.handle.net/10356/46426.
    [20]
    Hamid Reza Sharifzadeh, Ian Vince McLoughlin, and Farzaneh Ahamdi. 2009a. Voiced speech from whispers for post-laryngectomised patients. IAENG International Journal of Computer Science 36, 4 (2009), 367--377.
    [21]
    Hamid Reza Sharifzadeh, Ian Vince McLoughlin, and Farzaneh Ahmadi. 2009b. Regeneration of speech in voice-loss patients. In Proceedings of the 13th International Conference on Biomedical Engineering. Springer, Singapore.
    [22]
    Hamid Reza Sharifzadeh, Ian Vince McLoughlin, and Farzaneh Ahmadi. 2010a. Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec. IEEE Transactions on Biomedical Engineering 57, 10 (Oct. 2010), 2448--2458.
    [23]
    Hamid Reza Sharifzadeh, Ian Vince McLoughlin, and Farzaneh Ahmadi. 2010b. Spectral enhancement of whispered speech based on probability mass function. In Proceedings of the 6th Advanced International Conference on Telecommunications (AICT’10). IEEE, 207--211.
    [24]
    Hamid Reza Sharifzadeh, Ian Vince McLoughlin, and Farzaneh Ahmadi. 2010c. Speech rehabilitation methods for laryngectomised patients. In Electronic Engineering and Computing Technology, Sio-Iong Ao and Len Gelman (Eds.). Lecture Notes in Electrical Engineering, Vol. 60. Springer Netherlands, 597--607.
    [25]
    Hamid Reza Sharifzadeh, Ian V. McLoughlin, and Martin J. Russell. 2012. A comprehensive vowel space for whispered speech. Journal of Voice 26, 2 (2012), e49--e56.
    [26]
    Johan Sundberg. 1975. Formant technique in a professional female singer. Acta Acustica United with Acustica 32, 2 (1975), 89--96.
    [27]
    Yoni Swerdlin, John Smith, and Joe Wolfe. 2010. The effect of whisper and creak vocal mechanisms on vocal tract resonances. Journal of the Acoustical Society of America 127, 4 (2010), 2590--2598.
    [28]
    Vivien C. Tartter. 1989. Whats in a whisper? Journal of the Acoustical Society of America 86, 5 (1989), 1678--1683.
    [29]
    Ian B. Thomas. 1969. Perceived pitch of whispered vowels. Journal of the Acoustical Society of America 46 (1969), 468470.
    [30]
    Tomoki Toda, Mikihiro Nakagiri, and Kiyohiro Shikano. 2012. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing 20, 9 (2012), 2505--2517.
    [31]
    Tomoki Toda and Kiyohiro Shikano. 2005. NAM-to-speech conversion with gaussian mixture models. In Proceedings of InterSpeech.
    [32]
    Viet-Anh Tran, Gérard Bailly, Hélène Lœvenbruck, and Tomoki Toda. 2010. Improvement to a NAM-captured whisper-to-speech system. Speech Communication 52, 4 (2010), 314--326.

    Cited By

    View all
    • (2021)Flexible parametric implantation of voicing in whispered speech under scarce training data2020 28th European Signal Processing Conference (EUSIPCO)10.23919/Eusipco47968.2020.9287684(416-420)Online publication date: 24-Jan-2021
    • (2021)Teager Energy Cepstral Coefficients for Classification of Normal vs. Whisper Speech2020 28th European Signal Processing Conference (EUSIPCO)10.23919/Eusipco47968.2020.9287634(1-5)Online publication date: 24-Jan-2021
    • (2021) CinC-GAN for Effective F 0 prediction for Whisper-to-Normal Speech Conversion 2020 28th European Signal Processing Conference (EUSIPCO)10.23919/Eusipco47968.2020.9287385(411-415)Online publication date: 24-Jan-2021
    • Show More Cited By

    Index Terms

    1. Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Accessible Computing
      ACM Transactions on Accessible Computing  Volume 6, Issue 4
      Special Issue on Speech and Language Processing for AT (Part 2)
      June 2015
      76 pages
      ISSN:1936-7228
      EISSN:1936-7236
      DOI:10.1145/2775084
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 May 2015
      Accepted: 01 February 2015
      Revised: 01 February 2015
      Received: 01 April 2014
      Published in TACCESS Volume 6, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Whispers
      2. voice reconstruction
      3. whisper-to-speech conversion

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)12
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 11 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Flexible parametric implantation of voicing in whispered speech under scarce training data2020 28th European Signal Processing Conference (EUSIPCO)10.23919/Eusipco47968.2020.9287684(416-420)Online publication date: 24-Jan-2021
      • (2021)Teager Energy Cepstral Coefficients for Classification of Normal vs. Whisper Speech2020 28th European Signal Processing Conference (EUSIPCO)10.23919/Eusipco47968.2020.9287634(1-5)Online publication date: 24-Jan-2021
      • (2021) CinC-GAN for Effective F 0 prediction for Whisper-to-Normal Speech Conversion 2020 28th European Signal Processing Conference (EUSIPCO)10.23919/Eusipco47968.2020.9287385(411-415)Online publication date: 24-Jan-2021
      • (2021)Measurement of Formant Frequency in /hVd/ Words of Distorted Speech in Adult New Zealanders2021 IEEE Region 10 Symposium (TENSYMP)10.1109/TENSYMP52854.2021.9550969(1-6)Online publication date: 23-Aug-2021
      • (2021)Rectification of impaired whisper speech signals2021 International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT)10.1109/RTEICT52294.2021.9573835(543-549)Online publication date: 27-Aug-2021
      • (2019)Effectiveness of Cross-Domain Architectures for Whisper-to-Normal Speech Conversion2019 27th European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO.2019.8902961(1-5)Online publication date: Sep-2019
      • (2019)Whispered Speech to Normal Speech Conversion Using Bidirectional LSTMs with Meta-network2019 IEEE 2nd International Conference on Information Communication and Signal Processing (ICICSP)10.1109/ICICSP48821.2019.8958537(251-255)Online publication date: Sep-2019
      • (2019)Formant-gaps Features for Speaker Verification Using Whispered SpeechICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2019.8682571(6231-6235)Online publication date: May-2019
      • (2019)Whisper to Normal Speech Conversion Using Sequence-to-Sequence Mapping Model With Auditory AttentionIEEE Access10.1109/ACCESS.2019.29407007(130495-130504)Online publication date: 2019
      • (2018)Reconstruction of articulatory movements during neutral speech from those during whispered speechThe Journal of the Acoustical Society of America10.1121/1.5039750143:6(3352-3364)Online publication date: 6-Jun-2018
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media