Objective Comparison of Four GMM-Based Methods for PMA-to-Speech Conversion

Erro, Daniel; Hernaez, Inma; Serrano, Luis; Saratxaga, Ibon; Navas, Eva

doi:10.1007/978-3-319-49169-1_3

Daniel Erro^21,22,
Inma Hernaez²¹,
Luis Serrano²¹,
Ibon Saratxaga²¹ &
…
Eva Navas²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10077))

Included in the following conference series:

International Conference on Advances in Speech and Language Technologies for Iberian Languages

736 Accesses

Abstract

In silent speech interfaces a mapping is established between biosignals captured by sensors and acoustic characteristics of speech. Recent works have shown the feasibility of a silent interface based on permanent magnet-articulography (PMA). This paper studies the performance of four different mapping methods based on Gaussian mixture models (GMMs), typical from the voice conversion field, when applied to PMA-to-spectrum conversion. The results show the superiority of methods based on maximum likelihood parameter generation (MLPG), especially when the parameters of the mapping function are trained by minimizing the generation error. Informal listening tests reveal that the resulting speech is moderately intelligible for the database under study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analysis of Unintelligible Speech for MLLR and MAP-Based Speaker Adaptation

Voice Restoration After Laryngectomy Based on Magnetic Sensing of Articulator Movement and Statistical Articulation-to-Speech Conversion

Telephony speech system performance based on the codec effect

Article 31 May 2023

References

Qi, Y., Weinberg, B., Bi, N.: Enhancement of female esophageal and tracheoesophageal speech. J. Acoust. Soc. Am. 98, 2461–2465 (1995)
Article Google Scholar
Matsui, K., Hara, N.: Enhancement of esophageal speech using formant synthesis. In: Proceedings of the ICASSP, pp. 81–84 (1999)
Google Scholar
del Pozo, A., Young, S.J.: Continuous tracheoesophageal speech repair. In: Proceedings of the EUSIPCO, pp. 1–5 (2006)
Google Scholar
Türkmen, H.I., Karsligil, M.E.: Reconstruction of dysphonic speech by MELP. In: Ruiz-Shulcloper, J., Kropatsch, W.G. (eds.) CIARP 2008. LNCS, vol. 5197, pp. 767–774. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85920-8_93
Chapter Google Scholar
Mantilla-Caeiros, A., Nakano-Miyatake, M., Perez-Meana, H.: A pattern recognition based esophageal speech enhancement system. J. Appl. Res. Tech. 8(1), 56–71 (2010)
Google Scholar
Doi, H., Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Esophageal speech enhancement based on statistical voice conversion with Gaussian mixture models. IEICE Trans. Inf. Syst. E93–D(9), 2472–2482 (2010)
Article Google Scholar
Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1), 134–146 (2012)
Article Google Scholar
Doi, H., Toda, T., Nakamura, K., Saruwatari, H., Shikano, K.: Alaryngeal speech enhancement based on one-to-many eigenvoice conversion. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 172–183 (2014)
Article Google Scholar
Kello, C.T., Plaut, D.C.: A neural network model of the articulatoryacoustic forward mapping trained on recordings of articulatory parameters. J. Acoust. Soc. Am. 116(4), 2354–2364 (2004)
Article Google Scholar
Toda, T., Black, A.W., Tokuda, K.: Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Commun. 50(3), 215–227 (2008)
Article Google Scholar
Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J.M., Brumberg, J.S.: Silent speech interfaces. Speech Commun. 52(4), 270–287 (2010)
Article Google Scholar
Hofe, R., Ell, S.R., Fagan, M.J., Gilbert, J.M., Green, P.D., Moore, R.K., Rybchenko, S.I.: Speech synthesis parameter generation for the assistive silent speech interface MVOCA. In: Proceedings of the INTERSPEECH, pp. 3009–3012 (2011)
Google Scholar
Cheah, L.A., Bai, J., Gonzalez, J.A., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: A user-centric design of permanent magnetic articulography based assistive speech technology. In: Proceedings of the BioSignals, pp. 109–116 (2015)
Google Scholar
Gonzalez, J.A., Cheah, L.A., Gilbert, J.M., Bai, J., Ell, S.R., Green, P.D., Moore, R.K.: A silent speech system based on permanent magnet articulography and direct synthesis. Comput. Speech Lang. 39, 67–87 (2016)
Article Google Scholar
Kain, A., Macon, M.W.: Spectral voice conversion for text-to-speech synthesis. In: Proceedings of the ICASSP, pp. 285–288 (1998)
Google Scholar
Stylianou, Y., Cappé, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech Audio Process. 6(2), 131–142 (1998)
Article Google Scholar
Ye, H., Young, S.J.: Quality-enhanced voice morphing using maximum likelihood transformations. IEEE Trans. Audio Speech Lang. Process. 14(4), 1301–1312 (2006)
Article Google Scholar
Toda, T., Black, A., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)
Article Google Scholar
Erro, D., Alonso, A., Serrano, L., Tavarez, D., Odriozola, I., Sarasola, X., Del-Blanco, E., Sanchez, J., Saratxaga, I., Navas, E., Hernaez, I.: ML parameter generation with a reformulated MGE training criterion participation in the voice conversion challenge 2016. In: Proceedings of the INTERSPEECH (2016)
Google Scholar
Kominek, J., Black, A.W.: The CMU arctic speech databases. In: Proceedings of the 5th ISCA Speech Synthesis Workshop, pp. 223–224 (2004)
Google Scholar
Erro, D., Sainz, I., Navas, E., Hernáez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Sig. Process. 8(2), 184–194 (2014)
Article Google Scholar
Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space probability distribution HMM. IEICE Trans. Inf. Syst. E85–D(3), 455–464 (2002)
Google Scholar

Download references

Acknowledgements

This work has been partially funded by the Spanish Ministry of Economy and Competitiveness (RESTORE project, TEC2015-67163-C2-1-R MINECO/FEDER, UE) and the Basque Government (ELKAROLA, KK-2015/00098). We would like to thank the Univeristy of Hull and the University of Sheffield, especially Dr. Jose A. Gonzalez, for the permission to use the PMA data in this work.

Author information

Authors and Affiliations

Aholab, University of the Basque Country (UPV/EHU), Bilbao, Spain
Daniel Erro, Inma Hernaez, Luis Serrano, Ibon Saratxaga & Eva Navas
IKERBASQUE, Basque Foundation for Science, Bilbao, Spain
Daniel Erro

Authors

Daniel Erro
View author publications
You can also search for this author in PubMed Google Scholar
Inma Hernaez
View author publications
You can also search for this author in PubMed Google Scholar
Luis Serrano
View author publications
You can also search for this author in PubMed Google Scholar
Ibon Saratxaga
View author publications
You can also search for this author in PubMed Google Scholar
Eva Navas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Erro .

Editor information

Editors and Affiliations

INESC-ID/IST, Universidade de Lisboa, Lisbon, Portugal
Alberto Abad
I3A/University of Zaragoza, Zaragoza, Spain
Alfonso Ortega
DETI/IEETA, University of Aveiro, Aveiro, Portugal
António Teixeira
AtlantTIC Research Center, Universidad de Vigo, Vigo, Spain
Carmen García Mateo
Universitat Politècnica de València, Valencia, Spain
Carlos D. Martínez Hinarejos
University of Coimbra, Coimbra, Portugal
Fernando Perdigão
INESC-ID/ISCTE-IUL, Lisbon, Portugal
Fernando Batista
INESC-ID/IST, Universidade de Lisboa, Lisbon, Portugal
Nuno Mamede

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Erro, D., Hernaez, I., Serrano, L., Saratxaga, I., Navas, E. (2016). Objective Comparison of Four GMM-Based Methods for PMA-to-Speech Conversion. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-49169-1_3
Published: 04 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49168-4
Online ISBN: 978-3-319-49169-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics