Abstract
The acoustic models used by automatic speech recognisers are usually trained with speech collected from young to middle-aged adults. As the characteristics of speech change with age, such acoustic models tend to perform poorly on children’s and elderly people’s speech. In this study, we investigate whether the automatic age group classification of speakers, together with age group –specific acoustic models, could improve automatic speech recognition performance. We train an age group classifier with an accuracy of about 95% and show that using the results of the classifier to select age group –specific acoustic models for children and the elderly leads to considerable gains in automatic speech recognition performance, as compared with using acoustic models trained with young to middle-aged adults’ speech for recognising their speech, as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lee, S., Potamianos, A., Narayanan, S.: Acoustics of Children’s Speech: Developmental Changes of Temporal and Spectral Parameters. J. Acoust. Soc. Am. 10, 1455–1468 (1999)
Huber, J.E., Stathopoulos, E.T., Curione, G.M., Ash, T.A., Johnson, K.: Formants of Children, Women and Men: The Effects of Vocal Intensity Variation. J. Acoust. Soc. Am. 106(3), 1532–1542 (1999)
Xue, S., Hao, G.: Changes in the Human Vocal Tract Due to Aging and the Acoustic Correlates of Speech Production: A Pilot Study. Journal of Speech, Language, and Hearing Research 46, 689–701 (2003)
Pellegrini, T., Hämäläinen, A., Boula de Mareüil, P., Tjalve, M., Trancoso, I., Candeias, S., Sales Dias, M., Braga, D.: A Corpus-Based Study of Elderly and Young Speakers of European Portuguese: Acoustic Correlates and Their Impact on Speech Recognition Performance. In: Interspeech, Lyon (2013)
Narayanan, S., Potamianos, A.: Creating Conversational Interfaces for Children. IEEE Speech Audio Process. 10(2), 65–78 (2002)
Strommen, E.F., Frome, F.S.: Talking Back to Big Bird: Preschool Users and a Simple Speech Recognition System. Educ. Technol. Res. Dev. 41(1), 5–16 (1993)
Anderson, S., Liberman, N., Bernstein, E., Foster, S., Cate, E., Levin, B., Hudson, R.: Recognition of Elderly Speech and Voice-Driven Document Retrieval. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ, pp. 145–148 (1999)
Takahashi, S., Morimoto, T., Maeda, S., Tsuruta, N.: Dialogue Experiment for Elderly People in Home Health Care System. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 418–423. Springer, Heidelberg (2003)
Teixeira, V., Pires, C., Pinto, F., Freitas, J., Dias, M.S., Mendes Rodrigues, E.: Towards Elderly Social Integration using a Multimodal Human-computer Interface. In: Proc. of the 2nd International Living Usability Lab Workshop on AAL Latest Solutions, Trends and Applications, AAL 2012, Milan (2012)
Wilpon, J.G., Jacobsen, C.N.: A Study of Speech Recognition for Children and Elderly. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Atlanta, GA, pp. 349–352 (1996)
Potamianos, A., Narayanan, S.: Robust Recognition of Children’s Speech. IEEE Speech Audio Process 11(6), 603–615 (2003)
Hämäläinen, A., Miguel Pinto, F., Rodrigues, S., Júdice, A., Morgado Silva, S., Calado, A., Sales Dias, M.: A Multimodal Educational Game for 3-10-year-old Children: Collecting and Automatically Recognising European Portuguese Children’s Speech. In: Workshop on Speech and Language Technology in Education, Grenoble (2013)
Pellegrini, T., Trancoso, I., Hämäläinen, A., Calado, A., Sales Dias, M., Braga, D.: Impact of Age in ASR for the Elderly: Preliminary Experiments in European Portuguese. In: IberSPEECH, Madrid (2012)
Vipperla, R., Renals, S., Frankel, J.: Longitudinal Study of ASR Performance on Ageing Voices. In: Interspeech, Brisbane, pp. 2550–2553 (2008)
Batliner, A., Blomberg, M., D’Arcy, S., Elenius, D., Giuliani, D., Gerosa, M., Hacker, C., Russell, M., Steidl, S., Wong, M.: The PF_STAR Children’s Speech Corpus. In: Interspeech, Lisbon (2005)
Hämäläinen, A., Rodrigues, S., Júdice, A., Silva, S.M., Calado, A., Pinto, F.M., Dias, M.S.: The CNG Corpus of European Portuguese Children’s Speech. In: Habernal, I. (ed.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 544–551. Springer, Heidelberg (2013)
Cucchiarini, C., Van Hamme, H., van Herwijnen, O., Smits, F.: JASMIN-CGN: Extension of the Spoken Dutch Corpus with Speech of Elderly People, Children and Non-natives in the Human-Machine Interaction Modality. In: Language Resources and Evaluation, Genoa (2006)
Hämäläinen, A., Pinto, F., Sales Dias, M., Júdice, A., Freitas, J., Pires, C., Teixeira, V., Calado, A., Braga, D.: The First European Portuguese Elderly Speech Corpus. In: IberSPEECH, Madrid (2012)
Hämäläinen, A., Avelar, J., Rodrigues, S., Sales Dias, M., Kolesiński, A., Fegyó, T., Nemeth, G., Csobánka, P., Lan Hing Ting, K., Hewson, D.: The EASR Corpora of European Portuguese, French, Hungarian and Polish Elderly Speech. In: Langauge Resources and Evaluation, Reykjavik (2014)
Minematsu, N., Sekiguchi, M., Hirose, K.: Automatic Estimation of One’s Age with His/Her Speech Basedupon Acoustic Modeling Techniques of Speakers. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, pp. 137–140 (2002)
Dobry, G., Hecht, R., Avigal, M., Zigel, Y.: Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal. IEEE Transactions on Audio, Speech & Language Processing 19(7), 1975–1985 (2011)
Bahari, M., McLaren, M., Van Hamme, H., Van Leeuwen, D.: Age Estimation from Telephone Speech Using i-Vectors. In: Interspeech, Portland, OR (2012)
Neto, J., Martins, C., Meinedo, H., Almeida, L.: The Design of a Large Vocabulary Speech Corpus for Portuguese. In: European Conference on Speech Technology, Rhodes (1997)
Eyben, F., Wollmer, M., Schuller, B.: openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In: ACM International Conference on Multimedia, Florence, pp. 1459–1462 (2010)
Meinedo, H., Trancoso, I.: Age and Gender Detection in the I-DASH Project. ACM Trans. Speech Lang. Process. 7(4), 13 (2011)
Schuller, B., Steidl, S., Batliner, A., Noeth, E., Vinciarelli, A., Burkhardt, F., van Son, R., Weninger, F., Eyben, F., Bocklet, T., Mohammadi, G., Weiss, B.: The Interspeech 2012 Speaker Trait Challenge. In: Interspeech 2012, Portland, OR (2012)
Weninger, F., Eyben, F., Schuller, B.W., Mortillaro, M., Scherer, K.R.: On the Acoustics of Emotion in Audio: What Speech, Music and Sound Have in Common. Frontiers in Psychology, Emotion Science, Special Issue on Expression of Emotion in Music and Vocal Communication 4(Article ID 292), 1–12 (2013)
Hall, M.: Correlation-Based Feature Subset Selection for Machine Learning. Hamilton, New Zealand (1998)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11 (2009)
Platt, J.: Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning (1998)
Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computation 13(3), 637–649 (2001)
Linville, S.E.: Vocal Aging. Singular, San Diego (2001)
Microsoft Speech Platform Runtime (Version 11), http://www.microsoft.com/en-us/download/details.aspx?id=27225 (accessed March 25, 2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hämäläinen, A., Meinedo, H., Tjalve, M., Pellegrini, T., Trancoso, I., Dias, M.S. (2014). Improving Speech Recognition through Automatic Selection of Age Group – Specific Acoustic Models. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-09761-9_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09760-2
Online ISBN: 978-3-319-09761-9
eBook Packages: Computer ScienceComputer Science (R0)