Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method

Lei Geng; Hongfeng Shan; Zhitao Xiao; Wei Wang; Mei Wei

doi:10.1515/bmt-2021-0112

Published by De Gruyter November 29, 2021

Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method

Lei Geng , Hongfeng Shan , Zhitao Xiao , Wei Wang and Mei Wei

From the journal Biomedical Engineering / Biomedizinische Technik

https://doi.org/10.1515/bmt-2021-0112

Showing a limited preview of this publication:

Abstract

Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal’s harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.

Keywords: EGG; LSTM; multi-modal; residual network; voice pathology detection

Corresponding authors: Zhitao Xiao, School of Life Sciences, Tiangong University, Tianjin, 300387, China; and Tianjin Key Laboratory of Optoelectronic Detection Technology and Systems, Tianjin, 300387, China, E-mail: xiaozhitao@tiangong.edu.cn; and Wei Wang, Department of Otorhinolaryngology Head and Neck Surgery, Tianjin First Central Hospital, Tianjin, 300192, China; Institute of Otolaryngology of Tianjin, Tianjin, China; Key Laboratory of Auditory Speech and Balance Medicine, Tianjin, China; Key Clinical Discipline of Tianjin (Otolaryngology), Tianjin, China; and Otolaryngology Clinical Quality Control Centre, Tianjin, China, Phone: +13802120366, E-mail: Wwei1106@hotmail.com

Funding source: Program for Innovative Research Team in University of Tianjin

Award Identifier / Grant number: TD13-5034

Research funding: This work was supported by the Program for Innovative Research Team in University of Tianjin (No. TD13-5034).
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: Authors state no conflict of interest.
Informed consent: Informed consent is not applicable.
Ethical approval: The conducted research is not related to either human or animals use.

References

1. Fang, SH, Tsao, Y, Hsiao, MJ, Chen, JY, Lai, YH, Lin, FC, et al.. Detection of pathological voice using cepstrum vectors: a deep learning approach. J Voice 2019;33:634–41. https://doi.org/10.1016/j.jvoice.2018.02.003.Search in Google Scholar PubMed

2. Arias-Londono, JD, Godino-Llorente, JI, Saenz-Lechon, N, Osma-Ruiz, V, Castellanos-Dominguez, G. Automatic detection of pathological voices using complexity measures, noise parameters, and Mel-Cepstral coefficients. IEEE Trans Biomed Eng 2011;58:370–9. https://doi.org/10.1109/tbme.2010.2089052.Search in Google Scholar PubMed

3. Aggarwal, G, Monga, R, Gochhayat, SP. A novel hybrid PSO assisted optimization for classification of intellectual disability using speech signal. Wireless Pers Commun 2020;113:1955–71. https://doi.org/10.1007/s11277-020-07301-6.Search in Google Scholar

4. Kalia, A, Sharma, S, Pandey, SK, Jadoun, VK, Das, M. Comparative analysis of speaker recognition system based on voice activity detection technique MFCC and PLP features. In: 1st international conference on intelligent computing techniques for smart energy systems (ICTSES), Jaipur, India; 2020.10.1007/978-981-15-0214-9_82Search in Google Scholar

5. Kadiri, SR, Alku, P. Analysis and detection of pathological voice using glottal source features. IEEE J Sel Top Signal Process 2020;14:367–79. https://doi.org/10.1109/jstsp.2019.2957988.Search in Google Scholar

6. Forero, MLA, Kohler, M, Vellasco, MMBR, Cataldo, E. Analysis and classification of voice pathologies using glottal signal parameters. J Voice 2016;30:549–56. https://doi.org/10.1016/j.jvoice.2015.06.010.Search in Google Scholar PubMed

7. Dahmani, M, Guerti, M. Glotal signal parameters as feature set for neurological voice disorders diagnosis using K-Nearest Neighbors (KNN). In: 2nd international conference on natural language and speech processing (ICNLSP), Algiers, Algeria; 2018.10.1109/ICNLSP.2018.8374384Search in Google Scholar

8. Mayle, A, Mou, Z, Bunescu, R, Mirshekarian, S, Xu, L, Liu, C. Diagnosing dysarthria with long short-term memory networks. In: Interspeech 2019, Graz, Austria; 2019.10.21437/Interspeech.2019-2903Search in Google Scholar

9. Harar, P, Alonso-Hernandez, JB, Mekyska, J, Galaz, Z, Burget, R, Smekal, Z. Voice pathology detection using deep learning: a preliminary study. In: 2017 international conference and workshop on bioinspired intelligence (IWOBI), Funchal, Portugal; 2017.10.1109/IWOBI.2017.7985525Search in Google Scholar

10. Markaki, M, Stylianou, Y. Voice pathology detection and discrimination based on modulation spectral features. IEEE Trans Audio Speech Lang Process 2011;19:1938–48. https://doi.org/10.1109/tasl.2010.2104141.Search in Google Scholar

11. Korzekwa, D, Barra-Chicote, R, Kostek, B, Drugman, T, Lajszczak, M. Interpretable deep learning model for the detection and reconstruction of dysarthric speech. In: International speech communication association (ISCA), Graz, Austria; 2019.10.21437/Interspeech.2019-1206Search in Google Scholar

12. Pironkov, G, Wood, SUN, Dupont, S. Hybrid-task learning for robust automatic speech recognition. Comput Speech Lang 2020;64:101103. https://doi.org/10.1016/j.csl.2020.101103.Search in Google Scholar

13. Herbst, CT. Electroglottography – an update. J Voice 2020;34:503–26. https://doi.org/10.1016/j.jvoice.2018.12.014.Search in Google Scholar PubMed

14. Szkielkowska, A, Krasnodebska, P, Miaskiewicz, B, Skarzynski, H. Electroglottography in the diagnosis of functional dysphonia. Eur Arch Oto-Rhino-Laryngol 2018;275:2523–8.10.1007/s00405-018-5012-6Search in Google Scholar PubMed PubMed Central

15. Tao, F, Busso, C. End-to-end audiovisual speech recognition system with multitask learning. IEEE Trans Multimed 2021;23:1–11. https://doi.org/10.1109/tmm.2020.2975922.Search in Google Scholar

16. Bugdol, MD, Bugdol, MN, Bienkowska, MJ, Lipowicz, A, Wijata, AM, Mitas, AW. Adolescent age estimation using voice features. Biomed Eng-Biomed Tech 2020;65:429–34. https://doi.org/10.1515/bmt-2018-0082.Search in Google Scholar PubMed

17. Kim, H, Jeon, J, Han, YJ, Lee, J, Lee, S, Im, S. Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy. J Clin Med 2020;9:3415. https://doi.org/10.3390/jcm9113415.Search in Google Scholar PubMed PubMed Central

18. Tracy, JM, Ozkanca, Y, Atkins, DC, Ghomi, RH. Investigating voice as a biomarker: deep phenotyping methods for early detection of Parkinson’s disease. J Biomed Inf 2020;104:103362. https://doi.org/10.1016/j.jbi.2019.103362.Search in Google Scholar PubMed

19. Zhang, J, Yang, S, Wang, XY, Tang, M, Yin, H, He, L. Automatic hypernasality grade assessment in cleft palate speech based on the spectral envelope method. Biomed Eng-Biomed Tech 2020;65:73–86. https://doi.org/10.1515/bmt-2018-0181.Search in Google Scholar PubMed

20. Narendra, NP, Schuller, B, Alku, P. The detection of Parkinson’s disease from speech using voice source information. IEEE-ACM Trans Audio Speech Lang 2021;29:1925–36. https://doi.org/10.1109/taslp.2021.3078364.Search in Google Scholar

21. Wu, H, Soraghan, J, Lowit, A, Di, CG. Convolutional neural networks for pathological voice detection. In: 2018 40th annual international conference of the ieee engineering in medicine and biology society (EMBC), Hilton, United States; 2018.10.1109/EMBC.2018.8513222Search in Google Scholar PubMed

22. Mohammed, MA, Abdulkareem, KH, Mostafa, SA, Abd Ghani, MK, Maashi, MS, Garcia-Zapirain, B, et al.. Voice pathology detection and classification using convolutional neural network model. Appl Sci-Basel 2020;10:3723. https://doi.org/10.3390/app10113723.Search in Google Scholar

23. Guedes, V, Teixeira, F, Oliveira, A, Fernandes, J, Silva, L, Junior, A, et al.. Transfer learning with AudioSet to voice pathologies identification in continuous speech. In: International conference on enterprise information systems (CENTERIS), Sousse, Tunisia; 2019.10.1016/j.procs.2019.12.233Search in Google Scholar

24. Muhammad, G, Alhamid, MF, Hossain, MS, Almogren, AS, Vasilakos, AV. Enhanced living by assessing voice pathology using a co-occurrence matrix. Sensors 2017;17:267. https://doi.org/10.3390/s17020267.Search in Google Scholar PubMed PubMed Central

25. Hossain, MS, Muhammad, G, Alamri, A. Smart healthcare monitoring: a voice pathology detection paradigm for smart cities. Multimed Syst 2019;25:565–75. https://doi.org/10.1007/s00530-017-0561-x.Search in Google Scholar

26. Soares, ADP, Parreira, WD, Souza, EG, do Nascimento, CD, de Almeida, SJM. Voice activity detection using generalized exponential Kernels for time and frequency domains. IEEE Trans Circuits Syst I-Regul Pap 2019;66:2116–23. https://doi.org/10.1109/tcsi.2019.2895771.Search in Google Scholar

27. Narendra, NP, Alku, P. Glottal source information for pathological voice detection. IEEE Access 2020;8:67745–55. https://doi.org/10.1109/access.2020.2986171.Search in Google Scholar

28. Saarbruecken voice database. Available from: http://www.stimmdatenbank.coli.uni-saarland.de/help_en.php4.Search in Google Scholar

29. Chung, J, Gulcehre, C, Cho, KH, Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. Arxiv E-prints 2014;1412:3555.Search in Google Scholar

30. Korkmaz, SA. Classification of histopathological gastric images using a new method. Neural Comput Appl 2021;33:12007–22. https://doi.org/10.1007/s00521-021-05887-x.s.Search in Google Scholar

31. Al-nasheri, A, Ghulam, M, Alsulaiman, M, Ali, Z. Investigation of voice pathology detection and classification on different frequency regions using correlation functions. J Voice 2017;31:3–15. https://doi.org/10.1016/j.jvoice.2016.01.014.Search in Google Scholar PubMed

32. Al-nasheri, A, Muhammad, G, Alsulaiman, M, Ali, Z, Mesallam, TA, Farahat, M, et al.. An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. J Voice 2017;31:113.e9. https://doi.org/10.1016/j.jvoice.2016.03.019.Search in Google Scholar PubMed

33. Alhussein, M, Muhammad, G. Voice pathology detection using deep learning on mobile healthcare framework. IEEE Access 2018;6:41034–41. https://doi.org/10.1109/access.2018.2856238.Search in Google Scholar

34. Alhussein, M, Muhammad, G. Automatic voice pathology monitoring using parallel deep models for smart healthcare. IEEE Access 2019;1:46474–9. https://doi.org/10.1109/access.2019.2905597.Search in Google Scholar

35. Fan, ZQ, Wu, YB, Zhou, CW, Zhang, XJ, Tao, Z. Class-imbalanced voice pathology detection and classification using fuzzy cluster oversampling method. Appl Sci-Basel 2021;11:3450. https://doi.org/10.3390/app11083450.Search in Google Scholar

36. Dahmani, M, Guerti, M. Recurrence quantification analysis of glottal signal as non linear tool for pathologi-cal voice assessment and classification. Int Arab J Inf Technol 2020;17:857–66. https://doi.org/10.34028/iajit/17/6/4.Search in Google Scholar

37. Lee, J, Choi, HJ. Deep learning approaches for pathological voice detection using heterogeneous parameters. IEICE Trans Info Syst 2020;E103D:1920–3. https://doi.org/10.1587/transinf.2020edl8031.Search in Google Scholar

38. Castellana, A, Carullo, A, Corbellini, S, Astolfi, A. Discriminating pathological voice from healthy voice using cepstral peak prominence smoothed distribution in sustained vowel. IEEE Trans Instrum Meas 2018;67:646–54. https://doi.org/10.1109/tim.2017.2781958.Search in Google Scholar

39. Deb, S, Dandapat, S, Krajewski, J. Analysis and classification of cold speech using variational mode decomposition. IEEE Trans Affect Comput 2020;11:296–307. https://doi.org/10.1109/taffc.2017.2761750.Search in Google Scholar

Received: 2021-04-16

Accepted: 2021-11-12

Published Online: 2021-11-29

Published in Print: 2021-12-20

Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method

Abstract

References

Journal and Issue

Articles in the same Issue