Abstract
Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal’s harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.
Funding source: Program for Innovative Research Team in University of Tianjin
Award Identifier / Grant number: TD13-5034
-
Research funding: This work was supported by the Program for Innovative Research Team in University of Tianjin (No. TD13-5034).
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: Authors state no conflict of interest.
-
Informed consent: Informed consent is not applicable.
-
Ethical approval: The conducted research is not related to either human or animals use.
References
1. Fang, SH, Tsao, Y, Hsiao, MJ, Chen, JY, Lai, YH, Lin, FC, et al.. Detection of pathological voice using cepstrum vectors: a deep learning approach. J Voice 2019;33:634–41. https://doi.org/10.1016/j.jvoice.2018.02.003.Search in Google Scholar PubMed
2. Arias-Londono, JD, Godino-Llorente, JI, Saenz-Lechon, N, Osma-Ruiz, V, Castellanos-Dominguez, G. Automatic detection of pathological voices using complexity measures, noise parameters, and Mel-Cepstral coefficients. IEEE Trans Biomed Eng 2011;58:370–9. https://doi.org/10.1109/tbme.2010.2089052.Search in Google Scholar PubMed
3. Aggarwal, G, Monga, R, Gochhayat, SP. A novel hybrid PSO assisted optimization for classification of intellectual disability using speech signal. Wireless Pers Commun 2020;113:1955–71. https://doi.org/10.1007/s11277-020-07301-6.Search in Google Scholar
4. Kalia, A, Sharma, S, Pandey, SK, Jadoun, VK, Das, M. Comparative analysis of speaker recognition system based on voice activity detection technique MFCC and PLP features. In: 1st international conference on intelligent computing techniques for smart energy systems (ICTSES), Jaipur, India; 2020.10.1007/978-981-15-0214-9_82Search in Google Scholar
5. Kadiri, SR, Alku, P. Analysis and detection of pathological voice using glottal source features. IEEE J Sel Top Signal Process 2020;14:367–79. https://doi.org/10.1109/jstsp.2019.2957988.Search in Google Scholar
6. Forero, MLA, Kohler, M, Vellasco, MMBR, Cataldo, E. Analysis and classification of voice pathologies using glottal signal parameters. J Voice 2016;30:549–56. https://doi.org/10.1016/j.jvoice.2015.06.010.Search in Google Scholar PubMed
7. Dahmani, M, Guerti, M. Glotal signal parameters as feature set for neurological voice disorders diagnosis using K-Nearest Neighbors (KNN). In: 2nd international conference on natural language and speech processing (ICNLSP), Algiers, Algeria; 2018.10.1109/ICNLSP.2018.8374384Search in Google Scholar
8. Mayle, A, Mou, Z, Bunescu, R, Mirshekarian, S, Xu, L, Liu, C. Diagnosing dysarthria with long short-term memory networks. In: Interspeech 2019, Graz, Austria; 2019.10.21437/Interspeech.2019-2903Search in Google Scholar
9. Harar, P, Alonso-Hernandez, JB, Mekyska, J, Galaz, Z, Burget, R, Smekal, Z. Voice pathology detection using deep learning: a preliminary study. In: 2017 international conference and workshop on bioinspired intelligence (IWOBI), Funchal, Portugal; 2017.10.1109/IWOBI.2017.7985525Search in Google Scholar
10. Markaki, M, Stylianou, Y. Voice pathology detection and discrimination based on modulation spectral features. IEEE Trans Audio Speech Lang Process 2011;19:1938–48. https://doi.org/10.1109/tasl.2010.2104141.Search in Google Scholar
11. Korzekwa, D, Barra-Chicote, R, Kostek, B, Drugman, T, Lajszczak, M. Interpretable deep learning model for the detection and reconstruction of dysarthric speech. In: International speech communication association (ISCA), Graz, Austria; 2019.10.21437/Interspeech.2019-1206Search in Google Scholar
12. Pironkov, G, Wood, SUN, Dupont, S. Hybrid-task learning for robust automatic speech recognition. Comput Speech Lang 2020;64:101103. https://doi.org/10.1016/j.csl.2020.101103.Search in Google Scholar
13. Herbst, CT. Electroglottography – an update. J Voice 2020;34:503–26. https://doi.org/10.1016/j.jvoice.2018.12.014.Search in Google Scholar PubMed
14. Szkielkowska, A, Krasnodebska, P, Miaskiewicz, B, Skarzynski, H. Electroglottography in the diagnosis of functional dysphonia. Eur Arch Oto-Rhino-Laryngol 2018;275:2523–8.10.1007/s00405-018-5012-6Search in Google Scholar PubMed PubMed Central
15. Tao, F, Busso, C. End-to-end audiovisual speech recognition system with multitask learning. IEEE Trans Multimed 2021;23:1–11. https://doi.org/10.1109/tmm.2020.2975922.Search in Google Scholar
16. Bugdol, MD, Bugdol, MN, Bienkowska, MJ, Lipowicz, A, Wijata, AM, Mitas, AW. Adolescent age estimation using voice features. Biomed Eng-Biomed Tech 2020;65:429–34. https://doi.org/10.1515/bmt-2018-0082.Search in Google Scholar PubMed
17. Kim, H, Jeon, J, Han, YJ, Lee, J, Lee, S, Im, S. Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy. J Clin Med 2020;9:3415. https://doi.org/10.3390/jcm9113415.Search in Google Scholar PubMed PubMed Central
18. Tracy, JM, Ozkanca, Y, Atkins, DC, Ghomi, RH. Investigating voice as a biomarker: deep phenotyping methods for early detection of Parkinson’s disease. J Biomed Inf 2020;104:103362. https://doi.org/10.1016/j.jbi.2019.103362.Search in Google Scholar PubMed
19. Zhang, J, Yang, S, Wang, XY, Tang, M, Yin, H, He, L. Automatic hypernasality grade assessment in cleft palate speech based on the spectral envelope method. Biomed Eng-Biomed Tech 2020;65:73–86. https://doi.org/10.1515/bmt-2018-0181.Search in Google Scholar PubMed
20. Narendra, NP, Schuller, B, Alku, P. The detection of Parkinson’s disease from speech using voice source information. IEEE-ACM Trans Audio Speech Lang 2021;29:1925–36. https://doi.org/10.1109/taslp.2021.3078364.Search in Google Scholar
21. Wu, H, Soraghan, J, Lowit, A, Di, CG. Convolutional neural networks for pathological voice detection. In: 2018 40th annual international conference of the ieee engineering in medicine and biology society (EMBC), Hilton, United States; 2018.10.1109/EMBC.2018.8513222Search in Google Scholar PubMed
22. Mohammed, MA, Abdulkareem, KH, Mostafa, SA, Abd Ghani, MK, Maashi, MS, Garcia-Zapirain, B, et al.. Voice pathology detection and classification using convolutional neural network model. Appl Sci-Basel 2020;10:3723. https://doi.org/10.3390/app10113723.Search in Google Scholar
23. Guedes, V, Teixeira, F, Oliveira, A, Fernandes, J, Silva, L, Junior, A, et al.. Transfer learning with AudioSet to voice pathologies identification in continuous speech. In: International conference on enterprise information systems (CENTERIS), Sousse, Tunisia; 2019.10.1016/j.procs.2019.12.233Search in Google Scholar
24. Muhammad, G, Alhamid, MF, Hossain, MS, Almogren, AS, Vasilakos, AV. Enhanced living by assessing voice pathology using a co-occurrence matrix. Sensors 2017;17:267. https://doi.org/10.3390/s17020267.Search in Google Scholar PubMed PubMed Central
25. Hossain, MS, Muhammad, G, Alamri, A. Smart healthcare monitoring: a voice pathology detection paradigm for smart cities. Multimed Syst 2019;25:565–75. https://doi.org/10.1007/s00530-017-0561-x.Search in Google Scholar
26. Soares, ADP, Parreira, WD, Souza, EG, do Nascimento, CD, de Almeida, SJM. Voice activity detection using generalized exponential Kernels for time and frequency domains. IEEE Trans Circuits Syst I-Regul Pap 2019;66:2116–23. https://doi.org/10.1109/tcsi.2019.2895771.Search in Google Scholar
27. Narendra, NP, Alku, P. Glottal source information for pathological voice detection. IEEE Access 2020;8:67745–55. https://doi.org/10.1109/access.2020.2986171.Search in Google Scholar
28. Saarbruecken voice database. Available from: http://www.stimmdatenbank.coli.uni-saarland.de/help_en.php4.Search in Google Scholar
29. Chung, J, Gulcehre, C, Cho, KH, Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. Arxiv E-prints 2014;1412:3555.Search in Google Scholar
30. Korkmaz, SA. Classification of histopathological gastric images using a new method. Neural Comput Appl 2021;33:12007–22. https://doi.org/10.1007/s00521-021-05887-x.s.Search in Google Scholar
31. Al-nasheri, A, Ghulam, M, Alsulaiman, M, Ali, Z. Investigation of voice pathology detection and classification on different frequency regions using correlation functions. J Voice 2017;31:3–15. https://doi.org/10.1016/j.jvoice.2016.01.014.Search in Google Scholar PubMed
32. Al-nasheri, A, Muhammad, G, Alsulaiman, M, Ali, Z, Mesallam, TA, Farahat, M, et al.. An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. J Voice 2017;31:113.e9. https://doi.org/10.1016/j.jvoice.2016.03.019.Search in Google Scholar PubMed
33. Alhussein, M, Muhammad, G. Voice pathology detection using deep learning on mobile healthcare framework. IEEE Access 2018;6:41034–41. https://doi.org/10.1109/access.2018.2856238.Search in Google Scholar
34. Alhussein, M, Muhammad, G. Automatic voice pathology monitoring using parallel deep models for smart healthcare. IEEE Access 2019;1:46474–9. https://doi.org/10.1109/access.2019.2905597.Search in Google Scholar
35. Fan, ZQ, Wu, YB, Zhou, CW, Zhang, XJ, Tao, Z. Class-imbalanced voice pathology detection and classification using fuzzy cluster oversampling method. Appl Sci-Basel 2021;11:3450. https://doi.org/10.3390/app11083450.Search in Google Scholar
36. Dahmani, M, Guerti, M. Recurrence quantification analysis of glottal signal as non linear tool for pathologi-cal voice assessment and classification. Int Arab J Inf Technol 2020;17:857–66. https://doi.org/10.34028/iajit/17/6/4.Search in Google Scholar
37. Lee, J, Choi, HJ. Deep learning approaches for pathological voice detection using heterogeneous parameters. IEICE Trans Info Syst 2020;E103D:1920–3. https://doi.org/10.1587/transinf.2020edl8031.Search in Google Scholar
38. Castellana, A, Carullo, A, Corbellini, S, Astolfi, A. Discriminating pathological voice from healthy voice using cepstral peak prominence smoothed distribution in sustained vowel. IEEE Trans Instrum Meas 2018;67:646–54. https://doi.org/10.1109/tim.2017.2781958.Search in Google Scholar
39. Deb, S, Dandapat, S, Krajewski, J. Analysis and classification of cold speech using variational mode decomposition. IEEE Trans Affect Comput 2020;11:296–307. https://doi.org/10.1109/taffc.2017.2761750.Search in Google Scholar
© 2021 Walter de Gruyter GmbH, Berlin/Boston