Towards robust voice pathology detection

Harar, Pavol; Galaz, Zoltan; Alonso-Hernandez, Jesus B.; Mekyska, Jiri; Burget, Radim; Smekal, Zdenek

doi:10.1007/s00521-018-3464-7

Towards robust voice pathology detection

Investigation of supervised deep learning, gradient boosting, and anomaly detection approaches across four databases

S.I.: Advances in Bio-Inspired Intelligent Systems
Published: 04 April 2018

Volume 32, pages 15747–15757, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Pavol Harar ORCID: orcid.org/0000-0001-5206-1794¹,
Zoltan Galaz¹,
Jesus B. Alonso-Hernandez²,
Jiri Mekyska¹,
Radim Burget¹ &
…
Zdenek Smekal¹

1763 Accesses
46 Citations
2 Altmetric
Explore all metrics

A Correction to this article was published on 14 September 2019

This article has been updated

Abstract

Automatic objective non-invasive detection of pathological voice based on computerized analysis of acoustic signals can play an important role in early diagnosis, progression tracking, and even effective treatment of pathological voices. In search towards such a robust voice pathology detection system, we investigated three distinct classifiers within supervised learning and anomaly detection paradigms. We conducted a set of experiments using a variety of input data such as raw waveforms, spectrograms, mel-frequency cepstral coefficients (MFCC), and conventional acoustic (dysphonic) features (AF). In comparison with previously published works, this article is the first to utilize combination of four different databases comprising normophonic and pathological recordings of sustained phonation of the vowel /a/ unrestricted to a subset of vocal pathologies. Furthermore, to our best knowledge, this article is the first to explore gradient-boosted trees and deep learning for this application. The following best classification performances measured by F1 score on dedicated test set were achieved: XGBoost (0.733) using AF and MFCC, DenseNet (0.621) using MFCC, and Isolation Forest (0.610) using AF. Even though these results are of exploratory character, conducted experiments do show promising potential of gradient boosting and deep learning methods to robustly detect voice pathologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lecture Notes in Computer Science: Pathological Voice Recognition Based on Acoustic Phonatory Features

Classification of laryngeal diseases including laryngeal cancer, benign mucosal disease, and vocal cord paralysis by artificial intelligence using voice analysis

Article Open access 23 April 2024

Fast Learning Network Algorithm for Voice Pathology Detection and Classification

Article 10 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Change history

14 September 2019
The Table 3 was published incorrectly in the original publication of the article.

References

Ali Z, Alsulaiman M, Muhammad G, Elamvazuthi I, Al-nasheri A, Mesallam TA, Farahat M, Malki KH (2017) Intra-and inter-database study for arabic, english, and german databases: do conventional speech features detect voice pathology? J Voice 31(3):e381–e386
Article Google Scholar
Ali Z, Muhammad G, Alhamid MF (2017) An automatic health monitoring system for patients suffering from voice complications in smart cities. IEEE Access 5:3900–3908
Article Google Scholar
Al-nasheri A, Muhammad G, Alsulaiman M, Ali Z (2017) Investigation of voice pathology detection and classification on different frequency regions using correlation functions. J Voice 31(1):3–15
Article Google Scholar
Al-nasheri A, Muhammad G, Alsulaiman M, Ali Z, Malki K, Mesallam T, Farahat M (2017) Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Access PP(99):1–1. https://doi.org/10.1109/ACCESS.2017.2696056
Article Google Scholar
Al-nasheri A, Muhammad G, Alsulaiman M, Ali Z, Mesallam TA, Farahat M, Malki KH, Bencherif MA (2017) An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. J Voice 31(1):113–e119
Article Google Scholar
Al-nasheri A, Ali Z, Muhammad G, Alsulaiman M (2014) Voice pathology detection using auto-correlation of different filters bank. In: 2014 IEEE/ACS 11th international conference on computer systems and applications (AICCSA), pp 50–55. IEEE
Amami R, Smiti A (2017) An incremental method combining density clustering and support vector machines for voice pathology detection. Comput Electr Eng 57:257–265
Article Google Scholar
Arias-Londoño JD, Godino-Llorente JI, Markaki M, Stylianou Y (2011) On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices. Logop Phoniatr Vocol 36(2):60–69
Article Google Scholar
Armstrong D, Gosling A, Weinman J, Marteau T (1997) The place of inter-rater reliability in qualitative research: an empirical study. Sociology 31(3):597–606
Article Google Scholar
Brabenec L, Mekyska J, Galaz Z, Rektorova I (2017) Speech disorders in parkinsons disease: early diagnostics and effects of medication and brain stimulation. J Neural Transm 124(3):303–334
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794. ACM
Chollet F et al (2015) Keras: Deep learning library for theano and tensorflow. https://keras.io/
Dahmani M, Guerti M (2017) Vocal folds pathologies classification using naïve bayes networks. In: 2017 6th international conference on systems and control (ICSC), pp 426–432. IEEE
De Bodt MS, Wuyts FL, Van de Heyning PH, Croux C (1997) Test–retest study of the grbas scale: influence of experience and professional background on perceptual rating of voice quality. J Voice 11(1):74–80
Article Google Scholar
Dejonckere PH, Bradley P, Clemente P, Cornut G, Crevier-Buchman L, Friedrich G, Van De Heyning P, Remacle M, Woisard V (2001) A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Eur Arch Otorhinolaryngol 258(2):77–82
Article Google Scholar
Eskidere Ö, Gürhanlı A (2015) Voice disorder classification based on multitaper mel frequency cepstral coefficients features. Comput Math Methods Med. https://doi.org/10.1155/2015/956249
Article Google Scholar
Eye M, Infirmary E (1994) Voice disorders database, version. 1.03 (cd-rom). Kay Elemetrics Corporation, Lincoln Park
Gerratt BR, Kreiman J, Antonanzas-Barroso N, Berke GS (1993) Comparing internal and external standards in voice quality judgments. J Speech Hear Res 36(1):14–20
Article Google Scholar
Godino-Llorente JI, Gómez-Vilda P, Cruz-Roldán F, Blanco-Velasco M, Fraile R (2010) Pathological likelihood index as a measurement of the degree of voice normality and perceived hoarseness. J Voice 24(6):667–677
Article Google Scholar
Gwet KL (2014) Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among raters. Advanced Analytics LLC, Montgomery
Google Scholar
Harar P, Alonso-Hernandezy JB, Mekyska J, Galaz Z, Burget R, Smekal Z (2017) Voice pathology detection using deep learning: a preliminary study. In: 2017 international conference and workshop on bioinspired intelligence (IWOBI), pp 1–4. IEEE
Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J Roy Stat Soc Ser C (Appl Stat) 28(1):100–108
MATH Google Scholar
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Their Appl 13(4):18–28
Article Google Scholar
Hemmerling D (2017) Voice pathology distinction using autoassociative neural networks. In: 2017 25th European signal processing conference (EUSIPCO), pp 1844–1847. IEEE
Hemmerling D, Skalski A, Gajda J (2016) Voice data mining for laryngeal pathology assessment. Comput Biol Med 69:270–276
Article Google Scholar
Hillenbrand J, Houde RA (1996) Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. J Speech Hear Res 39(2):311–321
Article Google Scholar
Hossain MS, Muhammad G (2016) Healthcare big data voice pathology assessment framework. IEEE Access 4:7806–7815
Article Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Article Google Scholar
Huang G, Liu Z, Weinberger KQ, van der Maaten L (2016) Densely connected convolutional networks. arXiv preprint arXiv:1608.06993
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Kreiman J, Gerratt BR, Kempster GB, Erman A, Berke GS (1993) Perceptual evaluation of voice quality: review, tutorial, and a framework for future research. J Speech Hear Res 36(1):21–40
Article Google Scholar
Little M, McSharry P, Hunter E, Spielman J, Ramig L (2009) Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE T Bio-Med Eng 56(4):1015–1022
Article Google Scholar
Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: Eighth IEEE international conference on data mining, 2008. ICDM’08, pp 413–422. IEEE
Liu FT, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. ACM Trans Knowl Discov Data (TKDD) 6(1):3
Google Scholar
Martínez D, Lleida E, Ortega A, Miguel A, Villalba J (2012) Voice pathology detection on the saarbrücken voice database with calibration and fusion of scores using multifocal toolkit. In: Advances in speech and language technologies for Iberian Languages, pp 99–109. Springer
Mehta DD, Hillman RE (2008) Voice assessment: updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods. Curr Opin Otolaryngol Head Neck Surg 16(3):211
Article Google Scholar
Mekyska J, Janousova E, Gomez-Vilda P, Smekal Z, Rektorova I, Eliasova I, Kostalova M, Mrackova M, Alonso-Hernandez JB, Faundez-Zanuy M et al (2015) Robust and complex approach of pathological speech signal analysis. Neurocomputing 167:94–111
Article Google Scholar
Mekyska J, Galaz Z, Mzourek Z, Smekal Z, Rektorova I (2015) Assessing progress of Parkinson’s using acoustic analysis of phonation. In: 2015 International work conference on bioinspired intelligence (IWOBI), pp 115–122. https://doi.org/10.1109/IWOBI.2015.7160153
Mekyska J, Smekal Z, Galaz Z, Mzourek Z, Rektorova I, Faundez-Zanuy M, López-de Ipiña K (2016) Recent advances in nonlinear speech processing, chap. Perceptual features as markers of Parkinson’s disease: the issue of clinical interpretability, pp 83–91. Springer, Cham. https://doi.org/10.1007/978-3-319-28109-4_9
Mesallam TA, Farahat M, Malki KH, Alsulaiman M, Ali Z, Al-nasheri A, Muhammad G (2017) Development of the Arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. J Healthc Eng. https://doi.org/10.1155/2017/8783751
Article Google Scholar
Michaelis D, Gramss T, Strube HW (1997) Glottal-to-noise excitation ratio-a new measure for describing pathological voices. Acta Acust United Acust 83(4):700–706
Google Scholar
Muhammad G, Alhamid MF, Hossain MS, Almogren AS, Vasilakos AV (2017) Enhanced living by assessing voice pathology using a co-occurrence matrix. Sensors 17(2):267
Article Google Scholar
Muhammad G, Alsulaiman M, Ali Z, Mesallam TA, Farahat M, Malki KH, Al-nasheri A, Bencherif MA (2017) Voice pathology detection using interlaced derivative pattern on glottal source excitation. Biomed Signal Process Control 31:156–164
Article Google Scholar
Murphy KP (2006) Naive bayes classifiers. University of British Columbia
Oates J (2009) Auditory-perceptual evaluation of disordered voice quality. Folia Phoniatr Logop 61(1):49–56
Article Google Scholar
Parsa V, Jamieson DG (2003) Identification of pathological voices using glottal noise measures. J Speech Lang Hear Res 23(2):469–485
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Pimentel MA, Clifton DA, Clifton L, Tarassenko L (2014) A review of novelty detection. Sig Process 99:215–249
Article Google Scholar
Reynolds D (2015) Gaussian mixture models. Encyclopedia of biometrics, pp 827–832
Sabir B, Rouda F, Khazri Y, Touri B, Moussetad M (2017) Improved algorithm for pathological and normal voices identification. Int J Electr Comput Eng (IJECE) 7(1):238–243
Article Google Scholar
Saldanha JC, Ananthakrishna T, Pinto R (2014) Vocal fold pathology assessment using mel-frequency cepstral coefficients and linear predictive cepstral coefficients features. J Med Imaging Health Inf 4(2):168–173
Article Google Scholar
Schalkoff RJ (1997) Artificial neural networks, vol 1. McGraw-Hill, New York
MATH Google Scholar
Song P (2013) Assessment of vocal cord function and voice disorders. In: Principles and practice of interventional pulmonology, pp 137–149. Springer
Souissi N, Cherif A (2015) Dimensionality reduction for voice disorders identification system based on mel frequency cepstral coefficients and support vector machine. In: 2015 7th international conference on modelling, identification and control (ICMIC), pp 1–6. IEEE
Souissi N, Cherif A (2016) Speech recognition system based on short-term cepstral parameters, feature reduction method and artificial neural networks. In: 2016 2nd international conference on advanced technologies for signal and image processing (ATSIP), pp 667–671. IEEE
Stathopoulos ET, Huber JE, Sussman JE (2011) Changes in acoustic characteristics of the voice across the life span: measures from individuals 4–93 years of age. J Speech Lang Hear Res 54(4):1011–1021
Article Google Scholar
Teager H (1980) Some observations on oral air flow during phonation. IEEE Trans Acoust Speech Signal Process 28(5):599–601
Article Google Scholar
Titze IR (1994) Principles of voice production. Prentice-Hall, Englewood Cliffs
Google Scholar
Tsanas A, Little MA, McSharry PE, Ramig LO (2010) Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson’s disease symptom severity. J R Soc Interface 8(59):842–855
Article Google Scholar
Uloza V, Vegiene A, Saferis V (2015) Correlation between the quantitative video laryngostroboscopic measurements and parameters of multidimensional voice assessment. Biomed Signal Process Control 17(Suppl C):3–10
Article Google Scholar
Woldert-Jokisz B (2007) Saarbruecken voice database
Wyse L (2017) Audio spectrogram representations for processing with convolutional neural networks. arXiv preprint arXiv:1706.09559

Download references

Acknowledgements

This study was funded by the grant of the Czech Ministry of Health 16-30805A (Effects of non-invasive brain stimulation on hypokinetic dysarthria, micrographia, and brain plasticity in patients with Parkinson’s disease) and the following projects: SIX (CZ.1.05/2.1.00/03.0072) and LO1401. For the research, infrastructure of the SIX Center was used. The authors (P. Harar, Z. Galaz) of this study also acknowledge the financial support of Erwin Schrödinger International Institute for Mathematics and Physics during their stay at the “Systematic approaches to deep learning methods for audio” workshop held from 11 September, 2017, to 15 September, 2017, in Vienna.

Author information

Authors and Affiliations

Brno University of Technology, Technicka 3082/12, 61 600, Brno, Czech Republic
Pavol Harar, Zoltan Galaz, Jiri Mekyska, Radim Burget & Zdenek Smekal
Institute for Technological Development and Innovation in Communications (IDeTIC), University of Las Palmas de Gran Canaria, Parque Científico Tecnológico de la ULPGC, Polivalente II, Planta 2, 35017, Las Palmas de Gran Canaria, Spain
Jesus B. Alonso-Hernandez

Authors

Pavol Harar
View author publications
You can also search for this author in PubMed Google Scholar
Zoltan Galaz
View author publications
You can also search for this author in PubMed Google Scholar
Jesus B. Alonso-Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
Jiri Mekyska
View author publications
You can also search for this author in PubMed Google Scholar
Radim Burget
View author publications
You can also search for this author in PubMed Google Scholar
Zdenek Smekal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pavol Harar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Harar, P., Galaz, Z., Alonso-Hernandez, J.B. et al. Towards robust voice pathology detection. Neural Comput & Applic 32, 15747–15757 (2020). https://doi.org/10.1007/s00521-018-3464-7

Download citation

Received: 10 January 2018
Accepted: 24 March 2018
Published: 04 April 2018
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00521-018-3464-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards robust voice pathology detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Lecture Notes in Computer Science: Pathological Voice Recognition Based on Acoustic Phonatory Features

Classification of laryngeal diseases including laryngeal cancer, benign mucosal disease, and vocal cord paralysis by artificial intelligence using voice analysis

Fast Learning Network Algorithm for Voice Pathology Detection and Classification

Change history

14 September 2019

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Towards robust voice pathology detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Lecture Notes in Computer Science: Pathological Voice Recognition Based on Acoustic Phonatory Features

Classification of laryngeal diseases including laryngeal cancer, benign mucosal disease, and vocal cord paralysis by artificial intelligence using voice analysis

Fast Learning Network Algorithm for Voice Pathology Detection and Classification

Explore related subjects

Change history

14 September 2019

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation