Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Residual Neural Network precisely quantifies dysarthria severity-level based on short-duration speech segments

Published: 01 July 2021 Publication History

Abstract

Recently, we have witnessed Deep Learning methodologies gaining significant attention for severity-based classification of dysarthric speech. Detecting dysarthria, quantifying its severity, are of paramount importance in various real-life applications, such as the assessment of patients’ progression in treatments, which includes an adequate planning of their therapy and the improvement of speech-based interactive systems in order to handle pathologically-affected voices automatically. Notably, current speech-powered tools often deal with short-duration speech segments and, consequently, are less efficient in dealing with impaired speech, even by using Convolutional Neural Networks (CNNs). Thus, detecting dysarthria severity-level based on short speech segments might help in improving the performance and applicability of those systems. To achieve this goal, we propose a novel Residual Network (ResNet)-based technique which receives short-duration speech segments as input. Statistically meaningful objective analysis of our experiments, reported over standard Universal Access corpus, exhibits average values of 21.35% and 22.48% improvement, compared to the baseline CNN, in terms of classification accuracy and F1-score, respectively. For additional comparisons, tests with Gaussian Mixture Models and Light CNNs were also performed. Overall, the values of 98.90% and 98.00% for classification accuracy and F1-score, respectively, were obtained with the proposed ResNet approach, confirming its efficacy and reassuring its practical applicability.

Highlights

Innovatively, ResNet is used to detect dysarthria and its corresponding severity-level.
Advancing the state-of-the-art, we demonstrated that ResNet is much better than CNNs to detect dysarthria.
Originally, we show that short-duration speech segments, instead of long-duration, successfully identify dysarthria.

References

[1]
An, K., Kim, M. J., Teplansky, K., Green, J. R., Campbell, T., & Yunusova, Y., et al. (2018). Automatic early detection of amyotrophic lateral sclerosis from intelligible speech using convolutional neural networks. In Proc. Interspeech Hyderabad, India: (pp. 1913–1917).
[2]
Ananthapadmanabha T.V., Yegnanarayana B., Epoch extraction of voiced speech, IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-23 (6) (1975) 562–570.
[3]
Ananthapadmanabha T.V., Yegnanarayana B., Epoch extraction from linear prediction residual for identification of closed glottis interval, IEEE Transactions on Acoustics, Speech, and Signal Processing 27 (4) (1979) 309–319.
[4]
Angelov, P., & Sperduti, A. (2016). Challenges in deep learning. In The 24th European Symposium on Artificial Neural Networks (ESANN) Bruges, Belgium: (pp. 489–496).
[5]
Atal B.S., Hanauer S.L., Speech analysis and synthesis by linear prediction of the speech wave, Journal of the Acoustical Society of America 50 (2) (1971) 637–655.
[6]
Bhat, C., Das, B., Vachhani, B., & Kopparapu, S. K. (2018). Dysarthric speech recognition using time-delay neural network based denoising autoencoder. In INTERSPEECH Hyderabad, India: (pp. 451–455).
[7]
Bhat, C., Vachhani, B., & Kopparapu, S. K. (2016). Recognition of dysarthric speech using voice parameters for speaker adaptation and multi-taper spectral estimation. In INTERSPEECH San Fransisco, USA: (pp. 228–232).
[8]
Bhat, C., Vachhani, B., & Kopparapu, S. K. (2017a). Automatic assessment of dysarthria severity level using audio descriptors. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA (pp. 5070–5074).
[9]
Bhat, C., Vachhani, B., & Kopparapu, S. K. (2017b). Automatic assessment of dysarthria severity level using audio descriptors. In Proc. ICASSP New Orleans, USA: (pp. 5070–5074).
[10]
Bishop C.M., Pattern Recognition and Machine Learning, Springer, 2006.
[11]
Bořil, H., & Pollák, P. (2004). Direct time domain fundamental frequency estimation of speech in noisy conditions. In 12th European Signal Processing Conference (EUSIPCO) Vienna, Austria: (pp. 1003–1006).
[12]
Calvert G., Spence C., Stein B.E., et al., The Handbook of Multisensory Processes, MIT Press, 2004, Edition.
[13]
Chandrashekar H.M., Karjigi V., Sreedevi N., Spectro-temporal representation of speech for intelligibility assessment of dysarthria, IEEE Journal of Selected Topics in Signal Processing 14 (2) (2019) 390–399.
[14]
Chen J., Wang Y., Wang D., A feature study for classification-based speech separation at low signal-to-noise ratios, IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 (12) (2014) 1993–2002.
[15]
Chen Z., Xie Z., Zhang W., Xu X., Resnet and model fusion for automatic spoofing detection, in: INTERSPEECH, 2017, pp. 102–106.
[16]
kyong Choe Y., Liss J.M., Azuma T., Mathy P., Evidence of cue use and performance differences in deciphering dysarthric speech, Journal of the Acoustical Society of America 131 (2) (2012) EL112–EL118.
[17]
Christensen O., Christensen K.L., Approximation Theory: From Taylor Polynomials to Wavelets, Birkhauser, 2006.
[18]
Connaghan K.P., Wertheim C., Laures-Gore J.S., Russell S., Patel R., An exploratory study of student, speech-language pathologist and emergency worker impressions of speakers with dysarthria, International Journal of Speech-Language Pathology 14 (2020) 1–10.
[19]
De Russis L., Corno F., On the impact of dysarthric speech on contemporary ASR cloud platforms, Journal of Reliable Intelligent Environments 5 (3) (2019) 163–172.
[20]
Delfarah M., Wang D., Features for masking-based monaural speech separation in reverberant conditions, IEEE/ACM Transactions on Audio, Speech, and Language Processing 25 (5) (2017) 1085–1094.
[21]
Duda R.O., Hart P.E., Pattern classification, John Wiley & Sons, 2006.
[22]
Enderby P., Frenchay dysarthria assessment, British Journal of Disorders of Communication 15 (3) (1980) 165–173.
[23]
Fahn S., Elton R., Unified Parkinson’s disease rating scale (UPDRS), Revue Neurologique (Paris) 156 (2000) 534–541.
[24]
Falk T.H., Chan W.-Y., Shein F., Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility, Speech Communication 54 (5) (2012) 622–631.
[25]
Farhadipour A., Veisi H., Asgari M., Keyvanrad M.A., Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks, ETRI Journal 40 (5) (2018) 643–652.
[26]
Freed D.B., Motor Speech Disorders: Diagnosis and Treatment, 3rd ed., Plural Publishing, USA, 2018.
[27]
Giri M.P., Rayavarapu N., Assessment on impact of various types of dysarthria on acoustic parameters of speech, Journal of the Acoustical Society of America 21 (3) (2018) 705–714.
[28]
Gomez P., et al., Characterization of Parkinson’s disease dysarthria in terms of speech articulation kinematics, Biomedical Signal Processing and Control 52 (2019) 312–320.
[29]
Goodfellow I., Bengio Y., Courville A., Deep learning, MIT press, 2016.
[30]
Gurevich N., Scamihorn S.L., Speech-language pathologists’ use of intelligibility measures in adults with dysarthria, American Journal of Speech-Language Pathology 26 (3) (2017) 873–892.
[31]
He, K., Zhang, X., Ren, S., & Sun, J. (2016a). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada: (pp. 770–778).
[32]
He, K., Zhang, X., Ren, S., & Sun, J. (2016b). Identity mappings in deep residual networks. In European Conference on Computer Vision (ECCV) Amsterdam, The Netherlands: (pp. 630–645).
[33]
Hestness J., Ardalani N., Diamos G., Beyond human-level accuracy: computational challenges in deep learning, in: The 24th Symposium on Principles and Practice of Parallel Programming, 2019, pp. 1–14.
[34]
Hoehn M.M., Yahr M.D., Parkinsonism: Onset, progression, and mortality, Neurology 17 (5) (1967) 427.
[35]
Huang, G., Liu, Z., Weinberger, K. Q., & van der Maaten, L. (2017). Densely connected convolutional networks. In Proc. Computer Vision and Pattern Recognition (CVPR) Cnonference Honolulu, HI, USA: (pp. 2261–2269).
[36]
Jiang Y., Chen L., Zhang H., Xiao X., Breast cancer histopathological image classification using convolutional neural networks with small SE-ResNet module, PloS One 14 (3) (2019).
[37]
Jung, H., Choi, M.-K., Jung, J., Lee, J.-H., Kwon, S., & Young Jung, W. (2017). ResNet-based vehicle classification and localization in traffic surveillance systems. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 61–67).
[38]
Kaiser, J. F. (1990). On a simple algorithm to calculate the energy of a signal. In Proc. of Int. Conf. on Acoustic, Speech and Signal Processing (pp. 381–384).
[39]
Kawaguchi K., Bengio Y., Depth with nonlinearity creates no bad local minima in resnets, Neural Networks 118 (2019) 167–174.
[40]
Kent R.D., Weismer G., Kent J.F., Vorperian H.K., Duffy J.R., Acoustic studies of dysarthric speech: Methods, progress, and potential, Journal of Communication Disorders 32 (3) (1999) 141–186.
[41]
Kim, M. J., Cao, B., An, K., & Wang, J. (2018). Dysarthric speech recognition using Convolutional LSTM neural network. In INTERSPEECH Hyderabad, India: (pp. 2948–2952).
[42]
Kim H., Hasegawa-Johnson M., Perlman A., Vowel contrast and speech intelligibility in dysarthria, Folia Phoniatrica et Logopaedica 63 (4) (2011) 187–194.
[43]
Kim, H., Hasegawa-Johnson, M., Perlman, A., Gunderson, J., Huang, T. S., & Watkin, K., et al. (2008). Dysarthric speech database for universal access research. In INTERSPEECH Brisbane, Australia: (pp. 1741–1744).
[44]
Kim Y.J., Kent R.D., Weismer G., An acoustic study of the relationships among neurologic disease, dysarthria type and severity of dysarthria, Journal of Speech, Language, and Hearing Research 54 (2) (2011) 417–429.
[45]
Kingma D.P., Ba J., Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980, {Last Accessed: Jan 30, 2017}.
[46]
Lansford K.L., Berisha V., Utianski R.L., Modeling listener perception of speaker similarity in dysarthria, Journal of the Acoustical Society of America 139 (6) (2016) EL209–EL215.
[47]
Lansford K.L., Liss J.M., Vowel acoustics in dysarthria: Speech disorder diagnosis and classification, Journal of Speech, Language, and Hearing Research 57 (1) (2014) 57–67.
[48]
Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., & Shchemelinin, V. (2017). Audio replay attack detection with deep learning frameworks. In INTERSPEECH Stockholm, Sweden: (pp. 82–86).
[49]
Lavrentyeva, G., Novoselov, S., Tseren, A., Volkova, M., Gorlanov, A., & Kozlov, A. (2019). STC antispoofing systems for the ASVspoof2019 challenge. In INTERSPEECH Graz, Austria: (pp. 1033–1037).
[50]
LeCun Y., Bengio Y., Hinton G., Deep learning, Nature 521 (7553) (2015).
[51]
LeCun Y., Bottou L., Bengio Y., Haffner P., Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324.
[52]
Liu S., Tian G., Xu Y., A novel scene classification model combining resnet based transfer learning and data augmentation with a filter, Neurocomputing 338 (2019) 191–206.
[53]
Liu X., Zhou Y., Zhao J., Yao R., Liu B., Ma D., et al., Multiobjective resnet pruning by means of EMOAs for remote sensing scene classification, Neurocomputing 381 (2020) 298–305.
[54]
Lu Z., Jiang X., Kot A., Deep coupled resnet for low-resolution face recognition, IEEE Signal Processing Letters 25 (4) (2018) 526–530.
[55]
Mustafa M.B., Salim S.S., Mohamed N., Al-Qatab B., Siong C.E., Severity-based adaptation with limited data for ASR to aid dysarthric speakers, Public Library of Science (PLoS) One 9 (1) (2014).
[56]
de Oliveira Chappaz R., dos Santos Barreto S., Ortiz K.Z., Pneumo-phono-articulatory coordination assessment in dysarthria cases: a cross-sectional study, São Paulo Medical Journal 136 (3) (2018) 216–221.
[57]
Paja, M. S., & Falk, T. H. (2012a). Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech. In Proc. INTERSPEECH Portland, Oregon, (pp. 62–65).
[58]
Paja, M. O. S., & Falk, T. H. (2012b). Automated Dysarthria Severity Classification for Improved Objective Intelligibility Assessment of Spastic Dysarthric Speech. In Proc. Interspeech Portland, OR, USA: (pp. 62–65).
[59]
Perez M., Aldeneh Z., Provost E.M., Aphasic speech recognition using a mixture of speech intelligibility experts, in: Proc. Interspeech 2020, 2020, pp. 4986–4990,.
[60]
Purohit, M., Parmar, M., Patel, M., Malaviya, H., & Patii, H. A. (0000). Weak speech supervision: A case study of dysarthria severity classification. In 2020 28th European Signal Processing Conference (EUSIPCO) IEEE; (pp. 101–105).
[61]
Quatieri T.F., Discrete-Time Speech Signal Processing: Principles and Practices, Pearson Education, 2002.
[62]
Reynolds D.A., A Gaussian mixture modeling approach to text-independent speaker identification, (Ph.D. thesis) Georgia Institute of Technology, 1992.
[63]
Reynolds D.A., Rose R.C., Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Transactions on Speech and Audio Processing 3 (1) (1995) 72–83.
[64]
Rosen K.M., Goozee J.V., Murdoch B.E., Examining the effects of multiple sclerosis on speech production: Does phonetic structure matter?, Journal of Communication Disorders 41 (1) (2008) 49–69.
[65]
Rudzicz F., Adjusting dysarthric speech signals to be more intelligible, Computer Speech & Language 27 (6) (2013) 1163–1177.
[66]
Rudzicz F., Namasivayam A.K., Wolff T., The TORGO database of acoustic and articulatory speech from speakers with dysarthria, Language Resources and Evaluation 46 (4) (2012) 523–541.
[67]
Schmitz-Hübsch T., Du Montcel S.T., Baliko L., Berciano J., Boesch S., Depondt C., et al., Scale for the assessment and rating of aataxia: Development of a new clinical scale, Neurology 66 (11) (2006) 1717–1720.
[68]
Simonyan K., Vedaldi A., Zisserman A., Deep inside convolutional networks: Visualising image classification models and saliency maps, 2013, arXiv preprint arXiv:1312.6034, {Last Accessed: Apr 19, 2014}.
[69]
Springenberg J.T., Dosovitskiy A., Brox T., Riedmiller M., Striving for simplicity: The all convolutional net, 2014, arXiv preprint arXiv:1412.6806.
[70]
Swarup, P., Maas, R., Garimella, S., Mallidi, S. H., & Hoffmeister, B. (2019). Improving ASR confidence scores for alexa using acoustic and hypothesis embeddings. In INTERSPEECH, Graz, Austria: (pp. 2175–2179).
[71]
Sztahó D., Vicsi K., Estimating the severity of Parkinson’s disease using voiced ratio and nonlinear parameters, in: Král P., Martín-Vide C. (Eds.), Statistical Language and Speech Processing, Springer International Publishing, 2016, pp. 96–107.
[72]
Tai, Y., Yang, J., & Liu, X. (2017). Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Honolulu, Hawaii: (pp. 3147–3155).
[73]
Teager H.M., Teager S.M., Evidence for nonlinear sound production mechanisms in the vocal tract, in: Chapter “Evidence for nonlinear sound production mechanisms in the vocal tract”, Kluwer Academic Publishers, 1990, pp. 241–261.
[74]
Tripathi, A., Bhosale, S., & Kopparapu, S. K. (2020). Improved speaker independent dysarthria intelligibility classification using deepspeech posteriors. In Proc. ICASSP Barcelona, Spain: (pp. 6114–6118).
[75]
Turner G., Tjaden K., Weismer G., The influence of speaking rate on vowel space and speech intelligibility for individuals with amyotrophic lateral sclerosis, Journal of Speech, Language, and Hearing Research 38 (5) (1995) 1001–1013.
[76]
Vachhani, B., Bhat, C., & Kopparapu, S. K. (2018). Data augmentation using healthy speech for dysarthric speech recognition. In Proc. INTERSPEECH Hyderabad, India: (pp. 471–475).
[77]
Vásquez Correa, J. C., Arias, T., Orozco-Arroyave, J. R., & Nöth, E. (2018). A multitask learning approach to assess the dysarthria severity in patients with Parkinson’s disease. In Proc. Interspeech (pp. 456–460).
[78]
Vásquez-Correa, J. C., Arias-Vergara, T., Orozco-Arroyave, J. R., & Nöth, E. (2018). A multitask learning approach to assess the dysarthria severity in patients with Parkinson’s Disease. In INTERSPEECH, Hyderabad, India: (pp. 456–460).
[79]
Vásquez-Correa, J. C., Orozco-Arroyave, J. R., & Nöth, E. (2017). Convolutional neural network to model articulation impairments in patients with Parkinson’s Disease. In INTERSPEECH Stockholm, Sweden: (pp. 314–318).
[80]
Vydana H.K., Vuppala A.K., Residual neural networks for speech recognition, in: 2017 25th European Signal Processing Conference (EUSIPCO), IEEE, 2017, pp. 543–547.
[81]
Wang D., Chen J., Supervised speech separation based on deep learning: An overview, IEEE/ACM Transactions on Audio, Speech, and Language Processing 26 (10) (2018) 1702–1726.
[82]
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., & Zhang, H., et al. (2017). Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3156–3164).
[83]
Watanabe S., Arasaki K., Nagata H., Shouji S., Analysis of dysarthria in amyotrophic lateral sclerosis—MRI of the tongue and formant analysis of vowels, Rinsho Shinkeigaku 34 (3) (1994) 217–223.
[84]
Wu X., He R., Sun Z., Tan T., A light CNN for deep face representation with noisy labels, IEEE Transactions on Information Forensics and Security 13 (11) (2018) 2884–2896.
[85]
Yang S., Wang F., Yang L., Xu F., Luo M., Chen X., et al., The physical significance of acoustic parameters and its clinical significance of dysarthria in Parkinson’s disease, Scientific Reports 10 (1) (2020) 11776.
[86]
Yorkston K.M., Beukelman D.R., Traynor C., Assessment of Intelligibility of Dysarthric Speech, Pro-ed Austin, TX, 1984.
[87]
Yorkston K., Beukelman D., Traynor C., Computerized Assessment of Intelligibility of Dysarthic Speech, CC Publications, 1984.
[88]
Young V., Mihailidis A., Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: A literature review, Assistive Technology 22 (2) (2010) 99–112.
[89]
Zhang X., Wang D., Boosting contextual information for deep neural network based voice activity detection, IEEE/ACM Transactions on Audio, Speech, and Language Processing 24 (2) (2016) 252–264.
[90]
Zhang X., Wu J., Deep belief networks based voice activity detection, IEEE Transactions on Audio, Speech, and Language Processing 21 (4) (2013) 697–710.

Cited By

View all
  • (2025)Integrating binary classification and clustering for multi-class dysarthria severity level classification: a two-stage approachCluster Computing10.1007/s10586-024-04748-128:2Online publication date: 1-Apr-2025
  • (2024)Hyperkinetic Dysarthria voice abnormalities: a neural network solution for text translationInternational Journal of Speech Technology10.1007/s10772-024-10098-527:1(255-265)Online publication date: 1-Mar-2024
  • (2024)Dysarthric Severity Categorization Based on Speech Intelligibility: A Hybrid ApproachCircuits, Systems, and Signal Processing10.1007/s00034-024-02770-743:11(7044-7063)Online publication date: 1-Nov-2024
  • Show More Cited By

Index Terms

  1. Residual Neural Network precisely quantifies dysarthria severity-level based on short-duration speech segments
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Neural Networks
          Neural Networks  Volume 139, Issue C
          Jul 2021
          373 pages

          Publisher

          Elsevier Science Ltd.

          United Kingdom

          Publication History

          Published: 01 July 2021

          Author Tags

          1. Dysarthria
          2. Severity-level
          3. Short-speech segments
          4. CNN
          5. ResNet

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 08 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2025)Integrating binary classification and clustering for multi-class dysarthria severity level classification: a two-stage approachCluster Computing10.1007/s10586-024-04748-128:2Online publication date: 1-Apr-2025
          • (2024)Hyperkinetic Dysarthria voice abnormalities: a neural network solution for text translationInternational Journal of Speech Technology10.1007/s10772-024-10098-527:1(255-265)Online publication date: 1-Mar-2024
          • (2024)Dysarthric Severity Categorization Based on Speech Intelligibility: A Hybrid ApproachCircuits, Systems, and Signal Processing10.1007/s00034-024-02770-743:11(7044-7063)Online publication date: 1-Nov-2024
          • (2024)Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw SpeechCircuits, Systems, and Signal Processing10.1007/s00034-024-02611-743:5(3261-3278)Online publication date: 22-Feb-2024
          • (2024)Linear Frequency Residual Cepstral Features for Dysarthria Severity ClassificationPattern Recognition10.1007/978-3-031-78498-9_22(316-331)Online publication date: 1-Dec-2024
          • (2023)Noise Robust Whisper Features for Dysarthric Severity-Level ClassificationPattern Recognition and Machine Intelligence10.1007/978-3-031-45170-6_74(708-715)Online publication date: 12-Dec-2023

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media