Classification of Cleft Lip and Palate Speech Using Fine-Tuned Transformer Pretrained Models

Bhattacharjee, Susmita; Shekhawat, H. S.; Prasanna, S. R. M.

doi:10.1007/978-3-031-53827-8_6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14531))

Included in the following conference series:

International Conference on Intelligent Human Computer Interaction

425 Accesses

Abstract

Cleft lip and palate speech (CLP) is a cranio-facial disorder which leads to spectro-temporal distortions in the speech of an individual. This makes accessibility of CLP speakers to speech enabled applications which require Human-computer interaction (HCI) such as voice assistants very challenging. Recently the availability of pretrained models have made the constraint of low resource language very convenient. Recent findings have proven that pretrained transformer models perform way ahead of traditional classifiers. In this paper, with an aim to achieve high end classification results, pretrained Transformer models fine-tuned on CLP data are used. The results obtained from the transformer models such as Wav2Vec2, SEW, SEW-D, UniSpeechSat, HuBERT, DistilHuBERT showed a comparative performance of the models and specially DistilHuBERT showed a significant improvement in the accuracy being close to 100%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Impact of Including Pathological Speech in Pre-training on Pathology Detection

Improving Automatic Speech Recognition for Non-native English with Transfer Learning and Language Model Decoding

Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions

Article Open access 02 June 2022

References

Kummer, A.W.: Cleft Palate and Craniofacial Anomalies: Effects on Speech and Resonance (2007)
Google Scholar
Peterson-Falzone, M. A. H.-J. S. J., Karnell, M.P.: Cleft palate speech (2001)
Google Scholar
Grunwell, P., Sell, D.: Speech and cleft palate/velopharyngeal anomalies. Management of Cleft Lip and Palate. Whurr, London (2001)
Google Scholar
Whitehill, T.: Assessing intelligibility in speakers with cleft palate: a critical review of the literature. Cleft Palate-Craniofacial Journal : Official Publication of the American Cleft Palate-craniofacial Association, vol. 39, pp. 50–8 (2002)
Google Scholar
Zajac, D.J., Vallino, L.: Evaluation and management of cleft lip and palate: a developmental perspective (2017)
Google Scholar
Lohmander, A., Olsson, M.: Methodology for perceptual assessment of speech in patients with cleft palate: a critical review of the literature. Cleft Palate Craniofac. J. 41, 64–70 (2004)
Article Google Scholar
Stengelhofen, J.: Cleft palate: The nature and remediation of communication problems Churchill Livingstone (1993)
Google Scholar
Hsu, C.-C., Hwang, H.-T., Wu, Y.-C., Tsao, Y., Wang, H.: Voice conversion from unaligned corpora using variational autoencoding Wasserstein generative adversarial networks. In: INTERSPEECH (2017)
Google Scholar
Bhattacharjee, S., Sinha, R.: Sensitivity analysis of maskcyclegan based voice conversion for enhancing cleft lip and palate speech recognition, pp. 1–5 (2022)
Google Scholar
Baumann, I., et al.: Influence of utterance and speaker characteristics on the classification of children with cleft lip and palate. In: INTERSPEECH 2023 (2022)
Google Scholar
Javid, M.H., Gurugubelli, K., Vuppala, A.K.: Single frequency filter bank based long-term average spectra for hypernasality detection and assessment in cleft lip and palate speech. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6754–6758 (2020)
Google Scholar
Baevski, A., Zhou, H., Mohamed, A., Auli, M.: Wav2vec 2.0: a framework for self-supervised learning of speech representations. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS 2020. Red Hook, NY, USA: Curran Associates Inc. (2020)
Google Scholar
Wu, F., Kim, K., Pan, J., Han, K.J., Weinberger, K.Q., Artzi, Y.: Performance-efficiency trade-offs in unsupervised pre-training for speech recognition. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7667–7671 (2021)
Google Scholar
Chen, S., et al.: Unispeech-sat: universal speech representation learning with speaker aware pre-training. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6152–6156 (2021)
Google Scholar
Hsu, W.-N., Bolte, B., Tsai, Y.-H.H., Lakhotia, K., Salakhutdinov, R., Rahman Mohamed, A.: Hubert: self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio, Speech, Lang. Process. 29, 3451–3460 (2021)
Google Scholar
Chang, H.-J., wen Yang, S., Yi Lee, H.: Distilhubert: speech representation learning by layer-wise distillation of hidden-unit Bert. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7087–7091 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Guwahati, Guwahati, India
Susmita Bhattacharjee & H. S. Shekhawat
Indian Institute of Technology Dharwad, Dharwad, India
S. R. M. Prasanna

Authors

Susmita Bhattacharjee
View author publications
You can also search for this author in PubMed Google Scholar
H. S. Shekhawat
View author publications
You can also search for this author in PubMed Google Scholar
S. R. M. Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Susmita Bhattacharjee .

Editor information

Editors and Affiliations

Soongsil University, Seoul, Korea (Republic of)
Bong Jun Choi
Saint Louis University, St. Louis, MO, USA
Dhananjay Singh
Indian Institute of Information Technology, Allahabad, India
Uma Shanker Tiwary
Pukyong National University, Busan, Korea (Republic of)
Wan-Young Chung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhattacharjee, S., Shekhawat, H.S., Prasanna, S.R.M. (2024). Classification of Cleft Lip and Palate Speech Using Fine-Tuned Transformer Pretrained Models. In: Choi, B.J., Singh, D., Tiwary, U.S., Chung, WY. (eds) Intelligent Human Computer Interaction. IHCI 2023. Lecture Notes in Computer Science, vol 14531. Springer, Cham. https://doi.org/10.1007/978-3-031-53827-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-53827-8_6
Published: 29 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53826-1
Online ISBN: 978-3-031-53827-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Classification of Cleft Lip and Palate Speech Using Fine-Tuned Transformer Pretrained Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Impact of Including Pathological Speech in Pre-training on Pathology Detection

Improving Automatic Speech Recognition for Non-native English with Transfer Learning and Language Model Decoding

Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Classification of Cleft Lip and Palate Speech Using Fine-Tuned Transformer Pretrained Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Impact of Including Pathological Speech in Pre-training on Pathology Detection

Improving Automatic Speech Recognition for Non-native English with Transfer Learning and Language Model Decoding

Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation