A Robust Deep Transfer Learning Model for Accurate Speech Emotion Classification

Akinpelu, Samson; Viriri, Serestina

doi:10.1007/978-3-031-20716-7_33

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13599))

Included in the following conference series:

International Symposium on Visual Computing

Abstract

The significant role of emotion in human daily interaction cannot be over-emphasized, however, the pressing demand for a cutting-edge and highly efficient model for the classification of speech emotion in effective computing has remained a challenging task. Researchers have proposed several approaches for speech emotion classification (SEC) in recent times, but the lingering challenge of the insufficient dataset, which has been limiting the performances of these approaches, is still of major concern. Therefore, this work proposes a deep transfer learning model, a technique that has been yielding tremendous and state-of-the-art results in computer vision, for SEC. Our approach used a pre-trained and optimized model of Visual Geometry Group (VGGNet) convolutional neural network architecture with appropriate fine-tuning for optimal performance. The speech signal is converted to a mel-Spectrogram image suitable for deep learning model input (224$\,\times \,$244 x 3) using filterbanks and Fast Fourier transform (FFT) on the speech samples. Multi-layer perceptron (MLP) algorithm is adopted as a classifier after feature extraction is carried out by the deep learning model. Speech pre-processing was carried out on Toronto English Speech Set (TESS) speech emotional corpus used for the study to prevent the low performance of our proposed model. The result of our experiment after evaluation using the TESS dataset shows an improved result in SEC with an accuracy rate of 96.1% and specificity of 97.4%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speech Emotion Recognition Using Pre-trained and Fine-Tuned Transfer Learning Approaches

Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition

Article 25 August 2022

An enhanced speech emotion recognition using vision transformer

Article Open access 07 June 2024

References

Luna-Jiménez, C., Kleinlein, R., Griol, D., Callejas, Z., Montero, J., Fernández-Martínez, F.: A Proposal for Multimodal Emotion Recognition Using Aural transformer on RAVDESS. Appl. Sci. MDPI 12, 327 (2022). https://doi.org/10.3390/app12010327
Firoozabadi, A., et al.: A multi-channel speech enhancement method based on subband affine projection algorithm in combination with proposed circular nested microphone array. Appl. Sci. MDPI 10(3955), 455–464 (2021)
Google Scholar
Leem, S., Fulford, D., Onnela, J., Gard, D., BussoAuthor, C.: separation of emotional and reconstruction embeddings on ladder network to improve speech emotion recognition robustness in noisy conditions. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 1, pp. 516–520 (2021). https://doi.org/10.21437/Interspeech.
Imani, M., Montazer, G.: A survey of emotion recognition methods with emphasis on e-learning environments. J. Netw. Comput. Appl. 147. Academic Press, (2019). https://doi.org/10.1016/j.jnca.2019.102423
Lieskovská, E., Jakubec, M., Jarina, R., Chmulik, M., Olave, M.: A review on speech emotion recognition using deep learning and attention mechanism. Electronics (Switzerland) MDPI 10(10), 455–464 (2021). https://doi.org/10.10.3390/electronics10101163
Saad, F., Mahmud, H., Shaheen, M., Hasan, M., Farastu, P., Kabir, M.: is speech emotion recognition language-independent? analysis of english and bangla languages using language-independent vocal features, pp 1–9 (2021). http://arxiv.org/abs/2111.10776
Padmavathi, K., et al.: Transfer learning techniques for medical image analysis: a review. Biocybern. Biomed. Eng. 42(1), 79–107 (2022). https://doi.org/10.1016/j.bbe.2021.11.004
Article Google Scholar
Akçay, M., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers.Elsevier B.V. vol. 116, pp. 56–76 (2020). https://doi.org/10.1016/j.specom.2019.12.001
El Ayadi, M., Kamel, M., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011). https://doi.org/10.1016/j.patcog.2010.09.020
Kwon, S.: A CNN-Assisted Enhanced Audio Signal Processing. Sensors 20(1), 183(2020)
Google Scholar
Latif, S., Rana, R., Younis, S, Qadir, J., Epps, J.: Cross corpus speech emotion classification - an effective transfer learning technique (2018)
Google Scholar
Farooq, M., Hussain, F., Baloch, N., Raja, F., Yu, H., Bin-Zikria Y. : Impact of feature selection algorithm on speech emotion recognition using deep convolutional neural network. Sensors (Switzerland) 20(21), 1–18 (2020). https://doi.org/10.3390/s20216008
Lech, M., Stolar, M., Best, C., Bolia, R.: Real-time speech emotion recognition using a pre-trained image classification network: effects of bandwidth reduction and companding. Front. Comput. Sci. 2, 1–14 (2020). https://doi.org/10.3389/fcomp.2020.00014
Feng, K., Chaspari, T.: A siamese neural network with modified distance loss for transfer learning in speech emotion recognition. Sensors (2020). arXiv:2111.10776
Kamin, A., et al.: A light-weight deep convolutional neural network for speech emotion recognition using mel-spectrograms. In: Proceedings of 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP) (2019)
Google Scholar
Padi, S., Sadjadi, S., Sriram, R., Manocha, D.: Improved speech emotion recognition using transfer learning and spectrogram augmentation. Sensors, pp. 645–652 (2021). https://doi.org/10.1145/3462244.3481003
Aggarwal, A., et al.: Two-way feature extraction for speech emotion recognition using deep learning, Sensors (Switzerland), 22, 237 (2022). https://doi.org/10.3390/s22062378
Zhang, H., Gou, R., Shang, J., Shen, F., Wu, Y., Dai, G.: Pre-trained deep convolution neural network model with attention for speech emotion recognition. Front. Physiol. 12, 1–13 (2021). https://doi.org/10.3389/fphys.2021.643202
Article Google Scholar
Ortega, J., Cardinal, P., Koerich, A., Jun, L.: Emotion recognition using fusion of audio and video features. (2019). arXiv:1906.10623v1
Vatcharaphrueksadee, A., Viboonpanich, R.: VGG-16 and optimized CNN for emotion classification, 16(2), 10–15, (2020). https://ph01.tci-thaijo.org/index.php/IT-Journal/article/download/243769/165748/848686
Retta, E., Almekhlafi, E., Sutcliffe, R., Mhamed, M., Ali, H., Feng J. : Amharic speech emotion dataset and classification benchmark. (2022). arxiv:abs/2201.02710
Parra-Gallego, L., Orozco-Arroyave, J.: Classification of emotions and evaluation of customer satisfaction from speech in real world acoustic environments. Digit. Signal Process. A Rev. J. 120, 1–18 (2022). https://doi.org/10.1016/j.dsp.2021.103286
Alzubaidi, L., et al.: Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J. Big Data 8(1), 1–74 (2021). https://doi.org/10.1186/s40537-021-00444-8
Article Google Scholar
Pusarla, A., Singh, B., Tripathi, C.: Learning DenseNet features from EEG based spectrograms for subject independent emotion recognition. Biomed. Signal Process. Control, 74(1), 103485 (2022). https://doi.org/10.1016/j.bspc.2022.103485
Jia, W., Sun, M., Lian, J., Hou, S.: Feature dimensionality reduction: a review. Complex Intell. Syst. (2022). https://doi.org/10.1007/s40747-021-00637-x
Article Google Scholar
Pichora-Fuller, M., Kate, K.D.: Toronto emotional speech set (TESS), scholars portal dataverse, V1 (2020). https://doi.org/10.5683/SP2/E8H2MF
Praseetha, V., Vadivel, S.: Deep learning models for speech emotion recognition. J. Comput. Sci. 14(11), 1577–1587 (2018). https://doi.org/10.3844/jcssp.2018.1577.1587
Article Google Scholar
Krishnan, P., Joseph, A., Rajangam, V.: Emotion classification from speech signal based on empirical mode decomposition and non-linear features. Complex Intell. Syst. 7(4), 1119–1934 (2021). https://doi.org/10.1007/s40747-021-00295-z
Article Google Scholar
Venkataramanan, K., Rajamohan, H. : Emotion recognition from speech. audio and speech processing. pp 1–14 (2019). https://doi.org/10.48550/arXiv.1912.10458 arXiv:1912.10458
Blumentals, E., Salimbajevs, A., : Emotion recognition in real-world support call center data for latvian language. Jt. Proc. ACM IUI Work. Helsinki, (Finland) (2022).http://ceur-ws.org/Vol-3124/paper23.pdf

Download references

Author information

Authors and Affiliations

School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, South Africa
Samson Akinpelu & Serestina Viriri

Authors

Samson Akinpelu
View author publications
You can also search for this author in PubMed Google Scholar
Serestina Viriri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Serestina Viriri .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
University of Illinois Urbana-Champaign, Urbana, IL, USA
Bo Li
National University of Singapore, Singapore, Singapore
Angela Yao
Microsoft Research Asia, Beijing, China
Yang Liu
University of Missouri, Columbia, MO, USA
Ye Duan
City University of Hong Kong, Kowloon, Hong Kong
Manfred Lau
Idaho National Laboratory, Idaho Falls, ID, USA
Rajiv Khadka
Salesforce, Seattle, WA, USA
Ana Crisan
Tufts University, Medford, MA, USA
Remco Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Akinpelu, S., Viriri, S. (2022). A Robust Deep Transfer Learning Model for Accurate Speech Emotion Classification. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2022. Lecture Notes in Computer Science, vol 13599. Springer, Cham. https://doi.org/10.1007/978-3-031-20716-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-20716-7_33
Published: 10 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20715-0
Online ISBN: 978-3-031-20716-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Robust Deep Transfer Learning Model for Accurate Speech Emotion Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Emotion Recognition Using Pre-trained and Fine-Tuned Transfer Learning Approaches

Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition

An enhanced speech emotion recognition using vision transformer

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Robust Deep Transfer Learning Model for Accurate Speech Emotion Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Emotion Recognition Using Pre-trained and Fine-Tuned Transfer Learning Approaches

Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition

An enhanced speech emotion recognition using vision transformer

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation