Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Robust Deep Transfer Learning Model for Accurate Speech Emotion Classification

  • Conference paper
  • First Online:
Advances in Visual Computing (ISVC 2022)

Abstract

The significant role of emotion in human daily interaction cannot be over-emphasized, however, the pressing demand for a cutting-edge and highly efficient model for the classification of speech emotion in effective computing has remained a challenging task. Researchers have proposed several approaches for speech emotion classification (SEC) in recent times, but the lingering challenge of the insufficient dataset, which has been limiting the performances of these approaches, is still of major concern. Therefore, this work proposes a deep transfer learning model, a technique that has been yielding tremendous and state-of-the-art results in computer vision, for SEC. Our approach used a pre-trained and optimized model of Visual Geometry Group (VGGNet) convolutional neural network architecture with appropriate fine-tuning for optimal performance. The speech signal is converted to a mel-Spectrogram image suitable for deep learning model input (224\(\,\times \,\)244 x 3) using filterbanks and Fast Fourier transform (FFT) on the speech samples. Multi-layer perceptron (MLP) algorithm is adopted as a classifier after feature extraction is carried out by the deep learning model. Speech pre-processing was carried out on Toronto English Speech Set (TESS) speech emotional corpus used for the study to prevent the low performance of our proposed model. The result of our experiment after evaluation using the TESS dataset shows an improved result in SEC with an accuracy rate of 96.1% and specificity of 97.4%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Luna-Jiménez, C., Kleinlein, R., Griol, D., Callejas, Z., Montero, J., Fernández-Martínez, F.: A Proposal for Multimodal Emotion Recognition Using Aural transformer on RAVDESS. Appl. Sci. MDPI 12, 327 (2022). https://doi.org/10.3390/app12010327

  2. Firoozabadi, A., et al.: A multi-channel speech enhancement method based on subband affine projection algorithm in combination with proposed circular nested microphone array. Appl. Sci. MDPI 10(3955), 455–464 (2021)

    Google Scholar 

  3. Leem, S., Fulford, D., Onnela, J., Gard, D., BussoAuthor, C.: separation of emotional and reconstruction embeddings on ladder network to improve speech emotion recognition robustness in noisy conditions. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 1, pp. 516–520 (2021). https://doi.org/10.21437/Interspeech.

  4. Imani, M., Montazer, G.: A survey of emotion recognition methods with emphasis on e-learning environments. J. Netw. Comput. Appl. 147. Academic Press, (2019). https://doi.org/10.1016/j.jnca.2019.102423

  5. Lieskovská, E., Jakubec, M., Jarina, R., Chmulik, M., Olave, M.: A review on speech emotion recognition using deep learning and attention mechanism. Electronics (Switzerland) MDPI 10(10), 455–464 (2021). https://doi.org/10.10.3390/electronics10101163

  6. Saad, F., Mahmud, H., Shaheen, M., Hasan, M., Farastu, P., Kabir, M.: is speech emotion recognition language-independent? analysis of english and bangla languages using language-independent vocal features, pp 1–9 (2021). http://arxiv.org/abs/2111.10776

  7. Padmavathi, K., et al.: Transfer learning techniques for medical image analysis: a review. Biocybern. Biomed. Eng. 42(1), 79–107 (2022). https://doi.org/10.1016/j.bbe.2021.11.004

    Article  Google Scholar 

  8. Akçay, M., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers.Elsevier B.V. vol. 116, pp. 56–76 (2020). https://doi.org/10.1016/j.specom.2019.12.001

  9. El Ayadi, M., Kamel, M., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011). https://doi.org/10.1016/j.patcog.2010.09.020

  10. Kwon, S.: A CNN-Assisted Enhanced Audio Signal Processing. Sensors 20(1), 183(2020)

    Google Scholar 

  11. Latif, S., Rana, R., Younis, S, Qadir, J., Epps, J.: Cross corpus speech emotion classification - an effective transfer learning technique (2018)

    Google Scholar 

  12. Farooq, M., Hussain, F., Baloch, N., Raja, F., Yu, H., Bin-Zikria Y. : Impact of feature selection algorithm on speech emotion recognition using deep convolutional neural network. Sensors (Switzerland) 20(21), 1–18 (2020). https://doi.org/10.3390/s20216008

  13. Lech, M., Stolar, M., Best, C., Bolia, R.: Real-time speech emotion recognition using a pre-trained image classification network: effects of bandwidth reduction and companding. Front. Comput. Sci. 2, 1–14 (2020). https://doi.org/10.3389/fcomp.2020.00014

  14. Feng, K., Chaspari, T.: A siamese neural network with modified distance loss for transfer learning in speech emotion recognition. Sensors (2020). arXiv:2111.10776

  15. Kamin, A., et al.: A light-weight deep convolutional neural network for speech emotion recognition using mel-spectrograms. In: Proceedings of 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP) (2019)

    Google Scholar 

  16. Padi, S., Sadjadi, S., Sriram, R., Manocha, D.: Improved speech emotion recognition using transfer learning and spectrogram augmentation. Sensors, pp. 645–652 (2021). https://doi.org/10.1145/3462244.3481003

  17. Aggarwal, A., et al.: Two-way feature extraction for speech emotion recognition using deep learning, Sensors (Switzerland), 22, 237 (2022). https://doi.org/10.3390/s22062378

  18. Zhang, H., Gou, R., Shang, J., Shen, F., Wu, Y., Dai, G.: Pre-trained deep convolution neural network model with attention for speech emotion recognition. Front. Physiol. 12, 1–13 (2021). https://doi.org/10.3389/fphys.2021.643202

    Article  Google Scholar 

  19. Ortega, J., Cardinal, P., Koerich, A., Jun, L.: Emotion recognition using fusion of audio and video features. (2019). arXiv:1906.10623v1

  20. Vatcharaphrueksadee, A., Viboonpanich, R.: VGG-16 and optimized CNN for emotion classification, 16(2), 10–15, (2020). https://ph01.tci-thaijo.org/index.php/IT-Journal/article/download/243769/165748/848686

  21. Retta, E., Almekhlafi, E., Sutcliffe, R., Mhamed, M., Ali, H., Feng J. : Amharic speech emotion dataset and classification benchmark. (2022). arxiv:abs/2201.02710

  22. Parra-Gallego, L., Orozco-Arroyave, J.: Classification of emotions and evaluation of customer satisfaction from speech in real world acoustic environments. Digit. Signal Process. A Rev. J. 120, 1–18 (2022). https://doi.org/10.1016/j.dsp.2021.103286

  23. Alzubaidi, L., et al.: Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J. Big Data 8(1), 1–74 (2021). https://doi.org/10.1186/s40537-021-00444-8

    Article  Google Scholar 

  24. Pusarla, A., Singh, B., Tripathi, C.: Learning DenseNet features from EEG based spectrograms for subject independent emotion recognition. Biomed. Signal Process. Control, 74(1), 103485 (2022). https://doi.org/10.1016/j.bspc.2022.103485

  25. Jia, W., Sun, M., Lian, J., Hou, S.: Feature dimensionality reduction: a review. Complex Intell. Syst. (2022). https://doi.org/10.1007/s40747-021-00637-x

    Article  Google Scholar 

  26. Pichora-Fuller, M., Kate, K.D.: Toronto emotional speech set (TESS), scholars portal dataverse, V1 (2020). https://doi.org/10.5683/SP2/E8H2MF

  27. Praseetha, V., Vadivel, S.: Deep learning models for speech emotion recognition. J. Comput. Sci. 14(11), 1577–1587 (2018). https://doi.org/10.3844/jcssp.2018.1577.1587

    Article  Google Scholar 

  28. Krishnan, P., Joseph, A., Rajangam, V.: Emotion classification from speech signal based on empirical mode decomposition and non-linear features. Complex Intell. Syst. 7(4), 1119–1934 (2021). https://doi.org/10.1007/s40747-021-00295-z

    Article  Google Scholar 

  29. Venkataramanan, K., Rajamohan, H. : Emotion recognition from speech. audio and speech processing. pp 1–14 (2019). https://doi.org/10.48550/arXiv.1912.10458arXiv:1912.10458

  30. Blumentals, E., Salimbajevs, A., : Emotion recognition in real-world support call center data for latvian language. Jt. Proc. ACM IUI Work. Helsinki, (Finland) (2022).http://ceur-ws.org/Vol-3124/paper23.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Serestina Viriri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Akinpelu, S., Viriri, S. (2022). A Robust Deep Transfer Learning Model for Accurate Speech Emotion Classification. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2022. Lecture Notes in Computer Science, vol 13599. Springer, Cham. https://doi.org/10.1007/978-3-031-20716-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20716-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20715-0

  • Online ISBN: 978-3-031-20716-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics