Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Speaker and gender dependencies in within/cross linguistic Speech Emotion Recognition

Published: 25 August 2023 Publication History

Abstract

In this difficult period and with the great influence of COVID-19 on many aspects of people’s lives, many areas have been affected such as economy, tourism and especially issues related to the medical field. For example in healthcare, a lot of people suffered from psychological and emotional disorders. Speech Emotion Recognition (SER) seems to be useful for different medical teams to understand the emotional state of their patients. The central contribution of this research is the creation of new features called Stationary Mel Frequency Cepstral Coefficients (SMFCC) and Discrete Mel Frequency Cepstral Coefficients (DMFCC) through the use of Multilevel Wavelet Transform (MWT) and conventional MFCC features. The proposed method was evaluated in different patterns: Within/Cross-language, Speaker-Dependency and Gender-Dependency. Recognition rates of 91.4%, 74.4% and 80,8% were reached for EMO-DB (German), RAVDESS (English) and EMOVO (Italian) target databases, respectively, in Speaker-dependent (SD) experiments for both genders (female and male). Therefore, the conclusive performance matrix is mentioned below to provide additional information on the model’s performance in the various experiments performed. The experimental results show that the proposed SER system outperforms other previous SER studies.

References

[1]
Ahmed ST, Singh DK, Basha SM, Abouel Nasr E, Kamrani AK, and Aboudaif MK Neural network based mental depression identification and sentiments classification technique from speech signals: A covid-19 focused pandemic study Frontiers in Public Health 2021 9
[2]
Akil, S., Sekkate, S., & Adib, A. (2021). Feature selection based on machine learning for credit scoring: An evaluation of filter and embedded methods. In 2021 International conference on innovations in intelligent systems and applications (INISTA) (pp. 1–6). IEEE.
[3]
Ancilin J and Milton A Improved speech emotion recognition with Mel frequency magnitude coefficient Applied Acoustics 2021 179
[4]
Assunção G, Menezes P, and Perdigão F Speaker awareness for speech emotion recognition International Journal of Online and Biomedical Engineering 2020 16 4 15-22
[5]
Bhavan A, Chauhan P, Shah RR, et al. Bagged support vector machines for emotion recognition from speech Knowledge-Based Systems 2019 184
[6]
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B, et al. A database of German emotional speech Interspeech 2005 5 1517-1520
[7]
Burrus CS, Gopinath RA, Guo H, Odegard JE, and Selesnick IW Introduction to wavelets and wavelet transforms: A primer 1997 Pentice Hall
[8]
Chakhtouna, A., Sekkate, S., & Adib, A. (2021). Improving speech emotion recognition system using spectral and prosodic features. In 2021 International conference on intelligent systems design and applications (ISDA) (pp. 1–10). Springer.
[9]
Chakhtouna, A., Sekkate, S., & Adib, A. (2022). Improving speaker-dependency/independency of wavelet-based speech emotion recognition. In Emerging trends in intelligent systems & network security (pp. 281–291). Springer.
[10]
Chakhtouna, A., Sekkate, S., & Adib, A. (2023). Speech emotion recognition using pre-trained and fine-tuned transfer learning approaches. In Innovations in smart cities applications volume 6: The proceedings of the 7th international conference on smart city applications (pp. 365–374). Springer.
[11]
Cortes C and Vapnik V Support-vector networks Machine Learning 1995 20 3 273-297
[12]
Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). Emovo corpus: An Italian emotional speech database. In International conference on language resources and evaluation (LREC 2014) (pp. 3501–3504). European Language Resources Association (ELRA).
[13]
Davis S and Mermelstein P Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Transactions on Acoustics, Speech, and Signal Processing 1980 28 4 357-366
[14]
Dissanayake T, Rajapaksha Y, Ragel R, and Nawinne I An ensemble learning approach for electrocardiogram sensor based human emotion recognition Sensors 2019 19 20 4495
[15]
Evain S, Lecouteux B, Schwab D, Contesse A, Pinchaud A, and Bernardoni NH Human beatbox sound recognition using an automatic speech recognition toolkit Biomedical Signal Processing and Control 2021 67
[16]
Eyben F, Scherer KR, Schuller BW, Sundberg J, André E, Busso C, Devillers LY, Epps J, Laukka P, Narayanan SS, et al. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing IEEE Transactions on Affective Computing 2015 7 2 190-202
[17]
Grossmann A and Morlet J Decomposition of hardy functions into square integrable wavelets of constant shape SIAM Journal on Mathematical Analysis 1984 15 4 723-736
[18]
Issa D, Demirci MF, and Yazici A Speech emotion recognition with deep convolutional neural networks Biomedical Signal Processing and Control 2020 59
[19]
Janse PV, Magre SB, Kurzekar PK, and Deshmukh R A comparative study between MFCC and DWT feature extraction technique International Journal of Engineering Research and Technology 2014 3 1 3124-3127
[20]
Kanwal S and Asghar S Speech emotion recognition using clustering based GA-optimized feature set IEEE Access 2021 9 125830-125842
[21]
Karimi S and Sedaaghi MH Robust emotional speech classification in the presence of babble noise International Journal of Speech Technology 2013 16 2 215-227
[22]
Khalil M, Adib A, et al. An end-to-end multi-level wavelet convolutional neural networks for heart diseases diagnosis Neurocomputing 2020 417 187-201
[23]
Kishore, K. K., & Satish, P. K. (2013). Emotion recognition in speech using MFCC and wavelet features. In 2013 3rd IEEE international advance computing conference (IACC) (pp. 842–847). IEEE.
[24]
Kockmann M, Burget L, et al. Application of speaker-and language identification state-of-the-art techniques for emotion recognition Speech Communication 2011 53 9–10 1172-1185
[25]
Kurpukdee, N., Kasuriya, S., Chunwijitra, V., Wutiwiwatchai, C., & Lamsrichan, P. (2017). A study of support vector machines for emotional speech recognition. In 2017 8th international conference of information and communication technology for embedded systems (IC-ICTES) (pp. 1–6). IEEE.
[26]
Kursa MB, Rudnicki WR, et al. Feature selection with the Boruta package Journal of Statistical Software 2010 36 11 1-13
[27]
Lalitha S, Tripathi S, and Gupta D Enhanced speech emotion detection using deep neural networks International Journal of Speech Technology 2019 22 3 497-510
[28]
Latif, S., Qayyum, A., Usman, M., & Qadir, J. (2018). Cross lingual speech emotion recognition: Urdu vs. Western languages. In 2018 International conference on frontiers of information technology (FIT) (pp. 88–93). IEEE.
[29]
Livingstone SR and Russo FA The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English PLoS ONE 2018 13 5
[30]
McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., & Nieto, O. (2015). Librosa: Audio and music signal analysis in Python. In Proceedings of the 14th Python in science conference (Vol. 8, pp. 18–25). Citeseer.
[31]
Naing HMS, Hidayat R, Hartanto R, and Miyanaga Y Discrete wavelet denoising into MFCC for noise suppressive in automatic speech recognition system International Journal of Intelligent Engineering and Systems 2020 13 2 74-82
[32]
Nwe TL, Foo SW, and De Silva LC Speech emotion recognition using hidden Markov models Speech Communication 2003 41 4 603-623
[33]
Praksah C and Gaikwad V Analysis of emotion recognition system through speech signal using KNN, GMM & SVM classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) 2015 10 2 55-67
[34]
Ramakrishnan S and El Emary IM Speech emotion recognition approaches in human computer interaction Telecommunication Systems 2013 52 3 1467-1478
[35]
Ramya J, Vijaylakshmi H, and Saifuddin HM Segmentation of skin lesion images using discrete wavelet transform Biomedical Signal Processing and Control 2021 69
[36]
Rao KS, Koolagudi SG, and Vempada RR Emotion recognition from speech using global and local prosodic features International Journal of Speech Technology 2013 16 2 143-160
[37]
Riyad M, Khalil M, and Adib A A novel multi-scale convolutional neural network for motor imagery classification Biomedical Signal Processing and Control 2021 68
[38]
Rybka, J., & Janicki, A. (2013). Comparison of speaker dependent and speaker independent emotion recognition. International Journal of Applied Mathematics and Computer Science,23(4).
[39]
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., & Narayanan, S. (2010). The interspeech 2010 paralinguistic challenge. In Proceedings of INTERSPEECH 2010 (pp. 2794–2797).
[40]
Sekkate, S., Khalil, M., & Adib, A. (2017). Speaker identification: A way to reduce call-sign confusion events. In 2017 International conference on advanced technologies for signal and image processing (ATSIP) (pp. 1–6). IEEE.
[41]
Sekkate, S., Khalil, M., Adib, A., & Ben Jebara, S. (2019a). A multiresolution-based fusion strategy for improving speech emotion recognition efficiency. In International conference on mobile, secure, and programmable networking (pp. 96–109). Springer.
[42]
Sekkate S, Khalil M, Adib A, and Ben Jebara S An investigation of a feature-level fusion for noisy speech emotion recognition Computers 2019 8 4 91
[43]
Sharma R, Pachori RB, and Sircar P Automated emotion recognition based on higher order statistics and deep learning algorithm Biomedical Signal Processing and Control 2020 58
[44]
Shensa MJ et al. The discrete wavelet transform: Wedding the a Trous and Mallat algorithms IEEE Transactions on Signal Processing 1992 40 10 2464-2482
[45]
Sönmez YÜ and Varol A A speech emotion recognition model based on multi-level local binary and local ternary patterns IEEE Access 2020 8 190784-190796
[46]
Sun Y, Wen G, and Wang J Weighted spectral features based on local Hu moments for speech emotion recognition Biomedical Signal Processing and Control 2015 18 80-90
[47]
Tan Y, Sun Z, Duan F, Solé-Casals J, and Caiafa CF A multimodal emotion recognition method based on facial expressions and electroencephalography Biomedical Signal Processing and Control 2021 70
[48]
Tuncer T, Dogan S, and Acharya UR Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques Knowledge-Based Systems 2021 211
[49]
Upadhya SS, Cheeran A, and Nirmal JH Thomson multitaper MFCC and PLP voice features for early detection of Parkinson disease Biomedical Signal Processing and Control 2018 46 293-301
[50]
Wang K, Su G, Liu L, and Wang S Wavelet packet analysis for speaker-independent emotion recognition Neurocomputing 2020 398 257-264
[51]
Zehra, W., Javed, A. R., Jalil, Z., Khan, H. U., & Gadekallu, T. R. (2021). Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex & Intelligent Systems, 1–10.
[52]
Zhao J, Mao X, and Chen L Speech emotion recognition using deep 1d & 2d CNN LSTM networks Biomedical Signal Processing and Control 2019 47 312-323
[53]
Zhu L, Chen L, Zhao D, Zhou J, and Zhang W Emotion recognition from Chinese speech for smart affective services using a combination of SVM and DBN Sensors 2017 17 7 1694

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of Speech Technology
International Journal of Speech Technology  Volume 26, Issue 3
Sep 2023
251 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 25 August 2023
Accepted: 03 August 2023
Received: 10 July 2022

Author Tags

  1. Speech emotion recognition
  2. Within/cross-language
  3. Speaker-dependency
  4. Gender-dependency
  5. Multilevel wavelet transform
  6. SMFCC
  7. DMFCC
  8. MFCC
  9. SVM

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media