Speech Based Emotion Recognition Using Machine Learning
Speech Based Emotion Recognition Using Machine Learning
https://doi.org/10.22214/ijraset.2023.50255
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Abstract: Speech-based emotion recognition is a developing field that has attracted a lot of interest lately. We suggest a machine
learning method for identifying emotions from speech samples in this article. From the speech examples, we extract acoustic
features that we then use to train and test a variety of machine learning methods, such as decision trees, support vector
machines, and neural networks. Using a dataset of speech samples with emotional labels that is publicly accessible, we assess
how well these models perform. The experimental findings demonstrate that, with an accuracy of 87%, the neural network model
beats the other models. Applications for the suggested method include human-computer interaction, education, and the
diagnosis of mental illnesses. Overall, this article makes a contribution to the improvement of speech-based emotion recognition
systems.
I. INTRODUCTION
Speech-based emotion recognition is an active area of research in the field of human-computer interaction. Recognizing emotions
from speech is important for a range of applications, including mental health diagnosis, education, and entertainment. Many studies
have been conducted on thistopic, but there is still a need for more accurate and reliable emotion recognition systems. Machine
learning has emerged as a promising approach for speech-based emotion recognition due to its ability to learn patterns from data and
adapt to new situations.
We suggest a machine learning method for identifying emotions from speech samples in this article. Then we extract acoustic
features from the speech samples and use these features to train and evaluate several machine learning algorithms. We evaluate the
performance of these models using a publicly available dataset of speech samples labeled with emotions. We also compare our
approach with existing methods in the literature.
The major contributions of this paper are the experimental assessment of this approach on a dataset that is freely available, as well
as the proposal of a novel machine learning approach for speech-based emotion recognition. Our findings demonstrate that the
proposed method works better in terms of accuracy than current approaches. The proposed approach has the potential to be used in
various applications, and can contribute to the development of more accurate and reliable speech-based emotion recognition
systems.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1182
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
But, behind facial cues, audio cues are the most frequently used information to determine an individual's emotional state. We
merged all the techniques into a single input vector in order to increase the identification rate. Because these techniques are more
frequently employed in speech recognition and have high recognition rates, we decided to use the coefficients MFCC, ZCR, and
TEO in our investigation. And secondly, we suggested using an auto-encoder to minimise the input vector dimensions in order to
optimise our system. Support vector machines were employed (SVM).
Using the RML database, our system is assessed. Ankitha Mathew Chinnu [3] One of the most often used study areas is speech
processing. There are numerous researchers working on various speech processing systems all over the world. In 1920, the company
Radio Rex produced a celluloid toy that marked the beginning of speech processing. This toy was the first speech recognition device
that relied on the 500 Hz acoustic energy that the vowel "Rex" releases. The earliest speech recognition system, created in 1952 by
Davis at Bell Laboratories in the US, could identify male voiced numbers from 0 to 9. The obstacles associated with speech
processing, such as continuous speech recognition and emotion recognition, were too great for researchers to overcome. [2] S .
Padmaja Karthik, In recent times, the significance of understanding human speech emotions has grown in order to enhance the
effectiveness and naturalness of human-machine interactions. The difficulty in differentiating performed and natural emotions
makes it a very difficult task to recognise human emotions. In order to correctly determine emotions, experiments have been done to
extract the spectral and prosodic elementsWeprovided an explanation of the classification of emotions based on calculated bytes
utilising human speaking utterance. How to categorise the gender using estimated pitch from human voice was explained by Chiu
Ying Lay et al. Acoustic cues from speech can be extracted in order to identify emotions and classify them, according to Chang-
Hyun Park et al. Nobuo Sato et al. provided information on the MFCC technique. Their primary goal was to use MFCC on human
speech and accurately classify emotions with over 67% accuracy. In an effort to improve accuracy, Yixiong Pan et al. applied
Support Vector Machines (SVM) to the problem of emotion classification. With more than 60% accuracy, Support vector machines
in neural networks were used by Keshi Dai et al. to recognise emotions. The implementation of speech-based emotion recognition
utilising machine learning and deep learning concepts has been the subject of numerous articles. Humans have a wide range of
heterogeneity in their capacity to identify emotion. It's crucial to remember that there are many sources of "ground truth," or details
about what the real emotion is, when learning about automated emotion recognition. Take into account that we are trying to
determine Alex's emotions.
"What would most people say that Alex is feeling?" is one source. The "truth" in this case may not be what Alex feels, but it may
be what the majority of people would assume Alex thinks. For instance, Alex might appear pleased even when he's truly feeling
depressed, but most people will mistake it for happiness. Even if an automated technique does not truly represent Alex's feelings, it
may be regarded accurate if it produces results that are comparable to those of a group of observers. You can also find out the
"truth" by asking Alex how he really feels.
This works if Alex is conscious of his internal state, is interested in conveying it to you, and is able to express it precisely in words
or numbers. Yet, some people with alexithymia lack a strong awareness of their internal emotions or are unable to express them
clearly through words and numbers. . In general, determining what emotion is actually present can be difficult, depend on the
criteria that are chosen, and typically require retaining a certain amount of uncertainty. Due to this, we decided to examine the
effectiveness of three alternative classifiers in this instance. Both regression and classification issues can be solved using the
machine learning approach known as multivariate linear regression classification (MLR) [6]Gaurav Sahu , We used Machnine
Learning models like Random forest,gradient boosting,Support Vector Machnies and Multinomial Naïve Bayes, Logistic
Regression models to extract the emotion from the audio.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1183
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
A. Training
KNN classifier with standard-scaler is used to train embeddings Libraries used – sklearn.K Fold Cross validation with different
scaling methods, splits are also experimented.
we I experimented with different CNN models created embeddings from pretrained
B. Data visualization
Count of files
20000
15000
10000
5000 Count of files
0
anger…
disgust
sadness
joy
surprise
Neutral
fear
Fig.2 After Pre-Processing models like VGGISH, edg13. But open13 with KNN classifier gave good results.
IV. CONCLUSION
We suggested a machine learning method for identifying emotions from speech samples in this article. Using an openly accessible
dataset of speech samples that had been emotionally labeled, we tested the effectiveness of these models. Our findings
Fig.3 Result1
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1184
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
The main contribution of this paper is the development of a new machine learning approach for speech-based emotion recognition
that can be used in various applications, including mental health diagnosis, education, and entertainment. Our approach has the
potential to improve the accuracy and reliability of speech-based emotion recognition systems, which is important for these
applications.In conclusion, our proposed approach shows promising results for recognizing emotions from speech samples using
machine learning.
Fig.4 Result1
REFERENCES
[1] Sukanya anil Kulkarni , “Speech based Emotion Recognition machine Learning “ March 2019
[2] Mahalakshmi Selvaraj, R.Bhuva, S.Padmaja Karthik ,”Human Speech emotion recognition “ February 2016
[3] Amitha Khan K H, AnikithaChinnu Mathew, Ansu Raju, Navya Lekshmi M, Raveena R Maranagttu, Rani Saratha R ,”Speech Emotion Recognition Using
Machnine Learning” 2021
[4] Vaibhav K.P,Parth J.M, Bhavana H.K, Akanksha S.S, ”Speech Based Emotion Recognition Using Machnie Learning” , 2021
[5] Elsevier B.V , “Speech Emotion Recognition with Deep Learning” , Procedia Computer ,2020
[6] Gaurav Saahu ,”Multimodal Speech Emotion Recognition and Ambiguity Resolution”,2019
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1185