Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
review-article
Free access

Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends

Published: 24 April 2018 Publication History

Abstract

Tracing 20 years of progress in making machines hear our emotions based on speech signal properties.

References

[1]
Abdelwahab, M. and Busso, C. Supervised domain adaptation for emotion recognition from speech. In Proceedings of ICASSP. (Brisbane, Australia, 2015). IEEE, 5058--5062.
[2]
Anagnostopoulos, C.-N., Iliou, T. and Giannoukos, I. Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artificial Intelligence Review 43, 2 (2015), 155--177.
[3]
Bhaykar, M., Yadav, J. and Rao, K.S. Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM. In Proceedings of the National Conference on Communications. (Delhi, India, 2013). IEEE, 1--5.
[4]
Blanton, S. The voice and the emotions. Q. Journal of Speech 1, 2 (1915), 154--172.
[5]
Chang, J. and Scherer, S. Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks. arxiv.org, (arXiv:1705.02394), 2017.
[6]
Chen, L., Mao, X., Xue, Y. and Cheng, L.L. Speech emotion recognition: Features and classification models. Digital Signal Processing 22, 6 (2012), 1154--1160.
[7]
Cibau, N.E., Albornoz. E.M., and Rufiner, H.L. Speech emotion recognition using a deep autoencoder. San Carlos de Bariloche, Argentina, 2013, 934--939.
[8]
Darwin, C. The Expression of Emotion in Man and Animals. Watts, 1948.
[9]
Davis, A., Rubinstein, M., Wadhwa, N., Mysore, G. J., Durand, F. and Freeman, W.T. The visual microphone: Passive recovery of sound from video. ACM Trans. Graphics 33, 4 (2014), 1--10.
[10]
Dellaert, F., Polzin, T. and Waibel, A. Recognizing emotion in speech. In Proceedings of ICSLP 3, (Philadelphia, PA, 1996). IEEE, 1970--1973.
[11]
Deng, J. Feature Transfer Learning for Speech Emotion Recognition. PhD thesis, Dissertation, Technische Universität München, Germany, 2016.
[12]
Deng, J., Xu, X., Zhang, Z., Frühholz, S., and Schuller B. Semisupervised Autoencoders for Speech Emotion Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 1 (2018), 31--43.
[13]
Devillers, L., Vidrascu, L. and Lamel, L. Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18, 4 (2005), 407--422.
[14]
Dhall, A., Goecke, R., Joshi, J., Sikka, K. and Gedeon, T. Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of ICMI (Istanbul, Turkey, 2014). ACM, 461--466.
[15]
El Ayadi, M., Kamel, M.S., and Karray, F. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44, 3 (2011), 572--587.
[16]
Fairbanks, G. and Pronovost, W. Vocal pitch during simulated emotion. Science 88, 2286 (1938), 382--383.
[17]
Gunes, H. and Schuller, B. Categorical and dimensional affect analysis in continuous input: Current trends and future directions. Image and Vision Computing 31, 2 (2013), 120--136.
[18]
Joachims, T. Learning to classify text using support vector machines: Methods, theory and algorithms. Kluwer Academic Publishers, 2002.
[19]
Kim, Y., Lee, H. and Provost, E.M. Deep learning for robust feature generation in audiovisual emotion recognition. In Proceedings of ICASSP, (Vancouver, Canada, 2013). IEEE, 3687--3691.
[20]
Koolagudi, S.G. and Rao, K.S. Emotion recognition from speech: A review. Intern. J. of Speech Technology 15, 2 (2012), 99--117.
[21]
Kramer, E. Elimination of verbal cues in judgments of emotion from voice. The J. Abnormal and Social Psychology 68, 4 (1964), 390.
[22]
Kraus, M.W. Voice-only communication enhances empathic accuracy. American Psychologist 72, 7 (2017), 644.
[23]
Lee, C.M., Narayanan, S.S., and Pieraccini, R. Combining acoustic and language information for emotion recognition. In Proceedings of INTERSPEECH, (Denver, CO, 2002). ISCA, 873--876.
[24]
Leng, Y., Xu, X., and Qi, G. Combining active learning and semi-supervised learning to construct SVM classifier. Knowledge-Based Systems 44 (2013), 121--131.
[25]
Liu, J., Chen, C., Bu, J., You, M. and Tao, J. Speech emotion recognition using an enhanced co-training algorithm. In Proceedings ICME. (Beijing, P.R. China, 2007). IEEE, 999--1002.
[26]
Lotfian, R. and Busso, C. Emotion recognition using synthetic speech as neutral reference. In Proceedings of ICASSP. (Brisbane, Australia, 2015). IEEE, 4759--4763.
[27]
Mao, Q., Dong, M., Huang, Z. and Zhan, Y. Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans. Multimedia 16, 8 (2014), 2203--2213.
[28]
Marsella, S. and Gratch, J. Computationally modeling human emotion. Commun. ACM 57, 12 (Dec. 2014), 56--67.
[29]
Picard, R.W. and Picard, R. Affective Computing, vol. 252. MIT Press Cambridge, MA, 1997.
[30]
Ram, C.S. and Ponnusamy, R. Assessment on speech emotion recognition for autism spectrum disorder children using support vector machine. World Applied Sciences J. 34, 1 (2016), 94--102.
[31]
Schmitt, M., Ringeval, F. and Schuller, B. At the border of acoustics and linguistics: Bag-of-audio-words for the recognition of emotions in speech. In Proceedings of INTERSPEECH. (San Francisco, CA, 2016). ISCA, 495--499.
[32]
Schuller, B. and Batliner, A. Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley, 2013.
[33]
Schuller, B, Mousa, A. E.-D., and Vasileios, V. Sentiment analysis and opinion mining: On optimal parameters and performances. WIREs Data Mining and Knowledge Discovery (2015), 5:255--5:263.
[34]
Soskin, W.F. and Kauffman, P.E. Judgment of emotion in word-free voice samples. J. of Commun. 11, 2 (1961), 73--80.
[35]
Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G. and Schuller, B. Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In Proceedings of ICASSP. (Prague, Czech Republic, 2011). IEEE,5688--5691.
[36]
Tosa, N. and Nakatsu, R. Life-like communication agent-emotion sensing character 'MIC' and feeling session character 'MUSE.' In Proceedings of the 3rd International Conference on Multimedia Computing and Systems. (Hiroshima, Japan, 1996). IEEE, 12--19.
[37]
Trigeorgis, G., Ringeval, F., Brückner, R., Marchi, E., Nicolaou, M., Schuller, B. and Zafeiriou, S. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In Proceedings of ICASSP. (Shanghai, P.R. China, 2016). IEEE, 5200--5204.
[38]
Ververidis, D. and Kotropoulos, C. Emotional speech recognition: Resources, features, and methods. Speech Commun. 48, 9 (2006), 1162--1181.
[39]
Watson, D., Clark, L.A., and Tellegen, A. Development and validation of brief measures of positive and negative affect: the PANAS scales. J. of Personality and Social Psychology 54, 6 (1988), 1063.
[40]
Weninger, F., Eyben, F., Schuller, B.W., Mortillaro, M., and Scherer, K.R. On the acoustics of emotion in audio: What speech, music and sound have in common. Frontiers in Psychology 4, Article ID 292 (2013), 1--12.
[41]
Williamson, J. Speech analyzer for analyzing pitch or frequency perturbations in individual speech pattern to determine the emotional state of the person. U.S. Patent 4,093,821, 1978.
[42]
Wöllmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E. and Cowie, R. Abandoning emotion classes--- Towards continuous emotion recognition with modeling of long-range dependencies. In Proceedings of INTERSPEECH. (Brisbane, Australia, 2008). ISCA, 597--600.
[43]
Zeng, Z., Pantic, M., Roisman, G.I., and Huang, T.S. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. Pattern Analysis and Machine Intelligence 31, 1 (2009), 39--58.

Cited By

View all
  • (2024)Unlocking Human EmotionsMachine and Deep Learning Techniques for Emotion Detection10.4018/979-8-3693-4143-8.ch008(167-183)Online publication date: 22-Mar-2024
  • (2024)Enhancing Healthcare Integration With IoT for Seamless and Responsive Patient CareFuture of AI in Biomedicine and Biotechnology10.4018/979-8-3693-3629-8.ch008(147-181)Online publication date: 21-Jun-2024
  • (2024)Speech Emotion Recognition With Osmotic ComputingAdvanced Applications in Osmotic Computing10.4018/979-8-3693-1694-8.ch006(90-112)Online publication date: 29-Mar-2024
  • Show More Cited By

Index Terms

  1. Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends

    Recommendations

    Reviews

    Jonathan P. E. Hodgson

    The two decades referred to in the subtitle essentially span the time since the publication of Picard's foundational Affective computing [1], which began the study of emotion recognition by computers. This paper can therefore be viewed as a comprehensive review of emotion recognition in speech. The author begins by laying out an overall view of the process. In gross terms, the process has four components. First, one chooses the model for emotions, either discrete classes or a value continuous dimensional view composed of axes for arousal and positivity. Then one acquires labeled data. Following this, features are selected that are then fed into a learning system. Initially, the labeling of the data required extensive human intervention with the ambiguities that this implies, but now systems exist where the machine can learn to label the data with some human intervention. This is an iterative process where human advice is used to learn labels. Features can be chunks of audio rather than just words. It is also important to take into account the speaker's states and traits beyond the emotion of interest. The author summarizes the results of recent speech emotion recognition (SER) challenge events in a useful table. Finally, the author considers challenges that the SER community could undertake. Going beyond the recognition of irony or sarcasm, the author suggests what he calls a "moonshot challenge" to target the actual emotion of the speaker. The review illuminates a fascinating area and leaves the reader eager for more. There is a comprehensive bibliography.

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image Communications of the ACM
    Communications of the ACM  Volume 61, Issue 5
    May 2018
    104 pages
    ISSN:0001-0782
    EISSN:1557-7317
    DOI:10.1145/3210350
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 April 2018
    Published in CACM Volume 61, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Review-article
    • Popular
    • Refereed

    Funding Sources

    • European Union's HORIZON 2020 Framework Programme

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)672
    • Downloads (Last 6 weeks)88
    Reflects downloads up to 18 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Unlocking Human EmotionsMachine and Deep Learning Techniques for Emotion Detection10.4018/979-8-3693-4143-8.ch008(167-183)Online publication date: 22-Mar-2024
    • (2024)Enhancing Healthcare Integration With IoT for Seamless and Responsive Patient CareFuture of AI in Biomedicine and Biotechnology10.4018/979-8-3693-3629-8.ch008(147-181)Online publication date: 21-Jun-2024
    • (2024)Speech Emotion Recognition With Osmotic ComputingAdvanced Applications in Osmotic Computing10.4018/979-8-3693-1694-8.ch006(90-112)Online publication date: 29-Mar-2024
    • (2024)Skin Disease Detection and Remedial SystemInternational Journal of Innovative Science and Research Technology (IJISRT)10.38124/IJISRT24MAY395(982-988)Online publication date: 28-May-2024
    • (2024)A New Network Structure for Speech Emotion Recognition ResearchSensors10.3390/s2405142924:5(1429)Online publication date: 22-Feb-2024
    • (2024)MISNet: multi-source information-shared EEG emotion recognition network with two-stream structureFrontiers in Neuroscience10.3389/fnins.2024.129396218Online publication date: 14-Feb-2024
    • (2024)Fuzzy speech emotion recognition considering semantic awarenessJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23228046:3(7367-7377)Online publication date: 5-Mar-2024
    • (2024)Creation of a diverse mixed-lingual emotional speech corpus with a framework for enhanced emotion detectionJournal of Intelligent & Fuzzy Systems10.3233/JIFS-219390(1-17)Online publication date: 30-Apr-2024
    • (2024)Automatic detection of expressed emotion from Five-Minute Speech Samples: Challenges and opportunitiesPLOS ONE10.1371/journal.pone.030051819:3(e0300518)Online publication date: 21-Mar-2024
    • (2024)The Caring Machine: Feeling AI for Customer CareJournal of Marketing10.1177/0022242923122474888:5(1-23)Online publication date: 22-Mar-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Digital Edition

    View this article in digital edition.

    Digital Edition

    Magazine Site

    View this article on the magazine site (external)

    Magazine Site

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media