Abstract
One of the severe obstacles to naturalistic human affective computing is that emotions are complex constructs with fuzzy boundaries and substantial individual variations. Thus, an important issue to be considered in emotion analysis is generating a person-specific representation of emotion in an unsupervised manner. This paper presents a fully unsupervised method combining autoencoder with Principle Component Analysis to build an emotion representation from speech signals. As each person has a different way of expressing emotions, this method is applied to the subject level. We also investigate the relevancy of such a representation. Experiments on Emo-DB, IEMOCAP, and SEMAINE database show that the proposed representation of emotion is invariant among subjects and similar to the representation built by psychologists, especially on the arousal dimension.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology (2005)
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335 (2008)
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 132–149 (2018)
Daniel, W.W.: Applied Nonparametric Statistics. Houghton Mifflin, Boston (1978)
Eskimez, S.E., Duan, Z., Heinzelman, W.: Unsupervised learning approach to feature analysis for automatic speech emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5099–5103 (2018)
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on Multimedia, pp. 835–838 (2013)
Ghosh, S., Laksana, E., Morency, L.P., Scherer, S.: Representation learning for speech emotion recognition. In: Interspeech, pp. 3603–3607 (2016)
Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Kaya, H., Karpov, A.A., Salah, A.A.: Fisher vectors with cascaded normalization for paralinguistic analysis. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Kim, Y., Provost, E.M.: Emotion classification via utterance-level dynamics: a pattern-based approach to characterizing affective expressions. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3677–3681 (2013)
Latif, S., Rana, R., Qadir, J., Epps, J.: Variational autoencoders for learning latent representations of speech emotion: a preliminary study. In: Interspeech, International Speech Communication Association (ISCA), pp. 3107–3111 (2018)
Lotfian, R., Busso, C.: Curriculum learning for speech emotion recognition from crowdsourced labels. IEEE/ACM Trans. Audio, Speech Lang. Process. 27(4), 815–826 (2019)
McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M.: The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3(1), 5–17 (2011)
Op’t Eynde, P., De Corte, E., Verschaffel, L.: Accepting emotional complexity: a socio-constructivist perspective on the role of emotions in the mathematics classroom. Educ. Stud. Math. 63(2), 193–207 (2006)
Pearson, K.: LIII on lines and planes of closest fit to systems of points in space. London Edinb. Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901)
Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inform. Fusion 37, 98–125 (2017)
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)
Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.R.: Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28(11), 2660–2673 (2016)
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. II-1 (2003)
Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association (2009)
Soladié, C., Stoiber, N., Séguier, R.: Invariant representation of facial expressions for blended expression recognition on unknown subjects. Comput. Vis. Image Underst. 117(11), 1598–1609 (2013)
Wang, S., Soladié, C., Séguier, R.: OCAE: Organization-controlled autoencoder for unsupervised speech emotion analysis. In: 5th International Conference on Frontiers of Signal Processing (ICFSP), pp. 72–76. IEEE (2019)
Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768–785 (2011)
Zhao, S., Ding, G., Han, J., Gao, Y.: Personality-aware personalized emotion recognition from physiological signals. In: IJCAI, pp. 1660–1667 (2018)
Acknowledgments.
Thanks to China Scholarship Council (CSC) and the French government funding program ANR REFLET No. ANR-17-CE19-0020-01 for funding.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, S., Soladié, C., Séguier, R. (2020). Learning an Unsupervised and Interpretable Representation of Emotion from Speech. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-60276-5_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)