research-article

Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech

Authors:

Dimitri Zharkov,

Markus Kächele,

Friedhelm SchwenkerAuthors Info & Claims

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

Pages 551 - 556

https://doi.org/10.1145/2522848.2531743

Published: 09 December 2013 Publication History

Abstract

Systems for the recognition of psychological characteristics such as the emotional state in real world scenarios have to deal with several difficulties. Amongst those are unconstrained environments and uncertainties in one or several input channels. However a more crucial aspect is the content of the data itself. Psychological states are highly person-dependent and often even humans are not able to determine the correct state a person is in. A successful recognition system thus has to deal with data, that is not very discriminative and often simply misleading. In order to succeed, a critical view on features and decisions is essential to select only the most valuable ones. This work presents a comparison of a common multi classifier system approach based on state of the art features and a modified forward backward feature selection algorithm with a long term stopping criteria. The second approach takes also features of the voice quality family into account. Both approaches are based on the audio modality only. The dataset used in the challenge is an in between dataset of real world datasets which are still very hard to handle and over acted datasets which were famous in the past and today are well understood.

References

[1]

M. Airas and P. Alku. Comparison of multiple voice source parameters in different phonation types. In INTERSPEECH, pages 1410--1413, 2007.

[2]

P. Alku, T. Bäckström, and E. Vilkman. Normalized amplitude quotient for parametrization of the glottal flow. the Journal of the Acoustical Society of America, 112:701, 2002.

[3]

F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss. A database of german emotional speech. In Proceedings of Interspeech 2005, 2005.

[4]

A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T. Gedeon. Emotion recognition in the wild challenge. ACM ICMI, 2013.

Digital Library

[5]

A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia, 19:34--41, 2012.

Digital Library

[6]

T. Drugman, B. Bozkurt, and T. Dutoit. Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation. Speech Communication}, 53(6):855--866, 2011.

Digital Library

[7]

G. Fant, J. Liljencrants, and Q.-g. Lin. A four-parameter model of glottal flow. STL-QPSR, 4(1985):1--13, 1985.

[8]

N. Fragopanagos and J. Taylor. Emotion recognition in human-computer interaction. Neural Networks, 18:389--405, 2005.

Digital Library

[9]

N. H. Frijda. Recognition of emotion. Advances in experimental social psychology, 4:167--223, 1969.

[10]

H. Hermansky, N. Morgan, A. Bayya, and P. Kohn. Rasta-plp speech analysis technique. In Acoustics, Speech, and Signal Processing. ICASSP-92., IEEE International Conference on, volume 1, pages 121--124. IEEE, 1992.

Digital Library

[11]

T. Kanade, J. Cohn, and Y. Tian. Comprehensive database for facial expression analysis. In Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on, pages 46--53, 2000.

Digital Library

[12]

J. Kane and C. Gobl. Identifying regions of non-modal phonation using features of the wavelet transform. In INTERSPEECH, pages 177--180, 2011.

[13]

J. Kane and C. Gobl. Wavelet maxima dispersion for breathy to tense voice discrimination. IEEE Transactions on Audio, Speech, and Language Processing, 21, 2013.

[14]

N. Kanedera, T. Arai, H. Hermansky, and M. Pavel. On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Communication, 28(1):43--55, 1999.

[15]

C. M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, and S. S. Narayanan. Emotion recognition based on phoneme classes. In Proceedings of ICSLP 04, 2004.

[16]

I. Luengo, E. Navas, and I. Hernáez. Feature analysis and evaluation for automatic emotion identification in speech. Multimedia, IEEE Transactions on, 12(6):490--501, 2010.

Digital Library

[17]

M. Lugger and B. Yang. Classification of different speaking groups by means of voice quality parameters. ITG-Fachbericht-Sprachkommunikation, 2006.

[18]

M. Lugger and B. Yang. The relevance of voice quality features in speaker independent emotion recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP., volume 4, pages IV--17. IEEE, 2007.

[19]

S. Meudt and F. Schwenker. On instance selection in audio based emotion recognition. In Proceedings of the 5th IAPR TC3 Workshop on Artificial Neural Networks for Pattern Recognition (ANNPR'12), pages 186--192. Springer, 2012.

Digital Library

[20]

J. Nicholson, K. Takahashi, and R. Nakatsu. Emotion recognition in speech using neural networks. Neural Computing and Applications, 9:290--296, 2000.

[21]

T. L. Nwe, S. W. Foo, and L. C. De Silva. Speech emotion recognition using hidden markov models. Speech communication, 41(4):603--623, 2003.

[22]

G. Palm and F. Schwenker. Sensor-fusion in neural networks. In E. Shahbazian, G. Rogova, and M. J. DeWeert, editors, Harbour Protection Through Data Fusion Technologies, pages 299--306. Springer, 2009.

[23]

L. Rabiner and B.-H. Juang. Fundamentals of speech recognition. 1993.

Digital Library

[24]

S. Schachter. The interaction of cognitive and physiological determinants of emotional state. Advances in Experimental Social Psychology}, 1(Bd. 1):49--80, 1964.

[25]

K. R. Scherer, T. Johnstone, and G. Klasmeyer. Handbook of Affective Sciences - Vocal expression of emotion, chapter 23, pages 433--456. Affective Science. Oxford University Press, 2003.

[26]

S. Scherer, J. Kane, C. Gobl, and F. Schwenker. Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification. Computer Speech and Language, 27(1):263--287, Jan. 2012.

Digital Library

[27]

S. Scherer, M. Oubbati, F. Schwenker, and G. Palm. Real-time emotion recognition from speech using echo state networks. In Artificial Neural Networks in Pattern Recognition, pages 205--216. Springer Berlin Heidelberg, 2008.

Digital Library

[28]

S. Scherer, F. Schwenker, and G. Palm. Classifier fusion for emotion recognition from speech. In Proceedings of Intelligent Environments 07, 2007.

[29]

S. Scherer, F. Schwenker, and G. Palm. Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, Audio, Image and Biomedical Signal Processing using Neural Networks, pages 49--70. Springer Berlin Heidelberg, 2008.

[30]

S. Scherer, F. Schwenker, and G. Palm. Classifier fusion for emotion recognition from speech. In W. Minker, M. Weber, H. Hagras, V. Callagan, and A. D. Kameas, editors, Advanced Intelligent Environments, pages 95--117. Springer, 2009.

[31]

F. Schwenker, S. Scherer, M. Schmidt, M. Schels, and M. Glodek. Multiple classifier systems for the recognition of human emotions. In N. E. Gayar, J. Kittler, and F. Roli, editors, Proceedings of the 9th International Workshop on Multiple Classifier Systems (MCS'10), LNCS 5997, pages 315--324. Springer, 2010.

Digital Library

[32]

M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic. Avec 2013--the continuous audio/visual emotion and depression recognition challenge.

[33]

S. Walter, S. Scherer, M. Schels, M. Glodek, D. Hrabal, M. Schmidt, R. Böck, K. Limbrecht, H. Traue, and F. Schwenker. Multimodal emotion classification in naturalistic user behavior. In J. Jacko, editor,Human-Computer Interaction. Towards Mobile and Intelligent Interaction Environments, volume 6763 of in Computer Science, pages 603--611. Springer Berlin Heidelberg, 2011.

Digital Library

Cited By

Gao YYan PKruger UCavuoto LSchwaitzberg SDe SIntes X(2021)Functional Brain Imaging Reliably Predicts Bimanual Motor Skill Performance in a Standardized Surgical TaskIEEE Transactions on Biomedical Engineering10.1109/TBME.2020.301429968:7(2058-2066)Online publication date: Jul-2021
https://doi.org/10.1109/TBME.2020.3014299
Bellmann PThiam PSchwenker F(2018)Multi-classifier-Systems: Architectures, Algorithms and ApplicationsComputational Intelligence for Pattern Recognition10.1007/978-3-319-89629-8_4(83-113)Online publication date: 1-May-2018
https://doi.org/10.1007/978-3-319-89629-8_4
Kindsvater DMeudt SSchwenker F(2017)Fusion Architectures for Multimodal Cognitive Load RecognitionMultimodal Pattern Recognition of Social Signals in Human-Computer-Interaction10.1007/978-3-319-59259-6_4(36-47)Online publication date: 1-Jun-2017
https://doi.org/10.1007/978-3-319-59259-6_4
Show More Cited By

Index Terms

Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech
1. Computing methodologies
  1. Machine learning

Recommendations

Toward Machine Emotional Intelligence: Analysis of Affective Physiological State
Graph Algorithms and Computer Vision

The ability to recognize emotion is one of the hallmarks of emotional intelligence, an aspect of human intelligence that has been argued to be even more important than mathematical and verbal intelligences. This paper proposes that machine intelligence ...
Feature selection in audiovisual emotion recognition based on rough set theory
Transactions on rough sets VII

Affective computing is becoming an important research area in intelligent computing technology. Furthermore, emotion recognition is one of the hot topics in affective computing. It is usually studied based on facial and audio information with ...
Enhanced Autocorrelation in Real World Emotion Recognition
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

Multimodal emotion recognition in real world environments is still a challenging task of affective computing research. Recognizing the affective or physiological state of an individual is difficult for humans as well as for computer systems, and thus ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

December 2013

630 pages

ISBN:9781450321297

DOI:10.1145/2522848

General Chairs:
Julien Epps
The University of New South Wales, Australia
,
Fang Chen
National ICT Australia, Australia
,
Sharon Oviatt
Incaa Designs, USA
,
Kenji Mase
Nagoya University, Japan
,
Program Chairs:
Andrew Sears
Rochester Institute of Technology, USA
,
Kristiina Jokinen
University of Helsinki, Finland
,
Björn Schuller
Technische Universität München, Germany

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 December 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI '13

Sponsor:

SIGCHI

ICMI '13: 2013 International Conference on Multimodal Interaction

December 9 - 13, 2013

Sydney, Australia

Acceptance Rates

ICMI '13 Paper Acceptance Rate 49 of 133 submissions, 37%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
188
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gao YYan PKruger UCavuoto LSchwaitzberg SDe SIntes X(2021)Functional Brain Imaging Reliably Predicts Bimanual Motor Skill Performance in a Standardized Surgical TaskIEEE Transactions on Biomedical Engineering10.1109/TBME.2020.301429968:7(2058-2066)Online publication date: Jul-2021
https://doi.org/10.1109/TBME.2020.3014299
Bellmann PThiam PSchwenker F(2018)Multi-classifier-Systems: Architectures, Algorithms and ApplicationsComputational Intelligence for Pattern Recognition10.1007/978-3-319-89629-8_4(83-113)Online publication date: 1-May-2018
https://doi.org/10.1007/978-3-319-89629-8_4
Kindsvater DMeudt SSchwenker F(2017)Fusion Architectures for Multimodal Cognitive Load RecognitionMultimodal Pattern Recognition of Social Signals in Human-Computer-Interaction10.1007/978-3-319-59259-6_4(36-47)Online publication date: 1-Jun-2017
https://doi.org/10.1007/978-3-319-59259-6_4
Held DMeudt SSchwenker F(2017)Bimodal Recognition of Cognitive Load Based on Speech and Physiological ChangesMultimodal Pattern Recognition of Social Signals in Human-Computer-Interaction10.1007/978-3-319-59259-6_2(12-23)Online publication date: 1-Jun-2017
https://doi.org/10.1007/978-3-319-59259-6_2
Schwenker FBöck RSchels MMeudt SSiegert IGlodek MKächele MSchmidt-Wack MThiam PWendemuth AKrell G(2017)Multimodal Affect Recognition in the Context of Human-Computer Interaction for Companion-SystemsCompanion Technology10.1007/978-3-319-43665-4_19(387-408)Online publication date: 5-Dec-2017
https://doi.org/10.1007/978-3-319-43665-4_19
Hihn HMeudt SSchwenker F(2016)Inferring mental overload based on postural behavior and gesturesProceedings of the 2nd workshop on Emotion Representations and Modelling for Companion Systems10.1145/3009960.3009961(1-4)Online publication date: 16-Nov-2016
https://dl.acm.org/doi/10.1145/3009960.3009961
Kächele MSchels MMeudt SPalm GSchwenker F(2016)Revisiting the EmotiW challenge: how wild is it really?Journal on Multimodal User Interfaces10.1007/s12193-015-0202-710:2(151-162)Online publication date: 12-Feb-2016
https://doi.org/10.1007/s12193-015-0202-7
Hihn HMeudt SSchwenker F(2016)On Gestures and Postural Behavior as a Modality in Ensemble MethodsArtificial Neural Networks in Pattern Recognition10.1007/978-3-319-46182-3_26(312-323)Online publication date: 9-Sep-2016
https://doi.org/10.1007/978-3-319-46182-3_26
Sun BLi LWu XZuo TChen YZhou GHe JZhu X(2015)Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wildJournal on Multimodal User Interfaces10.1007/s12193-015-0203-610:2(125-137)Online publication date: 18-Nov-2015
https://doi.org/10.1007/s12193-015-0203-6
Kächele MWerner PAl-Hamadi APalm GWalter SSchwenker F(2015)Bio-Visual Fusion for Person-Independent Recognition of Pain IntensityMultiple Classifier Systems10.1007/978-3-319-20248-8_19(220-230)Online publication date: 3-Jun-2015
https://doi.org/10.1007/978-3-319-20248-8_19
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents