research-article

AVEC 2012: the continuous audio/visual emotion challenge

Authors:

Björn Schuller,

Michel Valster,

Maja PanticAuthors Info & Claims

ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interaction

Pages 449 - 456

https://doi.org/10.1145/2388676.2388776

Published: 22 October 2012 Publication History

Abstract

We present the second Audio-Visual Emotion recognition Challenge and workshop (AVEC 2012), which aims to bring together researchers from the audio and video analysis communities around the topic of emotion recognition. The goal of the challenge is to recognise four continuously valued affective dimensions: arousal, expectancy, power, and valence. There are two sub-challenges: in the Fully Continuous Sub-Challenge participants have to predict the values of the four dimensions at every moment during the recordings, while for the Word-Level Sub-Challenge a single prediction has to be given per word uttered by the user. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.

References

[1]

Ahonen, T., Hadid, A., and Pietikäinen, M. Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 12 (2006), 2037--2041.

Digital Library

[2]

Chang, C.-C., and Lin, C.-J. LibSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Digital Library

[3]

Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., and Schröder, M. Feeltrace: An instrument for recording perceived emotion in real time. In Proc. ISCA Workshop on Speech and Emotion (Belfast, UK, 2000), pp. 19--24.

[4]

Douglas-Cowie, E., Cowie, R., Cox, C., Amier, N., and Heylen, D. The sensitive artificial listener: an induction technique for generating emotionally coloured conversation. In LREC Workshop on Corpora for Research on Emotion and Affect (Paris, France, 2008), ELRA, pp. 1--4.

[5]

Eyben, F., Wöllmer, M., and Schuller, B. openEAR - Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit. In Proc. ACII (Amsterdam, The Netherlands, 2009), pp. 576--581.

[6]

Eyben, F., Wöllmer, M., and Schuller, B. openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proc. ACM Multimedia (MM) (Florence, Italy, 2010), pp. 1459--1462.

Digital Library

[7]

Eyben, F., Wöllmer, M., Valstar, M., Gunes, H., Schuller, B., and Pantic, M. String-based Audiovisual Fusion of Behavioural Events for the Assessment of Dimensional Affect. In Proceedings International Workshop on Emotion Synthesis, rePresentation, and Analysis in Continuous spacE, EmoSPACE 2011, held in conjunction with the 9th IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, FG 2011 (Santa Barbara, CA, March 2011), IEEE, IEEE, pp. 322--329.

[8]

Fontaine, J., K. R., S., Roesch, E., and Ellsworth, P. The world of emotions is not two-dimensional. Psychological science 18, 2 (2007), 1050--1057.

[9]

Jiang, B., Valstar, M., and Pantic, M. Action unit detection using sparse appearance descriptors in space-time video volumes. In Proc. IEEE Int. Conf. on Automatic Face and Gesture Recognition (Santa Barbara, USA, 2011), pp. 314--321.

[10]

Lichtenauer, J., Valstar, M. F., Shen, J., and Pantic, M. Cost-effective solution to synchornized audio-visual capture using multiple sensors. Proc. IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (2009), 324--329.

Digital Library

[11]

McKeown, G., Valstar, M., Cowie, R., Pantic, M., and Schroder, M. The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing 3 (2012), 5--17.

Digital Library

[12]

Ojala, T., Pietikainen, M., and Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 7 (2002), 971--987.

Digital Library

[13]

Schröder, M., Bevacqua, E., Cowie, R., Eyben, F., Gunes, H., Heylen, D., ter Maat, M., McKeown, G., Pammi, S., Pantic, M., Pelachaud, C., Schuller, B., de Sevin, E., Valstar, M., and Wöllmer, M. Building Autonomous Sensitive Artificial Listeners. IEEE Transactions on Affective Computing 3 (2012). 165--183.

Digital Library

[14]

Schuller, B., Steidl, S., and Batliner, A. The INTERSPEECH 2009 Emotion Challenge. In Proc. INTERSPEECH 2009 (Brighton, UK, 2009), pp. 312--315.

[15]

Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., and Narayanan, S. The INTERSPEECH 2010 Paralinguistic Challenge. In Proc. INTERSPEECH 2010 (Makuhari, Japan, 2010), pp. 2794--2797.

[16]

Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., and Pantic, M. AVEC 2011 - The First International Audio/Visual Emotion Challenge. In Proceedings First International Audio/Visual Emotion Challenge and Workshop, AVEC 2011, held in conjunction with the International HUMAINE Association Conference on Affective Computing and Intelligent Interaction 2011, ACII 2011, vol. II. Springer, Memphis, TN, October 2011, pp. 415--424.

Digital Library

[17]

Shan, C., Gong, S., and Mcowan, P. W. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing 27, 6 (2009), 803--816.

Digital Library

[18]

Valstar, M., Jiang, B., Mehu, M., Pantic, M., and Scherer, K. The first facial expression recognition and analysis challenge. Proc. IEEE Int. Conf. on Automatic Face and Gesture Recognition (2011), 921--926.

[19]

Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. The HTK book (v3.4). Cambridge University Press, Cambridge, UK, 2006.

Cited By

Chen GLiu SBi XChen QMeng T(2024)CRGMR: A Contextualized RGAT and GraphTransformer Method for Multimodal Emotion Recognition2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661855(8286-8291)Online publication date: 28-Jul-2024
https://doi.org/10.23919/CCC63176.2024.10661855
Tellamekala MAmiriparian SSchuller BAndré EGiesbrecht TValstar M(2024)COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332577046:2(805-822)Online publication date: Feb-2024
https://doi.org/10.1109/TPAMI.2023.3325770
Yang ZLi XCheng YZhang TWang X(2024)Emotion Recognition in Conversation Based on a Dynamic Complementary Graph Convolutional NetworkIEEE Transactions on Affective Computing10.1109/TAFFC.2024.336097915:3(1567-1579)Online publication date: Jul-2024
https://doi.org/10.1109/TAFFC.2024.3360979
Show More Cited By

Index Terms

AVEC 2012: the continuous audio/visual emotion challenge

Recommendations

AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge
AVEC '16: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge

The Audio/Visual Emotion Challenge and Workshop (AVEC 2016) "Depression, Mood and Emotion" will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological ...
AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge
AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

Mood disorders are inherently related to emotion. In particular, the behaviour of people suffering from mood disorders such as unipolar depression shows a strong temporal correlation with the affective dimensions valence, arousal and dominance. In ...
AVEC 2012: the continuous audio/visual emotion challenge - an introduction
ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interaction

The second international Audio/Visual Emotion Challenge and Workshop 2012 (AVEC 2012) is introduced shortly. 34 teams from 12 countries signed up for the Challenge. The SEMAINE database serves for prediction of four-dimensional continuous affect in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interaction

October 2012

636 pages

ISBN:9781450314671

DOI:10.1145/2388676

General Chairs:
Louis-Philippe Morency
University of Southern California, USA
,
Dan Bohus
Microsoft Research, USA
,
Hamid Aghajan
Stanford University, USA
,
Program Chairs:
Justine Cassell
Carnegie Mellon University, USA
,
Anton Nijholt
University of Twente, Netherlands
,
Julien Epps
The University of New South Wales, Australia

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI '12

Sponsor:

SIGCHI

ICMI '12: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 22 - 26, 2012

California, Santa Monica, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

133
Total Citations
View Citations
668
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)6

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen GLiu SBi XChen QMeng T(2024)CRGMR: A Contextualized RGAT and GraphTransformer Method for Multimodal Emotion Recognition2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661855(8286-8291)Online publication date: 28-Jul-2024
https://doi.org/10.23919/CCC63176.2024.10661855
Tellamekala MAmiriparian SSchuller BAndré EGiesbrecht TValstar M(2024)COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332577046:2(805-822)Online publication date: Feb-2024
https://doi.org/10.1109/TPAMI.2023.3325770
Yang ZLi XCheng YZhang TWang X(2024)Emotion Recognition in Conversation Based on a Dynamic Complementary Graph Convolutional NetworkIEEE Transactions on Affective Computing10.1109/TAFFC.2024.336097915:3(1567-1579)Online publication date: Jul-2024
https://doi.org/10.1109/TAFFC.2024.3360979
Tellamekala MSümer ÖSchuller BAndré EGiesbrecht TValstar M(2024)Are 3D Face Shapes Expressive Enough for Recognising Continuous Emotions and Action Unit Intensities?IEEE Transactions on Affective Computing10.1109/TAFFC.2023.328053015:2(535-548)Online publication date: Apr-2024
https://doi.org/10.1109/TAFFC.2023.3280530
Krishnan SM SSrinivasulu SR S(2024)Advanced transformer model with fine-grained correlation fusion for Multimodal Emotion analysis2024 International Conference on Emerging Research in Computational Science (ICERCS)10.1109/ICERCS63125.2024.10895199(1-7)Online publication date: 12-Dec-2024
https://doi.org/10.1109/ICERCS63125.2024.10895199
Kalateh SEstrada-Jimenez LNikghadam-Hojjati SBarata J(2024)A Systematic Review on Multimodal Emotion Recognition: Building Blocks, Current State, Applications, and ChallengesIEEE Access10.1109/ACCESS.2024.343085012(103976-104019)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3430850
Pei EHu ZHe LNing HBerenguer A(2024)An ensemble learning-enhanced multitask learning method for continuous affect recognition from facial imagesExpert Systems with Applications10.1016/j.eswa.2023.121290236(121290)Online publication date: Feb-2024
https://doi.org/10.1016/j.eswa.2023.121290
Liang XZou YZhuang XYang JNiu TXu R(2023)MMATERIC: Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in ConversationElectronics10.3390/electronics1207153412:7(1534)Online publication date: 24-Mar-2023
https://doi.org/10.3390/electronics12071534
de Velasco MJusto RLópez Zorrilla ATorres M(2023)Analysis of Deep Learning-Based Decision-Making in an Emotional Spontaneous Speech TaskApplied Sciences10.3390/app1302098013:2(980)Online publication date: 11-Jan-2023
https://doi.org/10.3390/app13020980
Pei EZhao YOveneke MJiang DSahli H(2023)A Bayesian Filtering Framework for Continuous Affect Recognition From Facial ImagesIEEE Transactions on Multimedia10.1109/TMM.2022.316424825(3709-3722)Online publication date: 2023
https://doi.org/10.1109/TMM.2022.3164248
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten