Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2388676.2388776acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

AVEC 2012: the continuous audio/visual emotion challenge

Published: 22 October 2012 Publication History

Abstract

We present the second Audio-Visual Emotion recognition Challenge and workshop (AVEC 2012), which aims to bring together researchers from the audio and video analysis communities around the topic of emotion recognition. The goal of the challenge is to recognise four continuously valued affective dimensions: arousal, expectancy, power, and valence. There are two sub-challenges: in the Fully Continuous Sub-Challenge participants have to predict the values of the four dimensions at every moment during the recordings, while for the Word-Level Sub-Challenge a single prediction has to be given per word uttered by the user. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.

References

[1]
Ahonen, T., Hadid, A., and Pietikäinen, M. Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 12 (2006), 2037--2041.
[2]
Chang, C.-C., and Lin, C.-J. LibSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[3]
Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., and Schröder, M. Feeltrace: An instrument for recording perceived emotion in real time. In Proc. ISCA Workshop on Speech and Emotion (Belfast, UK, 2000), pp. 19--24.
[4]
Douglas-Cowie, E., Cowie, R., Cox, C., Amier, N., and Heylen, D. The sensitive artificial listener: an induction technique for generating emotionally coloured conversation. In LREC Workshop on Corpora for Research on Emotion and Affect (Paris, France, 2008), ELRA, pp. 1--4.
[5]
Eyben, F., Wöllmer, M., and Schuller, B. openEAR - Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit. In Proc. ACII (Amsterdam, The Netherlands, 2009), pp. 576--581.
[6]
Eyben, F., Wöllmer, M., and Schuller, B. openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proc. ACM Multimedia (MM) (Florence, Italy, 2010), pp. 1459--1462.
[7]
Eyben, F., Wöllmer, M., Valstar, M., Gunes, H., Schuller, B., and Pantic, M. String-based Audiovisual Fusion of Behavioural Events for the Assessment of Dimensional Affect. In Proceedings International Workshop on Emotion Synthesis, rePresentation, and Analysis in Continuous spacE, EmoSPACE 2011, held in conjunction with the 9th IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, FG 2011 (Santa Barbara, CA, March 2011), IEEE, IEEE, pp. 322--329.
[8]
Fontaine, J., K. R., S., Roesch, E., and Ellsworth, P. The world of emotions is not two-dimensional. Psychological science 18, 2 (2007), 1050--1057.
[9]
Jiang, B., Valstar, M., and Pantic, M. Action unit detection using sparse appearance descriptors in space-time video volumes. In Proc. IEEE Int. Conf. on Automatic Face and Gesture Recognition (Santa Barbara, USA, 2011), pp. 314--321.
[10]
Lichtenauer, J., Valstar, M. F., Shen, J., and Pantic, M. Cost-effective solution to synchornized audio-visual capture using multiple sensors. Proc. IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (2009), 324--329.
[11]
McKeown, G., Valstar, M., Cowie, R., Pantic, M., and Schroder, M. The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing 3 (2012), 5--17.
[12]
Ojala, T., Pietikainen, M., and Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 7 (2002), 971--987.
[13]
Schröder, M., Bevacqua, E., Cowie, R., Eyben, F., Gunes, H., Heylen, D., ter Maat, M., McKeown, G., Pammi, S., Pantic, M., Pelachaud, C., Schuller, B., de Sevin, E., Valstar, M., and Wöllmer, M. Building Autonomous Sensitive Artificial Listeners. IEEE Transactions on Affective Computing 3 (2012). 165--183.
[14]
Schuller, B., Steidl, S., and Batliner, A. The INTERSPEECH 2009 Emotion Challenge. In Proc. INTERSPEECH 2009 (Brighton, UK, 2009), pp. 312--315.
[15]
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., and Narayanan, S. The INTERSPEECH 2010 Paralinguistic Challenge. In Proc. INTERSPEECH 2010 (Makuhari, Japan, 2010), pp. 2794--2797.
[16]
Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., and Pantic, M. AVEC 2011 - The First International Audio/Visual Emotion Challenge. In Proceedings First International Audio/Visual Emotion Challenge and Workshop, AVEC 2011, held in conjunction with the International HUMAINE Association Conference on Affective Computing and Intelligent Interaction 2011, ACII 2011, vol. II. Springer, Memphis, TN, October 2011, pp. 415--424.
[17]
Shan, C., Gong, S., and Mcowan, P. W. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing 27, 6 (2009), 803--816.
[18]
Valstar, M., Jiang, B., Mehu, M., Pantic, M., and Scherer, K. The first facial expression recognition and analysis challenge. Proc. IEEE Int. Conf. on Automatic Face and Gesture Recognition (2011), 921--926.
[19]
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. The HTK book (v3.4). Cambridge University Press, Cambridge, UK, 2006.

Cited By

View all
  • (2024)CRGMR: A Contextualized RGAT and GraphTransformer Method for Multimodal Emotion Recognition2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661855(8286-8291)Online publication date: 28-Jul-2024
  • (2024)COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332577046:2(805-822)Online publication date: Feb-2024
  • (2024)Emotion Recognition in Conversation Based on a Dynamic Complementary Graph Convolutional NetworkIEEE Transactions on Affective Computing10.1109/TAFFC.2024.336097915:3(1567-1579)Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interaction
October 2012
636 pages
ISBN:9781450314671
DOI:10.1145/2388676
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. affective computing
  2. challenge
  3. emotion recognition
  4. facial expression
  5. speech

Qualifiers

  • Research-article

Conference

ICMI '12
Sponsor:
ICMI '12: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
October 22 - 26, 2012
California, Santa Monica, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)6
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)CRGMR: A Contextualized RGAT and GraphTransformer Method for Multimodal Emotion Recognition2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661855(8286-8291)Online publication date: 28-Jul-2024
  • (2024)COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332577046:2(805-822)Online publication date: Feb-2024
  • (2024)Emotion Recognition in Conversation Based on a Dynamic Complementary Graph Convolutional NetworkIEEE Transactions on Affective Computing10.1109/TAFFC.2024.336097915:3(1567-1579)Online publication date: Jul-2024
  • (2024)Are 3D Face Shapes Expressive Enough for Recognising Continuous Emotions and Action Unit Intensities?IEEE Transactions on Affective Computing10.1109/TAFFC.2023.328053015:2(535-548)Online publication date: Apr-2024
  • (2024)Advanced transformer model with fine-grained correlation fusion for Multimodal Emotion analysis2024 International Conference on Emerging Research in Computational Science (ICERCS)10.1109/ICERCS63125.2024.10895199(1-7)Online publication date: 12-Dec-2024
  • (2024)A Systematic Review on Multimodal Emotion Recognition: Building Blocks, Current State, Applications, and ChallengesIEEE Access10.1109/ACCESS.2024.343085012(103976-104019)Online publication date: 2024
  • (2024)An ensemble learning-enhanced multitask learning method for continuous affect recognition from facial imagesExpert Systems with Applications10.1016/j.eswa.2023.121290236(121290)Online publication date: Feb-2024
  • (2023)MMATERIC: Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in ConversationElectronics10.3390/electronics1207153412:7(1534)Online publication date: 24-Mar-2023
  • (2023)Analysis of Deep Learning-Based Decision-Making in an Emotional Spontaneous Speech TaskApplied Sciences10.3390/app1302098013:2(980)Online publication date: 11-Jan-2023
  • (2023)A Bayesian Filtering Framework for Continuous Affect Recognition From Facial ImagesIEEE Transactions on Multimedia10.1109/TMM.2022.316424825(3709-3722)Online publication date: 2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media