HAL (Le Centre pour la Communication Scientifique Directe), 2017
Facial Action Unit (AU) prediction is a challenging task as it supposes analysing subtle muscular... more Facial Action Unit (AU) prediction is a challenging task as it supposes analysing subtle muscular activations under a number of morphological and environmental variations. Furthermore, the scarcity of the available data makes end-toend training (e.g. using deep learning techniques) difficult. In this paper, we propose to use recently proposed local expressiondriven features along with a facial mesh refined using an adaptive strategy in order to better capture subtle deformations. Furthermore, we use multi-output Random Forests to capture the correlations between closely related AU description tasks. Eventually, we draw a thorough comparative study of different techniques for training multi-output Random Forests, and present a complete framework for AU detection that significantly outperforms state-of-the-art approaches on both prototypical and spontaneous data.
Resorting to crowdsourcing platforms is a popular way to obtain annotations. Multiple potentially... more Resorting to crowdsourcing platforms is a popular way to obtain annotations. Multiple potentially noisy answers can thus be aggregated to retrieve an underlying ground truth. However, it may be irrelevant to look for a unique ground truth when we ask crowd workers for opinions, notably when dealing with subjective phenomena such as stress. In this paper, we discuss how we can better use crowdsourced annotations with an application to automatic detection of perceived stress. Towards this aim, we first acquired video data from 44 subjects in a stressful situation and gathered answers to a binary question using a crowdsourcing platform. Then, we propose to integrate two measures derived from the set of gathered answers into the machine learning framework. First, we highlight that using the consensus level among crowd worker answers substantially increases classification accuracies. Then, we show that it is suitable to directly predict for each video the proportion of positive answers to the question from the different crowd workers. Hence, we propose a thorough study on how crowdsourced annotations can be used to enhance performance of classification and regression methods.
Being able to produce facial expressions (FEs) that are adequate given a social context is key to... more Being able to produce facial expressions (FEs) that are adequate given a social context is key to harmonious social development, particularly in the case of children plagued with autism spectrum disorder (ASD). In this paper, we introduce JEMImE, a serious game solution that aims at teaching children how to produce FEs. JEMImE is based on a FE recognition module that is learned on a large video corpus of children performing FEs. This module is validated and incorporated through multiple scenarios of gradual difficulty, ranging from a training phase where children have to perform the FEs on request, with or without an avatar model, to an in-context phase that involves many emotion-eliciting social situations with virtual characters.
HAL (Le Centre pour la Communication Scientifique Directe), 2017
Facial Action Unit (AU) prediction is a challenging task as it supposes analysing subtle muscular... more Facial Action Unit (AU) prediction is a challenging task as it supposes analysing subtle muscular activations under a number of morphological and environmental variations. Furthermore, the scarcity of the available data makes end-toend training (e.g. using deep learning techniques) difficult. In this paper, we propose to use recently proposed local expressiondriven features along with a facial mesh refined using an adaptive strategy in order to better capture subtle deformations. Furthermore, we use multi-output Random Forests to capture the correlations between closely related AU description tasks. Eventually, we draw a thorough comparative study of different techniques for training multi-output Random Forests, and present a complete framework for AU detection that significantly outperforms state-of-the-art approaches on both prototypical and spontaneous data.
Resorting to crowdsourcing platforms is a popular way to obtain annotations. Multiple potentially... more Resorting to crowdsourcing platforms is a popular way to obtain annotations. Multiple potentially noisy answers can thus be aggregated to retrieve an underlying ground truth. However, it may be irrelevant to look for a unique ground truth when we ask crowd workers for opinions, notably when dealing with subjective phenomena such as stress. In this paper, we discuss how we can better use crowdsourced annotations with an application to automatic detection of perceived stress. Towards this aim, we first acquired video data from 44 subjects in a stressful situation and gathered answers to a binary question using a crowdsourcing platform. Then, we propose to integrate two measures derived from the set of gathered answers into the machine learning framework. First, we highlight that using the consensus level among crowd worker answers substantially increases classification accuracies. Then, we show that it is suitable to directly predict for each video the proportion of positive answers to the question from the different crowd workers. Hence, we propose a thorough study on how crowdsourced annotations can be used to enhance performance of classification and regression methods.
Being able to produce facial expressions (FEs) that are adequate given a social context is key to... more Being able to produce facial expressions (FEs) that are adequate given a social context is key to harmonious social development, particularly in the case of children plagued with autism spectrum disorder (ASD). In this paper, we introduce JEMImE, a serious game solution that aims at teaching children how to produce FEs. JEMImE is based on a FE recognition module that is learned on a large video corpus of children performing FEs. This module is validated and incorporated through multiple scenarios of gradual difficulty, ranging from a training phase where children have to perform the FEs on request, with or without an avatar model, to an in-context phase that involves many emotion-eliciting social situations with virtual characters.
Uploads
Papers by Severine Dubuisson