Abstract. Sensitive Artificial Listeners (SAL) are virtual dialogue partners based on audiovisual analysis and synthesis. Despite their very limited verbal understanding, they intend to engage the user in a conversation by paying... more
Abstract. Sensitive Artificial Listeners (SAL) are virtual dialogue partners based on audiovisual analysis and synthesis. Despite their very limited verbal understanding, they intend to engage the user in a conversation by paying attention to the user's emotions and nonverbal expressions. The SAL characters have their own emotionally defined personality, and attempt to drag the user towards their dominant emotion through a combination of verbal and non-verbal expression.
Abstract Recent evidence in neuroscience support the theory that prediction of spatial and temporal patterns in the brain plays a key role in human actions and perception. Inspired by these findings, a system that discriminates laughter... more
Abstract Recent evidence in neuroscience support the theory that prediction of spatial and temporal patterns in the brain plays a key role in human actions and perception. Inspired by these findings, a system that discriminates laughter from speech by modeling the spatial and temporal relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter.
The human face is used to identify other people, to regulate the conversation by gazing or nodding, to interpret what has been said by lip reading, and to communicate and understand social signals, including affective states and... more
The human face is used to identify other people, to regulate the conversation by gazing or nodding, to interpret what has been said by lip reading, and to communicate and understand social signals, including affective states and intentions, on the basis of the shown facial expression. Machine understanding of human facial signals could revolutionize user-adaptive social interfaces, the integral part of ambient intelligence technologies.
Abstract We introduce the notion of subspace learning from image gradient orientations for appearance-based object recognition. As image data is typically noisy and noise is substantially different from Gaussian, traditional subspace... more
Abstract We introduce the notion of subspace learning from image gradient orientations for appearance-based object recognition. As image data is typically noisy and noise is substantially different from Gaussian, traditional subspace learning from pixel intensities fails very often to estimate reliably the low-dimensional subspace of a given data population. We show that replacing pixel intensities with gradient orientations and the $ ell_2 $ norm with a cosine-based distance measure offers, to some extend, a remedy to this problem.
Human affective behavior is multimodal, continuous and complex. Despite major advances within the affective computing research field, modeling, analyzing, interpreting and responding to human affective behavior still remains a challenge... more
Human affective behavior is multimodal, continuous and complex. Despite major advances within the affective computing research field, modeling, analyzing, interpreting and responding to human affective behavior still remains a challenge for automated systems as affect and emotions are complex constructs, with fuzzy boundaries and with substantial individual differences in expression and experience [7].
Abstract We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of... more
Abstract We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors.
Abstract Automatic recognition of human facial expressions is a challenging problem with many applications in human-computer interaction. Most of the existing facial expression analyzers succeed only in recognizing a few emotional facial... more
Abstract Automatic recognition of human facial expressions is a challenging problem with many applications in human-computer interaction. Most of the existing facial expression analyzers succeed only in recognizing a few emotional facial expressions, such as anger or happiness.
Abstract Automatic analysis of human facial expression is a challenging problem with many applications. Most of the existing automated systems for facial expression analysis attempt to recognize a few prototypic emotional expressions,... more
Abstract Automatic analysis of human facial expression is a challenging problem with many applications. Most of the existing automated systems for facial expression analysis attempt to recognize a few prototypic emotional expressions, such as anger and happiness. Instead of representing another approach to machine analysis of prototypic facial expressions of emotion, the method presented in this paper attempts to handle a large range of human facial behavior by recognizing facial muscle actions that produce expressions.
Abstract Principal Component Analysis (PCA) is perhaps the most prominent learning tool for dimensionality reduction in pattern recognition and computer vision. However, the ℓ 2-norm employed by standard PCA is not robust to outliers. In... more
Abstract Principal Component Analysis (PCA) is perhaps the most prominent learning tool for dimensionality reduction in pattern recognition and computer vision. However, the ℓ 2-norm employed by standard PCA is not robust to outliers. In this paper, we propose a kernel PCA method for fast and robust PCA, which we call Euler-PCA (e-PCA). In particular, our algorithm utilizes a robust dissimilarity measure based on the Euler representation of complex numbers.
Abstract In the last decade, the research topic of automatic analysis of facial expressions has become a central topic in machine vision research. Nonetheless, there is a glaring lack of a comprehensive, readily accessible reference set... more
Abstract In the last decade, the research topic of automatic analysis of facial expressions has become a central topic in machine vision research. Nonetheless, there is a glaring lack of a comprehensive, readily accessible reference set of face images that could be used as a basis for benchmarks for efforts in the field. This lack of easily accessible, suitable, common testing resource forms the major impediment to comparing and extending the issues concerned with automatic facial expression analysis.
Human affect sensing can be obtained from a broad range of behavioral cues and signals that are available via visual, acoustic, and tactual expressions or presentations of emotions.
Abstract This paper presents a novel, robust and flexible method for extracting four mouth features (top of the upper lip, bottom of the lower lip, left and right mouth corners) from facial image sequences. While robustness is referred to... more
Abstract This paper presents a novel, robust and flexible method for extracting four mouth features (top of the upper lip, bottom of the lower lip, left and right mouth corners) from facial image sequences. While robustness is referred to subject variability, pose, and image quality, flexibility is begotten by efficient fusion of several information sources and expounding the certainty of the generated results
Abstract This paper discusses the Integrated System for Facial Expression Recognition (ISFER), which performs facial expression analysis from a still dual facial view image. The system consists of three major parts: a facial data... more
Abstract This paper discusses the Integrated System for Facial Expression Recognition (ISFER), which performs facial expression analysis from a still dual facial view image. The system consists of three major parts: a facial data generator, a facial data evaluator and a facial data analyser.
Abstract Finding fiducial facial points in any frame of a video showing rich naturalistic facial behaviour is an unsolved problem. Yet this is a crucial step for geometric-feature-based facial expression analysis, and methods that use... more
Abstract Finding fiducial facial points in any frame of a video showing rich naturalistic facial behaviour is an unsolved problem. Yet this is a crucial step for geometric-feature-based facial expression analysis, and methods that use appearance-based features extracted at fiducial facial point locations.
Abstract: The exploration of how human beings react to the world and interact with it and each other remains one of the greatest scientific challenges. Latest research trends in cognitive sciences argue that our common view of... more
Abstract: The exploration of how human beings react to the world and interact with it and each other remains one of the greatest scientific challenges. Latest research trends in cognitive sciences argue that our common view of intelligence is too narrow, ignoring a crucial range of abilities that matter immensely for how people do in life.
ABSTRACT This paper proposes a new feature descriptor, local normal binary patterns (LNBPs), which is exploited for detection of facial action units (AUs). After LNBPs have been employed to form descriptor vectors, which capture the... more
ABSTRACT This paper proposes a new feature descriptor, local normal binary patterns (LNBPs), which is exploited for detection of facial action units (AUs). After LNBPs have been employed to form descriptor vectors, which capture the detailed shape of the action, feature selection is performed via a Gentle-Boost (GB) algorithm, and support vector machines (SVMs) are trained to detect each AU.
Abstract This paper addresses the problem of template-based tracking of non rigid objects. We use the well-known framework of auxiliary particle filtering and propose an observation model that explicitly addresses appearance changes that... more
Abstract This paper addresses the problem of template-based tracking of non rigid objects. We use the well-known framework of auxiliary particle filtering and propose an observation model that explicitly addresses appearance changes that are caused by local deformations of the tracked object. In addition, by adopting a colour difference that is invariant to local changes in the illumination, the proposed observation model can deal with changing lighting conditions and shadows.
Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from... more
Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by fusing the results of separate audio and video classifiers on the decision level. This results in laughter detection with a significantly higher AUC-ROC than single-modality classification.
Abstract Hidden conditional random fields (HCRFs) are discriminative latent variable models that have been shown to successfully learn the hidden structure of a given classification problem (provided an appropriate validation of the... more
Abstract Hidden conditional random fields (HCRFs) are discriminative latent variable models that have been shown to successfully learn the hidden structure of a given classification problem (provided an appropriate validation of the number of hidden states). In this brief, we present the infinite HCRF (iHCRF), which is a nonparametric model based on hierarchical Dirichlet processes and is capable of automatically learning the optimal number of hidden states for a classification task.
Abstract We have acquired a set of audio-visual recordings of induced emotions. A collage of comedy clips and clips of disgusting content were shown to a number of participants, who displayed mostly expressions of disgust, happiness, and... more
Abstract We have acquired a set of audio-visual recordings of induced emotions. A collage of comedy clips and clips of disgusting content were shown to a number of participants, who displayed mostly expressions of disgust, happiness, and surprise in response. While displays of induced emotions may differ from those shown in everyday life in aspects such as the frequency with which they occur, they are regarded as highly naturalistic and spontaneous. We recorded 25 participants for approximately 5 minutes each.