Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Renaud Seguier

    Ecole CentraleSupelec, FAST, Faculty Member
    One of the severe obstacles to speech emotion analysis is the lack of reasonable labelled speech signal. Thus, an important issue to be considered is applying an unsupervised method to generate a representation in low dimension to analyze... more
    One of the severe obstacles to speech emotion analysis is the lack of reasonable labelled speech signal. Thus, an important issue to be considered is applying an unsupervised method to generate a representation in low dimension to analyze emotions. Such a representation coming from data needs to be stable and meaningful, just like the 2D or 3D representation of emotions elaborated by psychology. In this paper, we propose a fully unsupervised approach, called Organization-Controlled AutoEncoder (OCAE), combining autoencoder with PCA to build an emotional representation. We utilize the result of PCA on speech features to control the organization of the data in the latent space of autoencoder, through adding an organization loss to the classical objective function. Indeed, PCA can keep the organization of the data, whereas autoencoder leads to better discrimination of the data. By combining both, we can take advantage of each method. The results on Emo-DB and SEMAINE database show that our representation generated in an unsupervised manner is meaningful and stable.
    We present a new application in the field of impulse neurons: audio-visual speech recognition. The features extracted from the audio (cepstral coefficients) and the video (height, width of the mouth, percentage of black and white pixels... more
    We present a new application in the field of impulse neurons: audio-visual speech recognition. The features extracted from the audio (cepstral coefficients) and the video (height, width of the mouth, percentage of black and white pixels in the mouth) are sufficiently simple to consider a real time integration of the complete system. A generic preprocessing makes it possible to convert
    3D face clones can be used as pretreatments in many applications, such as emotion analysis. However, such clones should model facial shape accurately, while keeping the attributes of individuals; and they should be semantic. A clone is... more
    3D face clones can be used as pretreatments in many applications, such as emotion analysis. However, such clones should model facial shape accurately, while keeping the attributes of individuals; and they should be semantic. A clone is semantics when we know the position of the different parts of the face (eyes, nose...). The main problem of texture reconstruction methods is the seam appearance on fusion texture data. In our technique, we use a low cost RGB-D sensor to get an accurate and detailed facial unfolded texture. We use shape and texture patches to preserve the person’s characteristics. They are detected using an error distance and the direction of the normal vectors computed from the depth frames. The tests we perform show the robustness and the accuracy of our method.
    International audienc
    Recent studies show that health perception from faces by humans is a good predictor of good health and healthy behaviors. We aimed to automatize human health perception by training a Convolutional Neural Network on a related task (age... more
    Recent studies show that health perception from faces by humans is a good predictor of good health and healthy behaviors. We aimed to automatize human health perception by training a Convolutional Neural Network on a related task (age estimation) combined with a Ridge Regression to rate faces. Indeed, contrary to health ratings, large datasets with labels of biological age exist. The results show that our system outperforms average human judgments for health. The system could be used on a daily basis to detect early signs of sickness or a declining state. We are convinced that such a system will contribute to more extensively explore the use of holistic, fast, and non-invasive measures to improve the speed of diagnosis.
    International audienc
    A crucial step for developing and testing a system of facial expression analysis is to choose the database which suits best the targeted context application. We propose in this paper a survey based on the review of 69 databases, taking... more
    A crucial step for developing and testing a system of facial expression analysis is to choose the database which suits best the targeted context application. We propose in this paper a survey based on the review of 69 databases, taking into account both macro- and micro-expressions. To the best of our knowledge, there are no other surveys with so many databases. We review the existing facial expression databases according to 18 characteristics grouped in 6 categories (population, modalities, data acquisition hardware, experimental conditions, experimental protocol and annotations). These characteristics are meant to be helpful for researchers when they are choosing a database which suits their context application. We bring to light the trends between posed, spontaneous and in-the-wild databases, as well as micro-expression databases. We finish with future directions, including crowd sourcing and databases with groups of people.
    International audienc
    Research Interests:
    International audienc
    Micro expressions (MEs) convey specific non-verbal information. However, as they occurs briefly on local facial region, ME detection is a difficult task. In our paper, a local temporal pattern (LTP) of facial movement is proposed for the... more
    Micro expressions (MEs) convey specific non-verbal information. However, as they occurs briefly on local facial region, ME detection is a difficult task. In our paper, a local temporal pattern (LTP) of facial movement is proposed for the ME detection. In order to increase the detection accuracy, our method extracts the LTP from a PCA space projection in the video by a sliding window of 300ms (mean duration of MEs). The LTPs represent the ME movement variation and they are identical for all MEs. ME frames are then recognized by a classical classification method (SVM). To eliminate the false positives and true negatives, we employ finally a global fusion on the entire facial region. Experiments are performed on two public databases, and the detection results show that our proposed method performs better than the most popular detection method in
    Notre objectif est la segmentation d'objets faiblement convexes en environnement bruite. Nous avons propose pour cela deux algorithmes implementant l'optimisation multicriteres selon Pareto de contours actifs genetiques. La... more
    Notre objectif est la segmentation d'objets faiblement convexes en environnement bruite. Nous avons propose pour cela deux algorithmes implementant l'optimisation multicriteres selon Pareto de contours actifs genetiques. La premiere methode, les Multiobjective Genetic Snakes (MGS), utilise l'algorithme genetique multiobjectifs NSGA2 pour deformer des double snakes. La seconde (MGHS) associe un micro AG a une methode de recherche locale basee sur des contours actifs, l'Operateur de Contours Actifs Variationnels (OCAV). Un ensemble assurant la convergence des contours actifs genetiques multicriteres a ete defini. Nous avons applique nos algorithmes a l'extraction des contours. Une implementation video de nos travaux dans le cadre d'un processus de lecture labiale a egalement ete realisee.
    International audienceEfficient modeling of the inter-individual variations of head-related transfer functions (HRTFs) is a key matterto the individualization of binaural synthesis. In previous work, we augmented a dataset of 119 pairs of... more
    International audienceEfficient modeling of the inter-individual variations of head-related transfer functions (HRTFs) is a key matterto the individualization of binaural synthesis. In previous work, we augmented a dataset of 119 pairs of earshapes and pinna-related transfer functions (PRTFs), thus creating a wide dataset of 1005 ear shapes and PRTFsgenerated by random ear drawings (WiDESPREaD) and acoustical simulations. In this article, we investigate thedimensionality reduction capacity of two principal component analysis (PCA) models of magnitude PRTFs, trainedon WiDESPREaD and on the original dataset, respectively. We find that the model trained on the WiDESPREaDdataset performs best, regardless of the number of retained principal components
    Understanding and controlling latent representations in deep generative models is a challenging yet important problem for analyzing, transforming and generating various types of data. In speech processing, inspiring from the anatomical... more
    Understanding and controlling latent representations in deep generative models is a challenging yet important problem for analyzing, transforming and generating various types of data. In speech processing, inspiring from the anatomical mechanisms of phonation, the source-filter model considers that speech signals are produced from a few independent and physically meaningful continuous latent factors, among which the fundamental frequency f0 and the formants are of primary importance. In this work, we show that the source-filter model of speech production naturally arises in the latent space of a variational autoencoder (VAE) trained in an unsupervised manner on a dataset of natural speech signals. Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we experimentally illustrate that f0 and the formant frequencies are encoded in orthogonal subspaces of the VAE latent space and we develop a weakly-supervised method to accurately and indep...
    National audienceUn clone de visage 3D sémantique peut être utilisé comme prétraitement dans des applications comme l’analyse des émotions.Toutefois, ces clones doivent avoir la forme du visage bien modélisée tout en gardant la... more
    National audienceUn clone de visage 3D sémantique peut être utilisé comme prétraitement dans des applications comme l’analyse des émotions.Toutefois, ces clones doivent avoir la forme du visage bien modélisée tout en gardant la spécificité des individus. Dans notre technique, nousutilisons un capteur RVB-Z pour obtenir la spécificité des individus et un modèle déformable de visage 3D pour marquer la forme du visage.Nous gardons les parties appropriées de données de profondeur appelés Patch. Cette sélection est effectuée en utilisant une erreur de distanceet la direction des vecteurs normaux de chaque point. Selon l’emplacement, nous fusionnons soit les données des capteurs soit les donnéesobtenues avec le modèle déformable. Nous comparons notre méthode avec un processus de fitting classique. Les tests qualitatifs montrent quenos résultats sont plus précis qu’une méthode de fitting classique et les tests quantitatifs montrent que notre clone possède à la fois les spécificitésde la pe...
    International audienceHead-related transfer function individualization is a key matter in binaural synthesis. However, currently available databases are limited in size compared to the high dimensionality of the data. In this paper, the... more
    International audienceHead-related transfer function individualization is a key matter in binaural synthesis. However, currently available databases are limited in size compared to the high dimensionality of the data. In this paper, the process of generating a synthetic dataset of 1000 ear shapes and matching sets of pinna-related transfer functions (PRTFs), named WiDESPREaD (wide dataset of ear shapes and pinna-related transfer functions obtained by random ear drawings), is presented and made freely available to other researchers. Contributions in this article are threefold. First, from a proprietary dataset of 119 three-dimensional left-ear scans, a matching dataset of PRTFs was built by performing fast-multipole boundary element method (FM-BEM) calculations. Second, the underlying geometry of each type of high-dimensional data was investigated using principal component analysis. It was found that this linear machine-learning technique performs better at modeling and reducing data...
    BACKGROUND Language mapping during awake brain surgery is currently a standard procedure. However, mapping is rarely performed for other cognitive functions that are important for social interaction, such as visuospatial cognition and... more
    BACKGROUND Language mapping during awake brain surgery is currently a standard procedure. However, mapping is rarely performed for other cognitive functions that are important for social interaction, such as visuospatial cognition and nonverbal language, including facial expressions and eye gaze. The main reason for this omission is the lack of tasks that are fully compatible with the restrictive environment of an operating room and awake brain surgery procedures. OBJECTIVE This study aims to evaluate the feasibility and safety of a virtual reality headset equipped with an eye-tracking device that is able to promote an immersive visuospatial and social virtual reality (VR) experience for patients undergoing awake craniotomy. METHODS We recruited 15 patients with brain tumors near language and/or motor areas. Language mapping was performed with a naming task, DO 80, presented on a computer tablet and then in 2D and 3D via the VRH. Patients were also immersed in a visuospatial and soc...
    Dynamic range compression (DRC) and noise reduction algorithms are commonly used in hearing aids. They are known to have opposite objectives concerning the Signal-to-Noise Ratio (SNR) and to affect negatively the localization performance.... more
    Dynamic range compression (DRC) and noise reduction algorithms are commonly used in hearing aids. They are known to have opposite objectives concerning the Signal-to-Noise Ratio (SNR) and to affect negatively the localization performance. Yet, the study of their interaction received few attention. In this work, we improve an existing combined approach of DRC and noise reduction to bridge the gap between the algorithms proposed independently in their respective communities. The proposed solution is then compared to state-of-the-art algorithms thanks to objective criteria assessing the spatial fidelity preservation, the SNR improvement and the output dynamic range reduction. Experimental results show that the standard serial concatenation of noise reduction and DRC stages is unable to improve the SNR and preserve the noise component acoustic characteristics. They suggest that the proposed design restores the noise localization cues and manages to improve the output SNR.
    The Micro-expressions (MEs) carry specific nonverbal information, for example the facial movement caused by pain. However, as a consequence of their local and short nature, it is difficult to detect MEs. This paper presents a novel... more
    The Micro-expressions (MEs) carry specific nonverbal information, for example the facial movement caused by pain. However, as a consequence of their local and short nature, it is difficult to detect MEs. This paper presents a novel detection method by recognizing a local and temporal pattern (LTP) of facial movement. In our system, with the purpose of improving the detection accuracy, temporal local features are generated from the video in a sliding window of 300ms (mean duration of a ME). These features are extracted from a projection in PCA space and form a specific pattern during ME which is the same for all MEs. Using a classical classification algorithm (SVM), MEs are then distinguished from other facial movements. Finally, a global fusion analysis is applied on the whole face to eliminate false positives. Experiments are performed on two databases: CASME I and CASME II. The detection results show that the proposed method outperforms the most popular detection method in terms of F1-score according to the analysis of multiple metrics.
    International audienc

    And 97 more