Do object part localization methods produce bilaterally
symmetric results on mirror images? Surpr... more Do object part localization methods produce bilaterally symmetric results on mirror images? Surprisingly not, even though state of the art methods augment the training set with mirrored images. In this paper we take a closer look into this issue. We first introduce the concept of mirrorability as the ability of a model to produce symmetric results in mirrored images and introduce a corresponding measure, namely the mirror error that is defined as the difference between the detection result on an image and the mirror of the detection result on its mirror image. We evaluate the mirrorability of several state of the art algorithms in two of the most intensively studied problems, namely human pose estimation and face alignment. Our experiments lead to several interesting findings: 1) Most of state of the art methods struggle to preserve the mirror symmetry, despite the fact that they do have very similar overall performance on the original and mirror images; 2) the low mirrorability is not caused by training or testing sample bias - all algorithms are trained on both the original images and their mirrored versions; 3) the mirror error is strongly correlated to the localization/alignment error (with correlation coefficients around 0.7). Since the mirror error is calculated without knowledge of the ground truth, we show two interesting applications - in the first it is used to guide the selection of difficult samples and in the second to give feedback in a popular Cascaded Pose Regression method for face alignment.
Privileged Information-based Conditional Structured
Output Regression Forest for Facial Point Det... more Privileged Information-based Conditional Structured Output Regression Forest for Facial Point Detection
In this paper we propose a object alignment method that detects the landmarks of an object in 2D ... more In this paper we propose a object alignment method that detects the landmarks of an object in 2D images. In the Regression Forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several landmarks. We propose to refine the votes before accumulating them into the Hough space, by sieving and/or aggregating. In order to filter out false positive votes, we pass them through several sieves, each associated with a discrete or continuous latent variable. The sieves filter out votes that are not consistent with the latent variable in question, something that implicitly enforces global constraints. In order to aggregate the votes when necessary, we adjusts on-the-fly a proximity threshold by applying a classifier on middle-level features extracted from voting maps for the object landmark in question. Moreover, our method is able to predict the unreliability of an individual object landmark. This information can be useful for subsequent object analysis like object recognition. Our contributions are validated for two object alignment tasks, face alignment and car alignment, on datasets with challenging images collected in the wild, i.e. the Labeled Face in the Wild, the Annotated Facial Landmarks in the Wild and the street scene car dataset. We show that with the proposed approach, and without explicitly introducing shape models, we obtain performance superior or close to the state of the art for both tasks.
In this paper we propose a method for the localization of multiple facial features on challenging... more In this paper we propose a method for the localization of multiple facial features on challenging face images. In the regression forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several facial features. In order to filter out votes that are not relevant, we pass them through two types of sieves, that are organised in a cascade, and which enforce geometric constraints. The first sieve filters out votes that are not consistent with a hypothesis for the location of the face center. Several sieves of the second type, one associated with each individual facial point, filter out distant votes. We propose a method that adjusts on-the-fly the proximity threshold of each second type sieve by applying a classifier which, based on middle-level features extracted from voting maps for the facial feature in question, makes a sequence of decisions on whether the threshold should be reduced or not. We validate our proposed method on two challenging datasets with images collected from the Internet in which we obtain state of the art results without resorting to explicit facial shape models. We also show the benefits of our method for proximity threshold adjustment especially on 'difficult' face images.
In this paper we propose a method that utilises privileged information, that is information that ... more In this paper we propose a method that utilises privileged information, that is information that is available only at the training phase, in order to train Regression Forests for facial feature detection. Our method chooses the split functions at some randomly chose internal tree nodes according to the information gain calculated from the privileged information, such as head pose or gender. In this way the training patches arrive at leaves that tend to have low variance both in displacements to facial points and in privileged information. At each leaf node, we learn both the probability of the privileged information and regression models conditioned on it. During testing, the marginal probability of privileged information is estimated and the facial feature locations are localised using the appropriate conditional regression models. The proposed model is validated by comparing with very recent methods on two challenging datasets, namely Labelled Faces in the Wild and Labelled Face Parts in the Wild.
In this paper, we propose a method for face parts localization called Structured-Output Regressio... more In this paper, we propose a method for face parts localization called Structured-Output Regression Forests (SO-RF). We assume that the spatial graph of face parts structure can be partitioned into star graphs associated with individual parts. At each leaf, a regression model for an individual part as well as an interdependency model between parts in the star graph is learned. During testing, individual part positions are determined by the product of two voting maps, corresponding to two different models. The part regression model captures local feature evidence while the interdependency model captures the structure configuration. Our method has shown state of the art results on the publicly available BioID dataset and competitive results on a more challenging dataset, namely Labeled Face Parts in the Wild.
Do object part localization methods produce bilaterally
symmetric results on mirror images? Surpr... more Do object part localization methods produce bilaterally symmetric results on mirror images? Surprisingly not, even though state of the art methods augment the training set with mirrored images. In this paper we take a closer look into this issue. We first introduce the concept of mirrorability as the ability of a model to produce symmetric results in mirrored images and introduce a corresponding measure, namely the mirror error that is defined as the difference between the detection result on an image and the mirror of the detection result on its mirror image. We evaluate the mirrorability of several state of the art algorithms in two of the most intensively studied problems, namely human pose estimation and face alignment. Our experiments lead to several interesting findings: 1) Most of state of the art methods struggle to preserve the mirror symmetry, despite the fact that they do have very similar overall performance on the original and mirror images; 2) the low mirrorability is not caused by training or testing sample bias - all algorithms are trained on both the original images and their mirrored versions; 3) the mirror error is strongly correlated to the localization/alignment error (with correlation coefficients around 0.7). Since the mirror error is calculated without knowledge of the ground truth, we show two interesting applications - in the first it is used to guide the selection of difficult samples and in the second to give feedback in a popular Cascaded Pose Regression method for face alignment.
Privileged Information-based Conditional Structured
Output Regression Forest for Facial Point Det... more Privileged Information-based Conditional Structured Output Regression Forest for Facial Point Detection
In this paper we propose a object alignment method that detects the landmarks of an object in 2D ... more In this paper we propose a object alignment method that detects the landmarks of an object in 2D images. In the Regression Forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several landmarks. We propose to refine the votes before accumulating them into the Hough space, by sieving and/or aggregating. In order to filter out false positive votes, we pass them through several sieves, each associated with a discrete or continuous latent variable. The sieves filter out votes that are not consistent with the latent variable in question, something that implicitly enforces global constraints. In order to aggregate the votes when necessary, we adjusts on-the-fly a proximity threshold by applying a classifier on middle-level features extracted from voting maps for the object landmark in question. Moreover, our method is able to predict the unreliability of an individual object landmark. This information can be useful for subsequent object analysis like object recognition. Our contributions are validated for two object alignment tasks, face alignment and car alignment, on datasets with challenging images collected in the wild, i.e. the Labeled Face in the Wild, the Annotated Facial Landmarks in the Wild and the street scene car dataset. We show that with the proposed approach, and without explicitly introducing shape models, we obtain performance superior or close to the state of the art for both tasks.
In this paper we propose a method for the localization of multiple facial features on challenging... more In this paper we propose a method for the localization of multiple facial features on challenging face images. In the regression forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several facial features. In order to filter out votes that are not relevant, we pass them through two types of sieves, that are organised in a cascade, and which enforce geometric constraints. The first sieve filters out votes that are not consistent with a hypothesis for the location of the face center. Several sieves of the second type, one associated with each individual facial point, filter out distant votes. We propose a method that adjusts on-the-fly the proximity threshold of each second type sieve by applying a classifier which, based on middle-level features extracted from voting maps for the facial feature in question, makes a sequence of decisions on whether the threshold should be reduced or not. We validate our proposed method on two challenging datasets with images collected from the Internet in which we obtain state of the art results without resorting to explicit facial shape models. We also show the benefits of our method for proximity threshold adjustment especially on 'difficult' face images.
In this paper we propose a method that utilises privileged information, that is information that ... more In this paper we propose a method that utilises privileged information, that is information that is available only at the training phase, in order to train Regression Forests for facial feature detection. Our method chooses the split functions at some randomly chose internal tree nodes according to the information gain calculated from the privileged information, such as head pose or gender. In this way the training patches arrive at leaves that tend to have low variance both in displacements to facial points and in privileged information. At each leaf node, we learn both the probability of the privileged information and regression models conditioned on it. During testing, the marginal probability of privileged information is estimated and the facial feature locations are localised using the appropriate conditional regression models. The proposed model is validated by comparing with very recent methods on two challenging datasets, namely Labelled Faces in the Wild and Labelled Face Parts in the Wild.
In this paper, we propose a method for face parts localization called Structured-Output Regressio... more In this paper, we propose a method for face parts localization called Structured-Output Regression Forests (SO-RF). We assume that the spatial graph of face parts structure can be partitioned into star graphs associated with individual parts. At each leaf, a regression model for an individual part as well as an interdependency model between parts in the star graph is learned. During testing, individual part positions are determined by the product of two voting maps, corresponding to two different models. The part regression model captures local feature evidence while the interdependency model captures the structure configuration. Our method has shown state of the art results on the publicly available BioID dataset and competitive results on a more challenging dataset, namely Labeled Face Parts in the Wild.
Uploads
Papers by Heng Yang
symmetric results on mirror images? Surprisingly not, even
though state of the art methods augment the training set
with mirrored images. In this paper we take a closer look
into this issue. We first introduce the concept of mirrorability
as the ability of a model to produce symmetric results in
mirrored images and introduce a corresponding measure,
namely the mirror error that is defined as the difference between
the detection result on an image and the mirror of the
detection result on its mirror image. We evaluate the mirrorability
of several state of the art algorithms in two of the
most intensively studied problems, namely human pose estimation
and face alignment. Our experiments lead to several
interesting findings: 1) Most of state of the art methods
struggle to preserve the mirror symmetry, despite the fact
that they do have very similar overall performance on the
original and mirror images; 2) the low mirrorability is not
caused by training or testing sample bias - all algorithms
are trained on both the original images and their mirrored
versions; 3) the mirror error is strongly correlated to the
localization/alignment error (with correlation coefficients
around 0.7). Since the mirror error is calculated without
knowledge of the ground truth, we show two interesting applications
- in the first it is used to guide the selection of
difficult samples and in the second to give feedback in a
popular Cascaded Pose Regression method for face alignment.
Output Regression Forest for Facial Point Detection
symmetric results on mirror images? Surprisingly not, even
though state of the art methods augment the training set
with mirrored images. In this paper we take a closer look
into this issue. We first introduce the concept of mirrorability
as the ability of a model to produce symmetric results in
mirrored images and introduce a corresponding measure,
namely the mirror error that is defined as the difference between
the detection result on an image and the mirror of the
detection result on its mirror image. We evaluate the mirrorability
of several state of the art algorithms in two of the
most intensively studied problems, namely human pose estimation
and face alignment. Our experiments lead to several
interesting findings: 1) Most of state of the art methods
struggle to preserve the mirror symmetry, despite the fact
that they do have very similar overall performance on the
original and mirror images; 2) the low mirrorability is not
caused by training or testing sample bias - all algorithms
are trained on both the original images and their mirrored
versions; 3) the mirror error is strongly correlated to the
localization/alignment error (with correlation coefficients
around 0.7). Since the mirror error is calculated without
knowledge of the ground truth, we show two interesting applications
- in the first it is used to guide the selection of
difficult samples and in the second to give feedback in a
popular Cascaded Pose Regression method for face alignment.
Output Regression Forest for Facial Point Detection