Do object part localization methods produce bilaterally
symmetric results on mirror images? Surpr... more Do object part localization methods produce bilaterally symmetric results on mirror images? Surprisingly not, even though state of the art methods augment the training set with mirrored images. In this paper we take a closer look into this issue. We first introduce the concept of mirrorability as the ability of a model to produce symmetric results in mirrored images and introduce a corresponding measure, namely the mirror error that is defined as the difference between the detection result on an image and the mirror of the detection result on its mirror image. We evaluate the mirrorability of several state of the art algorithms in two of the most intensively studied problems, namely human pose estimation and face alignment. Our experiments lead to several interesting findings: 1) Most of state of the art methods struggle to preserve the mirror symmetry, despite the fact that they do have very similar overall performance on the original and mirror images; 2) the low mirrorability is not caused by training or testing sample bias - all algorithms are trained on both the original images and their mirrored versions; 3) the mirror error is strongly correlated to the localization/alignment error (with correlation coefficients around 0.7). Since the mirror error is calculated without knowledge of the ground truth, we show two interesting applications - in the first it is used to guide the selection of difficult samples and in the second to give feedback in a popular Cascaded Pose Regression method for face alignment.
Privileged Information-based Conditional Structured
Output Regression Forest for Facial Point Det... more Privileged Information-based Conditional Structured Output Regression Forest for Facial Point Detection
In this paper we propose a object alignment method that detects the landmarks of an object in 2D ... more In this paper we propose a object alignment method that detects the landmarks of an object in 2D images. In the Regression Forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several landmarks. We propose to refine the votes before accumulating them into the Hough space, by sieving and/or aggregating. In order to filter out false positive votes, we pass them through several sieves, each associated with a discrete or continuous latent variable. The sieves filter out votes that are not consistent with the latent variable in question, something that implicitly enforces global constraints. In order to aggregate the votes when necessary, we adjusts on-the-fly a proximity threshold by applying a classifier on middle-level features extracted from voting maps for the object landmark in question. Moreover, our method is able to predict the unreliability of an individual object landmark. This information can be useful for subsequent object analysis like object recognition. Our contributions are validated for two object alignment tasks, face alignment and car alignment, on datasets with challenging images collected in the wild, i.e. the Labeled Face in the Wild, the Annotated Facial Landmarks in the Wild and the street scene car dataset. We show that with the proposed approach, and without explicitly introducing shape models, we obtain performance superior or close to the state of the art for both tasks.
In this paper we propose a method for facial landmarks localization in face sketch images. As rec... more In this paper we propose a method for facial landmarks localization in face sketch images. As recent approaches and the corresponding datasets are designed for ordinary face photos, the performance of such models drop significantly when they are applied on face sketch images. We first propose a scheme to synthesize face sketches from face photos based on randomforests edge detection and local face region enhancement. Then we jointly train a Cascaded Pose Regression based method for facial landmarks localization for both face photos and sketches. We build an evaluation dataset, called Face Sketches in the Wild (FSW), with 450 face sketch images collected from the Internet and with the manual annotation of 68 facial landmark locations on each face sketch. The proposed multi-modality facial landmark localization method shows competitive performance on both face sketch images (the FSW dataset) and face photo images (the Labeled Face Parts in the Wild dataset), despite the fact that we do not use extra annotation of face sketches for model building.
Despite the great success of recent facial landmarks localization approaches, the presence of occ... more Despite the great success of recent facial landmarks localization approaches, the presence of occlusions significantly degrades the performance of the systems. However, very few works have addressed this problem explicitly due to the high diversity of occlusion in real world. In this paper, we address the face mask reasoning and facial landmarks localization in an unified Structured Decision Forests framework. We first assign a portion of the face dataset with face masks, i.e., for each face image we give each pixel a label to indicate whether it belongs to the face or not. Then we incorporate such additional information of dense pixel labelling into training the Structured Classification-Regression Decision Forest. The classification nodes aim at decreasing the variance of the pixel labels of the patches by using our proposed structured criterion while the regression nodes aim at decreasing the variance of the displacements between the patches and the facial landmarks. The proposed framework allows us to predict the face mask and facial landmarks locations jointly. We test the model on face images from several datasets with significant occlusion. The proposed method 1) yields promising results in face mask reasoning; 2) improves the existing Decision Forests approaches in facial landmark localization, aided by the face mask reasoning.
In this paper we propose a method for the localization of multiple facial features on challenging... more In this paper we propose a method for the localization of multiple facial features on challenging face images. In the regression forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several facial features. In order to filter out votes that are not relevant, we pass them through two types of sieves, that are organised in a cascade, and which enforce geometric constraints. The first sieve filters out votes that are not consistent with a hypothesis for the location of the face center. Several sieves of the second type, one associated with each individual facial point, filter out distant votes. We propose a method that adjusts on-the-fly the proximity threshold of each second type sieve by applying a classifier which, based on middle-level features extracted from voting maps for the facial feature in question, makes a sequence of decisions on whether the threshold should be reduced or not. We validate our proposed method on two challenging datasets with images collected from the Internet in which we obtain state of the art results without resorting to explicit facial shape models. We also show the benefits of our method for proximity threshold adjustment especially on 'difficult' face images.
In this paper we propose a method that utilises privileged information, that is information that ... more In this paper we propose a method that utilises privileged information, that is information that is available only at the training phase, in order to train Regression Forests for facial feature detection. Our method chooses the split functions at some randomly chose internal tree nodes according to the information gain calculated from the privileged information, such as head pose or gender. In this way the training patches arrive at leaves that tend to have low variance both in displacements to facial points and in privileged information. At each leaf node, we learn both the probability of the privileged information and regression models conditioned on it. During testing, the marginal probability of privileged information is estimated and the facial feature locations are localised using the appropriate conditional regression models. The proposed model is validated by comparing with very recent methods on two challenging datasets, namely Labelled Faces in the Wild and Labelled Face Parts in the Wild.
Reconstructing 3D objects from single line drawings is often desirable in computer vision and gra... more Reconstructing 3D objects from single line drawings is often desirable in computer vision and graphics applications. If the line drawing of a complex 3D object is decomposed into primitives of simple shape, the object can be easily reconstructed. We propose an effective method to conduct the line drawing separation and turn a complex line drawing into parametric 3D models. This is achieved by recursively separating the line drawing using two types of split faces. Our experiments show that the proposed separation method can generate more basic and simple line drawings, and its combination with the example-based reconstruction can robustly recover wider range of complex parametric 3D objects than previous methods.
In this paper, we propose a method for face parts localization called Structured-Output Regressio... more In this paper, we propose a method for face parts localization called Structured-Output Regression Forests (SO-RF). We assume that the spatial graph of face parts structure can be partitioned into star graphs associated with individual parts. At each leaf, a regression model for an individual part as well as an interdependency model between parts in the star graph is learned. During testing, individual part positions are determined by the product of two voting maps, corresponding to two different models. The part regression model captures local feature evidence while the interdependency model captures the structure configuration. Our method has shown state of the art results on the publicly available BioID dataset and competitive results on a more challenging dataset, namely Labeled Face Parts in the Wild.
Do object part localization methods produce bilaterally
symmetric results on mirror images? Surpr... more Do object part localization methods produce bilaterally symmetric results on mirror images? Surprisingly not, even though state of the art methods augment the training set with mirrored images. In this paper we take a closer look into this issue. We first introduce the concept of mirrorability as the ability of a model to produce symmetric results in mirrored images and introduce a corresponding measure, namely the mirror error that is defined as the difference between the detection result on an image and the mirror of the detection result on its mirror image. We evaluate the mirrorability of several state of the art algorithms in two of the most intensively studied problems, namely human pose estimation and face alignment. Our experiments lead to several interesting findings: 1) Most of state of the art methods struggle to preserve the mirror symmetry, despite the fact that they do have very similar overall performance on the original and mirror images; 2) the low mirrorability is not caused by training or testing sample bias - all algorithms are trained on both the original images and their mirrored versions; 3) the mirror error is strongly correlated to the localization/alignment error (with correlation coefficients around 0.7). Since the mirror error is calculated without knowledge of the ground truth, we show two interesting applications - in the first it is used to guide the selection of difficult samples and in the second to give feedback in a popular Cascaded Pose Regression method for face alignment.
Privileged Information-based Conditional Structured
Output Regression Forest for Facial Point Det... more Privileged Information-based Conditional Structured Output Regression Forest for Facial Point Detection
In this paper we propose a object alignment method that detects the landmarks of an object in 2D ... more In this paper we propose a object alignment method that detects the landmarks of an object in 2D images. In the Regression Forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several landmarks. We propose to refine the votes before accumulating them into the Hough space, by sieving and/or aggregating. In order to filter out false positive votes, we pass them through several sieves, each associated with a discrete or continuous latent variable. The sieves filter out votes that are not consistent with the latent variable in question, something that implicitly enforces global constraints. In order to aggregate the votes when necessary, we adjusts on-the-fly a proximity threshold by applying a classifier on middle-level features extracted from voting maps for the object landmark in question. Moreover, our method is able to predict the unreliability of an individual object landmark. This information can be useful for subsequent object analysis like object recognition. Our contributions are validated for two object alignment tasks, face alignment and car alignment, on datasets with challenging images collected in the wild, i.e. the Labeled Face in the Wild, the Annotated Facial Landmarks in the Wild and the street scene car dataset. We show that with the proposed approach, and without explicitly introducing shape models, we obtain performance superior or close to the state of the art for both tasks.
In this paper we propose a method for facial landmarks localization in face sketch images. As rec... more In this paper we propose a method for facial landmarks localization in face sketch images. As recent approaches and the corresponding datasets are designed for ordinary face photos, the performance of such models drop significantly when they are applied on face sketch images. We first propose a scheme to synthesize face sketches from face photos based on randomforests edge detection and local face region enhancement. Then we jointly train a Cascaded Pose Regression based method for facial landmarks localization for both face photos and sketches. We build an evaluation dataset, called Face Sketches in the Wild (FSW), with 450 face sketch images collected from the Internet and with the manual annotation of 68 facial landmark locations on each face sketch. The proposed multi-modality facial landmark localization method shows competitive performance on both face sketch images (the FSW dataset) and face photo images (the Labeled Face Parts in the Wild dataset), despite the fact that we do not use extra annotation of face sketches for model building.
Despite the great success of recent facial landmarks localization approaches, the presence of occ... more Despite the great success of recent facial landmarks localization approaches, the presence of occlusions significantly degrades the performance of the systems. However, very few works have addressed this problem explicitly due to the high diversity of occlusion in real world. In this paper, we address the face mask reasoning and facial landmarks localization in an unified Structured Decision Forests framework. We first assign a portion of the face dataset with face masks, i.e., for each face image we give each pixel a label to indicate whether it belongs to the face or not. Then we incorporate such additional information of dense pixel labelling into training the Structured Classification-Regression Decision Forest. The classification nodes aim at decreasing the variance of the pixel labels of the patches by using our proposed structured criterion while the regression nodes aim at decreasing the variance of the displacements between the patches and the facial landmarks. The proposed framework allows us to predict the face mask and facial landmarks locations jointly. We test the model on face images from several datasets with significant occlusion. The proposed method 1) yields promising results in face mask reasoning; 2) improves the existing Decision Forests approaches in facial landmark localization, aided by the face mask reasoning.
In this paper we propose a method for the localization of multiple facial features on challenging... more In this paper we propose a method for the localization of multiple facial features on challenging face images. In the regression forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several facial features. In order to filter out votes that are not relevant, we pass them through two types of sieves, that are organised in a cascade, and which enforce geometric constraints. The first sieve filters out votes that are not consistent with a hypothesis for the location of the face center. Several sieves of the second type, one associated with each individual facial point, filter out distant votes. We propose a method that adjusts on-the-fly the proximity threshold of each second type sieve by applying a classifier which, based on middle-level features extracted from voting maps for the facial feature in question, makes a sequence of decisions on whether the threshold should be reduced or not. We validate our proposed method on two challenging datasets with images collected from the Internet in which we obtain state of the art results without resorting to explicit facial shape models. We also show the benefits of our method for proximity threshold adjustment especially on 'difficult' face images.
In this paper we propose a method that utilises privileged information, that is information that ... more In this paper we propose a method that utilises privileged information, that is information that is available only at the training phase, in order to train Regression Forests for facial feature detection. Our method chooses the split functions at some randomly chose internal tree nodes according to the information gain calculated from the privileged information, such as head pose or gender. In this way the training patches arrive at leaves that tend to have low variance both in displacements to facial points and in privileged information. At each leaf node, we learn both the probability of the privileged information and regression models conditioned on it. During testing, the marginal probability of privileged information is estimated and the facial feature locations are localised using the appropriate conditional regression models. The proposed model is validated by comparing with very recent methods on two challenging datasets, namely Labelled Faces in the Wild and Labelled Face Parts in the Wild.
Reconstructing 3D objects from single line drawings is often desirable in computer vision and gra... more Reconstructing 3D objects from single line drawings is often desirable in computer vision and graphics applications. If the line drawing of a complex 3D object is decomposed into primitives of simple shape, the object can be easily reconstructed. We propose an effective method to conduct the line drawing separation and turn a complex line drawing into parametric 3D models. This is achieved by recursively separating the line drawing using two types of split faces. Our experiments show that the proposed separation method can generate more basic and simple line drawings, and its combination with the example-based reconstruction can robustly recover wider range of complex parametric 3D objects than previous methods.
In this paper, we propose a method for face parts localization called Structured-Output Regressio... more In this paper, we propose a method for face parts localization called Structured-Output Regression Forests (SO-RF). We assume that the spatial graph of face parts structure can be partitioned into star graphs associated with individual parts. At each leaf, a regression model for an individual part as well as an interdependency model between parts in the star graph is learned. During testing, individual part positions are determined by the product of two voting maps, corresponding to two different models. The part regression model captures local feature evidence while the interdependency model captures the structure configuration. Our method has shown state of the art results on the publicly available BioID dataset and competitive results on a more challenging dataset, namely Labeled Face Parts in the Wild.
Uploads
Papers by Heng Yang
symmetric results on mirror images? Surprisingly not, even
though state of the art methods augment the training set
with mirrored images. In this paper we take a closer look
into this issue. We first introduce the concept of mirrorability
as the ability of a model to produce symmetric results in
mirrored images and introduce a corresponding measure,
namely the mirror error that is defined as the difference between
the detection result on an image and the mirror of the
detection result on its mirror image. We evaluate the mirrorability
of several state of the art algorithms in two of the
most intensively studied problems, namely human pose estimation
and face alignment. Our experiments lead to several
interesting findings: 1) Most of state of the art methods
struggle to preserve the mirror symmetry, despite the fact
that they do have very similar overall performance on the
original and mirror images; 2) the low mirrorability is not
caused by training or testing sample bias - all algorithms
are trained on both the original images and their mirrored
versions; 3) the mirror error is strongly correlated to the
localization/alignment error (with correlation coefficients
around 0.7). Since the mirror error is calculated without
knowledge of the ground truth, we show two interesting applications
- in the first it is used to guide the selection of
difficult samples and in the second to give feedback in a
popular Cascaded Pose Regression method for face alignment.
Output Regression Forest for Facial Point Detection
symmetric results on mirror images? Surprisingly not, even
though state of the art methods augment the training set
with mirrored images. In this paper we take a closer look
into this issue. We first introduce the concept of mirrorability
as the ability of a model to produce symmetric results in
mirrored images and introduce a corresponding measure,
namely the mirror error that is defined as the difference between
the detection result on an image and the mirror of the
detection result on its mirror image. We evaluate the mirrorability
of several state of the art algorithms in two of the
most intensively studied problems, namely human pose estimation
and face alignment. Our experiments lead to several
interesting findings: 1) Most of state of the art methods
struggle to preserve the mirror symmetry, despite the fact
that they do have very similar overall performance on the
original and mirror images; 2) the low mirrorability is not
caused by training or testing sample bias - all algorithms
are trained on both the original images and their mirrored
versions; 3) the mirror error is strongly correlated to the
localization/alignment error (with correlation coefficients
around 0.7). Since the mirror error is calculated without
knowledge of the ground truth, we show two interesting applications
- in the first it is used to guide the selection of
difficult samples and in the second to give feedback in a
popular Cascaded Pose Regression method for face alignment.
Output Regression Forest for Facial Point Detection