Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Object point cloud classification has drawn great research attention since the release of benchmarking datasets, such as the ModelNet and the ShapeNet. These benchmarks assume point clouds covering complete surfaces of object instances,... more
Object point cloud classification has drawn great research attention since the release of benchmarking datasets, such as the ModelNet and the ShapeNet. These benchmarks assume point clouds covering complete surfaces of object instances, for which plenty of high-performing methods have been developed. However, their settings deviate from those often met in practice, where, due to (self-)occlusion, a point cloud covering partial surface of an object is captured from an arbitrary view. We show in this paper that performance of existing point cloud classifiers drops drastically under the considered single-view, partial setting; the phenomenon is consistent with the observation that semantic category of a partial object surface is less ambiguous only when its distribution on the whole surface is clearly specified. To this end, we argue for a single-view, partial setting where supervised learning of object pose estimation should be accompanied with classification. Technically, we propose ...
Advanced face swapping methods have achieved appealing results. However, most of these methods have many parameters and computations, which makes it challenging to apply them in real-time applications or deploy them on edge devices like... more
Advanced face swapping methods have achieved appealing results. However, most of these methods have many parameters and computations, which makes it challenging to apply them in real-time applications or deploy them on edge devices like mobile phones. In this work, we propose a lightweight Identity-aware Dynamic Network (IDN) for subject-agnostic face swapping by dynamically adjusting the model parameters according to the identity information. In particular, we design an efficient Identity Injection Module (IIM) by introducing two dynamic neural network techniques, including the weights prediction and weights modulation. Once the IDN is updated, it can be applied to swap faces given any target image or video. The presented IDN contains only 0.50M parameters and needs 0.33G FLOPs per frame, making it capable for real-time video face swapping on mobile phones. In addition, we introduce a knowledge distillation-based method for stable training, and a loss reweighting module is employed...
Video surveillance-oriented biometrics is a very challenging task and has tremendous significance to the security of public places. With the growing threat of crime and terrorism to public security, it is becoming more and more critical... more
Video surveillance-oriented biometrics is a very challenging task and has tremendous significance to the security of public places. With the growing threat of crime and terrorism to public security, it is becoming more and more critical to develop and deploy reliable biometric techniques for video surveillance applications. Traditionally, it has been regarded as a very difficult problem. The low-quality of video frames and the rich intra-personal appearance variations impose significant challenge to previous biometric techniques, making them impractical to real-world video surveillance applications. Fortunately, recent advances in computer vision and machine learning algorithms as well as imaging hardware provide new inspirations and possibilities. In particular, the development of deep learning and the availability of big data open up great potential. Therefore, it is the time that this problem be re-evaluated. This special issue will provide a platform for researchers to exchange their innovative ideas and attractive improvements on video surveillance-oriented biometrics.
A point cloud is a popular shape representation adopted in 3D object classification, which covers the whole surface of an object and is usually well aligned. However, such an assumption can be invalid in practice, as point clouds... more
A point cloud is a popular shape representation adopted in 3D object classification, which covers the whole surface of an object and is usually well aligned. However, such an assumption can be invalid in practice, as point clouds collected in real-world scenarios are typically scanned from visible object parts observed under arbitrary SO(3) viewpoint, which are thus incomplete due to self and inter-object occlusion. In light of this, this paper introduces a practical setting to classify partial point clouds of object instances under any poses. Compared to the classification of complete object point clouds, such a problem is made more challenging in view of geometric similarities of local shape across object classes and intra-class dissimilarities of geometries restricted by their observation view. We consider that specifying the location of partial point clouds on their object surface is essential to alleviate suffering from the aforementioned challenges, which can be solved via an ...
Text-to-image person re-identification (ReID) aims to search for images containing a person of interest using textual descriptions. However, due to the significant modality gap and the large intra-class variance in textual descriptions,... more
Text-to-image person re-identification (ReID) aims to search for images containing a person of interest using textual descriptions. However, due to the significant modality gap and the large intra-class variance in textual descriptions, text-to-image ReID remains a challenging problem. Accordingly, in this paper, we propose a Semantically Self-Aligned Network (SSAN) to handle the above problems. First, we propose a novel method that automatically extracts semantically aligned part-level features from the two modalities. Second, we design a multi-view nonlocal network that captures the relationships between body parts, thereby establishing better correspondences between body parts and noun phrases. Third, we introduce a Compound Ranking (CR) loss that makes use of textual descriptions for other images of the same identity to provide extra supervision, thereby effectively reducing the intra-class variance in textual features. Finally, to expedite future research in text-to-image ReID,...
Nucleus segmentation is a challenging task due to the crowded distribution and blurry boundaries of nuclei. Recent approaches represent nuclei by means of polygons to differentiate between touching and overlapping nuclei and have... more
Nucleus segmentation is a challenging task due to the crowded distribution and blurry boundaries of nuclei. Recent approaches represent nuclei by means of polygons to differentiate between touching and overlapping nuclei and have accordingly achieved promising performance. Each polygon is represented by a set of centroid-toboundary distances, which are in turn predicted by features of the centroid pixel for a single nucleus. However, using the centroid pixel alone does not provide sufficient contextual information for robust prediction. To handle this problem, we propose a Context-aware Polygon Proposal Network (CPP-Net) for nucleus segmentation. First, we sample a point set rather than one single pixel within each cell for distance prediction. This strategy substantially enhances contextual information and thereby improves the robustness of the prediction. Second, we propose a Confidencebased Weighting Module, which adaptively fuses the predictions from the sampled point set. Third...
Face recognition is one of the most important and promising biometric techniques. In face recognition, a similarity score is automatically calculated between face images to further decide their identity. Due to its non-invasive... more
Face recognition is one of the most important and promising biometric techniques. In face recognition, a similarity score is automatically calculated between face images to further decide their identity. Due to its non-invasive characteristics and ease of use, it has shown great potential in many real-world applications, e.g., video surveillance, access control systems, forensics and security, and social networks. This thesis addresses key challenges inherent in real-world face recognition systems including pose and illumination variations, occlusion, and image blur. To tackle these challenges, a series of robust face recognition algorithms are proposed. These can be summarized as follows: In Chapter 2, we present a novel, manually designed face image descriptor named “Dual-Cross Patterns” (DCP). DCP efficiently encodes the seconder-order statistics of facial textures in the most informative directions within a face image. It proves to be more descriptive and discriminative than pre...
In this paper, we devise a novel two-stage cascaded U-Net to segment the substructures of brain tumors from coarse to fine. The network is trained end-to-end on the Multimodal Brain Tumor Segmentation Challenge (BraTS) 2019 training... more
In this paper, we devise a novel two-stage cascaded U-Net to segment the substructures of brain tumors from coarse to fine. The network is trained end-to-end on the Multimodal Brain Tumor Segmentation Challenge (BraTS) 2019 training dataset. Experimental results on the testing set demonstrate that the proposed method achieved average Dice scores of 0.83267, 0.88796 and 0.83697, as well as Hausdorff distances (95%) of 2.65056, 4.61809 and 4.13071, for the enhancing tumor, whole tumor and tumor core, respectively. The approach won the 1st place in the BraTS 2019 challenge segmentation task, with more than 70 teams participating in the challenge.
The model cascade strategy that runs a series of deep models sequentially for coarse-to-fine medical image segmentation is becoming increasingly popular, as it effectively relieves the class imbalance problem. This strategy has achieved... more
The model cascade strategy that runs a series of deep models sequentially for coarse-to-fine medical image segmentation is becoming increasingly popular, as it effectively relieves the class imbalance problem. This strategy has achieved state-of-the-art performance in many segmentation applications but results in undesired system complexity and ignores correlation among deep models. In this paper, we propose a light and clean deep model that conducts brain tumor segmentation in a single-pass and solves the class imbalance problem better than model cascade. First, we decompose brain tumor segmentation into three different but related tasks and propose a multi-task deep model that trains them together to exploit their underlying correlation. Second, we design a curriculum learning-based training strategy that trains the above multi-task model more effectively. Third, we introduce a simple yet effective post-processing method that can further improve the segmentation performance significantly. The proposed methods are extensively evaluated on BRATS 2017 and BRATS 2015 datasets, ranking first on the BRATS 2015 test set and showing top performance among 60+ competing teams on the BRATS 2017 validation set.
The capacity to recognize faces under varied poses is a fundamental human ability that presents a unique challenge for computer vision systems. Compared to frontal face recognition, which has been intensively studied and has gradually... more
The capacity to recognize faces under varied poses is a fundamental human ability that presents a unique challenge for computer vision systems. Compared to frontal face recognition, which has been intensively studied and has gradually matured in the past few decades, Pose-Invariant Face Recognition (PIFR) remains a largely unsolved problem. However, PIFR is crucial to realizing the full potential of face recognition for real-world applications, since face recognition is intrinsically a passive biometric technology for recognizing uncooperative subjects. In this article, we discuss the inherent difficulties in PIFR and present a comprehensive review of established techniques. Existing PIFR methods can be grouped into four categories, that is, pose-robust feature extraction approaches, multiview subspace learning approaches, face synthesis approaches, and hybrid approaches. The motivations, strategies, pros/cons, and performance of representative approaches are described and compared....
Face images captured in unconstrained environments usually contain significant pose variation, which dramatically degrades the performance of algorithms designed to recognize frontal faces. This paper proposes a novel face identification... more
Face images captured in unconstrained environments usually contain significant pose variation, which dramatically degrades the performance of algorithms designed to recognize frontal faces. This paper proposes a novel face identification framework capable of handling the full range of pose variations within ±90° of yaw. The proposed framework first transforms the original pose-invariant face recognition problem into a partial frontal face recognition problem. A robust patch-based face representation scheme is then developed to represent the synthesized partial frontal faces. For each patch, a transformation dictionary is learnt under the proposed multi-task learning scheme. The transformation dictionary transforms the features of different poses into a discriminative subspace. Finally, face matching is performed at patch level rather than at the holistic level. Extensive and systematic experimentation on FERET, CMU-PIE, and Multi-PIE databases shows that the proposed method consiste...
Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in... more
Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low-and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces.
Research Interests:
Face images appeared in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional... more
Face images appeared in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensive deep learning framework to jointly learn face representation using multimodal information. The proposed deep learning structure is composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE). The set of CNNs extracts complementary facial features from multimodal data. Then, the extracted features are concatenated to form a high-dimensional feature vector, whose dimension is compressed by SAE. All the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the reproducibility of this work. Using the proposed single CNN architecture and limited training data, 98.43% verification rate is achieved on the LFW database. Benefited from the complementary information contained in multimodal data, our small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set.
The capacity to recognize faces under varied poses is a fundamental human ability that presents a unique challenge for computer vision systems. Compared to frontal face recognition, which has been intensively studied and has gradually... more
The capacity to recognize faces under varied poses is a fundamental human ability that presents a unique challenge for computer vision systems. Compared to frontal face recognition, which has been intensively studied and has gradually matured in the past few decades, pose-invariant face recognition (PIFR) remains a largely unsolved problem. However, PIFR is crucial to realizing the full potential of face recognition for real-world applications, since face recognition is intrinsically a passive biometric technology for recognizing uncooperative subjects. In this paper, we discuss the inherent difficulties in PIFR and present a comprehensive review of established techniques. Existing PIFR methods can be grouped into four categories, i.e., pose-robust feature extraction approaches, multi-view subspace learning approaches, face synthesis approaches, and hybrid approaches. The motivations, strategies, pros/cons, and performance of representative approaches are described and compared. Moreover, promising directions for future research are discussed.

Final version:
http://arxiv.org/pdf/1502.04383v3.pdf
http://dl.acm.org/citation.cfm?id=2845089
Video surveillance-oriented biometrics is a very challenging task and has tremendous significance to the security of public places. With the growing threat of crime and terrorism to public security, it is becoming more and more critical... more
Video surveillance-oriented biometrics is a very challenging task and has tremendous significance to the security of public places. With the growing threat of crime and terrorism to public security, it is becoming more and more critical to develop and deploy reliable biometric techniques for video surveillance applications. Traditionally, it has been regarded as a very difficult problem. The low-quality of video frames and the rich intra-personal appearance variations impose significant challenge to previous biometric techniques, making them impractical to real-world video surveillance applications. Fortunately, recent advances in computer vision and machine learning algorithms as well as imaging hardware provide new inspirations and possibilities. In particular, the development of deep learning and the availability of big data open up great potential. Therefore, it is the time that this problem be re-evaluated. This special issue will provide a platform for researchers to exchange their innovative ideas and attractive improvements on video surveillance-oriented biometrics.
Research Interests: