We present a novel method for object tracking using global and local states of object in video su... more We present a novel method for object tracking using global and local states of object in video surveillance application. Most traditional object models using global appearance cannot handle partial occlusion effectively. The unoccluded part of partially visible object retains invariable appearance. Therefore, we introduce global and local dynamics model as our object model to overcome partial occlusion using local feature, and apply it to Bayesian tracking problem using motion-based particle filtering. Finally, experiments on some video surveillance sequences demonstrate the effectiveness and robustness of our approach for tracking object motions in video surveillance.
Facial beauty prediction is a challenging problem in computer vision and multimedia fields, due t... more Facial beauty prediction is a challenging problem in computer vision and multimedia fields, due to the variant pose and diverse conditions. In this paper, we introduce “soft label” for each annotated facial image, and propose a novel neural network–classification and regression network (CRNet) with different branches, to simultaneously process a classification and a regression task. Besides, weighted mean squared error (MSE) and cross entropy (CE) are used as the loss function, which is robust to outliers. CRNet achieves state-of-the-art performance on SCUT-FBP and ECCV HotOrNot dataset. Experimental results demonstrate the effectiveness of the proposed method and clarify the most important facial regions for facial beauty perception.
Cotton is an important economic crop, and many loci for important traits have been identified, bu... more Cotton is an important economic crop, and many loci for important traits have been identified, but it remains challenging and time-consuming to identify candidate or causal genes/variants and clarify their roles in phenotype formation and regulation. Here, we first collected and integrated the multi-omics datasets including 25 genomes, transcriptomes in 76 tissue samples, epigenome data of five species and metabolome data of 768 metabolites from four tissues, and genetic variation, trait and transcriptome datasets from 4180 cotton accessions. Then, a cotton multi-omics database (CottonMD, http://yanglab.hzau.edu.cn/CottonMD/) was constructed. In CottonMD, multiple statistical methods were applied to identify the associations between variations and phenotypes, and many easy-to-use analysis tools were provided to help researchers quickly acquire the related omics information and perform multi-omics data analysis. Two case studies demonstrated the power of CottonMD for identifying and ...
Loss function is crucial for model training and feature representation learning, conventional mod... more Loss function is crucial for model training and feature representation learning, conventional models usually regard facial attractiveness recognition task as a regression problem, and adopt MSE loss or Huber variant loss as supervision to train a deep convolutional neural network (CNN) to predict facial attractiveness score. Little work has been done to systematically compare the performance of diverse loss functions. In this paper, we firstly systematically analyze model performance under diverse loss functions. Then a novel loss function named ComboLoss is proposed to guide the SEResNeXt50 network. The proposed method achieves state-of-the-art performance on SCUT-FBP, HotOrNot and SCUT-FBP5500 datasets with an improvement of 1.13%, 2.1% and 0.57% compared with prior arts, respectively. Code and models are available at this https URL.
Although Generative Adversarial Networks have shown remarkable performance in image generation, t... more Although Generative Adversarial Networks have shown remarkable performance in image generation, there are some challenges in image realism and convergence speed. The results of some models display the imbalances of quality within a generated image, in which some defective parts appear compared with other regions. Different from general single global optimization methods, we introduce an adaptive global and local bilevel optimization model(GL-GAN). The model achieves the generation of high-resolution images in a complementary and promoting way, where global optimization is to optimize the whole images and local is only to optimize the low-quality areas. With a simple network structure, GL-GAN is allowed to effectively avoid the nature of imbalance by local bilevel optimization, which is accomplished by first locating low-quality areas and then optimizing them. Moreover, by using feature map cues from discriminator output, we propose the adaptive local and global optimization method(A...
In this paper, we propose a novel method to track non-rigid and/or articulated objects using supe... more In this paper, we propose a novel method to track non-rigid and/or articulated objects using superpixel matching and markov random field MRF. Our algorithm consists of three stages. First, a superpixel dataset is constructed by segmenting training frames into superpixels, and each superpixel is represented by multiple features. The appearance information of target is encoded in the superpixel database. Second, each new frame is segmented into superpixels and then its object-background confidence map is derived by comparing its superpixels with k-nearest neighbors in superpixel dataset. Taking context information into account, we utilize MRF to further improve the accuracy of confidence map. In addition, the local context information is incorporated through a feedback to refine superpixel matching. In the last stage, visual tracking is achieved via finding the best candidate by maximum a posterior estimate based on the confidence map. Experiments show that our method outperforms several state-of-the-art trackers.
Attribution editing has achieved remarkable progress in recent years owing to the encoder-decoder... more Attribution editing has achieved remarkable progress in recent years owing to the encoder-decoder structure and generative adversarial network (GAN). However, it remains challenging in generating high-quality images with accurate attribute transformation. Attacking these problems, the work proposes a novel selective attribute editing model based on classification adversarial network (referred to as ClsGAN) that shows good balance between attribute transfer accuracy and photo-realistic images. Considering that the editing images are prone to be affected by original attribute due to skip-connection in encoder-decoder structure, an upper convolution residual network (referred to as Tr-resnet) is presented to selectively extract information from the source image and target label. In addition, to further improve the transfer accuracy of generated images, an attribute adversarial classifier (referred to as Atta-cls) is introduced to guide the generator from the perspective of attribute th...
Feature extraction plays a significant part in computer vision tasks. In this paper, we propose a... more Feature extraction plays a significant part in computer vision tasks. In this paper, we propose a method which transfers rich deep features from a pretrained model on face verification task and feeds the features into Bayesian ridge regression algorithm for facial beauty prediction. We leverage the deep neural networks that extracts more abstract features from stacked layers. Through simple but effective feature fusion strategy, our method achieves improved or comparable performance on SCUT-FBP dataset and ECCV HotOrNot dataset. Our experiments demonstrate the effectiveness of the proposed method and clarify the inner interpretability of facial beauty perception.
In this paper, we propose a novel method to track non-rigid and/or articulated objects using supe... more In this paper, we propose a novel method to track non-rigid and/or articulated objects using superpixel matching and markov random field MRF. Our algorithm consists of three stages. First, a superpixel dataset is constructed by segmenting training frames into superpixels, and each superpixel is represented by multiple features. The appearance information of target is encoded in the superpixel database. Second, each new frame is segmented into superpixels and then its object-background confidence map is derived by comparing its superpixels with k-nearest neighbors in superpixel dataset. Taking context information into account, we utilize MRF to further improve the accuracy of confidence map. In addition, the local context information is incorporated through a feedback to refine superpixel matching. In the last stage, visual tracking is achieved via finding the best candidate by maximum a posterior estimate based on the confidence map. Experiments show that our method outperforms seve...
In this paper, we exploit deep convolutional features for object appearance modeling and propose ... more In this paper, we exploit deep convolutional features for object appearance modeling and propose a simple while effective deep discriminative model (DDM) for visual tracking. The proposed DDM takes as input the deep features and outputs an object-background confidence map. Considering that both spatial information from lower convolutional layers and semantic information from higher layers benefit object tracking, we construct multiple deep discriminative models (DDMs) for each layer and combine these confidence maps from each layer to obtain the final object-background confidence map. To reduce the risk of model drift, we propose to adopt a saliency method to generate object candidates. Object tracking is then achieved by finding the candidate with the largest confidence value. Experiments on a large-scale tracking benchmark demonstrate that the propose method performs favorably against state-of-the-art trackers.
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019
In this paper, we propose the novel complementary Siamese networks (CoSNet) for visual tracking b... more In this paper, we propose the novel complementary Siamese networks (CoSNet) for visual tracking by exploiting complementary global and local representations to learn a matching function. In specific, the proposed CoSNet is two-fold: a global Siamese network (GSNet) and a local Siamese network (LSNet). The GSNet aims to match the target with candidates using holistic representation. By contrast, the LSNet explores partial object representation for matching. Instead of simply decomposing the object into regular patches in LSNet, we propose a novel attentional local part network, which automatically generates salient object parts for local representation and adaptively weights each part according to its importance in matching. In CoSNet, the GSNet and LSNet are jointly trained in an end-to-end manner. By coupling two complementary Siamese networks, our CoSNet learns a robust matching function which can effectively handle various appearance changes in visual tracking. Extensive experime...
Correlation filter has drawn increasing interest in visual tracking due to its high efficiency, h... more Correlation filter has drawn increasing interest in visual tracking due to its high efficiency, however, it is sensitive to partial occlusion, which may result in tracking failure. To address this problem, we propose a novel local-global correlation filter (LGCF) for object tracking. Our LGCF model utilizes both local-based and global-based strategies, and effectively combines these two strategies by exploiting the relationship of circular shifts among local object parts and global target for their motion models to preserve the structure of object. In specific, our proposed model has two advantages: (1) Owing to the benefits of local-based mechanism, our method is robust to partial occlusion by leveraging visible parts. (2) Taking into account the relationship of motion models among local parts and global target, our LGCF model is able to capture the inner structure of object, which further improves its robustness to occlusion. In addition, to alleviate the issue of drift away from ...
Attribution editing has shown remarking progress by the incorporating of encoder-decoder structur... more Attribution editing has shown remarking progress by the incorporating of encoder-decoder structure and generative adversarial network. However, there are still some challenges in the quality and attribute transformation of the generated images. Encoder-decoder structure leads to blurring of images and the skip-connection of encoder-decoder structure weakens the attribute transfer ability. To address these limitations, we propose a classification adversarial model(Cls-GAN) that can balance between attribute transfer and generated photo-realistic images. Considering that the transfer images are affected by the original attribute using skip-connection, we introduce upper convolution residual network(Tr-resnet) to selectively extract information from the source image and target label. Specially, we apply to the attribute classification adversarial network to learn about the defects of attribute transfer images so as to guide the generator. Finally, to meet the requirement of multimodal ...
Advances in Multimedia Information Processing – PCM 2018
Facial beauty prediction is a challenging problem in computer vision and multimedia fields, due t... more Facial beauty prediction is a challenging problem in computer vision and multimedia fields, due to the variant pose and diverse conditions. In this paper, we introduce “soft label” for each annotated facial image, and propose a novel neural network–classification and regression network (CRNet) with different branches, to simultaneously process a classification and a regression task. Besides, weighted mean squared error (MSE) and cross entropy (CE) are used as the loss function, which is robust to outliers. CRNet achieves state-of-the-art performance on SCUT-FBP and ECCV HotOrNot dataset. Experimental results demonstrate the effectiveness of the proposed method and clarify the most important facial regions for facial beauty perception.
We present a novel method for object tracking using global and local states of object in video su... more We present a novel method for object tracking using global and local states of object in video surveillance application. Most traditional object models using global appearance cannot handle partial occlusion effectively. The unoccluded part of partially visible object retains invariable appearance. Therefore, we introduce global and local dynamics model as our object model to overcome partial occlusion using local feature, and apply it to Bayesian tracking problem using motion-based particle filtering. Finally, experiments on some video surveillance sequences demonstrate the effectiveness and robustness of our approach for tracking object motions in video surveillance.
Facial beauty prediction is a challenging problem in computer vision and multimedia fields, due t... more Facial beauty prediction is a challenging problem in computer vision and multimedia fields, due to the variant pose and diverse conditions. In this paper, we introduce “soft label” for each annotated facial image, and propose a novel neural network–classification and regression network (CRNet) with different branches, to simultaneously process a classification and a regression task. Besides, weighted mean squared error (MSE) and cross entropy (CE) are used as the loss function, which is robust to outliers. CRNet achieves state-of-the-art performance on SCUT-FBP and ECCV HotOrNot dataset. Experimental results demonstrate the effectiveness of the proposed method and clarify the most important facial regions for facial beauty perception.
Cotton is an important economic crop, and many loci for important traits have been identified, bu... more Cotton is an important economic crop, and many loci for important traits have been identified, but it remains challenging and time-consuming to identify candidate or causal genes/variants and clarify their roles in phenotype formation and regulation. Here, we first collected and integrated the multi-omics datasets including 25 genomes, transcriptomes in 76 tissue samples, epigenome data of five species and metabolome data of 768 metabolites from four tissues, and genetic variation, trait and transcriptome datasets from 4180 cotton accessions. Then, a cotton multi-omics database (CottonMD, http://yanglab.hzau.edu.cn/CottonMD/) was constructed. In CottonMD, multiple statistical methods were applied to identify the associations between variations and phenotypes, and many easy-to-use analysis tools were provided to help researchers quickly acquire the related omics information and perform multi-omics data analysis. Two case studies demonstrated the power of CottonMD for identifying and ...
Loss function is crucial for model training and feature representation learning, conventional mod... more Loss function is crucial for model training and feature representation learning, conventional models usually regard facial attractiveness recognition task as a regression problem, and adopt MSE loss or Huber variant loss as supervision to train a deep convolutional neural network (CNN) to predict facial attractiveness score. Little work has been done to systematically compare the performance of diverse loss functions. In this paper, we firstly systematically analyze model performance under diverse loss functions. Then a novel loss function named ComboLoss is proposed to guide the SEResNeXt50 network. The proposed method achieves state-of-the-art performance on SCUT-FBP, HotOrNot and SCUT-FBP5500 datasets with an improvement of 1.13%, 2.1% and 0.57% compared with prior arts, respectively. Code and models are available at this https URL.
Although Generative Adversarial Networks have shown remarkable performance in image generation, t... more Although Generative Adversarial Networks have shown remarkable performance in image generation, there are some challenges in image realism and convergence speed. The results of some models display the imbalances of quality within a generated image, in which some defective parts appear compared with other regions. Different from general single global optimization methods, we introduce an adaptive global and local bilevel optimization model(GL-GAN). The model achieves the generation of high-resolution images in a complementary and promoting way, where global optimization is to optimize the whole images and local is only to optimize the low-quality areas. With a simple network structure, GL-GAN is allowed to effectively avoid the nature of imbalance by local bilevel optimization, which is accomplished by first locating low-quality areas and then optimizing them. Moreover, by using feature map cues from discriminator output, we propose the adaptive local and global optimization method(A...
In this paper, we propose a novel method to track non-rigid and/or articulated objects using supe... more In this paper, we propose a novel method to track non-rigid and/or articulated objects using superpixel matching and markov random field MRF. Our algorithm consists of three stages. First, a superpixel dataset is constructed by segmenting training frames into superpixels, and each superpixel is represented by multiple features. The appearance information of target is encoded in the superpixel database. Second, each new frame is segmented into superpixels and then its object-background confidence map is derived by comparing its superpixels with k-nearest neighbors in superpixel dataset. Taking context information into account, we utilize MRF to further improve the accuracy of confidence map. In addition, the local context information is incorporated through a feedback to refine superpixel matching. In the last stage, visual tracking is achieved via finding the best candidate by maximum a posterior estimate based on the confidence map. Experiments show that our method outperforms several state-of-the-art trackers.
Attribution editing has achieved remarkable progress in recent years owing to the encoder-decoder... more Attribution editing has achieved remarkable progress in recent years owing to the encoder-decoder structure and generative adversarial network (GAN). However, it remains challenging in generating high-quality images with accurate attribute transformation. Attacking these problems, the work proposes a novel selective attribute editing model based on classification adversarial network (referred to as ClsGAN) that shows good balance between attribute transfer accuracy and photo-realistic images. Considering that the editing images are prone to be affected by original attribute due to skip-connection in encoder-decoder structure, an upper convolution residual network (referred to as Tr-resnet) is presented to selectively extract information from the source image and target label. In addition, to further improve the transfer accuracy of generated images, an attribute adversarial classifier (referred to as Atta-cls) is introduced to guide the generator from the perspective of attribute th...
Feature extraction plays a significant part in computer vision tasks. In this paper, we propose a... more Feature extraction plays a significant part in computer vision tasks. In this paper, we propose a method which transfers rich deep features from a pretrained model on face verification task and feeds the features into Bayesian ridge regression algorithm for facial beauty prediction. We leverage the deep neural networks that extracts more abstract features from stacked layers. Through simple but effective feature fusion strategy, our method achieves improved or comparable performance on SCUT-FBP dataset and ECCV HotOrNot dataset. Our experiments demonstrate the effectiveness of the proposed method and clarify the inner interpretability of facial beauty perception.
In this paper, we propose a novel method to track non-rigid and/or articulated objects using supe... more In this paper, we propose a novel method to track non-rigid and/or articulated objects using superpixel matching and markov random field MRF. Our algorithm consists of three stages. First, a superpixel dataset is constructed by segmenting training frames into superpixels, and each superpixel is represented by multiple features. The appearance information of target is encoded in the superpixel database. Second, each new frame is segmented into superpixels and then its object-background confidence map is derived by comparing its superpixels with k-nearest neighbors in superpixel dataset. Taking context information into account, we utilize MRF to further improve the accuracy of confidence map. In addition, the local context information is incorporated through a feedback to refine superpixel matching. In the last stage, visual tracking is achieved via finding the best candidate by maximum a posterior estimate based on the confidence map. Experiments show that our method outperforms seve...
In this paper, we exploit deep convolutional features for object appearance modeling and propose ... more In this paper, we exploit deep convolutional features for object appearance modeling and propose a simple while effective deep discriminative model (DDM) for visual tracking. The proposed DDM takes as input the deep features and outputs an object-background confidence map. Considering that both spatial information from lower convolutional layers and semantic information from higher layers benefit object tracking, we construct multiple deep discriminative models (DDMs) for each layer and combine these confidence maps from each layer to obtain the final object-background confidence map. To reduce the risk of model drift, we propose to adopt a saliency method to generate object candidates. Object tracking is then achieved by finding the candidate with the largest confidence value. Experiments on a large-scale tracking benchmark demonstrate that the propose method performs favorably against state-of-the-art trackers.
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019
In this paper, we propose the novel complementary Siamese networks (CoSNet) for visual tracking b... more In this paper, we propose the novel complementary Siamese networks (CoSNet) for visual tracking by exploiting complementary global and local representations to learn a matching function. In specific, the proposed CoSNet is two-fold: a global Siamese network (GSNet) and a local Siamese network (LSNet). The GSNet aims to match the target with candidates using holistic representation. By contrast, the LSNet explores partial object representation for matching. Instead of simply decomposing the object into regular patches in LSNet, we propose a novel attentional local part network, which automatically generates salient object parts for local representation and adaptively weights each part according to its importance in matching. In CoSNet, the GSNet and LSNet are jointly trained in an end-to-end manner. By coupling two complementary Siamese networks, our CoSNet learns a robust matching function which can effectively handle various appearance changes in visual tracking. Extensive experime...
Correlation filter has drawn increasing interest in visual tracking due to its high efficiency, h... more Correlation filter has drawn increasing interest in visual tracking due to its high efficiency, however, it is sensitive to partial occlusion, which may result in tracking failure. To address this problem, we propose a novel local-global correlation filter (LGCF) for object tracking. Our LGCF model utilizes both local-based and global-based strategies, and effectively combines these two strategies by exploiting the relationship of circular shifts among local object parts and global target for their motion models to preserve the structure of object. In specific, our proposed model has two advantages: (1) Owing to the benefits of local-based mechanism, our method is robust to partial occlusion by leveraging visible parts. (2) Taking into account the relationship of motion models among local parts and global target, our LGCF model is able to capture the inner structure of object, which further improves its robustness to occlusion. In addition, to alleviate the issue of drift away from ...
Attribution editing has shown remarking progress by the incorporating of encoder-decoder structur... more Attribution editing has shown remarking progress by the incorporating of encoder-decoder structure and generative adversarial network. However, there are still some challenges in the quality and attribute transformation of the generated images. Encoder-decoder structure leads to blurring of images and the skip-connection of encoder-decoder structure weakens the attribute transfer ability. To address these limitations, we propose a classification adversarial model(Cls-GAN) that can balance between attribute transfer and generated photo-realistic images. Considering that the transfer images are affected by the original attribute using skip-connection, we introduce upper convolution residual network(Tr-resnet) to selectively extract information from the source image and target label. Specially, we apply to the attribute classification adversarial network to learn about the defects of attribute transfer images so as to guide the generator. Finally, to meet the requirement of multimodal ...
Advances in Multimedia Information Processing – PCM 2018
Facial beauty prediction is a challenging problem in computer vision and multimedia fields, due t... more Facial beauty prediction is a challenging problem in computer vision and multimedia fields, due to the variant pose and diverse conditions. In this paper, we introduce “soft label” for each annotated facial image, and propose a novel neural network–classification and regression network (CRNet) with different branches, to simultaneously process a classification and a regression task. Besides, weighted mean squared error (MSE) and cross entropy (CE) are used as the loss function, which is robust to outliers. CRNet achieves state-of-the-art performance on SCUT-FBP and ECCV HotOrNot dataset. Experimental results demonstrate the effectiveness of the proposed method and clarify the most important facial regions for facial beauty perception.
Uploads
Papers by Jinhai Xiang