2019 22th International Conference on Information Fusion (FUSION), 2019
Human tracking based video analytics has become a popular approach to develop smart surveillance ... more Human tracking based video analytics has become a popular approach to develop smart surveillance systems, with particular interest in anomaly detection and healthcare systems. Handling inaccurate detections plays a key element in the tracking pipeline. To achieve this, modern human tracking methods have often relied on setting up a threshold on confidence scores, but missing in-depth analysis between these scores and true human detections. This may render an undesirable selection performance. For this purpose, we firstly analyze the misalignment between the given confidence scores and true human detections in the MOT16 Challenge benchmark [1]. Then we propose a global-to-local enhanced confidence rescoring strategy by exploiting the classification power of a mask region-convolutional neural network (Mask R-CNN) [2], in order to mitigate the misalignment issue. Moreover, we devise an improved pruning algorithm namely Soft-aggregated non-maximal suppression (Soft-ANMS) to further enha...
2017 Sensor Signal Processing for Defence Conference (SSPD), 2017
In this paper, an enhanced Gaussian mixture probability hypothesis density filter (GM-PHD) using ... more In this paper, an enhanced Gaussian mixture probability hypothesis density filter (GM-PHD) using convolutional neural network (CNN) based weight penalization is proposed to track multiple targets in video. Existing GM-PHD filter based tracking methods are not always able to accurately track the targets when they are in close proximity, especially with noisy detection responses or in a crowded environments. To address this issue, a measurement classification step which combines a confidence score with a gating technique is presented to discard the false measurements and initialise new-born targets. High level human features extracted from a pre- trained CNN are utilized to penalize the ambiguous weights in the weight matrix. In addition, we integrate an improved track management scheme with occlusion handling to form the tracks of confirmed targets and maintain the track continuity. Experimental results on two publicly available benchmark video sequences validate the efficacy of our ...
2016 IEEE International Conference on Digital Signal Processing (DSP), 2016
Recently, sparse representation has been widely used in computer vision and visual tracking appli... more Recently, sparse representation has been widely used in computer vision and visual tracking applications, including face recognition and object tracking. In this paper, we propose a novel robust multi-target tracking method by applying sparse representation in a particle probability hypothesis density (PHD) filter framework. We employ the dictionary learning method and principle component analysis (PCA) to train a static appearance model offline with sufficient training data. This pre-trained dictionary contains both colour histogram and oriented gradient histogram (HOG) features based on foreground target appearances. The tracker combines the pre-trained dictionary and sparse coding to discriminate the tracked target from background clutter. The sparse coefficients solved by ℓ1-minimization are employed to generate the likelihood function values, which are further applied in the update step of the proposed particle PHD filter. The proposed particle PHD filter is validated on two vi...
2020 28th European Signal Processing Conference (EUSIPCO), 2021
Optical coherence tomography (OCT) is a commonly-used method of extracting high resolution retina... more Optical coherence tomography (OCT) is a commonly-used method of extracting high resolution retinal information. Moreover there is an increasing demand for the automated retinal layer segmentation which facilitates the retinal disease diagnosis. In this paper, we propose a novel multi-prediction guided attention network (MPG-Net) for automated retinal layer segmentation in OCT images. The proposed method consists of two major steps to strengthen the discriminative power of a U-shape Fully convolutional network (FCN) for reliable automated segmentation. Firstly, the feature refinement module which adaptively re-weights the feature channels is exploited in the encoder to capture more informative features and discard information in irrelevant regions. Furthermore, we propose a multi-prediction guided attention mechanism which provides pixel-wise semantic prediction guidance to better recover the segmentation mask at each scale. This mechanism which transforms the deep supervision to sup...
We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognitio... more We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognition (HAR). The proposed approach exploits 2D human poses provided by OpenPose detector from RGB videos. ActionXPose aims to process poses data to be provided to a Long Short-Term Memory Neural Network and to a 1D Convolutional Neural Network, which solve the classification problem. ActionXPose is one of the first algorithms that exploits 2D human poses for HAR. The algorithm has real-time performance and it is robust to camera movings, subject proximity changes, viewpoint changes, subject appearance changes and provide high generalization degree. In fact, extensive simulations show that ActionXPose can be successfully trained using different datasets at once. State-of-the-art performance on popular datasets for posture-related HAR problems (i3DPost, KTH) are provided and results are compared with those obtained by other methods, including the selected ActionXPose baseline. Moreover, we als...
2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)
Optical coherence tomography (OCT) is the standard method of generating high resolution retinal i... more Optical coherence tomography (OCT) is the standard method of generating high resolution retinal images, which inform retinal disease diagnosis and guide management. However, in order to fully extract and utilize the retinal information from the OCT images, automatic OCT segmentation is essential. Although neural networks have achieved great success with automatic segmentation, only using one neural network model to segment may lead to an ambiguous information problem where the result contains incorrect classification. In this paper, we propose a two-stage fully convolutional network (FCN) method to address these shortcomings. The OCT image is segmented in the first stage via a trained FCN, and in the second stage, the segmentation result is refined via another trained model with a decision mask to improve the segmentation performance. Therefore, two neural network models are trained sequentially to achieve better segmentation performance. The proposed method is evaluated using the publicly available Duke OCT dataset using the F1-score as the metric to measure the performance. The experimental results confirm the improvements of the proposed method.
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Structured sparse representation has been recently found to achieve better efficiency and robustn... more Structured sparse representation has been recently found to achieve better efficiency and robustness in exploiting the target appearance model in tracking systems with both holistic and local information. Therefore, to better simultaneously discriminate multi-targets from their background, we propose a novel video-based multi-target tracking system that combines the particle probability hypothesis density (PHD) filter with discriminative group-structured dictionary learning. The discriminative dictionary with group structure learned by the hierarchical K-means clustering algorithm implicitly associates the dictionary atoms with the group labels, simultaneously enforcing the target candidates from the same group (class) to share the same structured sparsity pattern. Furthermore, we propose a new joint likelihood calculation by relating the discriminative sparse codes with the maximum voting technique to enhance the particle PHD updating step. Experimental results on two publicly available benchmark video sequences confirm the improved performance of our proposed method over other state-of-the-art techniques in video-based multi-target tracking.
2018 21st International Conference on Information Fusion (FUSION)
The use of multiple data sources (measurements) has been recently demonstrated to improve the acc... more The use of multiple data sources (measurements) has been recently demonstrated to improve the accuracy and reliability of a tracking system as it is capable of providing redundancy in different aspects, and also eliminating interferences of individual sources. This paper focuses on addressing the multiple human tracking problem from a multi-detector approach. This approach integrates two detectors with different characteristics (full-body and body-parts) to perform robust collaborative fusion based on data-driven Gaussian Mixture Probability Hypothesis Density (GM-PHD) filters. To leverage the maximum strengths from multiple detectors, we propose a robust fusion center at the track level, which manages to perform Generalized Intersection Covariance (GCI) fusions for survival and birth tracks independently, and also eliminates false tracks caused by a cluttered environment. Moreover, an identity reassignment mechanism is also developed to address the identity mismatching problem in the target birth process, so as to enhance the fusion performance and track consistency. Experimental results on two challenging benchmark video sequences confirm the effectiveness of the proposed approach.
IEEE journal of biomedical and health informatics, 2021
Fetal alcohol syndrome (FAS) caused by prenatal alcohol exposure can result in a series of cranio... more Fetal alcohol syndrome (FAS) caused by prenatal alcohol exposure can result in a series of cranio-facial anomalies, and behavioral and neurocognitive problems. Current diagnosis of FAS is typically done by identifying a set of facial characteristics, which are often obtained by manual examination. Anatomical landmark detection, which provides rich geometric information, is important to detect the presence of FAS associated facial anomalies. This imaging application is characterized by large variations in data appearance and limited availability of labeled data. Current deep learning-based heatmap regression methods designed for facial landmark detection in natural images assume availability of large datasets and are therefore not wellsuited for this application. To address this restriction, we develop a new regularized transfer learning approach that exploits the knowledge of a network learned on large facial recognition datasets. In contrast to standard transfer learning which focu...
2019 22th International Conference on Information Fusion (FUSION), 2019
Human tracking based video analytics has become a popular approach to develop smart surveillance ... more Human tracking based video analytics has become a popular approach to develop smart surveillance systems, with particular interest in anomaly detection and healthcare systems. Handling inaccurate detections plays a key element in the tracking pipeline. To achieve this, modern human tracking methods have often relied on setting up a threshold on confidence scores, but missing in-depth analysis between these scores and true human detections. This may render an undesirable selection performance. For this purpose, we firstly analyze the misalignment between the given confidence scores and true human detections in the MOT16 Challenge benchmark [1]. Then we propose a global-to-local enhanced confidence rescoring strategy by exploiting the classification power of a mask region-convolutional neural network (Mask R-CNN) [2], in order to mitigate the misalignment issue. Moreover, we devise an improved pruning algorithm namely Soft-aggregated non-maximal suppression (Soft-ANMS) to further enha...
2017 Sensor Signal Processing for Defence Conference (SSPD), 2017
In this paper, an enhanced Gaussian mixture probability hypothesis density filter (GM-PHD) using ... more In this paper, an enhanced Gaussian mixture probability hypothesis density filter (GM-PHD) using convolutional neural network (CNN) based weight penalization is proposed to track multiple targets in video. Existing GM-PHD filter based tracking methods are not always able to accurately track the targets when they are in close proximity, especially with noisy detection responses or in a crowded environments. To address this issue, a measurement classification step which combines a confidence score with a gating technique is presented to discard the false measurements and initialise new-born targets. High level human features extracted from a pre- trained CNN are utilized to penalize the ambiguous weights in the weight matrix. In addition, we integrate an improved track management scheme with occlusion handling to form the tracks of confirmed targets and maintain the track continuity. Experimental results on two publicly available benchmark video sequences validate the efficacy of our ...
2016 IEEE International Conference on Digital Signal Processing (DSP), 2016
Recently, sparse representation has been widely used in computer vision and visual tracking appli... more Recently, sparse representation has been widely used in computer vision and visual tracking applications, including face recognition and object tracking. In this paper, we propose a novel robust multi-target tracking method by applying sparse representation in a particle probability hypothesis density (PHD) filter framework. We employ the dictionary learning method and principle component analysis (PCA) to train a static appearance model offline with sufficient training data. This pre-trained dictionary contains both colour histogram and oriented gradient histogram (HOG) features based on foreground target appearances. The tracker combines the pre-trained dictionary and sparse coding to discriminate the tracked target from background clutter. The sparse coefficients solved by ℓ1-minimization are employed to generate the likelihood function values, which are further applied in the update step of the proposed particle PHD filter. The proposed particle PHD filter is validated on two vi...
2020 28th European Signal Processing Conference (EUSIPCO), 2021
Optical coherence tomography (OCT) is a commonly-used method of extracting high resolution retina... more Optical coherence tomography (OCT) is a commonly-used method of extracting high resolution retinal information. Moreover there is an increasing demand for the automated retinal layer segmentation which facilitates the retinal disease diagnosis. In this paper, we propose a novel multi-prediction guided attention network (MPG-Net) for automated retinal layer segmentation in OCT images. The proposed method consists of two major steps to strengthen the discriminative power of a U-shape Fully convolutional network (FCN) for reliable automated segmentation. Firstly, the feature refinement module which adaptively re-weights the feature channels is exploited in the encoder to capture more informative features and discard information in irrelevant regions. Furthermore, we propose a multi-prediction guided attention mechanism which provides pixel-wise semantic prediction guidance to better recover the segmentation mask at each scale. This mechanism which transforms the deep supervision to sup...
We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognitio... more We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognition (HAR). The proposed approach exploits 2D human poses provided by OpenPose detector from RGB videos. ActionXPose aims to process poses data to be provided to a Long Short-Term Memory Neural Network and to a 1D Convolutional Neural Network, which solve the classification problem. ActionXPose is one of the first algorithms that exploits 2D human poses for HAR. The algorithm has real-time performance and it is robust to camera movings, subject proximity changes, viewpoint changes, subject appearance changes and provide high generalization degree. In fact, extensive simulations show that ActionXPose can be successfully trained using different datasets at once. State-of-the-art performance on popular datasets for posture-related HAR problems (i3DPost, KTH) are provided and results are compared with those obtained by other methods, including the selected ActionXPose baseline. Moreover, we als...
2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)
Optical coherence tomography (OCT) is the standard method of generating high resolution retinal i... more Optical coherence tomography (OCT) is the standard method of generating high resolution retinal images, which inform retinal disease diagnosis and guide management. However, in order to fully extract and utilize the retinal information from the OCT images, automatic OCT segmentation is essential. Although neural networks have achieved great success with automatic segmentation, only using one neural network model to segment may lead to an ambiguous information problem where the result contains incorrect classification. In this paper, we propose a two-stage fully convolutional network (FCN) method to address these shortcomings. The OCT image is segmented in the first stage via a trained FCN, and in the second stage, the segmentation result is refined via another trained model with a decision mask to improve the segmentation performance. Therefore, two neural network models are trained sequentially to achieve better segmentation performance. The proposed method is evaluated using the publicly available Duke OCT dataset using the F1-score as the metric to measure the performance. The experimental results confirm the improvements of the proposed method.
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Structured sparse representation has been recently found to achieve better efficiency and robustn... more Structured sparse representation has been recently found to achieve better efficiency and robustness in exploiting the target appearance model in tracking systems with both holistic and local information. Therefore, to better simultaneously discriminate multi-targets from their background, we propose a novel video-based multi-target tracking system that combines the particle probability hypothesis density (PHD) filter with discriminative group-structured dictionary learning. The discriminative dictionary with group structure learned by the hierarchical K-means clustering algorithm implicitly associates the dictionary atoms with the group labels, simultaneously enforcing the target candidates from the same group (class) to share the same structured sparsity pattern. Furthermore, we propose a new joint likelihood calculation by relating the discriminative sparse codes with the maximum voting technique to enhance the particle PHD updating step. Experimental results on two publicly available benchmark video sequences confirm the improved performance of our proposed method over other state-of-the-art techniques in video-based multi-target tracking.
2018 21st International Conference on Information Fusion (FUSION)
The use of multiple data sources (measurements) has been recently demonstrated to improve the acc... more The use of multiple data sources (measurements) has been recently demonstrated to improve the accuracy and reliability of a tracking system as it is capable of providing redundancy in different aspects, and also eliminating interferences of individual sources. This paper focuses on addressing the multiple human tracking problem from a multi-detector approach. This approach integrates two detectors with different characteristics (full-body and body-parts) to perform robust collaborative fusion based on data-driven Gaussian Mixture Probability Hypothesis Density (GM-PHD) filters. To leverage the maximum strengths from multiple detectors, we propose a robust fusion center at the track level, which manages to perform Generalized Intersection Covariance (GCI) fusions for survival and birth tracks independently, and also eliminates false tracks caused by a cluttered environment. Moreover, an identity reassignment mechanism is also developed to address the identity mismatching problem in the target birth process, so as to enhance the fusion performance and track consistency. Experimental results on two challenging benchmark video sequences confirm the effectiveness of the proposed approach.
IEEE journal of biomedical and health informatics, 2021
Fetal alcohol syndrome (FAS) caused by prenatal alcohol exposure can result in a series of cranio... more Fetal alcohol syndrome (FAS) caused by prenatal alcohol exposure can result in a series of cranio-facial anomalies, and behavioral and neurocognitive problems. Current diagnosis of FAS is typically done by identifying a set of facial characteristics, which are often obtained by manual examination. Anatomical landmark detection, which provides rich geometric information, is important to detect the presence of FAS associated facial anomalies. This imaging application is characterized by large variations in data appearance and limited availability of labeled data. Current deep learning-based heatmap regression methods designed for facial landmark detection in natural images assume availability of large datasets and are therefore not wellsuited for this application. To address this restriction, we develop a new regularized transfer learning approach that exploits the knowledge of a network learned on large facial recognition datasets. In contrast to standard transfer learning which focu...
Uploads
Papers by Zeyu Fu