Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Fang Liu

    Dynamic attention mechanism and global modeling ability make Transformer show strong feature learning ability. In recent years, Transformer has become comparable to CNNs methods in computer vision. This review mainly investigates the... more
    Dynamic attention mechanism and global modeling ability make Transformer show strong feature learning ability. In recent years, Transformer has become comparable to CNNs methods in computer vision. This review mainly investigates the current research progress of Transformer in image and video applications, which makes a comprehensive overview of Transformer in visual learning understanding. First, the attention mechanism is reviewed, which plays an essential part in Transformer. And then, the visual Transformer model and the principle of each module are introduced. Thirdly, the existing Transformerbased models are investigated, and their performance is compared in visual learning understanding applications. Three image tasks and two video tasks of computer vision are investigated. The former mainly includes image classification, object detection, and image segmentation. The latter contains object tracking and video classification. It is significant for comparing different models' performance in various tasks on several public benchmark data sets. Finally, ten general problems are summarized, and the developing prospects of the visual Transformer are given in this review.
    Research Interests:
    Transformer has shown excellent performance in remote sensing field with long-range modeling capabilities. Remote sensing video (RSV) moving object detection and tracking play indispensable roles in military activities as well as urban... more
    Transformer has shown excellent performance in remote sensing field with long-range modeling capabilities. Remote sensing video (RSV) moving object detection and tracking play indispensable roles in military activities as well as urban monitoring. However, transformers in these fields are still at the exploratory stage. In this survey, we comprehensively summarize the research prospects of transformers in RSV moving object detection and tracking. The core designs of remote sensing transformers and advanced transformers are first analyzed. It mainly includes the attention mechanism evolution for specific tasks, the fitting ability design of input mapping, diverse feature representation, model optimization, etc. The architectural characteristics of RSV detection and tracking are then described across two aspects. One is moving object detection for motion-based traditional background subtractions and appearance-based deep learning models. The other is object tracking for single and multiple targets. The research difficulties mainly include the blurred foreground in RSV data, the irregular object movement in traditional background subtraction, and the severe object occlusion in object tracking. Following that, the potential significance of transformers is discussed according to some thorny problems in RSV. Finally, we summarize ten open challenges of transformers in RSV, which may be used as a reference for promoting future research.
    Research Interests:
    In recent years, the enormous demand for computing resources resulting from massive data and complex network models has become the limitation of deep learning. In the large-scale problems with massive samples and ultrahigh feature... more
    In recent years, the enormous demand for computing resources resulting from massive data and complex network models has become the limitation of deep learning. In the large-scale problems with massive samples and ultrahigh feature dimensions, sparsity has gradually drawn much attention from academia and the industrial field. In this article, the new generation of braininspired sparse learning is reviewed comprehensively. First, sparse cognition learning is introduced from the visual biology mechanism to modeling for the natural image. Second, the sparse representation algorithms are summarized to sort out the research progress of sparse learning. Third, the relevant research on sparse feature selection learning is reviewed. Then, the sparse deep networks and applications are summed up. Last but not least, ten public issues and challenges of sparse learning are discussed. By investigating the development process of sparse learning, this article summarizes the advantages, disadvantages, limitations, and future research directions of the algorithm, which can help readers conduct further study. Impact Statement-Sparse representation is an essential branch of machine learning. The sparsity has been considered to solve the limitation of computation resources in practical deep learning applications. Unlike previous reviews, our survey comprehensively...
    Research Interests:
    R emote sensing object detection (RSOD), one of the most fundamental and challenging tasks in the remote sensing field, has received long-standing attention. In recent years, deep learning techniques have demonstrated robust feature... more
    R emote sensing object detection (RSOD), one of the most fundamental and challenging tasks in the remote sensing field, has received long-standing attention. In recent years, deep learning techniques have demonstrated robust feature representation capabilities and led to a big leap in the development of RSOD techniques. In this era of rapid technical evolution, this article aims to present a comprehensive review of the recent achievements in deep learning-based RSOD methods. More than 300 papers are covered in this review. We identify five main challenges in RSOD, including multiscale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision, and systematically review the corresponding methods developed in a hierarchical division manner. We also review the widely used benchmark datasets and evaluation metrics within the field of RSOD as well as the application scenarios for RSOD. Future research directions are provided for further promoting the research in RSOD.
    Research Interests:
    Quantum mechanism, which has received widespread attention, is in continuous evolution rapidly. The powerful computing power and high parallel ability of quantum mechanism equip the quantum field with broad application scenarios and... more
    Quantum mechanism, which has received widespread attention, is in continuous evolution rapidly. The powerful computing power and high parallel ability of quantum mechanism equip the quantum field with broad application scenarios and brand-new vitality. Inspired by nature, intelligent algorithm has always been one of the research hotspots. It is a frontier interdisciplinary subject with a perfect integration of biology, mathematics and other disciplines. Naturally, the idea of combining quantum mechanism with intelligent algorithms will inject new vitality into artificial intelligence system. This paper lists major breakthroughs in the development of quantum domain firstly, then summarizes the existing quantum algorithms from two aspects: quantum optimization and quantum learning. After that, related concepts, main contents and research progresses of quantum optimization and quantum learning are introduced respectively. At last, experiments are conducted to prove that quantum intelligent algorithms have strong competitiveness compared with traditional intelligent algorithms and possess great potential by simulating quantum computing. INDEX TERMS Quantum optimization, quantum learning, quantum evolutionary algorithm (QEA), quantum particle swarm algorithm (QPSO), quantum immune clonal algorithm (QICA), quantum neural network (QNN), quantum clustering (QC).
    Research Interests:
    Video object detection, a basic task in the computer vision field, is rapidly evolving and widely used. In recent years, deep learning methods have rapidly become widespread in the field of video object detection, achieving excellent... more
    Video object detection, a basic task in the computer vision field, is rapidly evolving and widely used. In recent years, deep learning methods have rapidly become widespread in the field of video object detection, achieving excellent results compared with those of traditional methods. However, the presence of duplicate information and abundant spatiotemporal information in video data poses a serious challenge to video object detection. Therefore, in recent years, many scholars have investigated deep learning detection algorithms in the context of video data and have achieved remarkable results. Considering the wide range of applications, a comprehensive review of the research related to video object detection is both a necessary and challenging task. This survey attempts to link and systematize the latest cutting-edge research on video object detection with the goal of classifying and analyzing video detection algorithms based on specific representative models. The differences and connections between video object detection and similar tasks are systematically demonstrated, and the evaluation metrics and video detection performance of nearly 40 models on two data sets are presented. Finally, the various applications and challenges facing video object detection are discussed.
    Research Interests:
    Feature representation has been widely used and developed recently. Multiscale features have led to remarkable breakthroughs for representation learning process in many computer vision tasks. This paper aims to provide a comprehensive... more
    Feature representation has been widely used and developed recently. Multiscale features have led to remarkable breakthroughs for representation learning process in many computer vision tasks. This paper aims to provide a comprehensive survey of the recent multiscale representation learning achievements in classification tasks. Multiscale representation learning methods can be divided into two broad categories (multiscale geometric analysis and multiscale networks). Eleven kinds of multiscale geometric tools and seven kinds of multiscale networks are introduced. Some corresponding fundamental subproblems of these two broad categories are also described, including some concepts in representation process, specific representation methods with multiscale geometric analysis, and multiscale representation design strategies for networks. Then, the correlation between these two broad categories is illustrated, including respective characteristics, combination strategies, and characteristics of optimal representation. Some datasets and evaluation results are included to verify the effectiveness of the multiscale representation learning. Eventually, conclusion and future work are given, covering four directions [a) Choice and fusion; b) Self-adaption; c) Structure; and d) Generalization and proof]. Impact Statement-Multiscale representation learning techniques are fully utilized in existing artificial intelligence tasks. Compared to the existing papers, which pay attention to a particular task or algorithm, this paper is the first systematical review of multiscale
    Research Interests:
    Recently, the multiscale problem in computer vision has gradually attracted people's attention. This article focuses on multiscale representation for object detection and recognition, comprehensively introduces the development of... more
    Recently, the multiscale problem in computer vision has gradually attracted people's attention. This article focuses on multiscale representation for object detection and recognition, comprehensively introduces the development of multiscale deep learning, and constructs an easy-to-understand, but powerful knowledge structure. First, we give the definition of scale, explain the multiscale mechanism of human vision, and then lead to the multiscale problem discussed in computer vision. Second, advanced multiscale representation methods are introduced, including pyramid representation, scale-space representation, and multiscale geometric representation. Third, the theory of multiscale deep learning is presented, which mainly discusses the multiscale modeling in convolutional neural networks (CNNs) and Vision Transformers (ViTs). Fourth, we compare the performance of multiple multiscale methods on different tasks, illustrating the effectiveness of different multiscale structural designs. Finally, based on the in-depth understanding of the existing methods, we point out several open issues and future directions for multiscale deep learning.
    Research Interests:
    Social relations are ubiquitous and form the basis of social structure in our daily life. However, existing studies mainly focus on recognizing social relations from still images and movie clips, which are different from real-world... more
    Social relations are ubiquitous and form the basis of social structure in our daily life. However, existing studies mainly focus on recognizing social relations from still images and movie clips, which are different from real-world scenarios. For example, movie-based datasets define the task as the video classification, only recognizing one relation in the scene. In this article, we aim to study the problem of social relation recognition in an open environment. To close the gap, we provide the first video dataset collected from real-life scenarios, named social relation in the wild (SRIW), where the number of people can be huge and vary, and each pair of relations needs to be recognized. To overcome new challenges, we propose a spatio-temporal relation graph convolutional network (STRGCN) architecture...
    Research Interests:
    A graph structure is a powerful mathematical abstraction, which can not only represent information about individuals but also capture the interactions between individuals for reasoning. Geometric modeling and relational inference based on... more
    A graph structure is a powerful mathematical abstraction, which can not only represent information about individuals but also capture the interactions between individuals for reasoning. Geometric modeling and relational inference based on graph data is a long-standing topic of interest in the computer vision community. In this article, we provide a systematic review of graph representation learning and its applications in computer vision. First, we sort out the evolution of representation learning on graphs, categorizing them into the nonneural network and neural network methods based on the way the nodes are encoded. Specifically, nonneural network methods, such as graph embedding and probabilistic graphical models, are introduced, and neural network methods, such as graph recurrent neural networks, graph convolutional networks, and variants of graph neural networks, are also presented. Then, we organize the applications of graph representation algorithms in various vision tasks (such as image classification, semantic segmentation, object detection, and tracking) for review and reference, and the typical graph construction approaches in computer vision are also summarized. Finally, on the background of biology and brain inspiration, we discuss the existing challenges and future directions of graph representation learning and computer vision.
    Research Interests:
    Research Interests:
    Research Interests:
    Deep learning (DL) has made breakthroughs in many computer vision tasks and also in visual tracking. From the beginning of the research on the automatic acquisition of high abstract feature representation, DL has gone deep into all... more
    Deep learning (DL) has made breakthroughs in many computer vision tasks and also in visual tracking. From the beginning of the research on the automatic acquisition of high abstract feature representation, DL has gone deep into all aspects of tracking to date, to name a few, similarity metric, data association, and bounding box estimation. Also, pure DL-based trackers have obtained the state-of-the-art performance after the community's constant research. We believe that it is time to comprehensively review the development of DL research in visual tracking. In this article, we overview the critical improvements brought to the field by DL: deep feature representations, network architecture, and four crucial issues in visual tracking (spatiotemporal information integration, target-specific classification, target information update, and bounding box estimation). The scope of the survey of DL-based tracking covers two primary subtasks for the first time, single-object tracking and multipleobject tracking. Also, we analyze the performance of DL-based approaches and give meaningful conclusions. Finally, we provide
    Research Interests:
    Brain-inspired algorithms have become a new trend in next-generation artificial intelligence. Through research on brain science, the intelligence of remote sensing algorithms can be effectively improved. This article summarizes and... more
    Brain-inspired algorithms have become a new trend in next-generation artificial intelligence. Through research on brain science, the intelligence of remote sensing algorithms can be effectively improved. This article summarizes and analyzes the essential properties of brain cognize learning and the recent advance of remote sensing interpretation. First, this article introduces the structural composition and the properties of the brain. Then, five represent brain-inspired algorithms are studied, including multiscale geometry analysis, compressed sensing, attention mechanism, reinforcement learning, and transfer learning. Next, this article summarizes the data types of remote sensing, the development of typical applications of remote sensing interpretation, and the implementations of remote sensing, including datasets, software,
    Research Interests:
    The foundation model (FM) has garnered significant attention for its remarkable transfer performance in downstream tasks. Typically, it undergoes task-agnostic pretraining on a large dataset and can be efficiently adapted to various... more
    The foundation model (FM) has garnered significant attention for its remarkable transfer performance in downstream tasks. Typically, it undergoes task-agnostic pretraining on a large dataset and can be efficiently adapted to various downstream applications through fine-tuning. While FMs have been extensively explored in language and other domains, their potential in remote sensing has also begun to attract scholarly interest. However, comprehensive investigations and performance comparisons of these models on remote sensing tasks are currently lacking. In this survey, we provide essential background knowledge by introducing key technologies and recent developments in FMs. Subsequently, we explore essential downstream applications in remote sensing, covering classification, localization, and understanding.
    Research Interests:
    The progress of brain cognition and learning mechanisms has provided new inspiration for the next generation of artificial intelligence (AI) and provided the biological basis for the establishment of new models and methods. Brain science... more
    The progress of brain cognition and learning mechanisms has provided new inspiration for the next generation of artificial intelligence (AI) and provided the biological basis for the establishment of new models and methods. Brain science can effectively improve the intelligence of existing models and systems. Compared with other reviews, this article provides a comprehensive review of brain-inspired deep learning algorithms for learning, perception, and cognition from microscopic, mesoscopic, macroscopic, and super-macroscopic perspectives. First, this article introduces the brain cognition mechanism. Then, it summarizes the existing studies on brain-inspired learning and modeling from the perspectives of neural structure, cognitive module, learning mechanism, and behavioral characteristics. Next, this article introduces the potential learning directions of brain-inspired learning from four aspects: perception, cognition, understanding, and decision-making. Finally, the top-ten open problems that brain-inspired learning, perception, and cognition currently face are summarized, and the next generation of AI technology has been prospected. This work intends to provide a quick overview of the research on brain-inspired AI algorithms and to motivate future research by illuminating the latest developments in brain science.
    Research Interests:
    The construction of machine learning models involves many bi-level multiobjective optimization problems (BL-MOPs), where upper-level (UL) candidate solutions must be evaluated via training weights of a model in the lower level (LL). Due... more
    The construction of machine learning models involves many bi-level multiobjective optimization problems (BL-MOPs), where upper-level (UL) candidate solutions must be evaluated via training weights of a model in the lower level (LL). Due to the Pareto optimality of subproblems and the complex dependency across UL solutions and LL weights, a UL solution is feasible if and only if the LL weight is Pareto optimal. It is computationally expensive to determine which LL Pareto weight in the LL Pareto weight set is the most appropriate for each UL solution. This article proposes a bi-level multiobjective learning framework (BLMOL), coupling the above decision-making process with the optimization process of the upper-level MOP (UL-MOP) by introducing LL preference r. Specifically, the UL variable and r are simultaneously searched to minimize multiple UL objectives by evolutionary multiobjective algorithms. The LL weight with respect to r is trained to minimize multiple LL objectives via gradient-based preference multiobjective algorithms. In addition, the preference surrogate model is constructed to replace the expensive evaluation process of the UL-MOP. We
    Research Interests:
    The advent of the sixth generation of mobile communications (6G) ushers in an era of heightened demand for advanced network intelligence to tackle the challenges of an expanding network landscape and increasing service demands. Deep... more
    The advent of the sixth generation of mobile communications (6G) ushers in an era of heightened demand for advanced network intelligence to tackle the challenges of an expanding network landscape and increasing service demands. Deep Learning (DL), as a crucial technique for instilling intelligence into 6G, has demonstrated powerful and promising development. This paper provides a comprehensive overview of the pivotal role of DL in 6G, exploring the myriad opportunities and challenges that arise. Firstly, we present a detailed vision for DL in 6G, emphasizing areas such as adaptive resource allocation, intelligent network management, robust signal processing, ubiquitous edge intelligence, and endogenous security. Secondly, this paper reviews how DL models leverage their unique learning capabilities to solve complex service demands in 6G. The models discussed include Convolutional Neural Networks (CNN), Generative Adversarial Networks (GAN), Graph Neural Networks (GNN), Deep Reinforcement Learning (DRL), Transformer, Federated Learning (FL), and Meta Learning. Additionally, we examine the specific challenges each DL model faces within the 6G context. Moreover, we delve into the rapidly evolving field of Artificial Intelligence Generated Content (AIGC), examining its development and impact within the 6G framework. Finally, this paper culminates in a detailed discussion of ten critical open problems in integrating DL with 6G, setting the stage for future research and development in this field. INDEX TERMS Deep learning, 6G, network intelligence, artificial intelligence generated content (AIGC), open problems.
    Research Interests:
    During the past decade, deep learning is one of the essential breakthroughs made in artificial intelligence. In particular, it has achieved great success in image processing. Correspondingly, various applications related to image... more
    During the past decade, deep learning is one of the essential breakthroughs made in artificial intelligence. In particular, it has achieved great success in image processing. Correspondingly, various applications related to image processing are also promoting the rapid development of deep learning in all aspects of network structure, layer designing, and training tricks. However, the deeper structure makes the back-propagation algorithm more difficult. At the same time, the scale of training images without labels is also rapidly increasing, and class imbalance severely affects the performance of deep learning, these urgently require more novelty deep models and new parallel computing system to more effectively interpret the content of the image and form a suitable analysis mechanism. In this context, this survey provides four deep learning model series, which includes CNN series, GAN series, ELM-RVFL series, and other series, for comprehensive understanding towards the analytical techniques of image processing field, clarify the most important advancements and shed some light on future studies. By further studying the relationship between deep learning and image processing tasks, which can not only help us understand the reasons for the success of deep learning but also inspires new deep models and training methods. More importantly, this survey aims to improve or arouse other researchers to catch a glimpse of the state-of-the-art deep learning methods in the field of image processing and facilitate the applications of these deep learning technologies in their research tasks. Besides, we discuss the open issues and the promising directions of future research in image processing using the new generation of deep learning.
    Research Interests:
    Research Interests:
    With the increasing accessibility of remote sensing videos, remote sensing tracking is gradually becoming a hot issue. However, accurately detecting and tracking in complex remote sensing scenes is still a challenge. In this article, we... more
    With the increasing accessibility of remote sensing videos, remote sensing tracking is gradually becoming a hot issue. However, accurately detecting and tracking in complex remote sensing scenes is still a challenge. In this article, we propose a collaborative learning tracking network for remote sensing videos, including a consistent receptive field parallel fusion module (CRFPF), dual-branch spatial-channel co-attention (DSCA) module, and geometric constraint retrack strategy (GCRT). Considering the small-size objects of remote sensing scenes are difficult for general forward networks to extract effective features, we propose a CRFPF-module to establish parallel branches with consistent receptive fields to separately extract from shallow to deep features and then fuse hierarchical features adaptively. Since the objects and their background are difficult to distinguish, the proposed DSCA-module uses the spatial-channel co-attention mechanism to collaboratively learn the relevant information, which enhances the saliency of the objects and regresses to precise bounding boxes. Considering the interference of similar objects, we designed a GCRT-strategy to judge whether there is a false detection through the estimated motion trajectory and then recover the correct object by weakening the feature response of interference. The experimental results and theoretical analysis on multiple datasets demonstrate our proposed method's feasibility and effectiveness. Code and net are available at https://github.com/Dawn5786/CoCRF-TrackNet.
    Research Interests:
    Weakly supervised Video Anomaly Detection (VAD) using Multi-Instance Learning (MIL) is usually based on the fact that the anomaly score of an abnormal snippet is higher than that of a normal snippet. In the beginning of training, due to... more
    Weakly supervised Video Anomaly Detection (VAD) using Multi-Instance Learning (MIL) is usually based on the fact that the anomaly score of an abnormal snippet is higher than that of a normal snippet. In the beginning of training, due to the limited accuracy of the model, it is easy to select the wrong abnormal snippet. In order to reduce the probability of selection errors, we first propose a Multi-Sequence Learning (MSL) method and a hinge-based MSL ranking loss that uses a sequence composed of multiple snippets as an optimization unit. We then design a Transformer-based MSL network to learn both video-level anomaly probability and snippet-level anomaly scores. In the inference stage, we propose to use the video-level anomaly probability to suppress the fluctuation of snippet-level anomaly scores. Finally, since VAD needs to predict the snippet-level anomaly scores, by gradually reducing the length of selected sequence, we propose a self-training strategy to gradually refine the anomaly scores. Experimental results show that our method achieves significant improvements on ShanghaiTech, UCF-Crime, and XD-Violence.
    Self-supervised representation learning is becoming more and more popular due to its superior performance. According to the information entropy theory, the smaller the information entropy of a feature, the more certain it is and the less... more
    Self-supervised representation learning is becoming more and more popular due to its superior performance. According to the information entropy theory, the smaller the information entropy of a feature, the more certain it is and the less redundant it is. Based on this, we propose a simple yet effective self-supervised representation learning method via Minimum Entropy (MinEnt). From the perspective of reducing information entropy, our MinEnt takes the output of the projector towards its nearest minimum entropy as the optimization target. The core of our MinEnt consists of three important steps: 1) normalize along the batch dimension to avoid model collapse, 2) compute the nearest minimum entropy to get the target, 3) compute the loss and backpropagate to optimize the network. Our MinEnt can learn efficient representations, even without the need for techniques such as negative sample pairs, predictors, momentum encoders, cross-correlation matrices, etc. Experimental results on four widely used datasets show that our method achieves competitive results in a simple manner.