In recent years, the enormous demand for computing resources resulting from massive data and complex network models has become the limitation of deep learning. In the large-scale problems with massive samples and ultrahigh feature... more
In recent years, the enormous demand for computing resources resulting from massive data and complex network models has become the limitation of deep learning. In the large-scale problems with massive samples and ultrahigh feature dimensions, sparsity has gradually drawn much attention from academia and the industrial field. In this article, the new generation of braininspired sparse learning is reviewed comprehensively. First, sparse cognition learning is introduced from the visual biology mechanism to modeling for the natural image. Second, the sparse representation algorithms are summarized to sort out the research progress of sparse learning. Third, the relevant research on sparse feature selection learning is reviewed. Then, the sparse deep networks and applications are summed up. Last but not least, ten public issues and challenges of sparse learning are discussed. By investigating the development process of sparse learning, this article summarizes the advantages, disadvantages, limitations, and future research directions of the algorithm, which can help readers conduct further study. Impact Statement-Sparse representation is an essential branch of machine learning. The sparsity has been considered to solve the limitation of computation resources in practical deep learning applications. Unlike previous reviews, our survey comprehensively...
Research Interests:
R emote sensing object detection (RSOD), one of the most fundamental and challenging tasks in the remote sensing field, has received long-standing attention. In recent years, deep learning techniques have demonstrated robust feature... more
R emote sensing object detection (RSOD), one of the most fundamental and challenging tasks in the remote sensing field, has received long-standing attention. In recent years, deep learning techniques have demonstrated robust feature representation capabilities and led to a big leap in the development of RSOD techniques. In this era of rapid technical evolution, this article aims to present a comprehensive review of the recent achievements in deep learning-based RSOD methods. More than 300 papers are covered in this review. We identify five main challenges in RSOD, including multiscale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision, and systematically review the corresponding methods developed in a hierarchical division manner. We also review the widely used benchmark datasets and evaluation metrics within the field of RSOD as well as the application scenarios for RSOD. Future research directions are provided for further promoting the research in RSOD.
Research Interests:
Video object detection, a basic task in the computer vision field, is rapidly evolving and widely used. In recent years, deep learning methods have rapidly become widespread in the field of video object detection, achieving excellent... more
Video object detection, a basic task in the computer vision field, is rapidly evolving and widely used. In recent years, deep learning methods have rapidly become widespread in the field of video object detection, achieving excellent results compared with those of traditional methods. However, the presence of duplicate information and abundant spatiotemporal information in video data poses a serious challenge to video object detection. Therefore, in recent years, many scholars have investigated deep learning detection algorithms in the context of video data and have achieved remarkable results. Considering the wide range of applications, a comprehensive review of the research related to video object detection is both a necessary and challenging task. This survey attempts to link and systematize the latest cutting-edge research on video object detection with the goal of classifying and analyzing video detection algorithms based on specific representative models. The differences and connections between video object detection and similar tasks are systematically demonstrated, and the evaluation metrics and video detection performance of nearly 40 models on two data sets are presented. Finally, the various applications and challenges facing video object detection are discussed.
Research Interests:
Feature representation has been widely used and developed recently. Multiscale features have led to remarkable breakthroughs for representation learning process in many computer vision tasks. This paper aims to provide a comprehensive... more
Feature representation has been widely used and developed recently. Multiscale features have led to remarkable breakthroughs for representation learning process in many computer vision tasks. This paper aims to provide a comprehensive survey of the recent multiscale representation learning achievements in classification tasks. Multiscale representation learning methods can be divided into two broad categories (multiscale geometric analysis and multiscale networks). Eleven kinds of multiscale geometric tools and seven kinds of multiscale networks are introduced. Some corresponding fundamental subproblems of these two broad categories are also described, including some concepts in representation process, specific representation methods with multiscale geometric analysis, and multiscale representation design strategies for networks. Then, the correlation between these two broad categories is illustrated, including respective characteristics, combination strategies, and characteristics of optimal representation. Some datasets and evaluation results are included to verify the effectiveness of the multiscale representation learning. Eventually, conclusion and future work are given, covering four directions [a) Choice and fusion; b) Self-adaption; c) Structure; and d) Generalization and proof]. Impact Statement-Multiscale representation learning techniques are fully utilized in existing artificial intelligence tasks. Compared to the existing papers, which pay attention to a particular task or algorithm, this paper is the first systematical review of multiscale
Research Interests:
Recently, the multiscale problem in computer vision has gradually attracted people's attention. This article focuses on multiscale representation for object detection and recognition, comprehensively introduces the development of... more
Recently, the multiscale problem in computer vision has gradually attracted people's attention. This article focuses on multiscale representation for object detection and recognition, comprehensively introduces the development of multiscale deep learning, and constructs an easy-to-understand, but powerful knowledge structure. First, we give the definition of scale, explain the multiscale mechanism of human vision, and then lead to the multiscale problem discussed in computer vision. Second, advanced multiscale representation methods are introduced, including pyramid representation, scale-space representation, and multiscale geometric representation. Third, the theory of multiscale deep learning is presented, which mainly discusses the multiscale modeling in convolutional neural networks (CNNs) and Vision Transformers (ViTs). Fourth, we compare the performance of multiple multiscale methods on different tasks, illustrating the effectiveness of different multiscale structural designs. Finally, based on the in-depth understanding of the existing methods, we point out several open issues and future directions for multiscale deep learning.
Research Interests:
Social relations are ubiquitous and form the basis of social structure in our daily life. However, existing studies mainly focus on recognizing social relations from still images and movie clips, which are different from real-world... more
Social relations are ubiquitous and form the basis of social structure in our daily life. However, existing studies mainly focus on recognizing social relations from still images and movie clips, which are different from real-world scenarios. For example, movie-based datasets define the task as the video classification, only recognizing one relation in the scene. In this article, we aim to study the problem of social relation recognition in an open environment. To close the gap, we provide the first video dataset collected from real-life scenarios, named social relation in the wild (SRIW), where the number of people can be huge and vary, and each pair of relations needs to be recognized. To overcome new challenges, we propose a spatio-temporal relation graph convolutional network (STRGCN) architecture...
Research Interests:
A graph structure is a powerful mathematical abstraction, which can not only represent information about individuals but also capture the interactions between individuals for reasoning. Geometric modeling and relational inference based on... more
A graph structure is a powerful mathematical abstraction, which can not only represent information about individuals but also capture the interactions between individuals for reasoning. Geometric modeling and relational inference based on graph data is a long-standing topic of interest in the computer vision community. In this article, we provide a systematic review of graph representation learning and its applications in computer vision. First, we sort out the evolution of representation learning on graphs, categorizing them into the nonneural network and neural network methods based on the way the nodes are encoded. Specifically, nonneural network methods, such as graph embedding and probabilistic graphical models, are introduced, and neural network methods, such as graph recurrent neural networks, graph convolutional networks, and variants of graph neural networks, are also presented. Then, we organize the applications of graph representation algorithms in various vision tasks (such as image classification, semantic segmentation, object detection, and tracking) for review and reference, and the typical graph construction approaches in computer vision are also summarized. Finally, on the background of biology and brain inspiration, we discuss the existing challenges and future directions of graph representation learning and computer vision.
Research Interests:
Research Interests:
Deep learning (DL) has made breakthroughs in many computer vision tasks and also in visual tracking. From the beginning of the research on the automatic acquisition of high abstract feature representation, DL has gone deep into all... more
Deep learning (DL) has made breakthroughs in many computer vision tasks and also in visual tracking. From the beginning of the research on the automatic acquisition of high abstract feature representation, DL has gone deep into all aspects of tracking to date, to name a few, similarity metric, data association, and bounding box estimation. Also, pure DL-based trackers have obtained the state-of-the-art performance after the community's constant research. We believe that it is time to comprehensively review the development of DL research in visual tracking. In this article, we overview the critical improvements brought to the field by DL: deep feature representations, network architecture, and four crucial issues in visual tracking (spatiotemporal information integration, target-specific classification, target information update, and bounding box estimation). The scope of the survey of DL-based tracking covers two primary subtasks for the first time, single-object tracking and multipleobject tracking. Also, we analyze the performance of DL-based approaches and give meaningful conclusions. Finally, we provide
Research Interests:
The foundation model (FM) has garnered significant attention for its remarkable transfer performance in downstream tasks. Typically, it undergoes task-agnostic pretraining on a large dataset and can be efficiently adapted to various... more
The foundation model (FM) has garnered significant attention for its remarkable transfer performance in downstream tasks. Typically, it undergoes task-agnostic pretraining on a large dataset and can be efficiently adapted to various downstream applications through fine-tuning. While FMs have been extensively explored in language and other domains, their potential in remote sensing has also begun to attract scholarly interest. However, comprehensive investigations and performance comparisons of these models on remote sensing tasks are currently lacking. In this survey, we provide essential background knowledge by introducing key technologies and recent developments in FMs. Subsequently, we explore essential downstream applications in remote sensing, covering classification, localization, and understanding.
Research Interests:
The progress of brain cognition and learning mechanisms has provided new inspiration for the next generation of artificial intelligence (AI) and provided the biological basis for the establishment of new models and methods. Brain science... more
The progress of brain cognition and learning mechanisms has provided new inspiration for the next generation of artificial intelligence (AI) and provided the biological basis for the establishment of new models and methods. Brain science can effectively improve the intelligence of existing models and systems. Compared with other reviews, this article provides a comprehensive review of brain-inspired deep learning algorithms for learning, perception, and cognition from microscopic, mesoscopic, macroscopic, and super-macroscopic perspectives. First, this article introduces the brain cognition mechanism. Then, it summarizes the existing studies on brain-inspired learning and modeling from the perspectives of neural structure, cognitive module, learning mechanism, and behavioral characteristics. Next, this article introduces the potential learning directions of brain-inspired learning from four aspects: perception, cognition, understanding, and decision-making. Finally, the top-ten open problems that brain-inspired learning, perception, and cognition currently face are summarized, and the next generation of AI technology has been prospected. This work intends to provide a quick overview of the research on brain-inspired AI algorithms and to motivate future research by illuminating the latest developments in brain science.
Research Interests:
The construction of machine learning models involves many bi-level multiobjective optimization problems (BL-MOPs), where upper-level (UL) candidate solutions must be evaluated via training weights of a model in the lower level (LL). Due... more
The construction of machine learning models involves many bi-level multiobjective optimization problems (BL-MOPs), where upper-level (UL) candidate solutions must be evaluated via training weights of a model in the lower level (LL). Due to the Pareto optimality of subproblems and the complex dependency across UL solutions and LL weights, a UL solution is feasible if and only if the LL weight is Pareto optimal. It is computationally expensive to determine which LL Pareto weight in the LL Pareto weight set is the most appropriate for each UL solution. This article proposes a bi-level multiobjective learning framework (BLMOL), coupling the above decision-making process with the optimization process of the upper-level MOP (UL-MOP) by introducing LL preference r. Specifically, the UL variable and r are simultaneously searched to minimize multiple UL objectives by evolutionary multiobjective algorithms. The LL weight with respect to r is trained to minimize multiple LL objectives via gradient-based preference multiobjective algorithms. In addition, the preference surrogate model is constructed to replace the expensive evaluation process of the UL-MOP. We
Research Interests:
The advent of the sixth generation of mobile communications (6G) ushers in an era of heightened demand for advanced network intelligence to tackle the challenges of an expanding network landscape and increasing service demands. Deep... more
The advent of the sixth generation of mobile communications (6G) ushers in an era of heightened demand for advanced network intelligence to tackle the challenges of an expanding network landscape and increasing service demands. Deep Learning (DL), as a crucial technique for instilling intelligence into 6G, has demonstrated powerful and promising development. This paper provides a comprehensive overview of the pivotal role of DL in 6G, exploring the myriad opportunities and challenges that arise. Firstly, we present a detailed vision for DL in 6G, emphasizing areas such as adaptive resource allocation, intelligent network management, robust signal processing, ubiquitous edge intelligence, and endogenous security. Secondly, this paper reviews how DL models leverage their unique learning capabilities to solve complex service demands in 6G. The models discussed include Convolutional Neural Networks (CNN), Generative Adversarial Networks (GAN), Graph Neural Networks (GNN), Deep Reinforcement Learning (DRL), Transformer, Federated Learning (FL), and Meta Learning. Additionally, we examine the specific challenges each DL model faces within the 6G context. Moreover, we delve into the rapidly evolving field of Artificial Intelligence Generated Content (AIGC), examining its development and impact within the 6G framework. Finally, this paper culminates in a detailed discussion of ten critical open problems in integrating DL with 6G, setting the stage for future research and development in this field. INDEX TERMS Deep learning, 6G, network intelligence, artificial intelligence generated content (AIGC), open problems.
Research Interests: Deep Learning and 6G
During the past decade, deep learning is one of the essential breakthroughs made in artificial intelligence. In particular, it has achieved great success in image processing. Correspondingly, various applications related to image... more
During the past decade, deep learning is one of the essential breakthroughs made in artificial intelligence. In particular, it has achieved great success in image processing. Correspondingly, various applications related to image processing are also promoting the rapid development of deep learning in all aspects of network structure, layer designing, and training tricks. However, the deeper structure makes the back-propagation algorithm more difficult. At the same time, the scale of training images without labels is also rapidly increasing, and class imbalance severely affects the performance of deep learning, these urgently require more novelty deep models and new parallel computing system to more effectively interpret the content of the image and form a suitable analysis mechanism. In this context, this survey provides four deep learning model series, which includes CNN series, GAN series, ELM-RVFL series, and other series, for comprehensive understanding towards the analytical techniques of image processing field, clarify the most important advancements and shed some light on future studies. By further studying the relationship between deep learning and image processing tasks, which can not only help us understand the reasons for the success of deep learning but also inspires new deep models and training methods. More importantly, this survey aims to improve or arouse other researchers to catch a glimpse of the state-of-the-art deep learning methods in the field of image processing and facilitate the applications of these deep learning technologies in their research tasks. Besides, we discuss the open issues and the promising directions of future research in image processing using the new generation of deep learning.
Research Interests:
Research Interests:
Weakly supervised Video Anomaly Detection (VAD) using Multi-Instance Learning (MIL) is usually based on the fact that the anomaly score of an abnormal snippet is higher than that of a normal snippet. In the beginning of training, due to... more
Weakly supervised Video Anomaly Detection (VAD) using Multi-Instance Learning (MIL) is usually based on the fact that the anomaly score of an abnormal snippet is higher than that of a normal snippet. In the beginning of training, due to the limited accuracy of the model, it is easy to select the wrong abnormal snippet. In order to reduce the probability of selection errors, we first propose a Multi-Sequence Learning (MSL) method and a hinge-based MSL ranking loss that uses a sequence composed of multiple snippets as an optimization unit. We then design a Transformer-based MSL network to learn both video-level anomaly probability and snippet-level anomaly scores. In the inference stage, we propose to use the video-level anomaly probability to suppress the fluctuation of snippet-level anomaly scores. Finally, since VAD needs to predict the snippet-level anomaly scores, by gradually reducing the length of selected sequence, we propose a self-training strategy to gradually refine the anomaly scores. Experimental results show that our method achieves significant improvements on ShanghaiTech, UCF-Crime, and XD-Violence.