Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Remote Sensing Object Detection Meets Deep Learning

R emote sensing object detection (RSOD), one of the most fundamental and challenging tasks in the remote sensing field, has received long-standing attention. In recent years, deep learning techniques have demonstrated robust feature representation capabilities and led to a big leap in the development of RSOD techniques. In this era of rapid technical evolution, this article aims to present a comprehensive review of the recent achievements in deep learning-based RSOD methods. More than 300 papers are covered in this review. We identify five main challenges in RSOD, including multiscale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision, and systematically review the corresponding methods developed in a hierarchical division manner. We also review the widely used benchmark datasets and evaluation metrics within the field of RSOD as well as the application scenarios for RSOD. Future research directions are provided for further promoting the research in RSOD.

Remote Sensing Object Detection Meets Deep Learning A metareview of challenges and advances XIANGRONG ZHANG , TIANYANG ZHANG , GUANCHUN WANG , PENG ZHU , XU TANG , XIUPING JIA , AND LICHENG JIAO ©SHUTTERSTOCK.COM/TIERNEYMJ R emote sensing object detection (RSOD), one of the most fundamental and challenging tasks in the remote sensing field, has received long-standing attention. In recent years, deep learning techniques have demonstrated robust feature representation capabilities and led to a big leap in the development of RSOD techniques. In this era of rapid technical evolution, this article aims to present a comprehensive review of the recent achievements in deep Digital Object Identifier 10.1109/MGRS.2023.3312347 Date of current version: 24 October 2023 8 learning-based RSOD methods. More than 300 papers are covered in this review. We identify five main challenges in RSOD, including multiscale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision, and systematically review the corresponding methods developed in a hierarchical division manner. We also review the widely used benchmark datasets and evaluation metrics within the field of RSOD as well as the application scenarios for RSOD. Future research directions are provided for further promoting the research in RSOD. 2473-2397/23©2023IEEE IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. INTRODUCTION With the rapid advances in Earth observation technology, remote sensing satellites (e.g., Google Earth [1], WorldView-3 [2], and Gaofen-series satellites [3], [4], [5]) have made significant improvements in spatial, temporal, and spectral resolutions, and a massive number of remote sensing images (RSIs) are now accessible. Benefiting from the dramatic increase in available RSIs, human beings have entered an era of remote sensing big data, and the automatic interpretation of RSIs has become an active and challenging topic [6], [7], [8]. RSOD aims to determine whether or not objects of interest exist in a given RSI and to return the category and position of each predicted object. The term “object” in this survey refers to man-made or highly structured objects (such as airplanes, vehicles, and ships) rather than unstructured scene objects (e.g., land, the sky, and grass). As the cornerstone of the automatic interpretation of RSIs, RSOD has received significant attention. In general, RSIs are taken at an overhead viewpoint with different ground sampling distances (GSDs) and cover widespread regions of Earth’s surface. As a result, geospatial objects exhibit more significant diversity in scale, angle, and appearance. Based on the characteristics of geospatial objects in RSIs, we summarize the major challenges of RSOD in the following five aspects: 1) Huge scale variations: On the one hand, there are generally massive scale variations across different categories of objects, as illustrated in Figure 1(b): a vehicle may be as small as a 10-pixel area, while an airplane can be 20 times larger than the vehicle. On the other hand, intracategory objects also show a wide range of scales. Therefore, detection models must handle both largescale and small-scale objects. 2) Arbitrary orientations: The unique overhead viewpoint leads to geospatial objects often being distributed with arbitrary orientations, as shown in Figure 1(c). This rotated object detection task exacerbates the challenge of RSOD, making it important for the detector to be perceptive of orientation. 3) Weak feature responses: Generally, RSIs have a complex context and massive amounts of background noise. As depicted in Figure 1(a), some vehicles are obscured by shadows, and the surrounding background noises tend to have a similar appearance to vehicles. This intricate interference may overwhelm the objects of interest and deteriorate their feature representation, which results in the objects of interest being presented as weak feature responses [9]. 4) Tiny objects: As shown in Figure 1(d), tiny objects tend to exhibit extremely small scales and limited appearance information, resulting in a poor-quality feature representation. In addition, the prevailing detection paradigms inevitably weaken or even discard the representation of tiny objects [10]. These problems in tiny object detection bring new difficulties to existing detection methods. 5) Expensive annotation: The complex characteristics of geospatial objects in terms of scale and angle, as well as the expert knowledge required for fine-grained annotations [11], make accurate box-level annotations of RSIs a time-consuming and labor-intensive task. However, the current deep learning-based detectors rely heavily on abundant well-labeled data to reach performance saturation. Therefore, RSOD methods that are efficient when there is in a lack of sufficient supervised information remain challenging. To tackle these challenges, numerous RSOD methods have emerged in the past two decades. At the early stage, researchers adopted template matching [12], [13], [14] and prior knowledge [15], [16], [17] for object detection in remote sensing scenes. These early methods relied more on handcrafted templates or prior knowledge, leading to unstable results. Later, machine learning approaches [18], [19], [20], [21] became mainstream in RSOD, and they view object detection as a classification task. Concretely, the machine learning model first searches a set of object proposals from the input image and extracts the texture, context, and other features of these object proposals. Then, it employs an independent classifier to identify the object categories in these object proposals. However, shallow learning-based features from the machine learning approaches significantly restrict the representations of objects, especially in more challenging scenarios. Besides, the machine learning-based object detection methods cannot be trained in an end-toend manner, which is no longer applicable in the era of remote sensing big data. Recently, deep learning techniques [22] have demonstrated powerful feature representation capabilities from massive amounts of data, and the state-of-the-art detectors [23], [24], [25], [26] in computer vision achieve an object detection ability that rivals that of humans [27]. Drawing on the advanced progress of deep learning techniques, various deep learning-based methods have dominated RSOD and led to remarkable breakthroughs in detection performance. Compared to the traditional methods, deep neural network architectures can extract high-level semantic features and obtain much more robust feature representations of objects. In addition, efficient end-to-end training and automated feature extraction make the deep learningbased object detection methods more suitable for RSOD in the remote sensing big data era. Along with the prevalence of RSOD, a number of geospatial object detection surveys [9], [28], [29], [30], [31], [32], [33], [34] have been published in recent years. For example, Cheng et al. [29] reviewed the early development of RSOD. Han et al. [9] focused on small and weak object detection in RSIs. In [30], the authors reviewed airplane detection methods. Li et al. [31] conducted a thorough sur vey on deep learning-based detectors in the remote sensing community according to various improvement strategies. Besides, some work [28], [33], [34] mainly focused on publishing novel benchmark datasets for RSOD and DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 9 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 10 Weak Feature Response Huge Scale Variations Obscured by Shadows Similar Appearance Arbitrary Orientations Tiny Objects Intracategory Scale Variations Intercategory Scale Variations Extremely Small Size (Fewer Than 10 × 10 Pixels) (a) (b) (c) (d) FIGURE 1. Typical RSIs. (a) Complex context and massive amounts of background noise lead to weak feature responses of objects. (b) Huge scale variations exist in both inter- and intracategory objects. (c) Objects are distributed with arbitrary orientations. (d) Tiny objects tend to exhibit extremely small scales. briefly reviewed object detection methods in the field of remote sensing. Compared with previous works, this article provides a comprehensive analysis of the major challenges in RSOD based on the characteristics of geospatial objects and systematically categorizes and summarizes the deep learning-based remote sensing object detectors according to these challenges. Moreover, more than 300 papers on RSOD are reviewed in this work, leading to a more comprehensive and systematic survey. Figure 2 provides a taxonomy of the object detection methods in this review. According to the major challenges in RSOD, we divide the current deep learning-based RSOD methods into five main categories: multiscale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision. In each category, we further examine subcategories based on improvement strategies or learning paradigms designed for category-specific challenges. For multiscale object detection, we mainly review the three widely used methods: data augmentation, multiscale feature representation, and high-quality multiscale anchor generation. With regard to rotated object detection, we mainly focus on rotated bounding box representation and rotation-insensitive feature learning. For weak object detection, we divide it into two classes: background noise suppression and related context mining. As for tiny object detection, we divide it into three streams: discriminative feature extraction, superresolution reconstruction, and improved detection metrics. According to the learning paradigms, we divide object detection with limited supervision into weakly supervised object detection (WSOD), semisupervised object detection (SSOD), and few-shot object detection (FSOD). Notably, there are still detailed divisions in each subcategory, as shown in the rounded rectangles in Figure 2. This hierarchical division provides a systematic review and summarization of existing methods. It helps researchers understand • Multilayer Feature Integration • Pyramidal Feature Hierarchy • Feature Pyramid Network Multiscale Representation • Predefine • Adaptive Learning • Multiscale Image Pyramids • Modern Augmentation Data Augmentation Multiscale Object Detection Rotated • Five Parameters Object • Eight Parameters Representation • Angle Classification • Gaussian Distribution • Others • Implicit Learning • Explicit Learning Background Noise Suppression Huge Scale Variations Rotated Object Detection RotationInvariant Feature Learning Weak Object Detection RSOD Weak Response Arbitrary Orientations Tiny Objects • Multiscale Feature Learning • Context Mining Discriminative Feature Learning • Image-Level Superresolution • Feature-Level Superresolution Multiscale Anchor Generation Superresolution Expensive Annotation Tiny Object Detection Improved Detection Metrics Weakly Supervised • Global Context • Local Context • Local and Global Related Context Mining Limited Supervision Object Detection Few-Shot Learning • Metalearning Semisupervised • Transfer Learning FIGURE 2. The structured taxonomy of the deep learning-based RSOD methods in this article. A hierarchical division is adopted to describe each subcategory. DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 11 RSOD more comprehensively and facilitate further progress, which is the main purpose of this review. In summary, the main contributions of this article are as follows: ◗ We comprehensively analyze the major challenges in RSOD based on the characteristics of geospatial objects, including huge scale variations, arbitrary orienta- Pixel Area 105 104 103 102 Plane Baseball Diamond Bridge Ground Track Field Small Vehicle Large Vehicle Ship Tennis Court Basketball Court Storage Tank Soccer-Ball Field Roundabout Harbor Swimming Pool Helicopter Container Crane Helipad Airport 101 FIGURE 3. The scale variations for each category in the DOTA v2.0 dataset. Huge scale variations exist in both the inter- and intracategories. tions, weak feature responses, tiny objects, and expensive annotations. ◗ We systematically summarize the deep learning-based object detectors in the remote sensing community and categorize them in a hierarchical manner according to their motivation. ◗ We present a forward-looking discussion of future research directions for RSOD to motivate the further progress of RSOD. MULTISCALE OBJECT DETECTION Due to the different spatial resolutions among RSIs, huge scale variation is a notoriously challenging problem in RSOD and seriously degrades the detection performance. As depicted in Figure 3, we present the distribution of object pixel areas for each category in the DOTA v2.0 dataset [33]. Obviously, the scales vary greatly among categories, in which a small vehicle may have an area less than 10 pixels, while an airport exceeds a 105 pixel area. Worse still, the huge intracategory scale variations further exacerbate the difficulties of multiscale object detection. To tackle the huge scale variation problem, current studies are mainly divided into data augmentation, multiscale feature representation, and multiscale anchor generation. Figure 4 gives a brief summary of multiscale object detection methods. Multiscale Feature Integration Integrating Multilayer Features into a Single Layer for Prediction Data Augmentation Adopting Data Augmentation Strategies Multiscale Object Detection Multiscale Feature Representation Using Multiscale Feature Representation to Replace Single Scale Pyramidal Feature Hierarchy Employing Multilayer Features for Independent Prediction FPNs Constructing Rich Semantic Features at All Levels for Prediction Predefine Multiscale Anchor Generation Generating Multiscale Anchors to Match Objects at Different Scales Setting Multiscale Anchors With Different Scales and Aspect Ratios Adaptive Learning Learning Scale-Adaptive Anchors During the Training Phase FIGURE 4. A brief summary of multiscale object detection methods. 12 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. hierarchy. (d) FPNs. (e) Top-down and bottom-up. (f) Cross-scale feature balance. (f) (d) (b) Prediction Top-Down Pathway Prediction Prediction Prediction Prediction DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. FIGURE 5. Single-scale feature representation and six paradigms for multiscale feature representation. (a) Single-scale feature representation. (b) Multiscale feature integration. (c) Pyramidal feature Prediction Prediction Prediction CrossScale Balance Prediction Prediction Prediction Prediction Prediction Bottom-Up Pathway Prediction Prediction (e) Prediction Prediction (c) (a) MULTISCALE FEATURE INTEGRATION Convolutional neural networks (CNN) usually adopt a deep hierarchical structure, where different level features have different characteristics. The shallow-level features usually contain fine-grained features (e.g., points, edges, and textures of objects) and provide detailed spatial location information, which is more suitable for object localization. In contrast, features from higher-level layers show stronger semantic information and present discriminative information for object classification. To combine the information from different layers and generate a multiscale representation, some researchers introduced multilayer feature integration methods that integrate features from multiple layers into a single feature map and perform the detection on this rebuilt feature map [45], [46], [47], [48], [49], [50], [51], [52]. Figure 5(b) exhibits the structure of multilayer feature integration methods. Zhang et al. [48] designed a hierarchical robust CNN to extract hierarchical spatial semantic information by fusing multiscale convolutional features from three different layers Prediction MULTISCALE FEATURE REPRESENTATION Early studies in RSOD usually utilized the last single feature map of the backbone to detect objects, as illustrated in Figure 5(a). However, such single-scale feature map prediction limits the detector’s ability to handle objects with a wide range of scales [42], [43], [44]. Consequently, multiscale feature representation methods have been proposed and become an effective solution to the huge object scale variation problem in RSOD. The current multiscale feature representation methods are mainly divided into three streams: multiscale feature integration, pyramidal feature hierarchy, and FPNs. Integration DATA AUGMENTATION Data augmentation is a simple yet widely applied approach for increasing dataset diversity. As for the scale variation problem in multiscale object detection, image scaling is a straightforward and effective augmentation method. Zhao et al. [35] fed multiscale image pyramids into multiple networks and fused the output features from these networks to generate multiscale feature representations. In [36], Azimi et al. proposed a combined image cascade and feature pyramid network (FPN) to extract object features on various scales. Although image pyramids can effectively increase the detection performance for multiscale objects, the inference time and computational complexity are severely increased. To tackle this problem, Shamsolmoali et al. [37] designed a lightweight image pyramid module (LIPM). The proposed LIPM receives multiple downsampling images to generate multiscale feature maps and fuses the output multiscale feature maps with the corresponding scale feature maps from the backbone. Moreover, some modern data augmentation methods (e.g., Mosaic and Stitcher [38]) also show remarkable effectiveness in multiscale object detection, especially for small objects [39], [40], [41]. 13 and introduced multiple fully connected layers to enhance the rotation and scaling robustness of the network. Considering the different norms among multilayer features, Lin et al. [49] applied an l2 normalization for each feature before integration to maintain stability in the network training stage. Unlike previous multiscale feature integration at the level of the convolutional layer, Zheng et al. [51] designed HyBlock to build multiscale feature representation at the intralayer level. HyBlock employs the atrous separable convolution with pyramidal receptive fields to learn the hyperscale features, alleviating the scale-variation issue in RSOD. PYRAMIDAL FEATURE HIERARCHY The key insight behind the pyramidal feature hierarchy is that the features in different layers can encode object information from different scales. For instance, small objects are more likely to appear in shallow layers, while large objects tend to exist in deep layers. Therefore, the pyramidal feature hierarchy employs multiple-layer features for independent prediction to detect objects with a wide scale range, as demonstrated in Figure 5(c). The single-shot multibox detector (SSD) [53] is a typical representative of the pyramidal feature hierarchy, which has a wide range of extended applications in both natural scenes [54], [55], [56] and remote sensing scenes [57], [58], [59], [60], [61], [62], [63]. To improve the detection performance for small vehicles, Liang et al. [60] added an extra scaling branch to the SSD, consisting of a deconvolution module and an average pooling layer. Referring to hierarchical regression layers in the SSD, Wang et al. [58] introduced scale-invariant regression layers (SIRLs), where three isolated regression layers are employed to capture the information of full-scale objects. Based on SIRLs, a novel specific scale joint loss is introduced to accelerate network convergence. In [64], Li et al. proposed the HSF-Net that introduces the hierarchical selective filtering layer in both the region proposal network (RPN) and detection subnetwork. Specifically, the hierarchical selective filtering layer employs three convolutional layers with different kernel sizes (e.g., 1 # 1, 3 # 3, and 5 # 5) to obtain multiple receptive field features, which benefits multiscale ship detection. FEATURE PYRAMID NETWORKS Pyramidal feature hierarchy methods use independent multilevel features for detection and ignore the complementary information among features at different levels, resulting in weak semantic information for low-level features. To tackle this problem, Lin et al. [65] proposed the FPN. As explained in Figure 5(d), the FPN introduces a top-down pathway to transfer rich semantic information from high-level features to shallow-level features, leading to rich semantic features at all levels (please refer to the details in [65]). Thanks to the significant improvement of the FPN for multiscale object detection, the FPN and its extensions [66], [67], [68] play a dominant role in multiscale feature representation. 14 Considering the extreme aspect ratios of geospatial objects (e.g., bridges, harbors, and airports), Hou et al. [69] proposed an asymmetric FPN (AFPN). The AFPN adopts the asymmetric convolution block to enhance the feature representation regarding the cross-shaped skeleton and improve the performance of large-aspect-ratio objects. Zhang et al. [70] designed a Laplacian FPN to inject highfrequency information into the multiscale pyramidal feature representation, which is useful for accurate object detection but has been ignored by previous work. In [71], Zhang et al. introduced the high-resolution FPN to fully leverage high-resolution feature representations, leading to precise and robust synthetic aperture radar (SAR) ship detection. In addition, some researchers integrated the novel feature fusion module [72], [73], attention machine [74], [75], [76], [77], or dilation convolution layer [78], [79] into the FPN to further obtain a more discriminative multiscale feature representation. The FPN introduces a top-down pathway to transfer high-level semantic information into the shallow layers, while the low-level spatial information is still lost in the top layers after the long-distance propagation in the backbone. Drawing on this problem, Fu et al. [80] proposed a feature fusion architecture (FFA) that integrates an auxiliary bottom-up pathway into the FPN structure to transfer the low-level spatial information to the top-layer features via a short path, as depicted in Figure 5(e). The FFA ensures that the detector extracts multiscale feature pyramids with rich semantic and detailed spatial information. Similarly, in [81] and [82], the authors introduced a bidirectional FPN that learns the importance of different level features through learnable parameters and fuses the multilevel features through iteratively top-down and bottom-up pathways. Differing from the preceding sequential enhancement pathway [80], some studies [83], [84], [85], [86], [87], [88], [89], [90], [91], [92], [93], [94] adopt a cross-level feature fusion manner. As shown in Figure 5(f), the crosslevel feature fusion methods fully collect the features at all levels to adaptively obtain balanced feature maps. Cheng et al. [83] utilized feature concatenation to achieve cross-scale feature fusion. Considering that features from different levels should have different contributions to the feature fusion, Fu et al. [84] proposed level-based attention to learn the unique contribution of features from each level. Thanks to the powerful global information extraction ability of the transformer structure, some work [88], [89] introduced transformer structures to integrate and refine multilevel features. In [90], Chen et al. presented a cascading attention network in which position supervision is introduced to enhance the semantic information of multilevel features. MULTISCALE ANCHOR GENERATION Apart from data augmentation and multiscale feature representation methods, multiscale anchor generation can also IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. tackle the huge object scale variation problem in RSOD. Due to the difference in the scale range of objects in natural and remote sensing scenes, some studies [95], [96], [97], [98], [99], [100], [101], [102], [103], [104] modify the anchor settings in common object detection to better cover the scales of geospatial objects. Guo et al. [95] injected extra anchors with more scales and aspect ratios into the detector for multiscale object detection. Dong et al. [98] designed more suitable anchor scales based on statistics of the object scales in the training set. Qiu et al. [99] extended the original square region of interest (ROI) features into vertical, square, and horizontal ROI features and fused these ROI features to represent objects with different aspect ratios in a more flexible way. The preceding methods follow a fixed anchor setting, while current studies [100], [101], [102], [103], [104] have attempted to dynamically learn the anchor during the training phase. Considering the aspect ratio variations among different categories, Hou et al. [100] devised a novel self-adaptive aspect ratio anchor (SARA) to adaptively learn an appropriate aspect ratio for each category. SARA embeds the learnable category-wise aspect ratio values into the regression branch to adaptively update the aspect ratio of each category with the gradient of the location regression loss. Inspired by the guided anchoring RPN [105], some researchers [102], [103], [104] introduced a lightweight subnetwork into the detector to adaptively learn the location and shape information of anchors. ROTATED OBJECT DETECTION The arbitrary orientation of objects is another major challenge in RSOD. Since the objects in RSIs are acquired from a bird’s-eye view, they exhibit the property of arbitrary orientations, so the widely used horizontal bounding box (HBB) representation in generic object detection is insufficient to locate rotated objects accurately. Therefore, numerous researchers have focused on the arbitrary orientation property of geospatial objects, which can be summarized into rotated object representation and rotation-invariant feature learning. A brief summary of rotated object detection methods is provided in Figure 6. ROTATED OBJECT REPRESENTATION Rotated object representation is essential for RSOD to avoid redundant backgrounds and obtain precise detection results. Recent rotated object representation methods can be mainly summarized into several categories: five-parameter representation [107], [108], [109], [110], [111], [112], [113], [114], [115], [116], eight-parameter representation [117], [118], [119], [120], [121], [122], [123], [124], [125], [126], Five-Parameter Five Parameters Regression Rotated Object Representation Eight Parameters Employing Various Representations to Model Rotated Objects Four-Vertex Regression Rotated Object Detection Angle Classification Transforming Angle Regression to Angle Classification Gaussian Distribution Rotation-Invariant Feature Learning Encoding Rotation-Invariant Information into the Detector Representing the Rotated Objects With Gaussian Distribution Others Segmentation-Based Method Keypoint-Based Method FIGURE 6. A brief summary of rotated object detection methods. DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 15 b b a a W int Po ter n t Ce igh He θ ∈ [–90,0) h idt nte Ce t igh He t oin rP dth θ ∈ [–90,0) Wi x-Axis c x-Axis c d d x-Axis a b b [115], [116] refers to i as the angle between the x-axis and the long side, whose range is 180°, as in Fig ure 7(b). Ding et al. [114] regressed rotation angles by five-parameter methods and transformed the features of horizontal regions into rotated ones to facilitate rotated object detection. EIGHT PARAMETERS Differing from five-parameter methigh c Ce t ods, eight-parameter methods [117], nte rP [118], [119], [120], [121], [122], [123], oin t h [124], [125], [126] solve the issue of idt θ ∈ [–90,0) a W rotated object representation by x-Axis d d c directly regressing four vertices, (a) (b) (c) "^a x, a y h, ^ b x, b y h, ^c x, c y h, ^d x , d y h,, as described in Figure 7(c). Xia et al. FIGURE 7. A visualization of the five-parameter representation and eight-parameter rep[117] first adopted the eight-parameresentation methods for rotated objects [106]. (a) Five-parameter representation with 90° ter method for rotated object repreangular range. (b) Five-parameter representation with 180° angular range. (c) Eight-parameter sentation, which directly supervises representation. the detection model by minimizing the difference between each vertex angle classif ication representation [106], [127], [128], and the ground truth coordinates during training. How[129], Gaussian distribution representation [130], [131], ever, the sequence order of these vertices is essential for the [132], [133], and others [134], [135], [136], [137], [138], [139], eight-parameter method to avoid unstable training. As evi[140], [141], [142], [143], [144]. dent in Figure 8, it is intuitive that regressing targets from the red dotted arrow is an easier route, but the actual proFIVE PARAMETERS cess follows the red solid arrows, which causes the difficulty The most popular solution is representing objects with a of model training. To this end, Qian et al. [119], [121] profive-parameter method (x, y, w, h, i), which simply adds an posed a modulated loss function that calculates the losses extra rotation angle parameter i on the HBB [107], [108], under different sorted orders and selects the minimum case [109], [110], [111], [112], [113], [114], [115]. The definition to learn, efficiently improving the detection performance. of the angular range plays a crucial role in such methods, where two kinds of definitions are derived. Some studies ANGLE CLASSIFICATION [107], [108], [109], [110], [111], [112] define i as the acute To address the issue present in Figure 8, many researchers angle to the x-axis and restrict the angular range to 90°, as in [106], [127], [128], [129] take a detour from the boundary Figure 7(a). As the most representative work, Yang et al. [107] challenge of regression by transforming the angle predicfollowed the five-parameter method to detect rotated obtion problem into an angle classification task. Yang et al. jects and designed an intersection-over-union (IOU)-aware [106] proposed the first angle classification method loss function to tackle the boundary discontinuity problem for rotated object detection, which converts the continuof rotation angles. Another group of works [113], [114], ous angle into a discrete kind and trains the model with novel circular smooth labels. However, the angle classification head [106] introduces additional parameters and b a b w h Step 1 degrades the detector’s efficiency. To overcome this, Yang w et al. [129] improved the work in [106] with a densely w coded label that ensures both the accuracy and efficiency c h h X of the model. Wi dth t igh He t oin rP nte Ce He θ ∈ [–90,0) X Step 2 a θ Y (a) d d Y c (b) FIGURE 8. The boundary discontinuity challenge of the (a) five- parameter method and (b) eight-parameter method [119], [121]. 16 GAUSSIAN DISTRIBUTION Although the preceding methods achieve promising progress, they do not consider the misalignment between the actual detection performance and optimization metric. Most recently, a series of works [130], [131], [132], [133] aim to handle this challenge by representing rotated objects with a Gaussian distribution, as detailed in Figure 9. Specifically, IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. Gaussian Distribution 150 100 x = 50 100 N(m, Σ ) 50 50 0 0 –50 –50 B(x, y, w, h, θ) 0 0 20 0 10 20 –150 0 –150 15 –100 15 –100 0 0 10 y = 10 10 0 0 ROTATION-INVARIANT FEATURE LEARNING Rotation-invariant features remain consistent under any rotation transformations. Thus, rotation-invariant feature learning of objects is a crucial research field to tackle the arbitrary orientation problem in rotated object detection. To this purpose, many researchers proposed a series of methods for learning the rotational invariance of objects [146], [147], [148], [149], [150], [151], [152], [153], [154], [155], [156], [157], which significantly improves rotated object detection in RSIs. Cheng et al. [146] proposed the first rotation-invariant object detector to precisely recognize objects by using 150 10 OTHERS Some researchers solve the rotated object representation by other approaches, such as segmentation based [134], [135], [136] and keypoint based [137], [138], [139], [140], [141], [142], [143], [144]. The representative one in segmentation-based methods is Mask OBB [134], which deploys the segmentation method on each horizontal proposal to obtain the pixel-level object region and produce the minimum external rectangle as a rotated bounding box. On the other side, Wei et al. [142] adopted a keypoint-based representation for rotated objects, which locates the object center and leverages a pair of middle lines to represent the whole object. In addition, Yang et al. [145] proposed the first rotated object detector supervised by horizontal box annotations, which adopts the self-supervised learning of two different views to predict the angles of rotated objects. WEAK OBJECT DETECTION Objects of interest in RSIs are typically embedded in complex scenes with intricate object spatial patterns and massive amounts of background noise. The complex context and background noise severely harm the feature representation of objects of interest, resulting in weak feature responses to objects of interest. Thus, many existing works have concentrated on improving the feature representation of objects of interest, which can be grouped into two streams: suppressing background noise and mining related context information. A brief summary of weak object detection methods is given in Figure 10. 0 where R represents the rotation matrix and K represents the diagonal matrix of the eigenvalues. With the Gaussian distribution representation in (1), the IOU between two rotated objects can be simplified as a distance estimation between two distributions. Besides, the Gaussian distribution representation discards the definition of the angular boundary and effectively solves the angular boundary problem. Yang et al. [130] proposed a novel metric with a Gaussian Wasserstein distance for measuring the distance between distributions, which achieves remarkable performance by efficiently approximating the rotation IOU. Based on this, Yang et al. [131] introduced a Kullback-Leibler divergence (KLD) metric to enhance the scale invariance. 50 (1) 0 cos i - sin i m sin i cos i 50 w 0 cos i sin i f 2 h p c - sin i cos i m 0 2 N Jw h w-h 2 2 K 2 cos i + 2 sin i 2 cos i sin i O =K w - h O w h K cos i sin i 2 sin 2 i + 2 cos 2 iO 2 P L n = (x, y) < =c –2 00 –1 50 –1 00 –5 0 R 1/2 = RKR < rotation-insensitive features, which forces the features of objects to be consistent at different rotation angles. Later, Cheng et al. [148], [149] employed the rotationinvariant and Fisher discrimination regularizers to encourage the detector to learn both rotation-invariant and discriminative features. In [150] and [151], Wu et al. analyzed object rotation invariance under polar coordinates in the Fourier domain and designed a spatial frequency channel feature extraction module to obtain rotation-invariant features. Considering misalignment between axis-aligned convolutional features and rotated objects, Han et al. [156] proposed an oriented detection module that adopts a novel alignment convolution operation to learn the orientation information. In [155], Han et al. further devised a rotation-equivariant detector to explicitly encode rotation equivariance and rotation invariance. Besides, some researchers [80], [157] extended the RPN with a series of predefined rotated anchors to cope with the arbitrary orientation characteristics of geospatial objects. We summarize the detection performance of milestone rotated object detection methods in Table 1. –2 00 –1 50 –1 00 –5 0 these methods convert rotated objects into a 2D Gaussian distribution N ^n, Rh, as follows: FIGURE 9. A visualization of the Gaussian distribution representation methods for rotated objects [130]. DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 17 SUPPRESSING BACKGROUND NOISE This type of method aims to strengthen the weak response of the object region in the feature map by weakening the response of background regions. It can be mainly divided into two categories: implicit learning and explicit supervision. TABLE 1. THE PERFORMANCE OF ROTATED OBJECT DETECTION METHODS ON THE DOTA V1.0 DATASET WITH ROTATED ANNOTATIONS. MODEL BACKBONE METHOD mAP (%) SCRDet [107] R-101-FPN Five parameters 72.61 O2Det [142] H-104 Keypoint based 72.8 R 3Det [108] R-101-FPN Five parameters 73.79 S2 Anet [156] R-50-FPN Rotation-invariant feature 74.12 RoI Transformer [114] R-50-FPN Five parameters 74.61 Mask OBB [134] R-50-FPN Segmentation based 74.86 Gliding Vertex [120] R-101-FPN Four vertices 75.02 DCL [128] R-152-FPN Angle classification 75.54 ReDet [155] ReR50-ReFPN Rotation-invariant feature 76.25 Oriented R-CNN [124] R-101-FPN Four vertices 76.28 R 3Det-KLD [131] Gaussian distribution 77.36 R-50-FPN mAP: mean of the average precision. IMPLICIT LEARNING Implicit learning methods employ carefully designed modules in the detector to adaptively learn important features and suppress redundant features during the training phase, thereby reducing background noise interference. In machine learning, dimensionality reduction can effectively learn compact feature representation and suppress irrelevant features. Drawing on the preceding property, Ye et al. [158] proposed a feature filtration module that captures low-dimensional feature maps by consecutive bottleneck layers to filter background noise interference. Inspired by the selective focus of human visual perception, the attention mechanism has been proposed and heavily researched [159], [160], [161]. The attention mechanism redistributes the feature importance during the network learning phase to enhance important features and suppress redundant information. Consequently, the attention mechanism has also been widely introduced in RSOD to tackle the background noise interference problem [57], [162], [163], [164], [165], [166], [167], [168], [169], [170]. In [162], Huang et al. emphasized the importance of patch-patch dependencies for RSOD and designed a novel nonlocal perceptual pyramidal attention (NP-Attention). NP-Attention learns spatial multiscale nonlocal dependencies and channel dependencies to enable the detector Implicit Learning Learning Important Features Background Noise Suppression Suppressing Background Noise to Enhance Object Feature Response Weak Object Detection Explicit Supervision Employing Saliency Supervision Local Context Mining Capturing Correlations Between Objects and Surroundings Global Context Mining Related Context Mining Leveraging the Related Context as Auxiliary Feature Representation Exploiting Associations Between Objects and Global Scene Semantics Local and Global Context Mining Combined Local and Global Context FIGURE 10. A brief summary of weak object detection methods. 18 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. to concentrate on the object region rather than the background. Considering the strong scattering interference of the land area in SAR images, Sun et al. [163] presented a ship attention module to highlight the feature representation of ships and reduce false alarms from the land area. Moreover, a series of attention mechanisms devised for RSOD (e.g., spatial shuffle group enhanced attention [165], multiscale spatial and channel-wise attention [166], discrete wavelet multiscale attention [167], and so on) have demonstrated their effectiveness in suppressing background noise. the feature representation of the foreground regions in the refined stage. MINING RELATED CONTEXT INFORMATION Context information typically refers to the spatial and semantic relations between an object and the surrounding environment or scene. This context information can provide auxiliary feature representations to the object that could not be clearly distinguished. Thus, mining context information can effectively solve the weak feature responses problem in RSOD. According to the category of context information, existing methods are mainly classified into local and global context information mining. EXPLICIT SUPERVISION Unlike implicit learning methods, the explicit supervision approach employs auxiliary saliency supervision information to explicitly guide the detector to highlight the foreground regions and weaken the background. Li et al. [171] employed the region contrast method to obtain the saliency map and construct the saliency feature pyramid by fusing the multiscale feature maps with the saliency map. In [172], Lei et al. extracted the saliency map via the saliency detection method [173] and proposed a saliency reconstruction network. The saliency reconstruction network utilizes the saliency map as pixel-level supervision to guide the training of the detector to strengthen saliency regions in feature maps. The preceding saliency detection methods are typically unsupervised, and the generated saliency map may contain nonobject regions, as exhibited in Figure 11(b), providing inaccurate guidance to the detector. Therefore, later works [107], [134], [174], [175], [176], [177], [178], [179], [180] transformed box-level annotation into object-level saliency guidance information [as shown in Figure 11(c)] to generate more accurate saliency supervision. Yang et al. [107] designed a pixel attention network that employs object-level saliency supervision to enhance object cues and weaken the background information. In [175], Zhang et al. proposed the FoRDet to exploit object-level saliency supervision in a more concise way. Concretely, the proposed FoRDet leverages the prediction of foreground regions in the coarse stage (supervised under box-level annotation) to enhance (a) LOCAL CONTEXT INFORMATION MINING Local context information refers to the correlation between an object and its surrounding environment in visual information and spatial distribution [147], [181], [182], [183], [184], [185], [186], [187]. Zhang et al. [181] generated multiple local context regions by scaling the original region proposal into three different sizes and proposed a contextual bidirectional enhancement module to fuse the local context features with object features. The contextaware CNN [182] employed a context ROI mining layer to extract context information about surrounding objects. The context ROI for an object is first generated by merging a series of filtered proposals around the object and then fused with the object ROI as the final object feature representation for classification and regression. In [183], Ma et al. exploit gated recurrent units to fuse object features with local context information, leading to a more discriminative feature representation for the object. Graph convolutional networks have recently shown better performance in object-object relationship reasoning. Hence, Tian et al. [184], [185] constructed spatial and semantic graphs to model and learn the contextual relationships among objects. GLOBAL CONTEXT INFORMATION MINING Global context information exploits the association between an object and the scene [188], [189], [190], [191], (b) (c) FIGURE 11. The (a) input image, (b) saliency map generated by the saliency detection method [173], and (c) object-level saliency map. DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 19 [192], [193], [194], [195]; e.g., vehicles are generally located on roads and ships typically appear at sea. Chen et al. [188] extracted the scene context information from the global image feature with the ROI align operation and fused it with the object-level ROI features to strengthen the relationship between the object and the scene. Liu et al. [192] designed a scene auxiliary detection head that exploits the scene context information under scene-level supervision. The scene auxiliary detection head embeds the predicted scene vector into the classification branch to fuse objectlevel features with scene-level context information. In [193], Tao et al. presented a scene context-driven vehicle detection approach. Specifically, a pretrained scene classifier is introduced to classify each image patch into three scene categories. Then, scene-specific vehicle detectors are employed to achieve preliminary detection results, and finally, the detection results are further optimized with the scene contextual information. Considering the complementarity of local and global context information, Zhang et al. [196] proposed CAD-Net to mine both local and global context information. CADNet employed a pyramid local context network to learn object-level local context information and designed a global context network to extract scene-level global context information. In [103], Teng et al. proposed GLNet to collect context information, from global to local, so as to achieve a robust and accurate detector for RSIs. Besides, some studies [197], [198], [199] also introduced atrous spatial pyramid pooling [200] or a receptive field block module [54] to leverage both local and global context information. TINY OBJECT DETECTION The typical GSD for RSIs is 1–3 m, which means that even large objects (e.g., airplanes, ships, and storage tanks) can occupy fewer than 16 # 16 pixels. Besides, even in highresolution RSIs with a GSD of 0.25 m, a vehicle with a dimension of 3 # 1.5 m 2 covers only 72 pixels (12 # 6). This prevalence of tiny objects in RSIs further increases the difficulty of RSOD. Current studies on tiny object detection are mainly grouped into discriminative feature learning, superresolution-based methods, and improved detection metrics. The tiny object detection methods are briefly summarized in Figure 12. DISCRIMINATIVE FEATURE LEARNING The extremely small scales (less than 16 # 16 pixels) of tiny objects result in limited appearance information, which poses serious challenges for detectors to learn the features of tiny objects. To tackle the problem, many researchers focus on improving the discriminative feature learning ability for tiny objects [201], [202], [203], [204], [205], [206], [207], [208]. Since tiny objects mainly exist in shallow features and lack high-level semantic information [65], some literature [201], [202], [203] introduces top-down structures to fuse Multiscale Feature Learning Increasing the Semantic Information for Tiny Objects Discriminative Feature Learning Learning Discriminative Features for Tiny Objects Context Mining Mining Contextual Information Related to Tiny Objects Tiny Object Detection Improved Detection Metrics Employing Well-Designed Detection Metrics for Tiny Objects Image-Level Superresolution Integrating The Superresolution Strategy as a Preprocessing Step Superresolution-Based Methods Increasing Resolution to Promote Tiny Object Detection Feature-Level Superresolution Applying the Superresolution Strategy at Feature Level FIGURE 12. A brief summary of tiny object detection methods. 20 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. high-level semantic information into shallow features to strengthen the semantic information for tiny objects. Considering the limited appearance information of tiny objects, some studies [204], [205], [206], [207], [208] establish a connection between a tiny object and the surrounding contextual information through a self-attention mechanism or dilated convolution to enhance the feature discriminative of tiny objects. Notably, some previously mentioned studies on multiscale feature learning and context information mining also demonstrate remarkable effectiveness in tiny object detection. SUPERRESOLUTION-BASED METHOD The extremely small scale is a crucial issue for tiny object detection, so increasing the resolution of images is an intuitive solution to promote the detection performance of tiny objects. Some methods [209], [210], [211], [212] employ superresolution strategies as a preprocessing step in the detection pipeline to enlarge the resolution of input images. For example, Rabbi et al. [211] emphasized the importance of edge information for tiny object detection and proposed an edge-enhanced superresolution generative adversarial network (GAN) to generate visually pleasing high-resolution RSIs with detailed edge information. Wu et al. [212] developed a point-to-region detection framework for tiny objects. The point-to-region framework first obtains the proposal regions with keypoint prediction and then employs a multitask GAN to perform superresolution on the proposal regions and detect tiny objects in these proposal regions. However, the high-resolution image generated by superresolution brings extra computational complexity to the detection pipeline. Drawing on this problem, [213] and [214] employ the superresolution strategy at the feature level to acquire discriminative feature representation of tiny objects and effectively save computational resources. IMPROVED DETECTION METRICS FOR TINY OBJECTS Unlike the first two types of methods, recent advanced works [10], [215], [216], [217], [218], [219], [220], [221], [222] assert that the current prevailing detection paradigms are unsuitable for tiny object detection and inevitably hinder tiny object detection performance. Pang et al. [215] argued that excessive downsampling operations in modern detectors lead to the loss of tiny objects on the feature map and proposed a zoom-out/zoom-in structure to enlarge the feature map. In [218], Yan et al. adjusted the IOU threshold in the label assignment to increase the positive assigned anchors for tiny objects, facilitating the learning of tiny objects. Dong et al. [219] devised Sig-NMS to reduce the suppression of tiny objects by large and medium objects in traditional nonmaximum suppression (NMS). In [10], Xu et al. pointed out that the IOU metric is unsuitable for tiny object detection. As shown in Figure 13, the IOU metric is sensitive to slight location offsets. Besides, IOU-based label assignment suffers from a severe scale imbalance problem, where tiny objects tend to be assigned with insufficient positive samples. To solve these problems, Xu et al. [10] designed a normalized Wasserstein distance (NWD) to replace the IOU metric. The NWD models tiny objects as 2D Gaussian distributions and utilizes the NWD between Gaussian distributions to represent the location relationship among tiny objects, as detailed in [10]. Compared with the IOU metric, the proposed NWD metric is smooth to location deviations and has the characteristic of scale balance, as depicted in Figure 13(b). In [222], Xu et al. further proposed the receptive field distance for tiny object detection and achieved state-of-the-art performance. OBJECT DETECTION WITH LIMITED SUPERVISION In recent years, the widely used deep learning-based detectors in RSIs have heavily relied on large-scale datasets with high-quality annotations to achieve state-of-the-art 1 1 4 12 32 48 0.8 4 12 32 48 0.8 0.6 IOU NWD 0.6 0.4 0.4 0.2 0.2 0 –30 –20 –10 10 0 Deviation (a) 20 30 0 –30 –20 –10 10 0 Deviation (b) 20 30 FIGURE 13. A comparison of the (a) IOU deviation curve and (b) normalized Wasserstein distance (NWD) deviation curve [10]. Please refer to [10] for details. DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 21 WSOD Image-Level or Point-Level Supervision Object Detection With Limited Supervision Semisupervised Object Detection Metalearning A Small Part of Labeled Samples With Abundant Unlabeled Samples Simulating Few-Shot Learning Tasks to Acquire Task-Level Knowledge FSOD Transfer Learning A Limited Number of Labeled Samples (No More Than 30) Transferring Common Knowledge to Few-Shot Novel Data FIGURE 14. A brief summary of object detection methods with limited supervision. performance. However, collecting volumes of well-labeled data is considerably expensive and time-consuming (e.g., a bounding box annotation would cost about 10 s), which leads to a data-limited or annotation-limited scenario in RSOD [11]. This lack of sufficient supervised information seriously degrades detection performance. To tackle this problem, researchers have explored various tasks in RSOD with limited supervision. We summarize the previous research into three main types: WSOD, SSOD, and FSOD. Figure 14 provides a brief summary of object detection methods with limited supervision. bag; and y i is the weakly supervised information (e.g., image-level labels [223] or point-level labels [224]) of X i . Effectively transferring image-level supervision to object-level labels is the key challenge in WSOD [225]. Han et al. [226] introduced a deep Boltzmann machine to learn the high-level features of objects and proposed a weakly supervised learning framework based on Bayesian principles for remote sensing WSOD. Li et al. [227] exploited the mutual information between scene pairs to learn discriminative convolutional weights and employed a multiscale category activation map to locate geospatial objects. Motivated by the remarkable performance of WSDDN WEAKLY SUPERVISED OBJECT DETECTION [228], a series of remote sensing WSOD methods [229], Compared to fully supervised object detection, WSOD con[230], [231], [232], [233], [234], [235], [236], [237], [238], tains only weakly supervised information. Formally, WSOD [239], [240], [241] are proposed. As detailed in Figure 15, the consists of a training data set D train = "^ X i, y ih,Ii = 1, where paradigm of the current WSOD methods usually consists X i = " x 1, ..., x mi , is a collection of training samples, termed of two steps, first constructing a multiple-instance learning the bag; m i is the total number of training samples in the model to find contributing proposals to the image classification task as pseudolabels and then utilizing them to train the detector. Yao et al. [229] introduced a dynamic MultipleClassification curriculum learning strategy, where Pseudolabels Instance the detector progressively improves Loss Learning detection performance through Instance Image-Level Selection an easy-to-hard training process. Annotations Feng et al. [231] designed a progresMultiple Instance Learning Stage Pseudolabels sive contextual instance refinement Object Detector Training Stage method that suppresses low-quality Training Images object parts and highlights the whole object by leveraging surrounding Prediction Detector context information. Wang et al. [233] introduced a spatial and appearance relation graph into WSOD, FIGURE 15. The two-step paradigm of recent WSOD methods [229], [230], [231], [232], [233], which propagates high-quality label [234], [235], [236], [237], [238], [239], [240], [241]. information to mine more possible 22 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. objects. In [240], Feng et al. argued that existing remote sensing WSOD methods ignored the arbitrary orientations of geospatial objects, resulting in rotation-sensitive object detectors. To address this problem, Feng et al. [240] proposed RINet, which brings rotation-invariant yet diverse feature learning to WSOD by employing rotation-invariant learning and multiple-instance mining. We summarize the performance of milestone WSOD methods in Table 2, where the correct localization metric [242] is adopted to evaluate the localization performance. box-level labeled samples to boost performance in labelscarce instance segmentation. FEW-SHOT OBJECT DETECTION FSOD refers to detecting novel classes with only a limited number (no more than 30) of samples. Generally, FSOD contains a base-class dataset with abundant samples D base = "^ x i, y ih, y i ! C base ,Ii base = 1 and a novel-class dataset with )K only K-shot samples D novel = "^ x j, y j h, y j ! C novel ,Cj =novel . 1 Note that C base and C novel are disjointed. As displayed in Figure 17, a typical FSOD paradigm consists of a two-stage training pipeline, where the base training stage establishes SEMISUPERVISED OBJECT DETECTION SSOD typically contains only a small portion (no more than 50%) of well-labeled samples D labeled = "^ x i, y ih,Ii labeled = 1 , makTABLE 2. THE PERFORMANCE OF WSOD METHODS ON ing it difficult to construct a reliable supervised detecTHE NWPU VHR-10 V2 AND DIOR DATASETS. tor, a nd ha s a la rge nu mbe r of u n labe led sa mples I unlabeled ^ h " , = D unlabeled x j j = 1 . SSOD aims to improve detection NWPU VHR-10 DIOR performance under scarce supervised information by learnMODEL CORLOC (%) MAP (%) CORLOC (%) MAP (%) ing the latent information from volume unlabeled samples. WSDDN [228] 35.24 35.12 32.44 13.26 Hou et al. [243] proposed SCLANet for semisupervised DCL [229] 69.65 52.11 42.23 20.19 SAR ship detection. SCLANet employs adversarial learning DPLG [230] 61.5 53.6 — — between labeled and unlabeled samples to exploit unlaPCIR [231] 71.87 54.97 46.12 24.92 beled sample information and adopts consistency learnMIGL [233] 70.16 55.95 46.8 25.11 ing for unlabeled samples to enhance the robustness of the TCANet [234] 72.76 58.82 48.41 25.82 network. The pseudolabel generation mechanism is also a SAENet [235] 73.46 60.72 49.42 27.1 widely used approach for SSOD [244], [245], [246], [247], OS-DES [236] 73.68 61.49 49.92 27.52 SPG+MELM [239] 73.41 62.8 48.3 25.77 [248], and the typical paradigm is presented in Figure 16. RINet [240] — 70.4 52.8 28.3 First, a pretrained detector learned from scare labeled samMOL [241] 75.96 75.46 50.66 29.21 ples is used to predict unlabeled samples. Then, the pseudolabels with higher confidence scores are selected as the CorLoc: correct localization metric. trusted part, and finally, the model is retrained with the labeled and pseudolabeled samples. Wu et al. [246] proposed self-paced curriculum learning that follows an Step 1: Pretrain “easy-to-hard” scheme to select more Prediction Labeled reliable pseudolabels. Zhong et al. Samples [245] adopt an active learning stratStep 3: Detector egy in which high-scored predictions Retrain are manually adjusted by experts to Unlabeled Step 2: Test obtain refined pseudolabels. Chen Samples Pseudolabels et al. [247] employed teacher-student mutual learning to fully leverage unlabeled samples and iteratively gen- FIGURE 16. The pseudolabel generation mechanism in SSOD. erate higher-quality pseudolabels. In addition, some studies[249], [250], [251], [252], [253] have worked Base Training Stage on weak SSOD, in which unlabeled Base Class Base samples are replaced with weakly Detector Set Prediction annotated samples. Du et al. [251], Assist Prior [252] employed a large number of Knowledge image-level labeled samples to imBase and Novel Novel Class prove SAR vehicle detection perforDetector Prediction Set mance under scarce box-level labeled Few-Shot Fine-Tuning Stage samples. Chen et al. [253] adopted a small number of pixel-level labeled samples and a dominant number of FIGURE 17. The two-stage training pipeline of FSOD. DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 23 prior knowledge with abundant base-class samples, and the few-shot fine-tuning stage leverages the prior knowledge to facilitate the learning of few-shot novel concepts. The research on remote sensing FSOD mainly focuses on metalearning methods [254], [255], [256], [257], [258], [259] and transfer learning methods [260], [261], [262], [263], [264], [265], [266], [267], [268], [269]. The metalearning-based methods acquire task-level knowledge by simulating a series of few-shot learning tasks and generalize this knowledge to tackle the few-shot learning of novel classes. Li et al. [255] first employed metalearning for remote sensing FSOD and achieved satisfactory detection performance with only one to 10 labeled samples. Later, a series of metalearning-based few-shot detectors were developed in the remote sensing community [254], [255], [256], [257], [258], [259]. For example, Cheng et al. [254] proposed a prototype CNN to generate better foreground proposals and class-aware ROI features for remote sensing FSOD by learning class-specific prototypes. Wang et al. [258] presented a metametric training paradigm to enable a few-shot learner with flexible scalability for fast adaptation to few-shot novel tasks. Transfer learning-based methods aim at fine-tuning common knowledge learned from abundant annotated data to few-shot novel data and typically consist of a base training stage and a few-shot fine-tuning stage. Huang et al. [266] proposed a balanced fine-tuning strategy to alleviate the number imbalance problem between novel-class samples and base-class samples. Zhou et al. [265] introduced proposal-level contrast learning in the fine-tuning phase to learn more robust feature representations in few-shot scenarios. Compared with the metalearning-based methods, the transfer learning-based method has a simpler and more memory-efficient training paradigm. DATASETS AND EVALUATION METRICS DATASETS INTRODUCTION AND SELECTION Datasets have played an indispensable role throughout the development of object detection in RSIs. On the one hand, datasets serve as common ground for the performance evaluation and comparison of detectors. On the other hand, datasets push researchers to address increasingly challenging problems in the RSOD field. In the past TABLE 3. COMPARISONS OF WIDELY USED DATASETS IN THE FIELD OF RSOD. DATASET SOURCE ANNOTATION CATEGORIES INSTANCES IMAGES IMAGE WIDTH TAS [270] Google Earth HBB 1 1,319 30 792 RESOLUTION YEAR — 2008 SZTAKI-INRIA [271] QuickBird, Ikonos, and Google Earth OBB 1 665 9 ~800 0.5–1 m 2012 NWPU VHR-10 [18] Google Earth HBB 10 3,651 800 ~1,000 0.3–2 m 2014 VEDAI [272] Utah Automated Geographic Reference Center OBB 9 2,950 1,268 1,024 0.125 m 2015 DLR 3K [273] DLR 3K camera system OBB 8 14,235 20 5,616 0.13 m 2015 UCAS-AOD [274] Google Earth OBB 2 6,029 910 ~1,000 0.3–2 m 2015 COWC [275] Multiple sources Point 1 32,716 53 2,000–19,000 0.15 m 2016 HRSC [276] Google Earth OBB 26 2,976 1,061 ~1,100 0.4–2 m 2016 RSOD [43] Google Earth and Tianditu HBB 4 6,950 976 ~1,000 0.3–3 m 2017 SSDD [277] RadarSat-2, TerraSAR-X, and Sentinel-1 HBB 1 2,456 1,160 500 1–15 m 2017 LEVIR [278] Google Earth HBB 3 11,000 22,000 800–600 0.2–1 m 2018 xView [2] WorldView-3 HBB 60 1 million 1,413 ~3,000 0.3 m 2018 DOTA v1.0 [117] Google Earth, Jilin-1, and Gaofen-2 HBB and OBB 15 188,282 2,806 800–13,000 0.1–1 m 2018 HRRSD [48] Google Earth and Baidu Maps HBB 13 55,740 21,761 152–10,569 0.15–1.2 m 2019 DIOR [28] Google Earth HBB 20 190,288 23,463 800 0.5–30 m 2019 AIR-SARShip-1.0 [279] Gaofen-3 HBB 1 3,000 31 3,000 1 and 3 m 2019 MAR20 [280] Google Earth HBB and OBB 20 22,341 3,824 ~800 — 2020 FGSD [281] Google Earth OBB 43 5,634 2,612 930 0.12–1.93 m 2020 DOSR [282] Google Earth OBB 20 6,172 1,066 600–1,300 0.5–2.5 m 2021 AI-TOD [283] Multiple sources HBB 8 700,621 28,036 800 — 2021 FAIR1M [34] Gaofen satellites and Google Earth OBB 37 1,020,579 42,796 600–10,000 0.3–0.8 m 2021 DOTA-v2.0 [33] Google Earth, Jilin-1, Gaofen-2, and airborne images HBB and OBB 18 1,793,658 11,268 800–20,000 0.1–4.5 m 2021 SODA-A [284] Google Earth OBB 9 800,203 2,510 4,761 × 2,777* — 2022 OBB: oriented bounding box. *Average image width. 24 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. decade, several datasets with different attributes have been released to facilitate the development of RSOD, as listed in Table 3. In this section, we mainly introduce 10 widely used datasets with specific characteristics: 1) NWPU VHR-10 [18]: This dataset is a multiclass geospatial object detection dataset. It contains 3,775 HBB annotated instances in 10 categories: airplane, ship, storage tank, baseball diamond, tennis court, basketball court, ground track field, harbor, bridge, and vehicle. There are 800 very high-resolution RSIs, consisting of 715 color images from Google Earth and 85 pansharpened color infrared images from Vaihingen data. The image resolutions range from 0.5 to 2 m. 2) VEDAI [272]: VEDAI is a fine-grained vehicle detection dataset that contains five fine-grained vehicle categories: camping car, car, pickup, tractor, truck, and van. There are 1,210 images and 3,700 instances in the VEDAI dataset, and the size of each image is 1, 024 # 1, 024. A small area and an arbitrary orientation of vehicles are the main challenges in the VEDAI dataset. 3) UCAS-AOD [274]: The UCAS-AOD dataset includes 910 images and 6,029 objects, where 3,210 aircraft are contained in 600 images and 2,819 vehicles are contained in 310 images. All images are acquired from Google Earth, with an image size of approximately 1, 000 # 1, 000. 4) HRSC [276]: The HRSC dataset is widely used for arbitrary orientation ship detection and consists of 1,070 images and 2,976 instances with oriented bounding box (OBB) annotation. The images are captured from Google Earth, containing offshore and onshore scenes. The image sizes vary from 300 # 300 to 1, 500 # 900, and the image resolutions range from 2 to 0.4 m. 5) SSDD [277]: SSDD is the first open dataset for SAR image ship detection and contains 1,160 SAR images and 2,456 ships. The SAR images in the SSDD dataset are collected from different sensors with resolutions from 1 to 15 m and have different polarizations (horizontal-horizontal, vertical-vertical, vertical-horizontal, and horizontal-vertical). Subsequently, the author further refines and enriches the SSDD dataset into three different types to satisfy the current research of SAR ship detection [286]. 6) xView [2]: The xView dataset is one of the largest publicly available datasets in RSOD, with approximately 1 million labeled objects across 60 fine-grained classes. Compared to other RSOD datasets, the images in the xView dataset are collected from WorldView-3 at a 0.3-m GSD, providing higher-resolution images. Moreover, the xView dataset covers more than 1,400 km 2 of Earth’s surface, which leads to higher diversity. 7) DOTA [117]: DOTA is a large-scale dataset consisting of 188,282 objects annotated with both HBBs and OBBs. All objects are divided into 15 categories: plane, ship, storage tank, baseball diamond, tennis court, swimming pool, ground track field, harbor, bridge, large vehicle, small vehicle, helicopter, roundabout, soccer field, and basketball court. The images in this dataset are collected from Google Earth, Jilin-1 satellites, and the Gaofen-2 satellite, with a spatial resolution of 0.1 to 1 m. Recently, DOTA v2.0 [33] was made publicly available, containing more than 1.7 million objects in 18 categories. 8) DIOR [28]: DIOR is an object detection dataset for optical RSIs. There are 23,463 optical images in this dataset, with a spatial resolution of 0.5 to 30 m. The total number of objects in the dataset is 192,472, and all the objects are labeled with HBBs. The categories of objects are as follows: airplane, airport, baseball field, basketball court, bridge, chimney, dam, expressway service area, expressway toll station, harbor, golf course, ground track field, overpass, ship, stadium, storage tank, tennis court, train station, vehicle, and windmill. 9) FAIR1M [34]: FAIR1M is a more challenging dataset for fine-grained object detection in RSIs, including five categories and 37 subcategories. There are more than 40,000 images and more than 1 million objects annotated by OBBs. The images are acquired from multiple platforms with a resolution of 0.3 to 0.8 m and are spread across different countries and regions. The finegrained categories, massive numbers of objects, large ranges of sizes and orientations, and diverse scenes make FAIR1M more challenging. 10) SODA-A [284]: SODA-A is a recently released dataset designed for tiny object detection in RSIs. This dataset consists of 2,510 images, with an average image size of 4, 761 # 2, 777, and 800,203 objects with OBB annotation. All objects are divided into four subsets (i.e., extremely small, relatively small, generally small, and normal) based on their area ranges. There are nine categories in this dataset, including airplane, helicopter, small vehicle, large vehicle, ship, container, storage tank, swimming pool, and windmill. The preceding review shows that the early published datasets generally have limited samples. For example, NWPU VHR-10 [18] contains only 10 categories and 3,651 instances, and UCAS-AOD [274] consists of two categories with 6,029 instances. In recent years, researchers have not only introduced massive amounts of data and fine-grained objects but also collected data from multiple sensors, various resolutions, and diverse scenes (e.g., DOTA [117], DIOR [28], and FAIR1M [34]) to satisfy practical applications in RSOD. Figure 18 exhibits typical samples of different RSOD datasets. We also provide dataset selection guidelines in Table 4 to help researchers select proper datasets and methods for different challenges and scenarios. Notably, only the image-level annotations of the datasets are available for the weak supervision scenario. As for the few-shot supervision scenario, there are only K-shot box-level annotated samples for each novel class, where K is set to " 3, 5, 10, 20, 30 , . DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 25 EVALUATION METRICS In addition to datasets, evaluation metrics are equally important. Generally, the inference speed and the detection accuracy are the two commonly adopted metrics for evaluating the performance of detectors. Frames per second (FPS) is a standard metric for inference speed evaluation that indicates the number of images that the detector can detect per second. Notably, both the image size and hardware devices can influence the inference speed. Average precision (AP) is the most commonly used metric for detection accuracy. Given a test image I, let "^b i, c i, p ih,iN= 1 denote the prediction detections, where b i is the predicted box, c i is the predicted label, and p i is the gt gt M confidence score. Let #_ b j , c j i-j = 1 refer to the ground truth gt annotations on the test image I, where b j is the ground gt truth box and c j is the ground truth category. A prediction detection ^b i, c i, p ih is assigned as a true positive (TP) gt gt for ground truth annotation _ b j , c j i if it meets both of the following criteria: (a) (b) (c) (d) (e) (f) (g) (h) FIGURE 18. A visualization of different RSOD datasets. Diverse resolutions, massive instances, multisensor images, and fine-grained categories are typical characteristics of RSOD datasets. (a) NWPU VHR-10. (b) DIOR. (c) SODA-A. (d) xView. (e) DOTA. (f) SSDD. (g) FAIR1M. (h) VEDAI. 26 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. ◗ The confidence score p i is greater than the confidence threshold t, and the predicted label is same as the gt ground truth label c j . ◗ The IOU between the predicted box b i and the ground gt truth box b j is larger than the IOU threshold f. The IOU is calculated as follows: IOU ^b, b g h = area ^b + b g h area ^b , b g h gt (2) gt where area (b i + b j ) and area (b i , b j ) stand for the intersection and union area of the predicted box and ground truth box. Otherwise, the prediction detection is considered a false positive (FP). It is worth noting that multiple prediction detections may match the same ground truth annotation according to the preceding criteria, but only the prediction detection with the highest confidence score is assigned as a TP, and the rest are FPs [287]. Based on TP and FP detections, and considering false negatives (FNs), the precision (P) and recall (R) can be computed as TP P = TP + FP (3) TP R = TP + FN . (4) The precision measures the fraction of TPs of the prediction detections, and the recall measures the fraction of positives that are correctly detected. However, the preceding two evaluation metrics reflect only a single aspect of detection performance. Taking into account both precision and recall, AP provides a comprehensive evaluation of detection performance and is calculated individually for each class. For a given class, the precision-recall curve (PRC) is drawn according to the detection of the maximum precision at each recall, and the AP summarizes the shape of the PRC [287]. For multiclass object detection, the mean of the AP values for all classes, termed mAP, is adopted to evaluate the overall detection accuracy. Early studies mainly employed a fixed IOU-based AP metric (i.e., AP50) [18], [28], [117], where the IOU threshold f is given as 0.5. This low IOU threshold exhibits a high tolerance for bounding box deviations and fails to satisfy the high localization accuracy requirements. Later, some works [130], [131], [284] introduced a novel evaluation metric, AP50: 95, which averages the AP over 10 IOU thresholds from 0.5 to 0.95, with an interval of 0.05. The AP50: 95 considers higher IOU thresholds and encourages more accurate localization. As the cornerstone of evaluation metrics in RSOD, AP has various extensions for different specific tasks. In the fewshot learning scenario, APnovel and APbase are two critical metrics to evaluate the performance of few-shot detectors, where APnovel and APbase represent the detection performance on the novel class and base class, respectively. An excellent few-shot detector should achieve satisfactory performance in the novel class and avoid performance degradation in the base class [269]. In the incremental detection of remote sensing objects, APold and APinc are employed to evaluate the performance of the old and incremental classes on different incremental tasks. In addition, the harmonic mean (HM) is also a vital evaluation metric for incremental object detection [288], providing a comprehensive performance evaluation of both old and incremental classes, as described by 2AP AP HM = AP old+ APinc . old inc (5) APPLICATIONS Deep learning techniques have injected significant innovations into RSOD, leading to an effective way to automatically identify objects of interest from voluminous RSIs. Therefore, RSOD methods have been applied in a rich diversity of practice scenarios that significantly support the implementation of sustainable development goals (SDGs) and the improvement of society[289], [290], [291], as described in Figure 19. DISASTER MANAGEMENT Natural disasters pose a serious threat to the safety of human life and property. A quick and precise understanding of disaster impacts and extents of damage is critical to disaster management. RSOD methods can accurately identify ground objects from a bird’s-eye view of a disaster-affected area, providing a novel potential for disaster management [292], [293], [294], [295], [296]. Guan et al. [293] proposed a novel instance segmentation model to accurately detect fire in a complex environment, which can be applied to forest fire disaster response. Ma et al. [295] designed a realtime detection method for collapsed building assessment following earthquakes. TABLE 4. DATASET SELECTION GUIDELINES IN RSOD FOR DIFFERENT CHALLENGES AND SCENARIOS. SCENARIO DATASETS METHODS Multiscale objects DOTA, DIOR, and FAIR1M HyNet [51] and FFA [80] Rotated objects DOTA and HRSC KLD [131] and ReDet [155] Weak objects DOTA, DIOR, and FAIR1M RECNN [172] and CADNet [196] Tiny objects SODA-A and AI-TOD NWD [10] and FSANet [216] Weak supervision NWPU VHR-10 and DIOR RINet [240] and MOL [241] Few-shot supervision NWPU VHR-10 and DIOR P-CNN [254] and G-FSDet [269] Fine-grained objects DOSR and FAIR1M RBFPN [94] and EIRNet [282] SAR image objects SSDD and AIRSARShip SSPNet [163] and HyperLiNet [285] Specific objects HRSC and MAR20 GRS-Det [135] and COLOR [245] DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 27 PRECISION AGRICULTURE With our unprecedented and still expanding population, ensuring agricultural production is a fundamental obstacle to feeding growing numbers of people. RSOD has the ability to monitor crop growth and estimate food production, promoting further progress for precision agriculture [297], [298], [299], [300], [301], [302]. Pang et al. [298] used RSIs for early season maize detection and achieved an accurate estimation of emergence rates. Chen et al. [302] designed an automatic strawberry flower detection system to monitor the growth cycle of strawberry fields. SUSTAINABLE CITIES AND COMMUNITIES Half of the global population now lives in cities, and this population will keep growing in the coming decades. Sustainable cities and communities are the goals of modern city development, in which RSOD can make a significant impact. For instance, building and vehicle detection [303], [304], [305], [306] can help estimate population density distributions and transport traffic statistics, providing suggestions for city development planning. Infrastructure distribution detection [307] can assist in disaster assessment and early warnings in city environments. CLIMATE ACTION The ongoing climate change forces humans to face the daunting challenge of the climate crisis. Some researchers [308], [309], [310] employed object detection methods for automatically mapping tundra ice wedge polygons to document and analyze the effects of climate warming on the Arctic region. Besides, RSOD can produce statistics on the number and spatial distribution of solar panels and wind turbines [311], [312], [313], [314], facilitating the mitigation of greenhouse gas emissions. OCEAN CONSERVATION The oceans cover nearly three-quarters of Earth’s surface, and more than 3 billion people depend on the diverse life of the oceans and coasts. The ocean is gradually deteriorating due to pollution, and RSOD can provide powerful support for ocean conservation [315]. Several works applied detection methods for litter detection along shores [316], floating plastic detection at sea [317], deep-sea debris detection [318], and so on. Another important application is ship detection [135], [136], which can help monitor illegal fishing activities. WILDLIFE SURVEILLANCE A global loss of biodiversity is observed at all levels, and object detection in combination with RSIs provides a novel perspective for wildlife conservation [319], [320], [321], [322], [323]. Delplanque et al. [322] adopted a deep learning-based detector for multiple-species detection and the identification of African mammals. Kellenberger et al. [323] designed a weakly supervised wildlife detection (a) (b) (c) (d) (e) (f) (g) (h) Number: 185 Area: 145,830.7 m2 FIGURE 19. The widespread applications of RSOD make substantial contributions to implementing SDGs and improving society. (a) Collapsed building detection following earthquakes for disaster assessment. (b) Corn plant detection for precision agriculture. (c) and (d) Building and vehicle detection for sustainable cities and communities. (e) Solar photovoltaic detection for climate change mitigation. (f) Litter detection along the shore for ocean conservation. (g) African mammal detection for wildlife surveillance. (h) Single-tree detection for forest ecosystem protection. 28 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. framework that requires only image-level labels to identify wildlife. FOREST ECOSYSTEM PROTECTION The forest ecosystem plays an important role in ecological protection, climate regulation, and carbon cycling. Understanding the condition of trees is essential for forest ecosystem protection [324], [325], [326], [327], [328]. Safonova et al. [326] analyzed the shape, texture, and color of detected trees’ crowns to determine their damage stage, providing a more efficient way to assess forest health. Sani-Mohammed et al. [328] utilized an instance segmentation approach to map standing dead trees, which is imperative for forest ecosystem management and protection. FUTURE DIRECTIONS Apart from the five RSOD research topics mentioned in this survey, there is still much work to be done in this field. Therefore, we present a forward-looking discussion of future directions to further improve and enhance the detectors in remote sensing scenes. UNIFIED DETECTION FRAMEWORK FOR LARGE-SCALE REMOTE SENSING IMAGES Benefiting from the development of remote sensing technology, high-resolution large-scale RSIs (e.g., more than 10, 000 # 10, 000 pixels) can be easily obtained. However, limited by GPU memory, the current mainstream RSOD methods fail to directly perform object detection in largescale RSIs but adopt a sliding window strategy, mainly including sliding window cropping, patch prediction, and results merging. On the one hand, this sliding window framework requires complex data preprocessing and postprocessing, compared with the unified detection framework. On the other hand, objects usually occupy a small area of RSIs, and the invalid calculation of massive backgrounds leads to increasing computation time and memory consumption. Some studies [215], [329], [330] proposed a coarse-to-fine detection framework for object detection in large-scale RSIs. This framework first locates ROIs by filtering out meaningless regions and then achieves accurate detection from these filtered regions. DETECTION WITH MULTIMODAL REMOTE SENSING IMAGES Restricted by the sensor imaging mechanism, detectors based on single-modal RSIs often have detection performance deviations, which are difficult to meet in practical applications [331]. In contrast, multimodal RSIs from different sensors have their own characteristics. For instance, hyperspectral images contain high-spectral-resolution and fine-grained spectral features, SAR images provide abundant texture information, and optical images exhibit high spatial resolution with rich detailed information. The integrated processing of multimodal RSIs can improve the interpretation of scenes and obtain a more objective and comprehensive understanding of geospatial objects [332], [333], [334], providing the possibility to further improve the detection performance of RSOD. DOMAIN ADAPTATION OBJECT DETECTION IN REMOTE SENSING IMAGES Due to the diversity of remote sensing satellite sensors, resolutions, and bands, as well as the influence of weather conditions, seasons, and geospatial regions [6], RSIs collected from different satellites are generally drawn from similar but not identical distributions. Such distribution differences (also called the domain gap) severely restrict the generalization performance of the detector. Recent studies on domain adaptation object detection [335], [336], [337], [338] have proposed to tackle the domain gap problem. However, these studies focus only on domain adaptation detectors in single-modal RSIs, while cross-modal domain adaptation object detection (e.g., from optical images to SAR images [339], [340]) is a more challenging and worthwhile topic to investigate. INCREMENTAL DETECTION OF REMOTE SENSING OBJECTS The real-world environment is dynamic and open, where the number of categories has evolved over time. However, mainstream detectors require both old and new data to retrain the model when meeting new categories, resulting in high computational costs. Recently, incremental learning has been considered the most promising way to solve this problem, as it can learn new knowledge without forgetting old knowledge while using only new data [341]. Incremental learning has been preliminarily explored in the remote sensing community [342], [343], [344], [345]. For example, Chen et al. [342] integrated knowledge distillation into FPN and detection heads to learn new concepts while maintaining old ones. More thorough research is still needed in incremental RSOD to meet the dynamic learning task in practical applications. SELF-SUPERVISED PRETRAINED MODELS FOR REMOTE SENSING SCENES Current RSOD methods are always initialized with ImageNet [346] pretrained weights. However, there is an inevitable domain gap between natural and remote sensing scenes, probably limiting the performance of RSOD. Recently, self-supervised pretraining approaches have received extensive attention and shown excellent performance in classification and downstream tasks in nature scenes. Benefiting from rapid advances in remote sensing technolog y, abundant remote sensing data [347], [348] also provide sufficient data support for selfsupervised pretraining. Some researchers [349], [350], [351], [352], [353] have initially demonstrated the effectiveness of remote sensing pretraining on representative downstream tasks. Therefore, exploring self-supervised DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 29 pretraining models based on multisource remote sensing data deserves further research. COMPACT AND EFFICIENT OBJECT DETECTION ARCHITECTURES Most existing airborne and satellite-borne satellites require sending back remote sensing data for interpretation, leading to additional resource overheads. Thus, it is essential to investigate compact and efficient detectors for airborne and satellite-borne platforms to reduce resource consumption in data transmission. Drawing on this demand, some researchers have proposed lightweight detectors through model design [285], [354], [355], network pruning [356], [357], and knowledge distillation [358], [359], [360]. However, these detectors still rely heavily on high-performance GPUs and cannot be deployed on airborne and satelliteborne satellites. Therefore, designing compact and efficient object detection architectures for limited resource scenarios remains challenging. CONCLUSION Object detection has been a fundamental but challenging research topic in the remote sensing community. Thanks to the rapid development of deep learning techniques, RSOD has received considerable attention and made remarkable achievements in the past decade. In this article, we presented a systematic review and summarization of existing deep learning-based methods in RSOD. First, we summarized the five main challenges in RSOD according to the characteristics of geospatial objects and categorized the methods into five streams: multiscale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision. Then, we adopted a systematic hierarchical division to review and summarize the methods in each category. Next, we introduced typical benchmark datasets, evaluation metrics, and practical applications in the RSOD field. Finally, considering the limitations of existing RSOD methods, we discussed some promising directions for further research. Given this time of high-speed technical evolution in RSOD, we believe this survey can help researchers to achieve a more comprehensive understanding of the main topics in this field and to find potential directions for future research. ACKNOWLEDGEMENT This work was supported, in part, by the National Natural Science Foundation of China, under Grants 62276197, 62006178, and 62171332, and the Key Research and Development Program of Shaanxi Province, under Grant 2019ZDLGY03-08. Xiangrong Zhang is the corresponding author. AUTHOR INFORMATION Xiangrong Zhang (xrzhang@mail.xidian.edu.cn) received her B.S. and M.S. degrees in computer application technology 30 from the School of Computer Science, Xidian University, Xi’an, China, in 1999 and 2003, respectively, and her Ph.D. degree in pattern recognition and intelligent system from the School of Electronic Engineering, Xidian University, in 2006. From January 2015 to March 2016, she was a Visiting Scientist with the Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA. Currently, she is a professor with the Key Laboratory of Intelligent Perception and Image Understanding of the Ministry of Education, Xidian University, Xi’an 710071, China. Her research interests include pattern recognition, machine learning, and remote sensing image analysis and understanding. She is a Senior Member of IEEE. Tianyang Zhang (tianyangzhang@stu.xidian.edu.cn) received his B.S. degree in intelligent science and technology from Xidian University, Xian, China in 2018. He is currently pursuing his Ph.D. degree from the Key Laboratory of Intelligence Perception and Image Understanding of the Ministry of Education, Xidian University, Xi’an 710071, China. His current research interests include remote sensing object detection and remote sensing image analysis. Guanchun Wang (guanchunwang1206@163.com) received his B.S. degree in intelligent science and technology from Xidian University, Xian, China in 2019. He is currently pursuing his Ph.D. degree from the Key Laboratory of Intelligence Perception and Image Understanding of the Ministry of Education, Xidian University, Xi’an 710071, China. His current research interests include object detection and remote sensing image analysis. Peng Zhu (zhupeng@stu.xidian.edu.cn) received his B.S. degree in intelligent science and technology from Xidian University, Xian, China in 2017. He is currently pursuing his Ph.D. degree from the Key Laboratory of Intelligence Perception and Image Understanding of the Ministry of Education, Xidian University, Xi’an 710071, China. His current research interests include computer vision and remote sensing images analysis. Xu Tang (tangxu128@gmail.com) received his B.Sc., M.Sc., and Ph.D. degrees in electronic circuits and systems from Xidian University, Xi’an, China, in 2007, 2010, and 2017, respectively. From 2015 to 2016, he was a Joint Ph.D. degree along with Prof. W. J. Emery at the University of Colorado at Boulder, Boulder, CO, USA. He is currently a professor with the Key Laboratory of Intelligent Perception and Image Understanding of the Ministry of Education, Xidian University, Xi’an 710071, China. He is also a Hong Kong Scholar with the Hong Kong Baptist University, Hong Kong. His research interests include remote sensing image content-based retrieval and reranking, hyperspectral image processing, remote sensing scene classification, and object detection. He is a Senior Member of IEEE. Xiuping Jia (xp.jia@ieee.org) received his B.Eng. degree from the Beijing University of Posts and Telecommunications, Beijing, China, in January 1982, and his Ph.D. degree in electrical engineering (via the part-time study) from The IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. University of New South Wales, Canberra, ACT, Australia, in 1996. She has a lifelong academic career in higher education for which she has continued passion. She is currently an associate professor with the School of Engineering and Information Technology, The University of New South Wales, Canberra ACT 2612, Australia. Her research interests include remote sensing, hyperspectral image processing, and spatial data analysis. She has published widely addressing various topics, including data correction, feature reduction, and image classification using machine-learning techniques. She has coauthored the remote sensing textbook Remote Sensing Digital Image Analysis [Springer-Verlag, 3rd edition (1999) and 4th edition (2006)]. She is the author of Field Guide to Hyperspectral/Multispectral Image Processing (SPIE, 2022). These publications are highly cited in the remote sensing and image processing communities with an H-index of 54 and an i-10-index of 189 (Google Scholar). She received the graduate certificate in higher education from the University of New South Wales in 2005. She is the Editor-in-Chief of IEEE Transactions on Geoscience and Remote Sensing. She is a Fellow of IEEE. Licheng Jiao (lchjiao@mail.xidian.edu.cn) received his B.S. degree from Shanghai Jiaotong University, Shanghai, China, in 1982 and his M.S. and Ph.D. degrees from Xi’an Jiaotong University, Xi’an, China, in 1984 and 1990, respectively. Since 1992, he has been a distinguished professor with the School of Electronic Engineering, Xidian University, Xi’an 710071 China, where he is currently the director of the Key Laboratory of Intelligent Perception and Image Understanding of the Ministry of Education of China. He has been a foreign member of the Academia Europaea and the Russian Academy of Natural Sciences. His research interests include machine learning, deep learning, natural computation, remote sensing, image processing, and intelligent information processing. He is the chairman of the Awards and Recognition Committee; the vice board chairperson of the Chinese Association of Artificial Intelligence; a councilor of the Chinese Institute of Electronics; a committee member of the Chinese Committee of Neural Networks; and an expert of the Academic Degrees Committee of the State Council. He is a Fellow of IEEE and the Institution of Engineering and Technology; Chinese Association for Artificial Intelligence; Chinese Institute of Electronics; China Computer Federation; and Chinese Association of Automation. [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] REFERENCES [1] [2] [3] N. Gorelick, M. Hancher, M. Dixon, S. Ilyushchenko, D. Thau, and R. Moore, “Google earth engine: Planetary-scale geospatial analysis for everyone,” Remote Sens. Environ., vol. 202, pp. 18–27, Dec. 2017, doi: 10.1016/j.rse.2017.06.031. D. Lam et al., “xView: Objects in conte x t in overhead imager y,” 2018. [Online]. Available: http://arxiv.org/abs/ 1802.07856 Z. Li, H. Shen, H. Li, G. Xia, P. Gamba, and L. Zhang, “Multi-feature combined cloud and cloud shadow detection in GaoFen-1 [15] [16] wide field of view imagery,” Remote Sens. Environ., vol. 191, pp. 342–358, Mar. 2017, doi: 10.1016/j.rse.2017.01.026. S. Zhang, R. Wu, K. Xu, J. Wang, and W. Sun, “R-CNN-based ship detection from high resolution remote sensing imagery,” Remote Sens., vol. 11, no. 6, Mar. 2019, Art. no. 631, doi: 10.3390/ rs11060631. Y. Wang, C. Wang, H. Zhang, Y. Dong, and S. Wei, “Automatic ship detection based on retinaNet using multi-resolution GaoFen-3 imagery,” Remote Sens., vol. 11, no. 5, Mar. 2019, Art. no. 531, doi: 10.3390/rs11050531. X. X. Zhu et al., “Deep learning in remote sensing: A comprehensive review and list of resources,” IEEE Geosci. Remote Sens. Mag., vol. 5, no. 4, pp. 8–36, Dec. 2017, doi: 10.1109/ MGRS.2017.2762307. L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data: A technical tutorial on the state of the art,” IEEE Geosci. Remote Sens. Mag., vol. 4, no. 2, pp. 22–40, Jun. 2016, doi: 10.1109/MGRS.2016.2540798. L. Zhang and L. Zhang, “Artificial intelligence for remote sensing data analysis: A review of challenges and opportunities,” IEEE Geosci. Remote Sens. Mag., vol. 10, no. 2, pp. 270–294, Jun. 2022, doi: 10.1109/MGRS.2022.3145854. W. Han et al., “Methods for small, weak object detection in optical high-resolution remote sensing images: A survey of advances and challenges,” IEEE Geosci. Remote Sens. Mag., vol. 9, no. 4, pp. 8–34, Dec. 2021, doi: 10.1109/ MGRS.2020.3041450. C. Xu, J. Wang, W. Yang, H. Yu, L. Yu, and G.-S. Xia, “Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark,” ISPRS J. Photogrammetry Remote Sens., vol. 190, pp. 79–93, Aug. 2022, doi: 10.1016/ j.isprsjprs.2022.06.002. J. Yue et al., “Optical remote sensing image understanding with weak supervision: Concepts, methods, and perspectives,” IEEE Geosci. Remote Sens. Mag., vol. 10, no. 2, pp. 250–269, Jun. 2022, doi: 10.1109/MGRS.2022.3161377. C. Xu and H. Duan, “Artificial bee colony (ABC) optimized edge potential function (EPF) approach to target recognition for low-altitude aircraft,” Pattern Recognit. Lett., vol. 31, no. 13, pp. 1759–1772, Oct. 2010, doi: 10.1016/j.patrec.2009.11.018. X. Sun, H. Wang, and K. Fu, “Automatic detection of geospatial objects using taxonomic semantics,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 1, pp. 23–27, Feb. 2010, doi: 10.1109/ LGRS.2009.2027139. Y. Lin, H. He, Z. Yin, and F. Chen, “Rotation-invariant object detection in remote sensing images based on radial-gradient angle,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 4, pp. 746–750, Apr. 2015, doi: 10.1109/LGRS.2014.2360887. H. Moon, R. Chellappa, and A. Rosenfeld, “Performance analysis of a simple vehicle detection algorithm,” Image Vision Comput., vol. 20, no. 1, pp. 1–13, Jan. 2002, doi: 10.1016/ S0262-8856(01)00059-2. S. Leninisha and K. Vani, “Water flow based geometric active deformable model for road network,” ISPRS J. Photogrammetry Remote Sens., vol. 102, pp. 140–147, Apr. 2015, doi: 10.1016/ j.isprsjprs.2015.01.013. DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 31 [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] 32 D. Chaudhuri and A. Samal, “An automatic bridge detection technique for multispectral images,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 9, pp. 2720–2727, Sep. 2008, doi: 10.1109/ TGRS.2008.923631. G. Cheng, J. Han, P. Zhou, and L. Guo, “Multi-class geospatial object detection and geographic image classification based on collection of part detectors,” ISPRS J. Photogrammetry Remote Sens., vol. 98, pp. 119–132, Dec. 2014, doi: 10.1016/j.isprsjprs.2014.10.002. L. Zhang, L. Zhang, D. Tao, and X. Huang, “Sparse transfer manifold embedding for hyperspectral target detection,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 2, pp. 1030–1043, Feb. 2014, doi: 10.1109/TGRS.2013.2246837. J. Han et al., “Efficient, simultaneous detection of multiclass geospatial targets based on visual saliency modeling and discriminative learning of sparse coding,” ISPRS J. Photogrammetry Remote Sens., vol. 89, pp. 37–48, Mar. 2014, doi: 10.1016/j.isprsjprs.2013.12.011. H. Sun, X. Sun, H. Wang, Y. Li, and X. Li, “Automatic target detection in high-resolution remote sensing images using spatial sparse coding bag-of-words model,” IEEE Geosci. Remote Sens. Lett., vol. 9, no. 1, pp. 109–113, Jan. 2012, doi: 10.1109/ LGRS.2011.2161569. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015, doi: 10.1038/ nature14539. S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2015, pp. 91–99. J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2017, pp. 6517–6525, doi: 10.1109/CVPR.2017.690. T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV), 2017, pp. 2999–3007, doi: 10.1109/ ICCV.2017.324. Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional one-stage object detection,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV), 2019, pp. 9626–9635, doi: 10.1109/ ICCV.2019.00972. L. Liu et al., “Deep learning for generic object detection: A survey,” Int. J. Comput. Vision, vol. 128, no. 2, pp. 261–318, Feb. 2020, doi: 10.1007/s11263-019-01247-4. K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, “Object detection in optical remote sensing images: A survey and a new benchmark,” ISPRS J. Photogrammetry Remote Sens., vol. 159, pp. 296–307, Jan. 2020, doi: 10.1016/j.isprsjprs.2019.11.023. G. Cheng and J. Han, “A survey on object detection in optical remote sensing images,” ISPRS J. Photogrammetry Remote Sens., vol. 117, pp. 11–28, Jul. 2016, doi: 10.1016/j.isprsjprs. 2016.03.014. U. Alganci, M. Soydas, and E. Sertel, “Comparative research on deep learning approaches for airplane detection from very high-resolution satellite images,” Remote Sens., vol. 12, no. 3, Feb. 2020, Art. no. 458, doi: 10.3390/rs12030458. [31] Z. Li et al., “Deep learning-based object detection techniques for remote sensing images: A survey,” Remote Sens., vol. 14, no. 10, May 2022, Art. no. 2385, doi: 10.3390/rs14102385. [32] J. Kang, S. Tariq, H. Oh, and S. S. Woo, “A survey of deep learning-based object detection methods and datasets for overhead imagery,” IEEE Access, vol. 10, pp. 20,118–20,134, Feb. 2022, doi: 10.1109/ACCESS.2022.3149052. [33] J. Ding et al., “Object detection in aerial images: A large-scale benchmark and challenges,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 7778–7796, Nov. 2022, doi: 10.1109/ TPAMI.2021.3117983. [34] X. Sun et al., “FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 184, pp. 116–130, Feb. 2022, doi: 10.1016/j.isprsjprs.2021.12.004. [35] W. Zhao, W. Ma, L. Jiao, P. Chen, S. Yang, and B. Hou, “Multiscale image block-level F-CNN for remote sensing images object detection,” IEEE Access, vol. 7, pp. 43,607–43,621, Mar. 2019, doi: 10.1109/ACCESS.2019.2908016. [36] S. M. Azimi, E. Vig, R. Bahmanyar, M. Körner, and P. Reinartz, “Towards multi-class object detection in unconstrained remote sensing imagery,” in Proc. Asian Conf. Comput. Vision, 2018, vol. 11363, pp. 150–165, doi: 10.1007/978-3-03020893-6_10. [37] P. Shamsolmoali, M. Zareapoor, J. Chanussot, H. Zhou, and J. Yang, “Rotation equivariant feature image pyramid network for object detection in optical remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3112481. [38] Y. Chen et al., “Stitcher: Feedback-driven data provider for object detection,” 2020. [Online]. Available: https://arxiv.org/ abs/2004.12432v1 [39] X. Xu, X. Zhang, and T. Zhang, “Lite-YOLOv5: A lightweight deep learning detector for on-board ship detection in largescene sentinel-1 SAR images,” Remote Sens., vol. 14, no. 4, Feb. 2022, Art. no. 1018, doi: 10.3390/rs14041018. [40] N. Su, Z. Huang, Y. Yan, C. Zhao, and S. Zhou, “Detect larger at once: Large-area remote-sensing image arbitrary-oriented ship detection,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, Jan. 2022, doi: 10.1109/LGRS.2022.3144485. [41] B. Zhao, Y. Wu, X. Guan, L. Gao, and B. Zhang, “An improved aggregated-mosaic method for the sparse object detection of remote sensing imagery,” Remote Sens., vol. 13, no. 13, Jul. 2021, Art. no. 2602, doi: 10.3390/rs13132602. [42] X. Han, Y. Zhong, and L. Zhang, “An efficient and robust integrated geospatial object detection framework for high spatial resolution remote sensing imagery,” Remote Sens., vol. 9, no. 7, Jun. 2017, Art. no. 666, doi: 10.3390/rs9070666. [43] Y. Long, Y. Gong, Z. Xiao, and Q. Liu, “Accurate object localization in remote sensing images based on convolutional neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5, pp. 2486–2498, May 2017, doi: 10.1109/TGRS. 2016.2645610. [44] Y. Zhong, X. Han, and L. Zhang, “Multi-class geospatial object detection based on a position-sensitive balancing framework for high spatial resolution remote sensing imagery,” ISPRS J. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. Photogrammetry Remote Sens., vol. 138, pp. 281–294, Apr. 2018, doi: 10.1016/j.isprsjprs.2018.02.014. [45] P. Ding, Y. Zhang, W.-J. Deng, P. Jia, and A. Kuijper, “A light and faster regional convolutional neural network for object detection in optical remote sensing images,” ISPRS J. Photogrammetry Remote Sens., vol. 141, pp. 208–218, Jul. 2018, doi: 10.1016/j.isprsjprs.2018.05.005. [46] W. Liu, L. Ma, and H. Chen, “Arbitrary-oriented ship detection framework in optical remote-sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 6, pp. 937–941, Jun. 2018, doi: 10.1109/LGRS.2018.2813094. [47] W. Liu, L. Ma, J. Wang, and H. Chen, “Detection of multiclass objects in optical remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 5, pp. 791–795, May 2019, doi: 10.1109/ LGRS.2018.2882778. [48] Y. Zhang, Y. Yuan, Y. Feng, and X. Lu, “Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 8, pp. 5535–5548, Aug. 2019, doi: 10.1109/ TGRS.2019.2900302. [49] Z. Lin, K. Ji, X. Leng, and G. Kuang, “Squeeze and excitation rank faster R-CNN for ship detection in SAR images,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 5, pp. 751–755, May 2019, doi: 10.1109/LGRS.2018.2882551. [50] Z. Deng, H. Sun, S. Zhou, J. Zhao, L. Lei, and H. Zou, “Multiscale object detection in remote sensing imagery with convolutional neural networks,” ISPRS J. Photogrammetry Remote Sens., vol. 145, no. Part A, pp. 3–22, Nov. 2018, doi: 10.1016/j. isprsjprs.2018.04.003. [51] Z. Zheng et al., “HyNet: Hyper-scale object detection network framework for multiple spatial resolution remote sensing imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 166, pp. 1–14, Aug. 2020, doi: 10.1016/j.isprsjprs.2020.04.019. [52] Y. Ren, C. Zhu, and S. Xiao, “Deformable faster R-CNN with aggregating multi-layer features for partially occluded object detection in optical remote sensing images,” Remote Sens., vol. 10, no. 9, Sep. 2018, Art. no. 1470, doi: 10.3390/ rs10091470. [53] W. Liu et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vision, Cham, Switzerland: Springer, 2016, pp. 21–37. [54] S. Liu, D. Huang, and Y. Wang, “Receptive field block net for accurate and fast object detection,” in Proc. Eur. Conf. Comput. Vision, 2018, pp. 385–400. [55] Z. Shen, Z. Liu, J. Li, Y.-G. Jiang, Y. Chen, and X. Xue, “DSOD: Learning deeply supervised object detectors from scratch,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV), 2017, pp. 1937– 1945, doi: 10.1109/ICCV.2017.212. [56] Z. Zhang, S. Qiao, C. Xie, W. Shen, B. Wang, and A. L. Yuille, “Single-shot object detection with enriched semantics,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognit. (CVPR), 2018, pp. 5813–5821, doi: 10.1109/CVPR.2018.00609. [57] X. Lu, J. Ji, Z. Xing, and Q. Miao, “Attention and feature fusion SSD for remote sensing object detection,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–9, Jan. 2021, doi: 10.1109/ TIM.2021.3052575. [58] G. Wang et al., “FSoD-Net: Full-scale object detection from optical remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–18, 2022, doi: 10.1109/TGRS.2021.3064599. [59] B. Hou, Z. Ren, W. Zhao, Q. Wu, and L. Jiao, “Object detection in high-resolution panchromatic images using deep models and spatial template matching,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 2, pp. 956–970, Feb. 2020, doi: 10.1109/ TGRS.2019.2942103. [60] X. Liang, J. Zhang, L. Zhuo, Y. Li, and Q. Tian, “Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 6, pp. 1758–1770, Jun. 2020, doi: 10.1109/TCSVT. 2019.2905881. [61] Z. Wang, L. Du, J. Mao, B. Liu, and D. Yang, “SAR target detection based on SSD with data augmentation and transfer learning,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 1, pp. 150–154, Jan. 2019, doi: 10.1109/LGRS.2018.2867242. [62] S. Bao, X. Zhong, R. Zhu, X. Zhang, Z. Li, and M. Li, “Single shot anchor refinement network for oriented object detection in optical remote sensing imagery,” IEEE Access, vol. 7, pp. 87,150–87,161, Jun. 2019, doi: 10.1109/ ACCESS.2019.2924643. [63] T. Xu, X. Sun, W. Diao, L. Zhao, K. Fu, and H. Wang, “ASSD: Feature aligned single-shot detection for multiscale objects in aerial imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–17, 2022, doi: 10.1109/TGRS.2021.3089170. [64] Q. Li, L. Mou, Q. Liu, Y. Wang, and X. X. Zhu, “HSF-Net: Multiscale deep feature embedding for ship detection in optical remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 12, pp. 7147–7161, Dec. 2018, doi: 10.1109/ TGRS.2018.2848901. [65] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2017, pp. 936–944, doi: 10.1109/CVPR.2017.106. [66] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit. (CVPR), Jun. 2018, pp. 8759–8768, doi: 10.1109/CVPR.2018.00913. [67] J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, and D. Lin, “Libra R-CNN: Towards balanced learning for object detection,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit. (CVPR), Jun. 2019, pp. 821–830. [68] M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2020, pp. 10,781–10,790, doi: 10.1109/CVPR42600.2020.01079. [69] L. Hou, K. Lu, and J. Xue, “Refined one-stage oriented object detection method for remote sensing images,” IEEE Trans. Image Process., vol. 31, pp. 1545–1558, Jan. 2022, doi: 10.1109/ TIP.2022.3143690. [70] W. Zhang, L. Jiao, Y. Li, Z. Huang, and H. Wang, “Laplacian feature pyramid network for object detection in VHR optical remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3072488. DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 33 [71] S. Wei et al., “Precise and robust ship detection for high-resolution SAR imagery based on HR-SDNet,” Remote Sens., vol. 12, no. 1, Jan. 2020, Art. no. 167, doi: 10.3390/rs12010167. [72] G. Cheng, M. He, H. Hong, X. Yao, X. Qian, and L. Guo, “Guiding clean features for object detection in remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2021.3104112. [73] J. Jiao et al., “A densely connected end-to-end neural network for multiscale and multiscene SAR ship detection,” IEEE Access, vol. 6, pp. 20,881–20,892, Apr. 2018, doi: 10.1109/ACCESS.2018.2825376. [74] Q. Guo, H. Wang, and F. Xu, “Scattering enhanced attention pyramid network for aircraft detection in SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 9, pp. 7570–7587, Sep. 2021, doi: 10.1109/TGRS.2020.3027762. [75] Y. Li, Q. Huang, X. Pei, L. Jiao, and R. Shang, “RADet: Refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images,” Remote Sens., vol. 12, no. 3, Jan. 2020, Art. no. 389, doi: 10.3390/rs12030389. [76] L. Shi, L. Kuang, X. Xu, B. Pan, and Z. Shi, “CANet: Centerness-aware network for object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–13, 2022, doi: 10.1109/TGRS.2021.3068970. [77] R. Yang, Z. Pan, X. Jia, L. Zhang, and Y. Deng, “A novel CNNbased detector for ship detection based on rotatable bounding box in SAR images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 1938–1958, Jan. 2021, doi: 10.1109/ JSTARS.2021.3049851. [78] Y. Zhao, L. Zhao, B. Xiong, and G. Kuang, “Attention receptive pyramid network for ship detection in SAR images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 13, pp. 2738– 2756, May 2020, doi: 10.1109/JSTARS.2020.2997081. [79] X. Yang, X. Zhang, N. Wang, and X. Gao, “A robust one-stage detector for multiscale ship detection with complex background in massive SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–12, 2022, doi: 10.1109/TGRS.2021.3128060. [80] K. Fu, Z. Chang, Y. Zhang, G. Xu, K. Zhang, and X. Sun, “Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images,” ISPRS J. Photogrammetry Remote Sens., vol. 161, pp. 294–308, Mar. 2020, doi: 10.1016/j.isprsjprs.2020.01.025. [81] W. Huang, G. Li, B. Jin, Q. Chen, J. Yin, and L. Huang, “Scenario context-aware-based bidirectional feature pyramid network for remote sensing target detection,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2021. 3135935. [82] V. Chalavadi, J. Prudviraj, R. Datla, C. S. Babu, and C. K. Mohan, “mSODANet: A network for multi-scale object detection in aerial images using hierarchical dilated convolutions,” Pattern Recognit., vol. 126, Jun. 2022, Art. no. 108548, doi: 10.1016/j.patcog.2022.108548. [83] G. Cheng, Y. Si, H. Hong, X. Yao, and L. Guo, “Cross-scale feature fusion for object detection in optical remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 18, no. 3, pp. 431– 435, Mar. 2021, doi: 10.1109/LGRS.2020.2975541. 34 [84] J. Fu, X. Sun, Z. Wang, and K. Fu, “An anchor-free method based on feature balancing and refinement network for multiscale ship detection in SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 2, pp. 1331–1344, Feb. 2021, doi: 10.1109/ TGRS.2020.3005151. [85] Y. Liu, Q. Li, Y. Yuan, Q. Du, and Q. Wang, “ABNet: Adaptive balanced network for multiscale object detection in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3133956. [86] H. Guo, X. Yang, N. Wang, B. Song, and X. Gao, “A rotational libra R-CNN method for ship detection,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 8, pp. 5772–5781, Aug. 2020, doi: 10.1109/TGRS.2020.2969979. [87] T. Zhang, Y. Zhuang, G. Wang, S. Dong, H. Chen, and L. Li, “Multiscale semantic fusion-guided fractal convolutional object detection network for optical remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–20, 2022, doi: 10.1109/TGRS.2021.3108476. [88] Y. Zheng, P. Sun, Z. Zhou, W. Xu, and Q. Ren, “ADT-Det: Adaptive dynamic refined single-stage transformer detector for arbitrary-oriented object detection in satellite optical imagery,” Remote Sens., vol. 13, no. 13, Jul. 2021, Art. no. 2623, doi: 10.3390/rs13132623. [89] Z. Wei et al., “Learning calibrated-guidance for object detection in aerial images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 2721–2733, Mar. 2022, doi: 10.1109/ JSTARS.2022.3158903. [90] L. Chen, C. Liu, F. Chang, S. Li, and Z. Nie, “Adaptive multilevel feature fusion and attention-based network for arbitraryoriented object detection in remote sensing imagery,” Neurocomputing, vol. 451, no. 8, pp. 67–80, Apr. 2021, doi: 10.1016/j. neucom.2021.04.011. [91] X. Sun, P. Wang, C. Wang, Y. Liu, and K. Fu, “PBNET: Partbased convolutional neural network for complex composite object detection in remote sensing imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 173, pp. 50–65, Mar. 2021, doi: 10.1016/j.isprsjprs.2020.12.015. [92] T. Zhang et al., “Balance learning for ship detection from synthetic aperture radar remote sensing imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 182, pp. 190–207, Dec. 2021, doi: 10.1016/j.isprsjprs.2021.10.010. [93] T. Zhang, X. Zhang, and X. Ke, “Quad-FPN: A novel quad feature pyramid network for SAR ship detection,” Remote Sens., vol. 13, no. 14, Jul. 2021, Art. no. 2771, doi: 10.3390/rs13142771. [94] J. Song, L. Miao, Q. Ming, Z. Zhou, and Y. Dong, “Finegrained object detection in remote sensing images via adaptive label assignment and refined-balanced feature pyramid network,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 16, pp. 71–82, 2023, doi: 10.1109/JSTARS.2022. 3224558. [95] W. Guo, W. Yang, H. Zhang, and G. Hua, “Geospatial object detection in high resolution satellite images based on multiscale convolutional neural network,” Remote Sens., vol. 10, no. 1, Jan. 2018, Art. no. 131, doi: 10.3390/rs10010131. [96] S. Zhang, G. He, H. Chen, N. Jing, and Q. Wang, “Scale adaptive proposal network for object detection in remote sensing IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] images,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 6, pp. 864– 868, Jun. 2019, doi: 10.1109/LGRS.2018.2888887. C. Li, C. Xu, Z. Cui, D. Wang, T. Zhang, and J. Yang, “Featureattentioned object detection in remote sensing imagery,” in Proc. IEEE Int. Conf. Image Process. Conf. (ICIP), 2019, pp. 3886– 3890, doi: 10.1109/ICIP.2019.8803521. Z. Dong, M. Wang, Y. Wang, Y. Zhu, and Z. Zhang, “Object detection in high resolution remote sensing imagery based on convolutional neural networks with suitable object scale features,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 3, pp. 2104–2114, Mar. 2020, doi: 10.1109/TGRS.2019.2953119. H. Qiu, H. Li, Q. Wu, F. Meng, K. N. Ngan, and H. Shi, “A2RMNET: Adaptively aspect ratio multi-scale network for object detection in remote sensing images,” Remote Sens., vol. 11, no. 13, Jul. 2019, Art. no. 1594, doi: 10.3390/rs11131594. J. Hou, X. Zhu, and X. Yin, “Self-adaptive aspect ratio anchor for oriented object detection in remote sensing images,” Remote Sens., vol. 13, no. 7, Mar. 2021, Art. no. 1318, doi: 10.3390/rs13071318. N. Mo, L. Yan, R. Zhu, and H. Xie, “Class-specific anchor based and context-guided multi-class object detection in high resolution remote sensing imagery with a convolutional neural network,” Remote Sens., vol. 11, no. 3, Jan. 2019, Art. no. 272, doi: 10.3390/rs11030272. Z. Tian, R. Zhan, J. Hu, W. Wang, Z. He, and Z. Zhuang, “Generating anchor boxes based on attention mechanism for object detection in remote sensing images,” Remote Sens., vol. 12, no. 15, Jul. 2020, Art. no. 2416, doi: 10.3390/rs12152416. Z. Teng, Y. Duan, Y. Liu, B. Zhang, and J. Fan, “Global to local: Clip-LSTM-based object detection from remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–13, 2022, doi: 10.1109/TGRS.2021.3064840. Y. Yu, H. Guan, D. Li, T. Gu, E. Tang, and A. Li, “Orientation guided anchoring for geospatial object detection from remote sensing imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 160, pp. 67–82, Feb. 2020, doi: 10.1016/j.isprsjprs.2019.12.001. J. Wang, K. Chen, S. Yang, C. C. Loy, and D. Lin, “Region proposal by guided anchoring,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2019, pp. 2960–2969, doi: 10.1109/ CVPR.2019.00308. X. Yang and J. Yan, “On the arbitrary-oriented object detection: Classification based approaches revisited,” Int. J. Comput. Vision, vol. 130, no. 5, pp. 1340–1365, Mar. 2022, doi: 10.1007/s11263-022-01593-w. X. Yang et al., “SCRDet: Towards more robust detection for small, cluttered and rotated objects,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV), 2019, pp. 8231–8240, doi: 10.1109/ ICCV.2019.00832. X. Yang, J. Yan, Z. Feng, and T. He, “R3Det: Refined singlestage detector with feature refinement for rotating object,” in Pro. AAAI Conf. Artif. Intell., 2021, vol. 35, no. 4, pp. 3163–3171, doi: 10.1609/aaai.v35i4.16426. X. Yang et al., “Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks,” Remote Sens., vol. 10, no. 1, Jan. 2018, Art. no. 132, doi: 10.3390/rs10010132. [110] X. Yang, H. Sun, X. Sun, M. Yan, Z. Guo, and K. Fu, “Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network,” IEEE Access, vol. 6, pp. 50,839–50,849, Sep. 2018, doi: 10.1109/ ACCESS.2018.2869884. [111] Q. Ming, L. Miao, Z. Zhou, and Y. Dong, “CFC-Net: A critical feature capturing network for arbitrary-oriented object detection in remote-sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3095186. [112] Q. Ming, Z. Zhou, L. Miao, H. Zhang, and L. Li, “Dynamic anchor learning for arbitrary-oriented object detection,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 2355–2363, doi: 10.1609/ aaai.v35i3.16336. [113] Y. Zhu, J. Du, and X. Wu, “Adaptive period embedding for representing oriented objects in aerial images,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 10, pp. 7247–7257, Oct. 2020, doi: 10.1109/TGRS.2020.2981203. [114] J. Ding, N. Xue, Y. Long, G.-S. Xia, and Q. Lu, “Learning RoI transformer for oriented object detection in aerial images,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2019, pp. 2844–2853, doi: 10.1109/CVPR.2019.00296. [115] Q. An, Z. Pan, L. Liu, and H. You, “DRBox-v2: An improved detector with rotatable boxes for target detection in SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 11, pp. 8333–8349, Nov. 2019, doi: 10.1109/TGRS.2019.2920534. [116] Q. Li, L. Mou, Q. Xu, Y. Zhang, and X. X. Zhu, “R 3-Net: A deep network for multi-oriented vehicle detection in aerial images and videos,” 2018. [Online]. Available: http://arxiv.org/ abs/1808.05560 [117] G. Xia et al., “DOTA: A large-scale dataset for object detection in aerial images,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit. (CVPR), 2018, pp. 3974–3983, doi: 10.1109/ CVPR.2018.00418. [118] Y. Liu, S. Zhang, L. Jin, L. Xie, Y. Wu, and Z. Wang, “Omnidirectional scene text detection with sequential-free box discretization,” 2019, arXiv:1906.02371. [119] W. Qian, X. Yang, S. Peng, J. Yan, and Y. Guo, “Learning modulated loss for rotated object detection,” in Proc. AAAI Conf. Artif. Intell., 2021, vol. 35, no. 3, pp. 2458–2466, doi: 10.1609/ aaai.v35i3.16347. [120] Y. Xu et al., “Gliding vertex on the horizontal bounding box for multi-oriented object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 4, pp. 1452–1459, Apr. 2021, doi: 10.1109/TPAMI.2020.2974745. [121] W. Qian, X. Yang, S. Peng, X. Zhang, and J. Yan, “RSDet++: Point-based modulated loss for more accurate rotated object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 11, pp. 7869–7879, Nov. 2022, doi: 10.1109/TCSVT. 2022.3186070. [122] J. Luo, Y. Hu, and J. Li, “Surround-net: A multi-branch arbitraryoriented detector for remote sensing,” Remote Sens., vol. 14, no. 7, Apr. 2022, Art. no. 1751, doi: 10.3390/rs14071751. [123] Q. Song, F. Yang, L. Yang, C. Liu, M. Hu, and L. Xia, “Learning point-guided localization for detection in remote sensing images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 1084–1094, 2021, doi: 10.1109/JSTARS.2020.3036685. DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 35 [124] X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, “Oriented R-CNN for object detection,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV), 2021, pp. 3500–3509, doi: 10.1109/ ICCV48922.2021.00350. [125] Y. Yao et al., “On improving bounding box representations for oriented object detection,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–11, 2023, doi: 10.1109/TGRS.2022.3231340. [126] Q. Ming, L. Miao, Z. Zhou, X. Yang, and Y. Dong, “Optimization for arbitrary-oriented object detection via representation invariance loss,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2021.3115110. [127] X. Yang and J. Yan, “Arbitrary-oriented object detection with circular smooth label,” in Proc. Eur. Conf. Comput. Vision, Cham, Switzerland: Springer, 2020, pp. 677–694, doi: 10.1007/978-3-030-58598-3_40. [128] X. Yang, L. Hou, Y. Zhou, W. Wang, and J. Yan, “Dense label encoding for boundary discontinuity free rotation detection,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2021, pp. 15,819–15,829, doi: 10.1109/CVPR46437. 2021.01556. [129] J. Wang, F. Li, and H. Bi, “Gaussian focal loss: Learning distribution polarized angle prediction for rotated object detection in aerial images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–13, 2022, doi: 10.1109/TGRS.2022.3175520. [130] X. Yang, J. Yan, Q. Ming, W. Wang, X. Zhang, and Q. Tian, “Rethinking rotated object detection with gaussian Wasserstein distance loss,” in Proc. Int. Conf. Machine Learn, 2021, pp. 11,830–11,841. [131] X. Yang et al., “Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence,” in Proc. Adv. Neural Inf. Process. Syst. 34, 2021, vol. 34, pp. 18,381– 18,394. [132] X. Yang et al., “The KFIOU loss for rotated object detection,” 2022, arXiv:2201.12558. [133] X. Yang et al., “Detecting rotated objects as gaussian distributions and its 3-D generalization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 4, pp. 4335–4354, Apr. 2023, doi: 10.1109/TPAMI.2022.3197152. [134] J. Wang, J. Ding, H. Guo, W. Cheng, T. Pan, and W. Yang, “Mask OBB: A semantic attention-based mask oriented bounding box representation for multi-category object detection in aerial images,” Remote Sens., vol. 11, no. 24, Dec. 2019, Art. no. 2930, doi: 10.3390/rs11242930. [135] X. Zhang, G. Wang, P. Zhu, T. Zhang, C. Li, and L. Jiao, “GRSDet: An anchor-free rotation ship detector based on gaussianmask in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 4, pp. 3518–3531, Apr. 2021, doi: 10.1109/ TGRS.2020.3018106. [136] Y. Yang et al., “AR 2Det: An accurate and real-time rotational one-stage ship detector in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/ TGRS.2021.3092433. [137] F. Zhang, X. Wang, S. Zhou, Y. Wang, and Y. Hou, “Arbitraryoriented ship detection through center-head point extraction,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, Oct. 2021, doi: 10.1109/TGRS.2021.3120411. 36 [138] J. Yi, P. Wu, B. Liu, Q. Huang, H. Qu, and D. Metaxas, “Oriented object detection in aerial images with box boundary-aware vectors,” in Proc. IEEE Winter Conf. Appl. Comput. Vision (WACV), 2021, pp. 2149–2158, doi: 10.1109/WACV48630.2021.00220. [139] Z. Xiao, L. Qian, W. Shao, X. Tan, and K. Wang, “Axis learning for orientated objects detection in aerial images,” Remote Sens., vol. 12, no. 6, Mar. 2020, Art. no. 908, doi: 10.3390/ rs12060908. [140] X. He, S. Ma, L. He, L. Ru, and C. Wang, “Learning rotated inscribed ellipse for oriented object detection in remote sensing images,” Remote Sens., vol. 13, no. 18, Sep. 2021, Art. no. 3622, doi: 10.3390/rs13183622. [141] K. Fu, Z. Chang, Y. Zhang, and X. Sun, “Point-based estimator for arbitrary-oriented object detection in aerial images,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 5, pp. 4370–4387, May 2021, doi: 10.1109/TGRS.2020.3020165. [142] H. Wei, Y. Zhang, Z. Chang, H. Li, H. Wang, and X. Sun, “Oriented objects as pairs of middle lines,” ISPRS J. Photogrammetry Remote Sens., vol. 169, pp. 268–279, Nov. 2020, doi: 10.1016/j. isprsjprs.2020.09.022. [143] L. Zhou, H. Wei, H. Li, W. Zhao, Y. Zhang, and Y. Zhang, “Arbitrary-oriented object detection in remote sensing images based on polar coordinates,” IEEE Access, vol. 8, pp. 223,373– 223,384, Nov. 2020, doi: 10.1109/ACCESS.2020.3041025. [144] X. Zheng, W. Zhang, L. Huan, J. Gong, and H. Zhang, “AProNet: Detecting objects with precise orientation from aerial images,” ISPRS J. Photogrammetry Remote Sens., vol. 181, pp. 99–112, Nov. 2021, doi: 10.1016/j.isprsjprs.2021.08.023. [145] X. Yang, G. Zhang, W. Li, X. Wang, Y. Zhou, and J. Yan, “H2RBox: Horizontal box annotation is all you need for oriented object detection,” 2022. [Online]. Available: https:// arxiv.org/abs/2210.06742 [146] G. Cheng, P. Zhou, and J. Han, “Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 12, pp. 7405–7415, Dec. 2016, doi: 10.1109/ TGRS.2016.2601622. [147] K. Li, G. Cheng, S. Bu, and X. You, “Rotation-insensitive and context-augmented object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 4, pp. 2337– 2348, Apr. 2018, doi: 10.1109/TGRS.2017.2778300. [148] G. Cheng, P. Zhou, and J. Han, “RIFD-CNN: Rotation-invariant and fisher discriminative convolutional neural networks for object detection,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2016, pp. 2884–2893, doi: 10.1109/ CVPR.2016.315. [149] G. Cheng, J. Han, P. Zhou, and D. Xu, “Learning rotationinvariant and fisher discriminative convolutional neural networks for object detection,” IEEE Trans. Image Process., vol. 28, no. 1, pp. 265–278, Jan. 2019, doi: 10.1109/TIP.2018. 2867198. [150] X. Wu, D. Hong, J. Tian, J. Chanussot, W. Li, and R. Tao, “ORSIm detector: A novel object detection framework in optical remote sensing imagery using spatial-frequency channel features,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 5146– 5158, Jul. 2019, doi: 10.1109/TGRS.2019.2897139. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. [151] X. Wu, D. Hong, J. Chanussot, Y. Xu, R. Tao, and Y. Wang, “Fourier-based rotation-invariant feature boosting: An efficient framework for geospatial object detection,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 2, pp. 302–306, Feb. 2020, doi: 10.1109/LGRS.2019.2919755. [152] G. Wang, X. Wang, B. Fan, and C. Pan, “Feature extraction by rotation-invariant matrix representation for object detection in aerial image,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 6, pp. 851–855, Jun. 2017, doi: 10.1109/ LGRS.2017.2683495. [153] X. Wu, D. Hong, P. Ghamisi, W. Li, and R. Tao, “MsRi-CCF: Multi-scale and rotation-insensitive convolutional channel features for geospatial object detection,” Remote Sens., vol. 10, no. 12, Dec. 2018, Art. no. 1990, doi: 10.3390/rs10121990. [154] M. Zand, A. Etemad, and M. Greenspan, “Oriented bounding boxes for small and freely rotated objects,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–15, 2022, doi: 10.1109/ TGRS.2021.3076050. [155] J. Han, J. Ding, N. Xue, and G.-S. Xia, “ReDet: A rotation-equivariant detector for aerial object detection,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2021, pp. 2785–2794, doi: 10.1109/CVPR46437.2021.00281. [156] J. Han, J. Ding, J. Li, and G. Xia, “Align deep features for oriented object detection,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–11, 2022, doi: 10.1109/TGRS.2021.3062048. [157] X. Yao, H. Shen, X. Feng, G. Cheng, and J. Han, “R 2Ipoints: Pursuing rotation-insensitive point representation for aerial object detection,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–12, May 2022, doi: 10.1109/TGRS.2022.3173373. [158] X. Ye, F. Xiong, J. Lu, J. Zhou, and Y. Qian, “R 3-net: Feature fusion and filtration network for object detection in optical remote sensing images,” Remote Sens., vol. 12, no. 24, Dec. 2020, Art. no. 4027, doi: 10.3390/rs12244027. [159] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2018, pp. 7132–7141, doi: 10.1109/CVPR.2018.00745. [160] X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel networks,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2019, pp. 510–519, doi: 10.1109/CVPR.2019.00060. [161] J. Fu et al., “Dual attention network for scene segmentation,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), Jun. 2019, pp. 3141–3149, doi: 10.1109/CVPR.2019.00326. [162] Z. Huang, W. Li, X. Xia, X. Wu, Z. Cai, and R. Tao, “A novel nonlocal-aware pyramid and multiscale multitask refinement detector for object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–20, 2022, doi: 10.1109/TGRS.2021.3059450. [163] Y. Sun, X. Sun, Z. Wang, and K. Fu, “Oriented ship detection based on strong scattering points network in large-scale SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–18, 2022, doi: 10.1109/TGRS.2021.3130117. [164] W. Ma et al., “Feature split–merge–enhancement network for remote sensing object detection,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–17, Jan. 2022, doi: 10.1109/TGRS.2022.3140856. [165] Z. Cui, X. Wang, N. Liu, Z. Cao, and J. Yang, “Ship detection in large-scale SAR images via spatial shuffle-group enhance [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] attention,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 1, pp. 379–391, Jan. 2021, doi: 10.1109/TGRS.2020.2997200. J. Chen, L. Wan, J. Zhu, G. Xu, and M. Deng, “Multi-scale spatial and channel-wise attention for improving object detection in remote sensing imagery,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 4, pp. 681–685, Apr. 2020, doi: 10.1109/ LGRS.2019.2930462. J. Bai et al., “Object detection in large-scale remote-sensing images based on time-frequency analysis and feature optimization,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–16, 2022, doi: 10.1109/TGRS.2021.3119344. J. Hu, X. Zhi, S. Jiang, H. Tang, W. Zhang, and L. Bruzzone, “Supervised multi-scale attention-guided ship detection in optical remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, Sep. 2022, doi: 10.1109/TGRS.2022.3206306. Y. Guo, X. Tong, X. Xu, S. Liu, Y. Feng, and H. Xie, “An anchorfree network with density map and attention mechanism for multiscale object detection in aerial images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, Sep. 2022, doi: 10.1109/ LGRS.2022.3207178. D. Yu and S. Ji, “A new spatial-oriented object detection framework for remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–16, 2022, doi: 10.1109/ TGRS.2021.3127232. C. Li et al., “Object detection based on global-local saliency constraint in aerial images,” Remote Sens., vol. 12, no. 9, May 2020, Art. no. 1435, doi: 10.3390/rs12091435. J. Lei, X. Luo, L. Fang, M. Wang, and Y. Gu, “Region-enhanced convolutional neural network for object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 8, pp. 5693–5702, Aug. 2020, doi: 10.1109/TGRS. 2020.2968802. Y. Yuan, C. Li, J. Kim, W. Cai, and D. D. Feng, “Reversion correction and regularized random walk ranking for saliency detection,” IEEE Trans. Image Process., vol. 27, no. 3, pp. 1311– 1322, Mar. 2018, doi: 10.1109/TIP.2017.2762422. C. Xu, C. Li, Z. Cui, T. Zhang, and J. Yang, “Hierarchical semantic propagation for object detection in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 6, pp. 4353–4364, Jun. 2020, doi: 10.1109/TGRS.2019.2963243. T. Zhang et al., “Foreground refinement network for rotated object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–13, 2022, doi: 10.1109/ TGRS.2021.3109145. J. Wang, W. Yang, H. Li, H. Zhang, and G. Xia, “Learning center probability map for detecting objects in aerial images,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 5, pp. 4307–4323, May 2021, doi: 10.1109/TGRS.2020.3010051. Z. Fang, J. Ren, H. Sun, S. Marshall, J. Han, and H. Zhao, “SAFDet: A semi-anchor-free detector for effective detection of oriented objects in aerial images,” Remote Sens., vol. 12, no. 19, Oct. 2020, Art. no. 3225, doi: 10.3390/rs12193225. Z. Ren, Y. Tang, Z. He, L. Tian, Y. Yang, and W. Zhang, “Ship detection in high-resolution optical remote sensing images aided by saliency information,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–16, May 2022, doi: 10.1109/TGRS.2022.3173610. DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 37 [179] H. Qu, L. Shen, W. Guo, and J. Wang, “Ships detection in SAR images based on anchor-free model with mask guidance features,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 666–675, 2022, doi: 10.1109/JSTARS.2021.3137390. [180] S. Liu, L. Zhang, H. Lu, and Y. He, “Center-boundary dual attention for oriented object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3069056. [181] J. Zhang, C. Xie, X. Xu, Z. Shi, and B. Pan, “A contextual bidirectional enhancement method for remote sensing image object detection,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 13, pp. 4518–4531, Aug. 2020, doi: 10.1109/ JSTARS.2020.3015049. [182] Y. Gong et al., “Context-aware convolutional neural network for object detection in VHR remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 1, pp. 34–44, Jan. 2020, doi: 10.1109/TGRS.2019.2930246. [183] W. Ma, Q. Guo, Y. Wu, W. Zhao, X. Zhang, and L. Jiao, “A novel multi-model decision fusion network for object detection in remote sensing images,” Remote Sens., vol. 11, no. 7, Mar. 2019, Art. no. 737, doi: 10.3390/rs11070737. [184] S. Tian et al., “Siamese graph embedding network for object detection in remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 18, no. 4, pp. 602–606, Apr. 2021, doi: 10.1109/ LGRS.2020.2981420. [185] S. Tian, L. Kang, X. Xing, J. Tian, C. Fan, and Y. Zhang, “A relation-augmented embedded graph attention network for remote sensing object detection,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–18, 2022, doi: 10.1109/TGRS.2021. 3073269. [186] Y. Wu, K. Zhang, J. Wang, Y. Wang, Q. Wang, and Q. Li, “CDDNet: A context-driven detection network for multiclass object detection,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2020.3042465. [187] Y. Han, J. Liao, T. Lu, T. Pu, and Z. Peng, “KCPNet: Knowledgedriven context perception networks for ship detection in infrared imagery,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–19, 2023, doi: 10.1109/TGRS.2022.3233401. [188] C. Chen, W. Gong, Y. Chen, and W. Li, “Object detection in remote sensing images based on a scene-contextual feature pyramid network,” Remote Sens., vol. 11, no. 3, Feb. 2019, Art. no. 339, doi: 10.3390/rs11030339. [189] Z. Wu, B. Hou, B. Ren, Z. Ren, S. Wang, and L. Jiao, “A deep detection network based on interaction of instance segmentation and object detection for SAR images,” Remote Sens., vol. 13, no. 13, Jul. 2021, Art. no. 2582, doi: 10.3390/rs13132582. [190] Y. Wu, K. Zhang, J. Wang, Y. Wang, Q. Wang, and X. Li, “GCWNet: A global context-weaving network for object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–12, Mar. 2022, doi: 10.1109/TGRS.2022.3155899. [191] G. Shi, J. Zhang, J. Liu, C. Zhang, C. Zhou, and S. Yang, “Global context-augmented objection detection in VHR optical remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 12, pp. 10,604–10,617, Dec. 2021, doi: 10.1109/ TGRS.2020.3043252. 38 [192] J. Liu, S. Li, C. Zhou, X. Cao, Y. Gao, and B. Wang, “SRAFNet: A scene-relevant anchor-free object detection network in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021. 3124959. [193] C. Tao, L. Mi, Y. Li, J. Qi, Y. Xiao, and J. Zhang, “Scene contextdriven vehicle detection in high-resolution aerial images,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 10, pp. 7339–7351, Oct. 2019, doi: 10.1109/TGRS.2019.2912985. [194] K. Zhang, Y. Wu, J. Wang, Y. Wang, and Q. Wang, “Semantic context-aware network for multiscale object detection in remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2021.3067313. [195] M. Wang, Q. Li, Y. Gu, L. Fang, and X. X. Zhu, “SCAF-Net: Scene context attention-based fusion network for vehicle detection in aerial imagery,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2021.3107281. [196] G. Zhang, S. Lu, and W. Zhang, “CAD-Net: A context-aware detection network for objects in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12, pp. 10,015– 10,024, Dec. 2019, doi: 10.1109/TGRS.2019.2930982. [197] E. Liu, Y. Zheng, B. Pan, X. Xu, and Z. Shi, “DCL-Net: Augmenting the capability of classification and localization for remote sensing object detection,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 9, pp. 7933–7944, Sep. 2021, doi: 10.1109/ TGRS.2020.3048384. [198] Y. Feng, W. Diao, X. Sun, M. Yan, and X. Gao, “Towards automated ship detection and category recognition from highresolution aerial images,” Remote Sens., vol. 11, no. 16, Aug. 2019, Art. no. 1901, doi: 10.3390/rs11161901. [199] P. Wang, X. Sun, W. Diao, and K. Fu, “FMSSD: Featuremerged single-shot detection for multiscale objects in largescale remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 5, pp. 3377–3390, May 2020, doi: 10.1109/ TGRS.2019.2954328. [200] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018, doi: 10.1109/ TPAMI.2017.2699184. [201] Y. Bai, R. Li, S. Gou, C. Zhang, Y. Chen, and Z. Zheng, “Crossconnected bidirectional pyramid network for infrared smalldim target detection,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, Jan. 2022, doi: 10.1109/LGRS.2022.3145577. [202] Y. Li, Q. Huang, X. Pei, Y. Chen, L. Jiao, and R. Shang, “Cross-layer attention network for small object detection in remote sensing imagery,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 2148–2161, 2021, doi: 10.1109/ JSTARS.2020.3046482. [203] H. Gong et al., “Swin-transformer-enabled YOLOv5 with attention mechanism for small object detection on satellite images,” Remote Sens., vol. 14, no. 12, Jun. 2022, Art. no. 2861, doi: 10.3390/rs14122861. [204] J. Qu, C. Su, Z. Zhang, and A. Razi, “Dilated convolution and feature fusion SSD network for small object detection in IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. [205] [206] [207] [208] [209] [210] [211] [212] [213] [214] [215] [216] [217] remote sensing images,” IEEE Access, vol. 8, pp. 82,832–82,843, Apr. 2020, doi: 10.1109/ACCESS.2020.2991439. T. Ma, Z. Yang, J. Wang, S. Sun, X. Ren, and U. Ahmad, “Infrared small target detection network with generate label and feature mapping,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, Jan. 2022, doi: 10.1109/LGRS.2022.3140432. W. Han, A. Kuerban, Y. Yang, Z. Huang, B. Liu, and J. Gao, “Multi-vision network for accurate and real-time small object detection in optical remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/ LGRS.2020.3044422. Q. Hou, Z. Wang, F. Tan, Y. Zhao, H. Zheng, and W. Zhang, “RISTDnet: Robust infrared small target detection network,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2021.3050828. X. Lu, Y. Zhang, Y. Yuan, and Y. Feng, “Gated and axis-concentrated localization network for remote sensing object detection,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 1, pp. 179–192, Jan. 2020, doi: 10.1109/TGRS.2019.2935177. L. Courtrai, M. Pham, and S. Lefèvre, “Small object detection in remote sensing images based on super-resolution with auxiliary generative adversarial networks,” Remote Sens., vol. 12, no. 19, Sep. 2020, Art. no. 3152, doi: 10.3390/rs12193152. S. M. A. Bashir and Y. Wang, “Small object detection in remote sensing images with residual feature aggregationbased super-resolution and object detector network,” Remote Sens., vol. 13, no. 9, May 2021, Art. no. 1854, doi: 10.3390/rs13091854. J. Rabbi, N. Ray, M. Schubert, S. Chowdhury, and D. Chao, “Small-object detection in remote sensing images with endto-end edge-enhanced GAN and object detector network,” Remote Sens., vol. 12, no. 9, May 2020, Art. no. 1432, doi: 10.3390/rs12091432. J. Wu and S. Xu, “From point to region: Accurate and efficient hierarchical small object detection in low-resolution remote sensing images,” Remote Sens., vol. 13, no. 13, Jul. 2021, Art. no. 2620, doi: 10.3390/rs13132620. J. Li, Z. Zhang, Y. Tian, Y. Xu, Y. Wen, and S. Wang, “Targetguided feature super-resolution for vehicle detection in remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2021.3112172. J. Chen, K. Chen, H. Chen, Z. Zou, and Z. Shi, “A degraded reconstruction enhancement-based method for tiny ship detection in remote sensing images with a new large-scale dataset,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, Jun. 2022, doi: 10.1109/TGRS.2022.3180894. J. Pang, C. Li, J. Shi, Z. Xu, and H. Feng, “R 2-CNN: Fast tiny object detection in large-scale remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 8, pp. 5512–5524, Aug. 2019, doi: 10.1109/TGRS.2019.2899955. J. Wu, Z. Pan, B. Lei, and Y. Hu, “FSANet: Feature-and-spatialaligned network for tiny object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–17, Sep. 2022, doi: 10.1109/TGRS.2022.3205052. M. Pham, L. Courtrai, C. Friguet, S. Lefèvre, and A. Baussard, “YOLO-Fine: One-stage detector of small objects under vari- [218] [219] [220] [221] [222] [223] [224] [225] [226] [227] [228] [229] ous backgrounds in remote sensing images,” Remote Sens., vol. 12, no. 15, Aug. 2020, Art. no. 2501, doi: 10.3390/rs12152501. J. Yan, H. Wang, M. Yan, W. Diao, X. Sun, and H. Li, “IoU-adaptive deformable R-CNN: Make full use of IOU for multi-class object detection in remote sensing imagery,” Remote Sens., vol. 11, no. 3, Feb. 2019, Art. no. 286, doi: 10.3390/rs11030286. R. Dong, D. Xu, J. Zhao, L. Jiao, and J. An, “Sig-NMS-based faster R-CNN combining transfer learning for small target detection in VHR optical remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 11, pp. 8534–8545, Nov. 2019, doi: 10.1109/TGRS.2019.2921396. Z. Shu, X. Hu, and J. Sun, “Center-point-guided proposal generation for detection of small and dense buildings in aerial imagery,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 7, pp. 1100–1104, Jul. 2018, doi: 10.1109/LGRS.2018.2822760. C. Xu, J. Wang, W. Yang, and L. Yu, “Dot distance for tiny object detection in aerial images,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit. Workshops, 2021, pp. 1192–1201, doi: 10.1109/CVPRW53098.2021.00130. C. Xu, J. Wang, W. Yang, H. Yu, L. Yu, and G. Xia, “RFLA: Gaussian receptive field based label assignment for tiny object detection,” in Proc. 17th Eur. Conf., 2022, pp. 526–543, doi: 10.1007/978-3-031-20077-9_31. F. Zhang, B. Du, L. Zhang, and M. Xu, “Weakly supervised learning based on coupled convolutional neural networks for aircraft detection,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 9, pp. 5553–5563, Sep. 2016, doi: 10.1109/TGRS. 2016.2569141. Y. Li, B. He, F. Melgani, and T. Long, “Point-based weakly supervised learning for object detection in high spatial resolution remote sensing images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 5361–5371, Apr. 2021, doi: 10.1109/JSTARS.2021.3076072. D. Zhang, J. Han, G. Cheng, Z. Liu, S. Bu, and L. Guo, “Weakly supervised learning for target detection in remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 4, pp. 701– 705, Apr. 2015, doi: 10.1109/LGRS.2014.2358994. J. Han, D. Zhang, G. Cheng, L. Guo, and J. Ren, “Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 6, pp. 3325–3337, Jun. 2015, doi: 10.1109/TGRS.2014.2374218. Y. Li, Y. Zhang, X. Huang, and A. L. Yuille, “Deep networks under scene-level supervision for multi-class geospatial object detection from remote sensing images,” ISPRS J. Photogrammetry Remote Sens., vol. 146, pp. 182–196, Dec. 2018, doi: 10.1016/j.isprsjprs.2018.09.014. H. Bilen and A. Vedaldi, “Weakly supervised deep detection networks,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2016, pp. 2846–2854, doi: 10.1109/CVPR. 2016.311. X. Yao, X. Feng, J. Han, G. Cheng, and L. Guo, “Automatic weakly supervised object detection from high spatial resolution remote sensing images via dynamic curriculum learning,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 1, pp. 675– 685, Jan. 2021, doi: 10.1109/TGRS.2020.2991407. DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 39 [230] H. Wang et al., “Dynamic pseudo-label generation for weakly supervised object detection in remote sensing images,” Remote Sens., vol. 13, no. 8, Apr. 2021, Art. no. 1461, doi: 10.3390/ rs13081461. [231] X. Feng, J. Han, X. Yao, and G. Cheng, “Progressive contextual instance refinement for weakly supervised object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 11, pp. 8002–8012, Nov. 2020, doi: 10.1109/ TGRS.2020.2985989. [232] P. Shamsolmoali, J. Chanussot, M. Zareapoor, H. Zhou, and J. Yang, “Multipatch feature pyramid network for weakly supervised object detection in optical remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–13, 2022, doi: 10.1109/TGRS.2021.3106442. [233] B. Wang, Y. Zhao, and X. Li, “Multiple instance graph learning for weakly supervised remote sensing object detection,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–12, 2022, doi: 10.1109/TGRS.2021.3123231. [234] X. Feng, J. Han, X. Yao, and G. Cheng, “TCANet: Triple context-aware network for weakly supervised object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 8, pp. 6946–6955, Aug. 2021, doi: 10.1109/ TGRS.2020.3030990. [235] X. Feng, X. Yao, G. Cheng, J. Han, and J. Han, “SAENet: Selfsupervised adversarial and equivariant network for weakly supervised object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–11, 2022, doi: 10.1109/TGRS.2021.3105575. [236] X. Qian et al., “Incorporating the completeness and difficulty of proposals into weakly supervised object detection in remote sensing images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 1902–1911, Feb. 2022, doi: 10.1109/ JSTARS.2022.3150843. [237] W. Qian, Z. Yan, Z. Zhu, and W. Yin, “Weakly supervised part-based method for combined object detection in remote sensing imagery,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 5024–5036, Jun. 2022, doi: 10.1109/ JSTARS.2022.3179026. [238] S. Chen, D. Shao, X. Shu, C. Zhang, and J. Wang, “FCC-Net: A full-coverage collaborative network for weakly supervised remote sensing object detection,” Electronics, vol. 9, no. 9, Aug. 2020, Art. no. 1356, doi: 10.3390/electronics9091356. [239] G. Cheng, X. Xie, W. Chen, X. Feng, X. Yao, and J. Han, “Selfguided proposal generation for weakly supervised object detection,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–11, Jun. 2022, doi: 10.1109/TGRS.2022.3181466. [240] X. Feng, X. Yao, G. Cheng, and J. Han, “Weakly supervised rotation-invariant aerial object detection network,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit. (CVPR), 2022, pp. 14,126–14,135, doi: 10.1109/CVPR52688. 2022.01375. [241] G. Wang, X. Zhang, Z. Peng, X. Jia, X. Tang, and L. Jiao, “MOL: Towards accurate weakly supervised remote sensing object detection via multi-view noisy learning,” ISPRS J. Photogrammetry Remote Sens., vol. 196, pp. 457–470, Feb. 2023, doi: 10.1016/j. isprsjprs.2023.01.011. 40 [242] T. Deselaers, B. Alexe, and V. Ferrari, “Weakly supervised localization and learning with generic knowledge,” Int. J. Comput. Vision, vol. 100, no. 3, pp. 275–293, May 2012, doi: 10.1007/s11263-012-0538-3. [243] B. Hou et al., “A neural network based on consistency learning and adversarial learning for semisupervised synthetic aperture radar ship detection,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–16, Jan. 2022, doi: 10.1109/TGRS.2022.3142017. [244] Z. Song, J. Yang, D. Zhang, S. Wang, and Z. Li, “Semi-supervised dim and small infrared ship detection network based on Haar wavelet,” IEEE Access, vol. 9, pp. 29,686–29,695, Feb. 2021, doi: 10.1109/ACCESS.2021.3058526. [245] Y. Zhong, Z. Zheng, A. Ma, X. Lu, and L. Zhang, “COLOR: Cycling, offline learning, and online representation framework for airport and airplane detection using GF-2 satellite images,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 12, pp. 8438–8449, Dec. 2020, doi: 10.1109/TGRS.2020.2987907. [246] Y. Wu, W. Zhao, R. Zhang, and F. Jiang, “AMR-Net: Arbitrary-oriented ship detection using attention module, multi-scale feature fusion and rotation pseudo-label,” IEEE Access, vol. 9, pp. 68,208–68,222, Apr. 2021, doi: 10.1109/ACCESS. 2021.3075857. [247] S. Chen, R. Zhan, W. Wang, and J. Zhang, “Domain adaptation for semi-supervised ship detection in SAR images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, May 2022, doi: 10.1109/LGRS.2022.3171789. [248] Z. Zhang, Z. Feng, and S. Yang, “Semi-supervised object detection framework with object first mixup for remote sensing images,” in Proc. Int. Geosci. Remote Sens. Symp., 2021, pp. 2596–2599, doi: 10.1109/IGARSS47720.2021.9554202. [249] B. Xue and N. Tong, “DIOD: Fast and efficient weakly semisupervised deep complex ISAR object detection,” IEEE Trans. Cybern., vol. 49, no. 11, pp. 3991–4003, Nov. 2019, doi: 10.1109/TCYB.2018.2856821. [250] L. Liao, L. Du, and Y. Guo, “Semi-supervised SAR target detection based on an improved faster R-CNN,” Remote Sens., vol. 14, no. 1, 2022, Art. no. 143, doi: 10.3390/rs14010143. [251] Y. Du, L. Du, Y. Guo, and Y. Shi, “Semisupervised SAR ship detection network via scene characteristic learning,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–17, Jan. 2023, doi: 10.1109/ TGRS.2023.3235859. [252] D. Wei, Y. Du, L. Du, and L. Li, “Target detection network for SAR images based on semi-supervised learning and attention mechanism,” Remote Sens., vol. 13, no. 14, Jul. 2021, Art. no. 2686, doi: 10.3390/rs13142686. [253] L. Chen, Y. Fu, S. You, and H. Liu, “Efficient hybrid supervision for instance segmentation in aerial images,” Remote Sens., vol.≈13, no. 2, Jan. 2021, Art. no. 252, doi: 10.3390/rs13020252. [254] G. Cheng et al., “Prototype-CNN for few-shot object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–10, 2022, doi: 10.1109/TGRS.2021.3078507. [255] X. Li, J. Deng, and Y. Fang, “Few-shot object detection on remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3051383. [256] L. Li, X. Yao, G. Cheng, M. Xu, J. Han, and J. Han, “Solo-to-collaborative dual-attention network for one-shot object detection IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. [257] [258] [259] [260] [261] [262] [263] [264] [265] [266] [267] [268] [269] in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–11, 2022, doi: 10.1109/TGRS.2021.3091003. H. Zhang, X. Zhang, G. Meng, C. Guo, and Z. Jiang, “Fewshot multi-class ship detection in remote sensing images using attention feature map and multi-relation detector,” Remote Sens., vol. 14, no. 12, Jun. 2022, Art. no. 2790, doi: 10.3390/ rs14122790. B. Wang, Z. Wang, X. Sun, H. Wang, and K. Fu, “DMMLNet: Deep metametric learning for few-shot geographic object segmentation in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–18, 2022, doi: 10.1109/ TGRS.2021.3116672. J. Li et al., “MM-RCNN: Toward few-shot object detection in remote sensing images with meta memory,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, Dec. 2022, doi: 10.1109/ TGRS.2022.3228612. Z. Zhao, P. Tang, L. Zhao, and Z. Zhang, “Few-shot object detection of remote sensing images via two-stage fine-tuning,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2021.3116858. Y. Zhou, H. Hu, J. Zhao, H. Zhu, R. Yao, and W. Du, “Fewshot object detection via context-aware aggregation for remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2022.3171257. Y. Wang, C. Xu, C. Liu, and Z. Li, “Context information refinement for few-shot object detection in remote sensing images,” Remote Sens., vol. 14, no. 14, Jul. 2022, Art. no. 3255, doi: 10.3390/rs14143255. Z. Zhou, S. Li, W. Guo, and Y. Gu, “Few-shot aircraft detection in satellite videos based on feature scale selection pyramid and proposal contrastive learning,” Remote Sens., vol. 14, no. 18, Sep. 2022, Art. no. 4581, doi: 10.3390/rs14184581. S. Chen, J. Zhang, R. Zhan, R. Zhu, and W. Wang, “Few shot object detection for SAR images via feature enhancement and dynamic relationship modeling,” Remote Sens., vol. 14, no. 15, Jul. 2022, Art. no. 3669, doi: 10.3390/rs14153669. S. Liu, Y. You, H. Su, G. Meng, W. Yang, and F. Liu, “Few-shot object detection in remote sensing image interpretation: Opportunities and challenges,” Remote Sens., vol. 14, no. 18, Sep. 2022, Art. no. 4435, doi: 10.3390/rs14184435. X. Huang, B. He, M. Tong, D. Wang, and C. He, “Few-shot object detection on remote sensing images via shared attention module and balanced fine-tuning strategy,” Remote Sens., vol. 13, no. 19, Sep. 2021, Art. no. 3816, doi: 10.3390/ rs13193816. Z. Xiao, J. Qi, W. Xue, and P. Zhong, “Few-shot object detection with self-adaptive attention network for remote sensing images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 4854–4865, May 2021, doi: 10.1109/ JSTARS.2021.3078177. S. Wolf, J. Meier, L. Sommer, and J. Beyerer, “Double head predictor based few-shot object detection for aerial imagery,” in Proc. IEEE Int. Conf. Comput. Vision Workshops, 2021, pp. 721– 731, doi: 10.1109/ICCVW54120.2021.00086. T. Zhang, X. Zhang, P. Zhu, X. Jia, X. Tang, and L. Jiao, “Generalized few-shot object detection in remote sensing images,” [270] [271] [272] [273] [274] [275] [276] [277] [278] [279] [280] [281] [282] [283] ISPRS J. Photogrammetry Remote Sens., vol. 195, pp. 353–364, Jan. 2023, doi: 10.1016/j.isprsjprs.2022.12.004. G. Heitz and D. Koller, “Learning spatial context: Using stuff to find things,” in Proc. Eur. Conf. Comput. Vision, Berlin, Heidelberg: Springer, 2008, vol. 5302, pp. 30–43. C. Benedek, X. Descombes, and J. Zerubia, “Building development monitoring in multitemporal remotely sensed image pairs with stochastic birth-death dynamics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 1, pp. 33–50, Jan. 2012, doi: 10.1109/TPAMI.2011.94. S. Razakarivony and F. Jurie, “Vehicle detection in aerial imagery: A small target detection benchmark,” J. Vis. Commun. Image Representation, vol. 34, pp. 187–203, Jan. 2016, doi: 10.1016/j.jvcir.2015.11.002. K. Liu and G. Máttyus, “Fast multiclass vehicle detection on aerial images,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 9, pp. 1938–1942, Sep. 2015, doi: 10.1109/LGRS.2015.2439517. H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, and J. Jiao, “Orientation robust object detection in aerial images using deep convolutional neural network,” in Proc. IEEE Int. Conf. Image Process., 2015, pp. 3735–3739, doi: 10.1109/ICIP.2015.7351502. T. N. Mundhenk, G. Konjevod, W. A. Sakla, and K. Boakye, “A large contextual dataset for classification, detection and counting of cars with deep learning,” in Proc. Eur. Conf. Comput. Vision, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham, Switzerland: Springer, 2016, vol. 9907, pp. 785–800. Z. Liu, H. Wang, L. Weng, and Y. Yang, “Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 8, pp. 1074–1078, Aug. 2016, doi: 10.1109/LGRS.2016.2565705. J. Li, C. Qu, and J. Shao, “Ship detection in SAR images based on an improved faster R-CNN,” in Proc. SAR Big Data Era, Models, Methods Appl. (BIGSARDATA), 2017, pp. 1–6, doi: 10.1109/ BIGSARDATA.2017.8124934. Z. Zou and Z. Shi, “Random access memories: A new paradigm for target detection in high resolution aerial remote sensing images,” IEEE Trans. Image Process., vol. 27, no. 3, pp. 1100–1111, Mar. 2018, doi: 10.1109/TIP.2017.2773199. X. Sun, Z. Wang, Y. Sun, W. Diao, Y. Zhang, and K. Fu, “Airsarship-1.0: High-resolution SAR ship detection dataset,” J. Radars, vol. 8, no. 6, pp. 852–863, Dec. 2019, doi: 10.12000/ JR19097. W. Yu et al., “Mar20: A benchmark for military aircraft recognition in remote sensing images,” in Proc. Nat. Remote Sens. Bull., 2022, pp. 1–11, doi: 10.11834/jrs.20222139. K. Chen, M. Wu, J. Liu, and C. Zhang, “FGSD: A dataset for fine-grained ship detection in high resolution satellite images,” 2020. [Online]. Available: https://arxiv.org/ abs/2003.06832 Y. Han, X. Yang, T. Pu, and Z. Peng, “Fine-grained recognition for oriented ship against complex scenes in optical remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–18, 2022, doi: 10.1109/TGRS.2021.3123666. J. Wang, W. Yang, H. Guo, R. Zhang, and G. Xia, “Tiny object detection in aerial images,” in Proc. IEEE Int. Conf. Pattern DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 41 [284] [285] [286] [287] [288] [289] [290] [291] [292] [293] [294] [295] [296] 42 Recognit., 2020, pp. 3791–3798, doi: 10.1109/ICPR48806. 2021.9413340. G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, and J. Han, “Towards large-scale small object detection: Survey and benchmarks,” 2022, arXiv:2207.14096. T. Zhang, X. Zhang, J. Shi, and S. Wei, “HyperLi-Net: A hyperlight deep learning network for high-accurate and high-speed ship detection from synthetic aperture radar imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 167, pp. 123–153, Sep. 2020, doi: 10.1016/j.isprsjprs.2020.05.016. T. Zhang et al., “SAR ship detection dataset (SSDD): Official release and comprehensive data analysis,” Remote Sens., vol. 13, no. 18, Sep. 2021, Art. no. 3690, doi: 10.3390/rs13183690. M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman, “The pascal visual object classes (VOC) challenge,” Int. J. Comput. Vision, vol. 88, no. 2, pp. 303–338, Jun. 2010, doi: 10.1007/s11263-009-0275-4. A. G. Menezes, G. de Moura, C. Alves, and A. C. P. L. F. de Carvalho, “Continual object detection: A review of definitions, strategies, and challenges,” Neural Netw., vol. 161, pp. 476–493, Apr. 2023, doi: 10.1016/j.neunet.2023.01.041. C. Persello et al., “Deep learning and earth observation to support the sustainable development goals: Current approaches, open challenges, and future opportunities,” IEEE Geosci. Remote Sens. Mag., vol. 10, no. 2, pp. 172–200, Jun. 2022, doi: 10.1109/MGRS.2021.3136100. T. Hoeser, F. Bachofer, and C. Kuenzer, “Object detection and image segmentation with deep learning on earth observation data: A review—part II: Applications,” Remote Sen., vol. 12, no. 18, Sep. 2020, Art. no. 3053, doi: 10.3390/ rs12183053. L. Ma, Y. Liu, X. Zhang, Y. Ye, G. Yin, and B. A. Johnson, “Deep learning in remote sensing applications: A meta-analysis and review,” ISPRS J. Photogrammetry Remote Sens., vol. 152, pp. 166–177, Jun. 2019, doi: 10.1016/j.isprsjprs.2019.04.015. P. Barmpoutis, P. Papaioannou, K. Dimitropoulos, and N. Grammalidis, “A review on early forest fire detection systems using optical remote sensing,” Sensors, vol. 20, no. 22, Nov. 2020, Art. no. 6442, doi: 10.3390/s20226442. Z. Guan, X. Miao, Y. Mu, Q. Sun, Q. Ye, and D. Gao, “Forest fire segmentation from aerial imagery data using an improved instance segmentation model,” Remote Sens., vol. 14, no. 13, Jul. 2022, Art. no. 3159, doi: 10.3390/rs14133159. Z. Zheng, Y. Zhong, J. Wang, A. Ma, and L. Zhang, “Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters,” Remote Sens. Environ., vol. 265, Nov. 2021, Art. no. 112636, doi: 10.1016/j.rse.2021.112636. H. Ma, Y. Liu, Y. Ren, and J. Yu, “Detection of collapsed buildings in post-earthquake remote sensing images based on the improved YOLOv3,” Remote Sens., vol. 12, no. 1, 2020, Art. no. 44, doi: 10.3390/rs12010044. Y. Pi, N. D. Nath, and A. H. Behzadan, “Convolutional neural networks for object detection in aerial imagery for disaster response and recovery,” Adv. Eng. Informat., vol. 43, Jan. 2020, Art. no. 101009, doi: 10.1016/j.aei.2019.101009. [297] M. Weiss, F. Jacob, and G. Duveiller, “Remote sensing for agricultural applications: A meta-review,” Remote Sens. Environ., vol. 236, Jan. 2020, Art. no. 111402, doi: 10.1016/j.rse. 2019.111402. [298] Y. Pang et al., “Improved crop row detection with deep neural network for early-season maize stand count in UAV imagery,” Comput. Electron. Agriculture, vol. 178, Nov. 2020, Art. no. 105766, doi: 10.1016/j.compag.2020.105766. [299] C. Mota-Delfin, G. de Jesús López-Canteñs, I. L. L. Cruz, E. Romantchik-Kriuchkova, and J. C. Olguín-Rojas, “Detection and counting of corn plants in the presence of weeds with convolutional neural networks,” Remote Sens., vol. 14, no. 19, Sep. 2022, Art. no. 4892, doi: 10.3390/rs14194892. [300] L. P. Osco et al., “A CNN approach to simultaneously count plants and detect plantation-rows from UAV imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 174, pp. 1–17, Apr. 2021, doi: 10.1016/j.isprsjprs.2021.01.024. [301] M. M. Anuar, A. A. Halin, T. Perumal, and B. Kalantar, “Aerial imagery paddy seedlings inspection using deep learning,” Remote Sens., vol. 14, no. 2, Jan. 2022, Art. no. 274, doi: 10.3390/ rs14020274. [302] Y. Chen et al., “Strawberry yield prediction based on a deep neural network using high-resolution aerial orthoimages,” Remote Sens., vol. 11, no. 13, Jul. 2019, Art. no. 1584, doi: 10.3390/rs11131584. [303] W. Zhao, C. Persello, and A. Stein, “Building outline delineation: From aerial images to polygons with an improved end-to-end learning framework,” ISPRS J. Photogrammetry Remote Sens., vol. 175, pp. 119–131, May 2021, doi: 10.1016/j. isprsjprs.2021.02.014. [304] Z. Li, J. D. Wegner, and A. Lucchi, “Topological map extraction from overhead images,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV), 2019, pp. 1715–1724, doi: 10.1109/ICCV.2019. 00180. [305] L. Mou and X. X. Zhu, “Vehicle instance segmentation from aerial image and video using a multitask learning residual fully convolutional network,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 11, pp. 6699–6711, Nov. 2018, doi: 10.1109/ TGRS.2018.2841808. [306] J. Zhang, X. Zhang, Z. Huang, X. Cheng, J. Feng, and L. Jiao, “Bidirectional multiple object tracking based on trajectory criteria in satellite videos,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–14, Jan. 2023, doi: 10.1109/TGRS.2023.3235883. [307] H. Kim and Y. Ham, “Participatory sensing-based geospatial localization of distant objects for disaster preparedness in urban built environments,” Automat. Construction, vol. 107, Nov. 2019, Art. no. 102960, doi: 10.1016/j.autcon.2019.102960. [308] M. A. E. Bhuiyan, C. Witharana, and A. K. Liljedahl, “Use of very high spatial resolution commercial satellite imagery and deep learning to automatically map ice-wedge polygons across tundra vegetation types,” J. Imag., vol. 6, no. 12, Dec. 2020, Art. no. 137, doi: 10.3390/jimaging6120137. [309] W. Zhang et al., “Transferability of the deep learning mask R-CNN model for automated mapping of ice-wedge polygons in high-resolution satellite and UAV images,” Remote IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. [310] [311] [312] [313] [314] [315] [316] [317] [318] [319] [320] [321] [322] Sens., vol. 12, no. 7, Mar. 2020, Art. no. 1085, doi: 10.3390/ rs12071085. C. Witharana et al., “An object-based approach for mapping tundra ice-wedge polygon troughs from very high spatial resolution optical satellite imagery,” Remote Sens., vol. 13, no. 4, Feb. 2021, Art. no. 558, doi: 10.3390/rs13040558. J. Yu, Z. Wang, A. Majumdar, and R. Rajagopal, “DeepSolar: A machine learning framework to efficiently construct a solar deployment database in the United States,” Joule, vol. 2, no. 12, pp. 2605–2617, Dec. 2018, doi: 10.1016/j.joule. 2018.11.021. J. M. Malof, K. Bradbury, L. M. Collins, and R. G. Newell, “Automatic detection of solar photovoltaic arrays in high resolution aerial imagery,” Appl. Energy, vol. 183, pp. 229–240, Dec. 2016, doi: 10.1016/j.apenergy.2016.08.191. W. Zhang, G. Wang, J. Qi, G. Wang, and T. Zhang, “Research on the extraction of wind turbine all over the China based on domestic satellite remote sensing data,” in Proc. Int. Geosci. Remote Sens. Symp., 2021, pp. 4167–4170, doi: 10.1109/ IGARSS47720.2021.9553559. W. Hu et al., “Wind turbine detection with synthetic overhead imagery,” in Proc. Int. Geosci. Remote Sens. Symp., 2021, pp. 4908–4911, doi: 10.1109/IGARSS47720.2021.9554306. T. Jia et al., “Deep learning for detecting macroplastic litter in water bodies: A review,” Water Res., vol. 231, Mar. 2023, Art. no. 119632, doi: 10.1016/j.watres.2023.119632. C. Martin, Q. Zhang, D. Zhai, X. Zhang, and C. M. Duarte, “Enabling a large-scale assessment of litter along Saudi Arabian red sea shores by combining drones and machine learning,” Environmental Pollut., vol. 277, May 2021, Art. no. 116730, doi: 10.1016/j.envpol.2021.116730. K. Themistocleous, C. Papoutsa, S. C. Michaelides, and D. G. Hadjimitsis, “Investigating detection of floating plastic litter from space using sentinel-2 imagery,” Remote Sens., vol. 12, no. 16, Aug. 2020, Art. no. 2648, doi: 10.3390/rs12162648. B. Xue et al., “An efficient deep-sea debris detection method using deep neural networks,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 12,348–12,360, Nov. 2021, doi: 10.1109/JSTARS.2021.3130238. J. Peng et al., “Wild animal survey using UAS imagery and deep learning: Modified faster R-CNN for kiang detection in Tibetan plateau,” ISPRS J. Photogrammetry Remote Sens., vol. 169, pp. 364–376, Nov. 2020, doi: 10.1016/j.isprsjprs. 2020.08.026. N. Rey, M. Volpi, S. Joost, and D. Tuia, “Detecting animals in African savanna with UAVs and the crowds,” Remote Sens. Environ., vol. 200, pp. 341–351, Oct. 2017, doi: 10.1016/ j.rse.2017.08.026. B. Kellenberger, D. Marcos, and D. Tuia, “Detecting mammals in UAV images: Best practices to address a substantially imbalanced dataset with deep learning,” Remote Sens. Environ., vol. 216, pp. 139–153, Oct. 2018, doi: 10.1016/j.rse.2018. 06.028. A. Delplanque, S. Foucher, P. Lejeune, J. Linchant, and J. Théau, “Multispecies detection and identification of African mammals in aerial imagery using convolutional neural net- [323] [324] [325] [326] [327] [328] [329] [330] [331] [332] [333] [334] works,” Remote Sens. Ecology Conservation, vol. 8, no. 2, pp. 166–179, 2022, doi: 10.1002/rse2.234. D. Wang, Q. Shao, and H. Yue, “Surveying wild animals from satellites, manned aircraft and unmanned aerial systems (UASS): A review,” Remote Sen., vol. 11, no. 11, Jun. 2019, Art. no. 1308, doi: 10.3390/rs11111308. T. Kattenborn, J. Leitloff, F. Schiefer, and S. Hinz, “Review on convolutional neural networks (CNN) in vegetation remote sensing,” ISPRS J. Photogrammetry Remote Sens., vol. 173, pp. 24–49, Mar. 2021, doi: 10.1016/j.isprsjprs.2020.12.010. T. Dong, Y. Shen, J. Zhang, Y. Ye, and J. Fan, “Progressive cascaded convolutional neural networks for single tree detection with google earth imagery,” Remote Sens., vol. 11, no. 15, Jul. 2019, Art. no. 1786, doi: 10.3390/rs11151786. A. Safonova, S. Tabik, D. Alcaraz-Segura, A. Rubtsov, Y. Maglinets, and F. Herrera, “Detection of fir trees (Abies sibirica) damaged by the bark beetle in unmanned aerial vehicle images with deep learning,” Remote Sens., vol. 11, no. 6, Mar. 2019, Art. no. 643, doi: 10.3390/rs11060643. Z. Hao et al., “Automated tree-crown and height detection in a young forest plantation using mask region-based convolutional neural network (Mask R-CNN),” ISPRS J. Photogrammetry Remote Sens., vol. 178, pp. 112–123, Aug. 2021, doi: 10.1016/j. isprsjprs.2021.06.003. A. Sani-Mohammed, W. Yao, and M. Heurich, “Instance segmentation of standing dead trees in dense forest from aerial imagery using deep learning,” ISPRS Open J. Photogrammetry Remote Sens., vol. 6, Dec. 2022, Art. no. 100024, doi: 10.1016/j. ophoto.2022.100024. A. V. Etten, “You only look twice: Rapid multi-scale object detection in satellite imagery,” 2018. [Online]. Available: http:// arxiv.org/abs/1805.09512 Q. Lin, J. Zhao, G. Fu, and Z. Yuan, “CRPN-SFNet: A highperformance object detector on large-scale remote sensing images,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 1, pp. 416 – 4 2 9, Ja n . 202 2 , doi : 10.110 9/ T N N L S . 2020. 3027924. D. Hong et al., “More diverse means better: Multimodal deep learning meets remote-sensing imagery classification,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 5, pp. 4340–4354, May 2021, doi: 10.1109/TGRS.2020.3016820. D. Hong, N. Yokoya, G.-S. Xia, J. Chanussot, and X. X. Zhu, “X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data,” ISPRS J. Photogrammetry Remote Sens., vol. 167, pp. 12–23, Sep. 2020, doi: 10.1016/ j.isprsjprs.2020.06.014. M. Segal-Rozenhaimer, A. Li, K. Das, and V. Chirayath, “Cloud detection algorithm for multi-modal satellite imagery using convolutional neural-networks (CNN),” Remote Sens. Environ., vol. 237, Feb. 2020, Art. no. 111446, doi: 10.1016/ j.rse.2019.111446. Y. Shendryk, Y. Rist, C. Ticehurst, and P. Thorburn, “Deep learning for multi-modal classification of cloud, shadow and land cover scenes in planetscope and sentinel-2 imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 157, pp. 124–136, Nov. 2019, doi: 10.1016/j.isprsjprs.2019.08.018. DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply. 43 [335] Y. Shi, L. Du, and Y. Guo, “Unsupervised domain adaptation for SAR target detection,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 6372–6385, Jun. 2021, doi: 10.1109/ JSTARS.2021.3089238. [336] Y. Zhu, X. Sun, W. Diao, H. Li, and K. Fu, “RFA-Net: Reconstructed feature alignment network for domain adaptation object detection in remote sensing imagery,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 5689–5703, Jul. 2022, doi: 10.1109/JSTARS.2022.3190699. [337] T. Xu, X. Sun, W. Diao, L. Zhao, K. Fu, and H. Wang, “FADA: Feature aligned domain adaptive object detection in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–16, Jan. 2022, doi: 10.1109/TGRS.2022.3147224. [338] Y. Koga, H. Miyazaki, and R. Shibasaki, “A method for vehicle detection in high-resolution satellite images that uses a region-based object detector and unsupervised domain adaptation,” Remote Sens., vol. 12, no. 3, Feb. 2020, Art. no. 575, doi: 10.3390/rs12030575. [339] Y. Shi, L. Du, Y. Guo, and Y. Du, “Unsupervised domain adaptation based on progressive transfer for ship detection: From optical to SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–17, Jun. 2022, doi: 10.1109/TGRS.2022. 3185298. [340] P. Zhang et al., “SEFEPNet: Scale expansion and feature enhancement pyramid network for SAR aircraft detection with small sample dataset,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 3365–3375, Apr. 2022, doi: 10.1109/ JSTARS.2022.3169339. [341] S. Dang, Z. Cao, Z. Cui, Y. Pi, and N. Liu, “Open set incremental learning for automatic target recognition,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 4445–4456, Jul. 2019, doi: 10.1109/TGRS.2019.2891266. [342] J. Chen, S. Wang, L. Chen, H. Cai, and Y. Qian, “Incremental detection of remote sensing objects with feature pyramid and knowledge distillation,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–13, 2022, doi: 10.1109/TGRS. 2020.3042554. [343] X. Chen et al., “An online continual object detector on VHR remote sensing images with class imbalance,” Eng. Appl. Artif. Intell., vol. 117, no. Part A, Jan. 2023, Art. no. 105549, doi: 10.1016/j.engappai.2022.105549. [344] J. Li et al., “Class-incremental learning network for small objects enhancing of semantic segmentation in aerial imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–20, 2022, doi: 10.1109/TGRS.2021.3124303. [345] W. Liu, X. Nie, B. Zhang, and X. Sun, “Incremental learning with open-set recognition for remote sensing image scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–16, May 2022, doi: 10.1109/TGRS.2022.3173995. [346] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit. (CVPR), 2009, pp. 248–255, doi: 10.1109/CVPR.2009.5206848. [347] Y. Long et al., “On creating benchmark dataset for aerial image interpretation: Reviews, guidances, and million-AID,” IEEE J. Sel. Topics Appl. Earth Observ. Remote 44 [348] [349] [350] [351] [352] [353] [354] [355] [356] [357] [358] [359] [360] Sens., vol. 14, pp. 4205–4230, Apr. 2021, doi: 10.1109/ JSTARS.2021.3070368. G. A. Christie, N. Fendley, J. Wilson, and R. Mukherjee, “Functional map of the world,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit. (CVPR), 2018, pp. 6172–6180, doi: 10.1109/ CVPR.2018.00646. D. Wang, J. Zhang, B. Du, G.-S. Xia, and D. Tao, “An empirical study of remote sensing pretraining,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–20, 2023, doi: 10.1109/ TGRS.2022.3176603. W. Li, K. Chen, H. Chen, and Z. Shi, “Geographical knowledge-driven representation learning for remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–16, 2022, doi: 10.1109/TGRS.2021.3115569. X. Sun et al., “RingMo: A remote sensing foundation model with masked image modeling,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–22, 2023, doi: 10.1109/TGRS.2022. 3194732. A. Fuller, K. Millard, and J. R. Green, “SatViT: Pretraining transformers for earth observation,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, Aug. 2022, doi: 10.1109/LGRS.2022.3201489. D. Wang et al., “Advancing plain vision transformer toward remote sensing foundation model,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–15, 2023, doi: 10.1109/TGRS. 2022.3222818. T. Zhang and X. Zhang, “ShipDeNet-20: An only 20 convolution layers and <1-mb lightweight SAR ship detector,” IEEE Geosci. Remote Sens. Lett., vol. 18, no. 7, pp. 1234–1238, Jul. 2021, doi: 10.1109/LGRS.2020.2993899. T. Zhang, X. Zhang, J. Shi, and S. Wei, “Depthwise separable convolution neural network for high-speed SAR ship detection,” Remote Sens., vol. 11, no. 21, Oct. 2019, Art. no. 2483, doi: 10.3390/rs11212483. Z. Wang, L. Du, and Y. Li, “Boosting lightweight CNNs through network pruning and knowledge distillation for SAR target recognition,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 8386–8397, Aug. 2021, doi: 10.1109/ JSTARS.2021.3104267. S. Chen, R. Zhan, W. Wang, and J. Zhang, “Learning slimming SAR ship object detector through network pruning and knowledge distillation,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 1267–1282, 2021, doi: 10.1109/ JSTARS.2020.3041783. Y. Zhang, Z. Yan, X. Sun, W. Diao, K. Fu, and L. Wang, “Learning efficient and accurate detectors with dynamic knowledge distillation in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–19, 2022, doi: 10.1109/ TGRS.2021.3130443. Y. Yang et al., “Adaptive knowledge distillation for lightweight remote sensing object detectors optimizing,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–15, May 2022, doi: 10.1109/ TGRS.2022.3175213. C. Li, G. Cheng, G. Wang, P. Zhou, and J. Han, “Instanceaware distillation for efficient object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–11, Jan. 2023, doi: 10.1109/TGRS.2023.3238801. GRS IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023 Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.