Remote Sensing Object
Detection Meets Deep Learning
A metareview of challenges and advances
XIANGRONG ZHANG , TIANYANG ZHANG , GUANCHUN WANG ,
PENG ZHU , XU TANG , XIUPING JIA , AND LICHENG JIAO
©SHUTTERSTOCK.COM/TIERNEYMJ
R
emote sensing object detection (RSOD), one of the
most fundamental and challenging tasks in the remote
sensing field, has received long-standing attention. In recent years, deep learning techniques have demonstrated
robust feature representation capabilities and led to a big
leap in the development of RSOD techniques. In this era
of rapid technical evolution, this article aims to present a
comprehensive review of the recent achievements in deep
Digital Object Identifier 10.1109/MGRS.2023.3312347
Date of current version: 24 October 2023
8
learning-based RSOD methods. More than 300 papers are
covered in this review. We identify five main challenges in
RSOD, including multiscale object detection, rotated object detection, weak object detection, tiny object detection,
and object detection with limited supervision, and systematically review the corresponding methods developed in a
hierarchical division manner. We also review the widely
used benchmark datasets and evaluation metrics within
the field of RSOD as well as the application scenarios for
RSOD. Future research directions are provided for further
promoting the research in RSOD.
2473-2397/23©2023IEEE
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
INTRODUCTION
With the rapid advances in Earth observation technology,
remote sensing satellites (e.g., Google Earth [1], WorldView-3 [2], and Gaofen-series satellites [3], [4], [5]) have
made significant improvements in spatial, temporal, and
spectral resolutions, and a massive number of remote sensing images (RSIs) are now accessible. Benefiting from the
dramatic increase in available RSIs, human beings have entered an era of remote sensing big data, and the automatic
interpretation of RSIs has become an active and challenging topic [6], [7], [8].
RSOD aims to determine whether or not objects of
interest exist in a given RSI and to return the category and
position of each predicted object. The term “object” in this
survey refers to man-made or highly structured objects
(such as airplanes, vehicles, and ships) rather than unstructured scene objects (e.g., land, the sky, and grass). As the
cornerstone of the automatic interpretation of RSIs, RSOD
has received significant attention.
In general, RSIs are taken at an overhead viewpoint
with different ground sampling distances (GSDs) and cover
widespread regions of Earth’s surface. As a result, geospatial objects exhibit more significant diversity in scale, angle,
and appearance. Based on the characteristics of geospatial
objects in RSIs, we summarize the major challenges of
RSOD in the following five aspects:
1) Huge scale variations: On the one hand, there are generally massive scale variations across different categories
of objects, as illustrated in Figure 1(b): a vehicle may
be as small as a 10-pixel area, while an airplane can be
20 times larger than the vehicle. On the other hand,
intracategory objects also show a wide range of scales.
Therefore, detection models must handle both largescale and small-scale objects.
2) Arbitrary orientations: The unique overhead viewpoint
leads to geospatial objects often being distributed with
arbitrary orientations, as shown in Figure 1(c). This rotated object detection task exacerbates the challenge of
RSOD, making it important for the detector to be perceptive of orientation.
3) Weak feature responses: Generally, RSIs have a complex
context and massive amounts of background noise. As
depicted in Figure 1(a), some vehicles are obscured by
shadows, and the surrounding background noises tend
to have a similar appearance to vehicles. This intricate
interference may overwhelm the objects of interest and
deteriorate their feature representation, which results in
the objects of interest being presented as weak feature
responses [9].
4) Tiny objects: As shown in Figure 1(d), tiny objects tend to
exhibit extremely small scales and limited appearance information, resulting in a poor-quality feature representation. In addition, the prevailing detection paradigms inevitably weaken or even discard the representation of tiny
objects [10]. These problems in tiny object detection bring
new difficulties to existing detection methods.
5) Expensive annotation: The complex characteristics of geospatial objects in terms of scale and angle, as well as
the expert knowledge required for fine-grained annotations [11], make accurate box-level annotations of RSIs
a time-consuming and labor-intensive task. However,
the current deep learning-based detectors rely heavily
on abundant well-labeled data to reach performance
saturation. Therefore, RSOD methods that are efficient
when there is in a lack of sufficient supervised information remain challenging.
To tackle these challenges, numerous RSOD methods
have emerged in the past two decades. At the early stage,
researchers adopted template matching [12], [13], [14] and
prior knowledge [15], [16], [17] for object detection in remote sensing scenes. These early methods relied more on
handcrafted templates or prior knowledge, leading to unstable results. Later, machine learning approaches [18], [19],
[20], [21] became mainstream in RSOD, and they view object detection as a classification task. Concretely, the machine learning model first searches a set of object proposals
from the input image and extracts the texture, context, and
other features of these object proposals. Then, it employs
an independent classifier to identify the object categories
in these object proposals. However, shallow learning-based
features from the machine learning approaches significantly restrict the representations of objects, especially in more
challenging scenarios. Besides, the machine learning-based
object detection methods cannot be trained in an end-toend manner, which is no longer applicable in the era of remote sensing big data.
Recently, deep learning techniques [22] have demonstrated powerful feature representation capabilities from
massive amounts of data, and the state-of-the-art detectors
[23], [24], [25], [26] in computer vision achieve an object
detection ability that rivals that of humans [27]. Drawing
on the advanced progress of deep learning techniques, various deep learning-based methods have dominated RSOD
and led to remarkable breakthroughs in detection performance. Compared to the traditional methods, deep neural network architectures can extract high-level semantic
features and obtain much more robust feature representations of objects. In addition, efficient end-to-end training
and automated feature extraction make the deep learningbased object detection methods more suitable for RSOD in
the remote sensing big data era.
Along with the prevalence of RSOD, a number of geospatial object detection surveys [9], [28], [29], [30], [31],
[32], [33], [34] have been published in recent years. For example, Cheng et al. [29] reviewed the early development
of RSOD. Han et al. [9] focused on small and weak object
detection in RSIs. In [30], the authors reviewed airplane
detection methods. Li et al. [31] conducted a thorough
sur vey on deep learning-based detectors in the remote
sensing community according to various improvement
strategies. Besides, some work [28], [33], [34] mainly focused
on publishing novel benchmark datasets for RSOD and
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
9
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
10
Weak Feature
Response
Huge Scale
Variations
Obscured
by Shadows
Similar
Appearance
Arbitrary
Orientations
Tiny Objects
Intracategory
Scale Variations
Intercategory
Scale Variations
Extremely Small Size
(Fewer Than 10 × 10 Pixels)
(a)
(b)
(c)
(d)
FIGURE 1. Typical RSIs. (a) Complex context and massive amounts of background noise lead to weak feature responses of objects. (b) Huge scale variations exist in both inter- and intracategory
objects. (c) Objects are distributed with arbitrary orientations. (d) Tiny objects tend to exhibit extremely small scales.
briefly reviewed object detection methods in the field of
remote sensing. Compared with previous works, this article
provides a comprehensive analysis of the major challenges
in RSOD based on the characteristics of geospatial objects
and systematically categorizes and summarizes the deep
learning-based remote sensing object detectors according
to these challenges. Moreover, more than 300 papers on
RSOD are reviewed in this work, leading to a more comprehensive and systematic survey.
Figure 2 provides a taxonomy of the object detection
methods in this review. According to the major challenges
in RSOD, we divide the current deep learning-based RSOD
methods into five main categories: multiscale object detection, rotated object detection, weak object detection, tiny
object detection, and object detection with limited supervision. In each category, we further examine subcategories
based on improvement strategies or learning paradigms
designed for category-specific challenges. For multiscale
object detection, we mainly review the three widely used
methods: data augmentation, multiscale feature representation, and high-quality multiscale anchor generation.
With regard to rotated object detection, we mainly focus on
rotated bounding box representation and rotation-insensitive feature learning. For weak object detection, we divide it
into two classes: background noise suppression and related
context mining. As for tiny object detection, we divide it
into three streams: discriminative feature extraction, superresolution reconstruction, and improved detection metrics.
According to the learning paradigms, we divide object detection with limited supervision into weakly supervised
object detection (WSOD), semisupervised object detection
(SSOD), and few-shot object detection (FSOD). Notably,
there are still detailed divisions in each subcategory, as
shown in the rounded rectangles in Figure 2. This hierarchical division provides a systematic review and summarization of existing methods. It helps researchers understand
• Multilayer Feature Integration
• Pyramidal Feature Hierarchy
• Feature Pyramid Network
Multiscale
Representation
• Predefine
• Adaptive Learning
• Multiscale Image Pyramids
• Modern Augmentation
Data
Augmentation
Multiscale
Object
Detection
Rotated
• Five Parameters
Object
• Eight Parameters
Representation
• Angle Classification
• Gaussian Distribution
• Others
• Implicit Learning
• Explicit Learning
Background
Noise
Suppression
Huge Scale
Variations
Rotated
Object
Detection
RotationInvariant
Feature
Learning
Weak
Object
Detection
RSOD
Weak
Response
Arbitrary
Orientations
Tiny
Objects
• Multiscale Feature Learning
• Context Mining
Discriminative
Feature
Learning
• Image-Level
Superresolution
• Feature-Level
Superresolution
Multiscale
Anchor
Generation
Superresolution
Expensive
Annotation
Tiny
Object
Detection
Improved
Detection
Metrics
Weakly
Supervised
• Global Context
• Local Context
• Local and Global
Related
Context
Mining
Limited
Supervision
Object
Detection
Few-Shot
Learning
• Metalearning
Semisupervised • Transfer Learning
FIGURE 2. The structured taxonomy of the deep learning-based RSOD methods in this article. A hierarchical division is adopted to describe
each subcategory.
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
11
RSOD more comprehensively and facilitate further progress, which is the main purpose of this review.
In summary, the main contributions of this article are
as follows:
◗ We comprehensively analyze the major challenges in
RSOD based on the characteristics of geospatial objects, including huge scale variations, arbitrary orienta-
Pixel Area
105
104
103
102
Plane
Baseball Diamond
Bridge
Ground Track Field
Small Vehicle
Large Vehicle
Ship
Tennis Court
Basketball Court
Storage Tank
Soccer-Ball Field
Roundabout
Harbor
Swimming Pool
Helicopter
Container Crane
Helipad
Airport
101
FIGURE 3. The scale variations for each category in the DOTA
v2.0 dataset. Huge scale variations exist in both the inter- and
intracategories.
tions, weak feature responses, tiny objects, and expensive annotations.
◗ We systematically summarize the deep learning-based
object detectors in the remote sensing community and
categorize them in a hierarchical manner according to
their motivation.
◗ We present a forward-looking discussion of future research directions for RSOD to motivate the further progress of RSOD.
MULTISCALE OBJECT DETECTION
Due to the different spatial resolutions among RSIs, huge
scale variation is a notoriously challenging problem in
RSOD and seriously degrades the detection performance.
As depicted in Figure 3, we present the distribution of object pixel areas for each category in the DOTA v2.0 dataset
[33]. Obviously, the scales vary greatly among categories,
in which a small vehicle may have an area less than 10 pixels, while an airport exceeds a 105 pixel area. Worse still,
the huge intracategory scale variations further exacerbate
the difficulties of multiscale object detection. To tackle the
huge scale variation problem, current studies are mainly
divided into data augmentation, multiscale feature representation, and multiscale anchor generation. Figure 4 gives
a brief summary of multiscale object detection methods.
Multiscale Feature Integration
Integrating Multilayer Features
into a Single Layer for Prediction
Data Augmentation
Adopting Data Augmentation
Strategies
Multiscale
Object
Detection
Multiscale Feature Representation
Using Multiscale Feature
Representation to Replace Single Scale
Pyramidal Feature Hierarchy
Employing Multilayer Features
for Independent Prediction
FPNs
Constructing Rich Semantic Features
at All Levels for Prediction
Predefine
Multiscale Anchor Generation
Generating Multiscale Anchors to
Match Objects at Different Scales
Setting Multiscale Anchors With
Different Scales and Aspect Ratios
Adaptive Learning
Learning Scale-Adaptive Anchors
During the Training Phase
FIGURE 4. A brief summary of multiscale object detection methods.
12
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
hierarchy. (d) FPNs. (e) Top-down and bottom-up. (f) Cross-scale feature balance.
(f)
(d)
(b)
Prediction
Top-Down Pathway
Prediction
Prediction
Prediction
Prediction
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
FIGURE 5. Single-scale feature representation and six paradigms for multiscale feature representation. (a) Single-scale feature representation. (b) Multiscale feature integration. (c) Pyramidal feature
Prediction
Prediction
Prediction
CrossScale
Balance
Prediction
Prediction
Prediction
Prediction
Prediction
Bottom-Up Pathway
Prediction
Prediction
(e)
Prediction
Prediction
(c)
(a)
MULTISCALE FEATURE INTEGRATION
Convolutional neural networks (CNN) usually adopt a
deep hierarchical structure, where different level features
have different characteristics. The shallow-level features
usually contain fine-grained features (e.g., points, edges,
and textures of objects) and provide detailed spatial location information, which is more suitable for object localization. In contrast, features from higher-level layers show
stronger semantic information and present discriminative
information for object classification. To combine the information from different layers and generate a multiscale representation, some researchers introduced multilayer feature
integration methods that integrate features from multiple
layers into a single feature map and perform the detection
on this rebuilt feature map [45], [46], [47], [48], [49], [50],
[51], [52]. Figure 5(b) exhibits the structure of multilayer
feature integration methods.
Zhang et al. [48] designed a hierarchical robust CNN to
extract hierarchical spatial semantic information by fusing
multiscale convolutional features from three different layers
Prediction
MULTISCALE FEATURE REPRESENTATION
Early studies in RSOD usually utilized the last single feature map of the backbone to detect objects, as illustrated
in Figure 5(a). However, such single-scale feature map prediction limits the detector’s ability to handle objects with a
wide range of scales [42], [43], [44]. Consequently, multiscale feature representation methods have been proposed
and become an effective solution to the huge object scale
variation problem in RSOD. The current multiscale feature representation methods are mainly divided into three
streams: multiscale feature integration, pyramidal feature
hierarchy, and FPNs.
Integration
DATA AUGMENTATION
Data augmentation is a simple yet widely applied approach
for increasing dataset diversity. As for the scale variation
problem in multiscale object detection, image scaling is a
straightforward and effective augmentation method. Zhao
et al. [35] fed multiscale image pyramids into multiple networks and fused the output features from these networks to
generate multiscale feature representations. In [36], Azimi
et al. proposed a combined image cascade and feature pyramid network (FPN) to extract object features on various
scales. Although image pyramids can effectively increase
the detection performance for multiscale objects, the inference time and computational complexity are severely
increased. To tackle this problem, Shamsolmoali et al. [37]
designed a lightweight image pyramid module (LIPM). The
proposed LIPM receives multiple downsampling images to
generate multiscale feature maps and fuses the output multiscale feature maps with the corresponding scale feature
maps from the backbone. Moreover, some modern data
augmentation methods (e.g., Mosaic and Stitcher [38]) also
show remarkable effectiveness in multiscale object detection, especially for small objects [39], [40], [41].
13
and introduced multiple fully connected layers to enhance
the rotation and scaling robustness of the network. Considering the different norms among multilayer features, Lin
et al. [49] applied an l2 normalization for each feature before integration to maintain stability in the network training
stage. Unlike previous multiscale feature integration at the
level of the convolutional layer, Zheng et al. [51] designed
HyBlock to build multiscale feature representation at the
intralayer level. HyBlock employs the atrous separable convolution with pyramidal receptive fields to learn the hyperscale features, alleviating the scale-variation issue in RSOD.
PYRAMIDAL FEATURE HIERARCHY
The key insight behind the pyramidal feature hierarchy is
that the features in different layers can encode object information from different scales. For instance, small objects are
more likely to appear in shallow layers, while large objects
tend to exist in deep layers. Therefore, the pyramidal feature hierarchy employs multiple-layer features for independent prediction to detect objects with a wide scale range, as
demonstrated in Figure 5(c). The single-shot multibox detector (SSD) [53] is a typical representative of the pyramidal
feature hierarchy, which has a wide range of extended applications in both natural scenes [54], [55], [56] and remote
sensing scenes [57], [58], [59], [60], [61], [62], [63].
To improve the detection performance for small vehicles, Liang et al. [60] added an extra scaling branch to the
SSD, consisting of a deconvolution module and an average
pooling layer. Referring to hierarchical regression layers
in the SSD, Wang et al. [58] introduced scale-invariant regression layers (SIRLs), where three isolated regression layers are employed to capture the information of full-scale
objects. Based on SIRLs, a novel specific scale joint loss
is introduced to accelerate network convergence. In [64],
Li et al. proposed the HSF-Net that introduces the hierarchical selective filtering layer in both the region proposal
network (RPN) and detection subnetwork. Specifically, the
hierarchical selective filtering layer employs three convolutional layers with different kernel sizes (e.g., 1 # 1, 3 # 3,
and 5 # 5) to obtain multiple receptive field features, which
benefits multiscale ship detection.
FEATURE PYRAMID NETWORKS
Pyramidal feature hierarchy methods use independent
multilevel features for detection and ignore the complementary information among features at different levels,
resulting in weak semantic information for low-level features. To tackle this problem, Lin et al. [65] proposed the
FPN. As explained in Figure 5(d), the FPN introduces a
top-down pathway to transfer rich semantic information
from high-level features to shallow-level features, leading
to rich semantic features at all levels (please refer to the
details in [65]). Thanks to the significant improvement of
the FPN for multiscale object detection, the FPN and its extensions [66], [67], [68] play a dominant role in multiscale
feature representation.
14
Considering the extreme aspect ratios of geospatial objects (e.g., bridges, harbors, and airports), Hou et al. [69]
proposed an asymmetric FPN (AFPN). The AFPN adopts
the asymmetric convolution block to enhance the feature
representation regarding the cross-shaped skeleton and
improve the performance of large-aspect-ratio objects.
Zhang et al. [70] designed a Laplacian FPN to inject highfrequency information into the multiscale pyramidal feature representation, which is useful for accurate object
detection but has been ignored by previous work. In [71],
Zhang et al. introduced the high-resolution FPN to fully
leverage high-resolution feature representations, leading to
precise and robust synthetic aperture radar (SAR) ship detection. In addition, some researchers integrated the novel
feature fusion module [72], [73], attention machine [74],
[75], [76], [77], or dilation convolution layer [78], [79] into
the FPN to further obtain a more discriminative multiscale
feature representation.
The FPN introduces a top-down pathway to transfer
high-level semantic information into the shallow layers, while the low-level spatial information is still lost in
the top layers after the long-distance propagation in the
backbone. Drawing on this problem, Fu et al. [80] proposed a feature fusion architecture (FFA) that integrates
an auxiliary bottom-up pathway into the FPN structure to
transfer the low-level spatial information to the top-layer
features via a short path, as depicted in Figure 5(e). The
FFA ensures that the detector extracts multiscale feature
pyramids with rich semantic and detailed spatial information. Similarly, in [81] and [82], the authors introduced
a bidirectional FPN that learns the importance of different level features through learnable parameters and fuses
the multilevel features through iteratively top-down and
bottom-up pathways.
Differing from the preceding sequential enhancement
pathway [80], some studies [83], [84], [85], [86], [87],
[88], [89], [90], [91], [92], [93], [94] adopt a cross-level feature fusion manner. As shown in Figure 5(f), the crosslevel feature fusion methods fully collect the features at
all levels to adaptively obtain balanced feature maps.
Cheng et al. [83] utilized feature concatenation to achieve
cross-scale feature fusion. Considering that features from
different levels should have different contributions to the
feature fusion, Fu et al. [84] proposed level-based attention to learn the unique contribution of features from
each level. Thanks to the powerful global information extraction ability of the transformer structure, some work
[88], [89] introduced transformer structures to integrate
and refine multilevel features. In [90], Chen et al. presented a cascading attention network in which position supervision is introduced to enhance the semantic information
of multilevel features.
MULTISCALE ANCHOR GENERATION
Apart from data augmentation and multiscale feature representation methods, multiscale anchor generation can also
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
tackle the huge object scale variation problem in RSOD.
Due to the difference in the scale range of objects in natural and remote sensing scenes, some studies [95], [96], [97],
[98], [99], [100], [101], [102], [103], [104] modify the anchor
settings in common object detection to better cover the
scales of geospatial objects.
Guo et al. [95] injected extra anchors with more scales
and aspect ratios into the detector for multiscale object
detection. Dong et al. [98] designed more suitable anchor
scales based on statistics of the object scales in the training
set. Qiu et al. [99] extended the original square region of
interest (ROI) features into vertical, square, and horizontal
ROI features and fused these ROI features to represent objects with different aspect ratios in a more flexible way. The
preceding methods follow a fixed anchor setting, while current studies [100], [101], [102], [103], [104] have attempted
to dynamically learn the anchor during the training phase.
Considering the aspect ratio variations among different categories, Hou et al. [100] devised a novel self-adaptive aspect ratio anchor (SARA) to adaptively learn an appropriate
aspect ratio for each category. SARA embeds the learnable
category-wise aspect ratio values into the regression branch
to adaptively update the aspect ratio of each category with
the gradient of the location regression loss. Inspired by the
guided anchoring RPN [105], some researchers [102], [103],
[104] introduced a lightweight subnetwork into the detector to adaptively learn the location and shape information
of anchors.
ROTATED OBJECT DETECTION
The arbitrary orientation of objects is another major challenge in RSOD. Since the objects in RSIs are acquired from
a bird’s-eye view, they exhibit the property of arbitrary
orientations, so the widely used horizontal bounding box
(HBB) representation in generic object detection is insufficient to locate rotated objects accurately. Therefore, numerous researchers have focused on the arbitrary orientation
property of geospatial objects, which can be summarized
into rotated object representation and rotation-invariant
feature learning. A brief summary of rotated object detection methods is provided in Figure 6.
ROTATED OBJECT REPRESENTATION
Rotated object representation is essential for RSOD to avoid
redundant backgrounds and obtain precise detection results. Recent rotated object representation methods can be
mainly summarized into several categories: five-parameter
representation [107], [108], [109], [110], [111], [112], [113],
[114], [115], [116], eight-parameter representation [117],
[118], [119], [120], [121], [122], [123], [124], [125], [126],
Five-Parameter
Five Parameters Regression
Rotated Object Representation
Eight Parameters
Employing Various Representations
to Model Rotated Objects
Four-Vertex Regression
Rotated
Object
Detection
Angle Classification
Transforming Angle Regression
to Angle Classification
Gaussian Distribution
Rotation-Invariant Feature Learning
Encoding Rotation-Invariant
Information into the Detector
Representing the Rotated Objects
With Gaussian Distribution
Others
Segmentation-Based Method
Keypoint-Based Method
FIGURE 6. A brief summary of rotated object detection methods.
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
15
b b
a
a
W
int
Po
ter
n
t
Ce
igh
He
θ ∈ [–90,0)
h
idt
nte
Ce
t
igh
He
t
oin
rP
dth
θ ∈ [–90,0)
Wi
x-Axis
c
x-Axis
c
d
d
x-Axis
a
b
b
[115], [116] refers to i as the angle
between the x-axis and the long side,
whose range is 180°, as in Fig ure 7(b).
Ding et al. [114] regressed rotation
angles by five-parameter methods
and transformed the features of horizontal regions into rotated ones to
facilitate rotated object detection.
EIGHT PARAMETERS
Differing from five-parameter methigh
c
Ce
t
ods, eight-parameter methods [117],
nte
rP
[118], [119], [120], [121], [122], [123],
oin
t
h
[124], [125], [126] solve the issue of
idt θ ∈ [–90,0)
a
W
rotated object representation by
x-Axis
d
d c
directly regressing four vertices,
(a)
(b)
(c)
"^a x, a y h, ^ b x, b y h, ^c x, c y h, ^d x , d y h,, as
described in Figure 7(c). Xia et al.
FIGURE 7. A visualization of the five-parameter representation and eight-parameter rep[117] first adopted the eight-parameresentation methods for rotated objects [106]. (a) Five-parameter representation with 90°
ter method for rotated object repreangular range. (b) Five-parameter representation with 180° angular range. (c) Eight-parameter sentation, which directly supervises
representation.
the detection model by minimizing
the difference between each vertex
angle classif ication representation [106], [127], [128],
and the ground truth coordinates during training. How[129], Gaussian distribution representation [130], [131],
ever, the sequence order of these vertices is essential for the
[132], [133], and others [134], [135], [136], [137], [138], [139],
eight-parameter method to avoid unstable training. As evi[140], [141], [142], [143], [144].
dent in Figure 8, it is intuitive that regressing targets from
the red dotted arrow is an easier route, but the actual proFIVE PARAMETERS
cess follows the red solid arrows, which causes the difficulty
The most popular solution is representing objects with a
of model training. To this end, Qian et al. [119], [121] profive-parameter method (x, y, w, h, i), which simply adds an
posed a modulated loss function that calculates the losses
extra rotation angle parameter i on the HBB [107], [108],
under different sorted orders and selects the minimum case
[109], [110], [111], [112], [113], [114], [115]. The definition
to learn, efficiently improving the detection performance.
of the angular range plays a crucial role in such methods,
where two kinds of definitions are derived. Some studies
ANGLE CLASSIFICATION
[107], [108], [109], [110], [111], [112] define i as the acute
To address the issue present in Figure 8, many researchers
angle to the x-axis and restrict the angular range to 90°, as in
[106], [127], [128], [129] take a detour from the boundary
Figure 7(a). As the most representative work, Yang et al. [107]
challenge of regression by transforming the angle predicfollowed the five-parameter method to detect rotated obtion problem into an angle classification task. Yang et al.
jects and designed an intersection-over-union (IOU)-aware
[106] proposed the first angle classification method
loss function to tackle the boundary discontinuity problem
for rotated object detection, which converts the continuof rotation angles. Another group of works [113], [114],
ous angle into a discrete kind and trains the model with
novel circular smooth labels. However, the angle classification head [106] introduces additional parameters and
b
a
b
w h
Step 1
degrades the detector’s efficiency. To overcome this, Yang
w
et al. [129] improved the work in [106] with a densely
w
coded label that ensures both the accuracy and efficiency
c
h
h
X
of the model.
Wi
dth
t
igh
He
t
oin
rP
nte
Ce
He
θ ∈ [–90,0)
X
Step 2 a
θ
Y
(a)
d
d
Y
c
(b)
FIGURE 8. The boundary discontinuity challenge of the (a) five-
parameter method and (b) eight-parameter method [119], [121].
16
GAUSSIAN DISTRIBUTION
Although the preceding methods achieve promising progress, they do not consider the misalignment between the actual detection performance and optimization metric. Most
recently, a series of works [130], [131], [132], [133] aim to
handle this challenge by representing rotated objects with a
Gaussian distribution, as detailed in Figure 9. Specifically,
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
Gaussian Distribution
150
100
x = 50
100
N(m, Σ )
50
50
0
0
–50
–50
B(x, y, w, h, θ)
0
0
20
0
10
20
–150
0
–150
15
–100
15
–100
0
0
10
y = 10
10
0
0
ROTATION-INVARIANT FEATURE LEARNING
Rotation-invariant features remain consistent under any
rotation transformations. Thus, rotation-invariant feature
learning of objects is a crucial research field to tackle the arbitrary orientation problem in rotated object detection. To
this purpose, many researchers proposed a series of methods for learning the rotational invariance of objects [146],
[147], [148], [149], [150], [151], [152], [153], [154], [155],
[156], [157], which significantly improves rotated object
detection in RSIs.
Cheng et al. [146] proposed the first rotation-invariant
object detector to precisely recognize objects by using
150
10
OTHERS
Some researchers solve the rotated object representation by other approaches, such as segmentation based
[134], [135], [136] and keypoint based [137], [138], [139],
[140], [141], [142], [143], [144]. The representative one in
segmentation-based methods is Mask OBB [134], which
deploys the segmentation method on each horizontal
proposal to obtain the pixel-level object region and produce the minimum external rectangle as a rotated bounding box. On the other side, Wei et al. [142] adopted a
keypoint-based representation for rotated objects, which
locates the object center and leverages a pair of middle
lines to represent the whole object. In addition, Yang
et al. [145] proposed the first rotated object detector supervised by horizontal box annotations, which adopts
the self-supervised learning of two different views to predict the angles of rotated objects.
WEAK OBJECT DETECTION
Objects of interest in RSIs are typically embedded in complex scenes with intricate object spatial patterns and massive amounts of background noise. The complex context
and background noise severely harm the feature representation of objects of interest, resulting in weak feature responses to objects of interest. Thus, many existing works
have concentrated on improving the feature representation of objects of interest, which can be grouped into two
streams: suppressing background noise and mining related
context information. A brief summary of weak object detection methods is given in Figure 10.
0
where R represents the rotation matrix and K represents
the diagonal matrix of the eigenvalues. With the Gaussian distribution representation in (1), the IOU between
two rotated objects can be simplified as a distance estimation between two distributions. Besides, the Gaussian distribution representation discards the definition of
the angular boundary and effectively solves the angular
boundary problem. Yang et al. [130] proposed a novel
metric with a Gaussian Wasserstein distance for measuring the distance between distributions, which achieves
remarkable performance by efficiently approximating the
rotation IOU. Based on this, Yang et al. [131] introduced a
Kullback-Leibler divergence (KLD) metric to enhance the
scale invariance.
50
(1)
0
cos i - sin i
m
sin i cos i
50
w
0
cos i sin i
f 2 h p c - sin i cos i m
0 2
N
Jw
h
w-h
2
2
K 2 cos i + 2 sin i
2 cos i sin i O
=K w - h
O
w
h
K
cos i sin i 2 sin 2 i + 2 cos 2 iO
2
P
L
n = (x, y) <
=c
–2
00
–1
50
–1
00
–5
0
R 1/2 = RKR <
rotation-insensitive features, which forces the features
of objects to be consistent at different rotation angles.
Later, Cheng et al. [148], [149] employed the rotationinvariant and Fisher discrimination regularizers to encourage the detector to learn both rotation-invariant
and discriminative features. In [150] and [151], Wu et al.
analyzed object rotation invariance under polar coordinates in the Fourier domain and designed a spatial
frequency channel feature extraction module to obtain
rotation-invariant features. Considering misalignment
between axis-aligned convolutional features and rotated
objects, Han et al. [156] proposed an oriented detection
module that adopts a novel alignment convolution operation to learn the orientation information. In [155],
Han et al. further devised a rotation-equivariant detector to explicitly encode rotation equivariance and rotation invariance. Besides, some researchers [80], [157]
extended the RPN with a series of predefined rotated
anchors to cope with the arbitrary orientation characteristics of geospatial objects.
We summarize the detection performance of milestone
rotated object detection methods in Table 1.
–2
00
–1
50
–1
00
–5
0
these methods convert rotated objects into a 2D Gaussian
distribution N ^n, Rh, as follows:
FIGURE 9. A visualization of the Gaussian distribution representation methods for rotated objects [130].
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
17
SUPPRESSING BACKGROUND NOISE
This type of method aims to strengthen the weak response
of the object region in the feature map by weakening the response of background regions. It can be mainly divided into
two categories: implicit learning and explicit supervision.
TABLE 1. THE PERFORMANCE OF ROTATED OBJECT DETECTION
METHODS ON THE DOTA V1.0 DATASET WITH ROTATED
ANNOTATIONS.
MODEL
BACKBONE
METHOD
mAP (%)
SCRDet [107]
R-101-FPN
Five parameters
72.61
O2Det [142]
H-104
Keypoint based
72.8
R 3Det [108]
R-101-FPN
Five parameters
73.79
S2 Anet [156]
R-50-FPN
Rotation-invariant
feature
74.12
RoI Transformer [114]
R-50-FPN
Five parameters
74.61
Mask OBB [134]
R-50-FPN
Segmentation based
74.86
Gliding Vertex [120]
R-101-FPN
Four vertices
75.02
DCL [128]
R-152-FPN
Angle classification
75.54
ReDet [155]
ReR50-ReFPN
Rotation-invariant
feature
76.25
Oriented R-CNN [124] R-101-FPN
Four vertices
76.28
R 3Det-KLD [131]
Gaussian distribution
77.36
R-50-FPN
mAP: mean of the average precision.
IMPLICIT LEARNING
Implicit learning methods employ carefully designed modules in the detector to adaptively learn important features
and suppress redundant features during the training phase,
thereby reducing background noise interference.
In machine learning, dimensionality reduction can
effectively learn compact feature representation and suppress irrelevant features. Drawing on the preceding property, Ye et al. [158] proposed a feature filtration module that
captures low-dimensional feature maps by consecutive
bottleneck layers to filter background noise interference.
Inspired by the selective focus of human visual perception,
the attention mechanism has been proposed and heavily
researched [159], [160], [161]. The attention mechanism
redistributes the feature importance during the network
learning phase to enhance important features and suppress redundant information. Consequently, the attention
mechanism has also been widely introduced in RSOD to
tackle the background noise interference problem [57],
[162], [163], [164], [165], [166], [167], [168], [169], [170].
In [162], Huang et al. emphasized the importance of
patch-patch dependencies for RSOD and designed a novel
nonlocal perceptual pyramidal attention (NP-Attention).
NP-Attention learns spatial multiscale nonlocal dependencies and channel dependencies to enable the detector
Implicit Learning
Learning Important Features
Background Noise Suppression
Suppressing Background Noise to
Enhance Object Feature Response
Weak
Object
Detection
Explicit Supervision
Employing Saliency Supervision
Local Context Mining
Capturing Correlations Between
Objects and Surroundings
Global Context Mining
Related Context Mining
Leveraging the Related Context as
Auxiliary Feature Representation
Exploiting Associations Between
Objects and Global Scene Semantics
Local and Global Context Mining
Combined Local and Global Context
FIGURE 10. A brief summary of weak object detection methods.
18
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
to concentrate on the object region rather than the background. Considering the strong scattering interference of
the land area in SAR images, Sun et al. [163] presented
a ship attention module to highlight the feature representation of ships and reduce false alarms from the land
area. Moreover, a series of attention mechanisms devised
for RSOD (e.g., spatial shuffle group enhanced attention
[165], multiscale spatial and channel-wise attention [166],
discrete wavelet multiscale attention [167], and so on)
have demonstrated their effectiveness in suppressing background noise.
the feature representation of the foreground regions in the
refined stage.
MINING RELATED CONTEXT INFORMATION
Context information typically refers to the spatial and semantic relations between an object and the surrounding
environment or scene. This context information can provide auxiliary feature representations to the object that
could not be clearly distinguished. Thus, mining context
information can effectively solve the weak feature responses problem in RSOD. According to the category of context
information, existing methods are mainly classified into
local and global context information mining.
EXPLICIT SUPERVISION
Unlike implicit learning methods, the explicit supervision
approach employs auxiliary saliency supervision information to explicitly guide the detector to highlight the foreground regions and weaken the background.
Li et al. [171] employed the region contrast method to
obtain the saliency map and construct the saliency feature
pyramid by fusing the multiscale feature maps with the saliency map. In [172], Lei et al. extracted the saliency map
via the saliency detection method [173] and proposed a
saliency reconstruction network. The saliency reconstruction network utilizes the saliency map as pixel-level supervision to guide the training of the detector to strengthen
saliency regions in feature maps. The preceding saliency
detection methods are typically unsupervised, and the
generated saliency map may contain nonobject regions,
as exhibited in Figure 11(b), providing inaccurate guidance to the detector. Therefore, later works [107], [134],
[174], [175], [176], [177], [178], [179], [180] transformed
box-level annotation into object-level saliency guidance
information [as shown in Figure 11(c)] to generate more
accurate saliency supervision. Yang et al. [107] designed
a pixel attention network that employs object-level saliency supervision to enhance object cues and weaken the
background information. In [175], Zhang et al. proposed
the FoRDet to exploit object-level saliency supervision in
a more concise way. Concretely, the proposed FoRDet leverages the prediction of foreground regions in the coarse
stage (supervised under box-level annotation) to enhance
(a)
LOCAL CONTEXT INFORMATION MINING
Local context information refers to the correlation between an object and its surrounding environment in visual information and spatial distribution [147], [181], [182],
[183], [184], [185], [186], [187]. Zhang et al. [181] generated multiple local context regions by scaling the original
region proposal into three different sizes and proposed a
contextual bidirectional enhancement module to fuse the
local context features with object features. The contextaware CNN [182] employed a context ROI mining layer
to extract context information about surrounding objects.
The context ROI for an object is first generated by merging a series of filtered proposals around the object and
then fused with the object ROI as the final object feature
representation for classification and regression. In [183],
Ma et al. exploit gated recurrent units to fuse object features with local context information, leading to a more
discriminative feature representation for the object. Graph
convolutional networks have recently shown better performance in object-object relationship reasoning. Hence,
Tian et al. [184], [185] constructed spatial and semantic
graphs to model and learn the contextual relationships
among objects.
GLOBAL CONTEXT INFORMATION MINING
Global context information exploits the association between an object and the scene [188], [189], [190], [191],
(b)
(c)
FIGURE 11. The (a) input image, (b) saliency map generated by the saliency detection method [173], and (c) object-level saliency map.
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
19
[192], [193], [194], [195]; e.g., vehicles are generally located
on roads and ships typically appear at sea. Chen et al. [188]
extracted the scene context information from the global
image feature with the ROI align operation and fused it
with the object-level ROI features to strengthen the relationship between the object and the scene. Liu et al. [192]
designed a scene auxiliary detection head that exploits the
scene context information under scene-level supervision.
The scene auxiliary detection head embeds the predicted
scene vector into the classification branch to fuse objectlevel features with scene-level context information. In
[193], Tao et al. presented a scene context-driven vehicle
detection approach. Specifically, a pretrained scene classifier is introduced to classify each image patch into three
scene categories. Then, scene-specific vehicle detectors are
employed to achieve preliminary detection results, and finally, the detection results are further optimized with the
scene contextual information.
Considering the complementarity of local and global
context information, Zhang et al. [196] proposed CAD-Net
to mine both local and global context information. CADNet employed a pyramid local context network to learn object-level local context information and designed a global
context network to extract scene-level global context information. In [103], Teng et al. proposed GLNet to collect context information, from global to local, so as to achieve a
robust and accurate detector for RSIs. Besides, some studies
[197], [198], [199] also introduced atrous spatial pyramid
pooling [200] or a receptive field block module [54] to leverage both local and global context information.
TINY OBJECT DETECTION
The typical GSD for RSIs is 1–3 m, which means that even
large objects (e.g., airplanes, ships, and storage tanks) can
occupy fewer than 16 # 16 pixels. Besides, even in highresolution RSIs with a GSD of 0.25 m, a vehicle with a dimension of 3 # 1.5 m 2 covers only 72 pixels (12 # 6). This
prevalence of tiny objects in RSIs further increases the difficulty of RSOD. Current studies on tiny object detection
are mainly grouped into discriminative feature learning,
superresolution-based methods, and improved detection
metrics. The tiny object detection methods are briefly summarized in Figure 12.
DISCRIMINATIVE FEATURE LEARNING
The extremely small scales (less than 16 # 16 pixels) of tiny
objects result in limited appearance information, which poses serious challenges for detectors to learn the features of tiny
objects. To tackle the problem, many researchers focus on
improving the discriminative feature learning ability for tiny
objects [201], [202], [203], [204], [205], [206], [207], [208].
Since tiny objects mainly exist in shallow features and
lack high-level semantic information [65], some literature
[201], [202], [203] introduces top-down structures to fuse
Multiscale Feature Learning
Increasing the Semantic Information
for Tiny Objects
Discriminative Feature Learning
Learning Discriminative Features
for Tiny Objects
Context Mining
Mining Contextual Information
Related to Tiny Objects
Tiny
Object
Detection
Improved Detection Metrics
Employing Well-Designed Detection
Metrics for Tiny Objects
Image-Level Superresolution
Integrating The Superresolution Strategy
as a Preprocessing Step
Superresolution-Based Methods
Increasing Resolution to Promote
Tiny Object Detection
Feature-Level Superresolution
Applying the Superresolution Strategy
at Feature Level
FIGURE 12. A brief summary of tiny object detection methods.
20
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
high-level semantic information into shallow features
to strengthen the semantic information for tiny objects.
Considering the limited appearance information of tiny
objects, some studies [204], [205], [206], [207], [208] establish a connection between a tiny object and the surrounding contextual information through a self-attention
mechanism or dilated convolution to enhance the feature
discriminative of tiny objects. Notably, some previously
mentioned studies on multiscale feature learning and context information mining also demonstrate remarkable effectiveness in tiny object detection.
SUPERRESOLUTION-BASED METHOD
The extremely small scale is a crucial issue for tiny object detection, so increasing the resolution of images is an
intuitive solution to promote the detection performance
of tiny objects. Some methods [209], [210], [211], [212]
employ superresolution strategies as a preprocessing step
in the detection pipeline to enlarge the resolution of input
images. For example, Rabbi et al. [211] emphasized the importance of edge information for tiny object detection and
proposed an edge-enhanced superresolution generative
adversarial network (GAN) to generate visually pleasing
high-resolution RSIs with detailed edge information. Wu
et al. [212] developed a point-to-region detection framework for tiny objects. The point-to-region framework first
obtains the proposal regions with keypoint prediction
and then employs a multitask GAN to perform superresolution on the proposal regions and detect tiny objects
in these proposal regions. However, the high-resolution
image generated by superresolution brings extra computational complexity to the detection pipeline. Drawing
on this problem, [213] and [214] employ the superresolution strategy at the feature level to acquire discriminative
feature representation of tiny objects and effectively save
computational resources.
IMPROVED DETECTION METRICS FOR TINY OBJECTS
Unlike the first two types of methods, recent advanced
works [10], [215], [216], [217], [218], [219], [220], [221],
[222] assert that the current prevailing detection paradigms
are unsuitable for tiny object detection and inevitably hinder tiny object detection performance. Pang et al. [215] argued that excessive downsampling operations in modern
detectors lead to the loss of tiny objects on the feature map
and proposed a zoom-out/zoom-in structure to enlarge the
feature map. In [218], Yan et al. adjusted the IOU threshold
in the label assignment to increase the positive assigned
anchors for tiny objects, facilitating the learning of tiny
objects. Dong et al. [219] devised Sig-NMS to reduce the
suppression of tiny objects by large and medium objects in
traditional nonmaximum suppression (NMS).
In [10], Xu et al. pointed out that the IOU metric is unsuitable for tiny object detection. As shown in Figure 13, the
IOU metric is sensitive to slight location offsets. Besides,
IOU-based label assignment suffers from a severe scale imbalance problem, where tiny objects tend to be assigned
with insufficient positive samples. To solve these problems,
Xu et al. [10] designed a normalized Wasserstein distance
(NWD) to replace the IOU metric. The NWD models tiny
objects as 2D Gaussian distributions and utilizes the NWD
between Gaussian distributions to represent the location
relationship among tiny objects, as detailed in [10]. Compared with the IOU metric, the proposed NWD metric is
smooth to location deviations and has the characteristic of
scale balance, as depicted in Figure 13(b). In [222], Xu et al.
further proposed the receptive field distance for tiny object
detection and achieved state-of-the-art performance.
OBJECT DETECTION WITH LIMITED SUPERVISION
In recent years, the widely used deep learning-based detectors in RSIs have heavily relied on large-scale datasets
with high-quality annotations to achieve state-of-the-art
1
1
4
12
32
48
0.8
4
12
32
48
0.8
0.6
IOU
NWD
0.6
0.4
0.4
0.2
0.2
0
–30
–20
–10
10
0
Deviation
(a)
20
30
0
–30
–20
–10
10
0
Deviation
(b)
20
30
FIGURE 13. A comparison of the (a) IOU deviation curve and (b) normalized Wasserstein distance (NWD) deviation curve [10]. Please refer
to [10] for details.
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
21
WSOD
Image-Level or Point-Level Supervision
Object
Detection
With Limited
Supervision
Semisupervised Object Detection
Metalearning
A Small Part of Labeled Samples
With Abundant Unlabeled Samples
Simulating Few-Shot Learning Tasks
to Acquire Task-Level Knowledge
FSOD
Transfer Learning
A Limited Number of Labeled Samples
(No More Than 30)
Transferring Common Knowledge
to Few-Shot Novel Data
FIGURE 14. A brief summary of object detection methods with limited supervision.
performance. However, collecting volumes of well-labeled
data is considerably expensive and time-consuming (e.g.,
a bounding box annotation would cost about 10 s), which
leads to a data-limited or annotation-limited scenario in
RSOD [11]. This lack of sufficient supervised information seriously degrades detection performance. To tackle
this problem, researchers have explored various tasks in
RSOD with limited supervision. We summarize the previous research into three main types: WSOD, SSOD, and
FSOD. Figure 14 provides a brief summary of object detection methods with limited supervision.
bag; and y i is the weakly supervised information (e.g., image-level labels [223] or point-level labels [224]) of X i . Effectively transferring image-level supervision to object-level
labels is the key challenge in WSOD [225].
Han et al. [226] introduced a deep Boltzmann machine to
learn the high-level features of objects and proposed a weakly
supervised learning framework based on Bayesian principles
for remote sensing WSOD. Li et al. [227] exploited the mutual information between scene pairs to learn discriminative
convolutional weights and employed a multiscale category
activation map to locate geospatial objects.
Motivated by the remarkable performance of WSDDN
WEAKLY SUPERVISED OBJECT DETECTION
[228], a series of remote sensing WSOD methods [229],
Compared to fully supervised object detection, WSOD con[230], [231], [232], [233], [234], [235], [236], [237], [238],
tains only weakly supervised information. Formally, WSOD
[239], [240], [241] are proposed. As detailed in Figure 15, the
consists of a training data set D train = "^ X i, y ih,Ii = 1, where
paradigm of the current WSOD methods usually consists
X i = " x 1, ..., x mi , is a collection of training samples, termed
of two steps, first constructing a multiple-instance learning
the bag; m i is the total number of training samples in the
model to find contributing proposals to the image classification task as pseudolabels and then
utilizing them to train the detector.
Yao et al. [229] introduced a dynamic
MultipleClassification
curriculum learning strategy, where
Pseudolabels
Instance
the detector progressively improves
Loss
Learning
detection performance through
Instance
Image-Level
Selection
an easy-to-hard training process.
Annotations
Feng et al. [231] designed a progresMultiple Instance Learning Stage
Pseudolabels
sive contextual instance refinement
Object Detector Training Stage
method that suppresses low-quality
Training
Images
object parts and highlights the whole
object by leveraging surrounding
Prediction
Detector
context information. Wang et al.
[233] introduced a spatial and appearance relation graph into WSOD,
FIGURE 15. The two-step paradigm of recent WSOD methods [229], [230], [231], [232], [233],
which propagates high-quality label
[234], [235], [236], [237], [238], [239], [240], [241].
information to mine more possible
22
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
objects. In [240], Feng et al. argued that existing remote
sensing WSOD methods ignored the arbitrary orientations
of geospatial objects, resulting in rotation-sensitive object
detectors. To address this problem, Feng et al. [240] proposed RINet, which brings rotation-invariant yet diverse
feature learning to WSOD by employing rotation-invariant
learning and multiple-instance mining.
We summarize the performance of milestone WSOD
methods in Table 2, where the correct localization metric
[242] is adopted to evaluate the localization performance.
box-level labeled samples to boost performance in labelscarce instance segmentation.
FEW-SHOT OBJECT DETECTION
FSOD refers to detecting novel classes with only a limited
number (no more than 30) of samples. Generally, FSOD
contains a base-class dataset with abundant samples
D base = "^ x i, y ih, y i ! C base ,Ii base
= 1 and a novel-class dataset with
)K
only K-shot samples D novel = "^ x j, y j h, y j ! C novel ,Cj =novel
.
1
Note that C base and C novel are disjointed. As displayed in
Figure 17, a typical FSOD paradigm consists of a two-stage
training pipeline, where the base training stage establishes
SEMISUPERVISED OBJECT DETECTION
SSOD typically contains only a small portion (no more than
50%) of well-labeled samples D labeled = "^ x i, y ih,Ii labeled
= 1 , makTABLE 2. THE PERFORMANCE OF WSOD METHODS ON
ing it difficult to construct a reliable supervised detecTHE NWPU VHR-10 V2 AND DIOR DATASETS.
tor, a nd ha s a la rge nu mbe r of u n labe led sa mples
I unlabeled
^
h
"
,
=
D unlabeled
x j j = 1 . SSOD aims to improve detection
NWPU VHR-10
DIOR
performance under scarce supervised information by learnMODEL
CORLOC (%) MAP (%) CORLOC (%) MAP (%)
ing the latent information from volume unlabeled samples.
WSDDN [228]
35.24
35.12
32.44
13.26
Hou et al. [243] proposed SCLANet for semisupervised
DCL [229]
69.65
52.11
42.23
20.19
SAR ship detection. SCLANet employs adversarial learning
DPLG [230]
61.5
53.6
—
—
between labeled and unlabeled samples to exploit unlaPCIR [231]
71.87
54.97
46.12
24.92
beled sample information and adopts consistency learnMIGL [233]
70.16
55.95
46.8
25.11
ing for unlabeled samples to enhance the robustness of the
TCANet [234]
72.76
58.82
48.41
25.82
network. The pseudolabel generation mechanism is also a
SAENet [235]
73.46
60.72
49.42
27.1
widely used approach for SSOD [244], [245], [246], [247],
OS-DES [236]
73.68
61.49
49.92
27.52
SPG+MELM [239] 73.41
62.8
48.3
25.77
[248], and the typical paradigm is presented in Figure 16.
RINet
[240]
—
70.4
52.8
28.3
First, a pretrained detector learned from scare labeled samMOL
[241]
75.96
75.46
50.66
29.21
ples is used to predict unlabeled samples. Then, the pseudolabels with higher confidence scores are selected as the
CorLoc: correct localization metric.
trusted part, and finally, the model is retrained with the
labeled and pseudolabeled samples.
Wu et al. [246] proposed self-paced
curriculum learning that follows an
Step 1: Pretrain
“easy-to-hard” scheme to select more
Prediction
Labeled
reliable pseudolabels. Zhong et al.
Samples
[245] adopt an active learning stratStep 3:
Detector
egy in which high-scored predictions
Retrain
are manually adjusted by experts to
Unlabeled
Step 2: Test
obtain refined pseudolabels. Chen
Samples
Pseudolabels
et al. [247] employed teacher-student
mutual learning to fully leverage unlabeled samples and iteratively gen- FIGURE 16. The pseudolabel generation mechanism in SSOD.
erate higher-quality pseudolabels.
In addition, some studies[249],
[250], [251], [252], [253] have worked
Base Training Stage
on weak SSOD, in which unlabeled
Base Class
Base
samples are replaced with weakly
Detector
Set
Prediction
annotated samples. Du et al. [251],
Assist
Prior
[252] employed a large number of
Knowledge
image-level labeled samples to imBase and Novel
Novel Class
prove SAR vehicle detection perforDetector
Prediction
Set
mance under scarce box-level labeled
Few-Shot Fine-Tuning Stage
samples. Chen et al. [253] adopted a
small number of pixel-level labeled
samples and a dominant number of FIGURE 17. The two-stage training pipeline of FSOD.
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
23
prior knowledge with abundant base-class samples, and the
few-shot fine-tuning stage leverages the prior knowledge to
facilitate the learning of few-shot novel concepts. The research on remote sensing FSOD mainly focuses on metalearning methods [254], [255], [256], [257], [258], [259] and
transfer learning methods [260], [261], [262], [263], [264],
[265], [266], [267], [268], [269].
The metalearning-based methods acquire task-level
knowledge by simulating a series of few-shot learning
tasks and generalize this knowledge to tackle the few-shot
learning of novel classes. Li et al. [255] first employed metalearning for remote sensing FSOD and achieved satisfactory detection performance with only one to 10 labeled
samples. Later, a series of metalearning-based few-shot detectors were developed in the remote sensing community
[254], [255], [256], [257], [258], [259]. For example, Cheng
et al. [254] proposed a prototype CNN to generate better
foreground proposals and class-aware ROI features for remote sensing FSOD by learning class-specific prototypes.
Wang et al. [258] presented a metametric training paradigm
to enable a few-shot learner with flexible scalability for fast
adaptation to few-shot novel tasks.
Transfer learning-based methods aim at fine-tuning
common knowledge learned from abundant annotated
data to few-shot novel data and typically consist of a base
training stage and a few-shot fine-tuning stage. Huang et al.
[266] proposed a balanced fine-tuning strategy to alleviate
the number imbalance problem between novel-class samples and base-class samples. Zhou et al. [265] introduced
proposal-level contrast learning in the fine-tuning phase to
learn more robust feature representations in few-shot scenarios. Compared with the metalearning-based methods,
the transfer learning-based method has a simpler and more
memory-efficient training paradigm.
DATASETS AND EVALUATION METRICS
DATASETS INTRODUCTION AND SELECTION
Datasets have played an indispensable role throughout
the development of object detection in RSIs. On the one
hand, datasets serve as common ground for the performance evaluation and comparison of detectors. On the
other hand, datasets push researchers to address increasingly challenging problems in the RSOD field. In the past
TABLE 3. COMPARISONS OF WIDELY USED DATASETS IN THE FIELD OF RSOD.
DATASET
SOURCE
ANNOTATION
CATEGORIES
INSTANCES
IMAGES
IMAGE WIDTH
TAS [270]
Google Earth
HBB
1
1,319
30
792
RESOLUTION YEAR
—
2008
SZTAKI-INRIA [271]
QuickBird, Ikonos, and
Google Earth
OBB
1
665
9
~800
0.5–1 m
2012
NWPU VHR-10 [18]
Google Earth
HBB
10
3,651
800
~1,000
0.3–2 m
2014
VEDAI [272]
Utah Automated Geographic
Reference Center
OBB
9
2,950
1,268
1,024
0.125 m
2015
DLR 3K [273]
DLR 3K camera system
OBB
8
14,235
20
5,616
0.13 m
2015
UCAS-AOD [274]
Google Earth
OBB
2
6,029
910
~1,000
0.3–2 m
2015
COWC [275]
Multiple sources
Point
1
32,716
53
2,000–19,000
0.15 m
2016
HRSC [276]
Google Earth
OBB
26
2,976
1,061
~1,100
0.4–2 m
2016
RSOD [43]
Google Earth and Tianditu
HBB
4
6,950
976
~1,000
0.3–3 m
2017
SSDD [277]
RadarSat-2, TerraSAR-X, and
Sentinel-1
HBB
1
2,456
1,160
500
1–15 m
2017
LEVIR [278]
Google Earth
HBB
3
11,000
22,000
800–600
0.2–1 m
2018
xView [2]
WorldView-3
HBB
60
1 million
1,413
~3,000
0.3 m
2018
DOTA v1.0 [117]
Google Earth, Jilin-1, and
Gaofen-2
HBB and OBB
15
188,282
2,806
800–13,000
0.1–1 m
2018
HRRSD [48]
Google Earth and Baidu Maps
HBB
13
55,740
21,761
152–10,569
0.15–1.2 m
2019
DIOR [28]
Google Earth
HBB
20
190,288
23,463
800
0.5–30 m
2019
AIR-SARShip-1.0 [279] Gaofen-3
HBB
1
3,000
31
3,000
1 and 3 m
2019
MAR20 [280]
Google Earth
HBB and OBB
20
22,341
3,824
~800
—
2020
FGSD [281]
Google Earth
OBB
43
5,634
2,612
930
0.12–1.93 m
2020
DOSR [282]
Google Earth
OBB
20
6,172
1,066
600–1,300
0.5–2.5 m
2021
AI-TOD [283]
Multiple sources
HBB
8
700,621
28,036
800
—
2021
FAIR1M [34]
Gaofen satellites and
Google Earth
OBB
37
1,020,579
42,796
600–10,000
0.3–0.8 m
2021
DOTA-v2.0 [33]
Google Earth, Jilin-1, Gaofen-2,
and airborne images
HBB and OBB
18
1,793,658
11,268
800–20,000
0.1–4.5 m
2021
SODA-A [284]
Google Earth
OBB
9
800,203
2,510
4,761 × 2,777* —
2022
OBB: oriented bounding box.
*Average image width.
24
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
decade, several datasets with different attributes have been
released to facilitate the development of RSOD, as listed
in Table 3. In this section, we mainly introduce 10 widely
used datasets with specific characteristics:
1) NWPU VHR-10 [18]: This dataset is a multiclass geospatial object detection dataset. It contains 3,775 HBB annotated instances in 10 categories: airplane, ship, storage
tank, baseball diamond, tennis court, basketball court,
ground track field, harbor, bridge, and vehicle. There are
800 very high-resolution RSIs, consisting of 715 color images from Google Earth and 85 pansharpened color infrared images from Vaihingen data. The image resolutions
range from 0.5 to 2 m.
2) VEDAI [272]: VEDAI is a fine-grained vehicle detection
dataset that contains five fine-grained vehicle categories: camping car, car, pickup, tractor, truck, and van.
There are 1,210 images and 3,700 instances in the VEDAI dataset, and the size of each image is 1, 024 # 1, 024.
A small area and an arbitrary orientation of vehicles are
the main challenges in the VEDAI dataset.
3) UCAS-AOD [274]: The UCAS-AOD dataset includes 910
images and 6,029 objects, where 3,210 aircraft are contained in 600 images and 2,819 vehicles are contained in
310 images. All images are acquired from Google Earth,
with an image size of approximately 1, 000 # 1, 000.
4) HRSC [276]: The HRSC dataset is widely used for arbitrary orientation ship detection and consists of 1,070
images and 2,976 instances with oriented bounding
box (OBB) annotation. The images are captured from
Google Earth, containing offshore and onshore scenes.
The image sizes vary from 300 # 300 to 1, 500 # 900,
and the image resolutions range from 2 to 0.4 m.
5) SSDD [277]: SSDD is the first open dataset for SAR image ship detection and contains 1,160 SAR images
and 2,456 ships. The SAR images in the SSDD dataset are collected from different sensors with resolutions from 1 to 15 m and have different polarizations
(horizontal-horizontal, vertical-vertical, vertical-horizontal, and horizontal-vertical). Subsequently, the author
further refines and enriches the SSDD dataset into
three different types to satisfy the current research of
SAR ship detection [286].
6) xView [2]: The xView dataset is one of the largest publicly available datasets in RSOD, with approximately 1 million labeled objects across 60 fine-grained
classes. Compared to other RSOD datasets, the images in the xView dataset are collected from WorldView-3 at a 0.3-m GSD, providing higher-resolution
images. Moreover, the xView dataset covers more
than 1,400 km 2 of Earth’s surface, which leads to
higher diversity.
7) DOTA [117]: DOTA is a large-scale dataset consisting of
188,282 objects annotated with both HBBs and OBBs.
All objects are divided into 15 categories: plane, ship,
storage tank, baseball diamond, tennis court, swimming pool, ground track field, harbor, bridge, large
vehicle, small vehicle, helicopter, roundabout, soccer
field, and basketball court. The images in this dataset
are collected from Google Earth, Jilin-1 satellites, and
the Gaofen-2 satellite, with a spatial resolution of 0.1
to 1 m. Recently, DOTA v2.0 [33] was made publicly
available, containing more than 1.7 million objects in
18 categories.
8) DIOR [28]: DIOR is an object detection dataset for
optical RSIs. There are 23,463 optical images in this
dataset, with a spatial resolution of 0.5 to 30 m. The
total number of objects in the dataset is 192,472, and
all the objects are labeled with HBBs. The categories
of objects are as follows: airplane, airport, baseball
field, basketball court, bridge, chimney, dam, expressway service area, expressway toll station, harbor,
golf course, ground track field, overpass, ship, stadium, storage tank, tennis court, train station, vehicle,
and windmill.
9) FAIR1M [34]: FAIR1M is a more challenging dataset for
fine-grained object detection in RSIs, including five
categories and 37 subcategories. There are more than
40,000 images and more than 1 million objects annotated by OBBs. The images are acquired from multiple
platforms with a resolution of 0.3 to 0.8 m and are
spread across different countries and regions. The finegrained categories, massive numbers of objects, large
ranges of sizes and orientations, and diverse scenes
make FAIR1M more challenging.
10) SODA-A [284]: SODA-A is a recently released dataset
designed for tiny object detection in RSIs. This dataset
consists of 2,510 images, with an average image size
of 4, 761 # 2, 777, and 800,203 objects with OBB annotation. All objects are divided into four subsets (i.e.,
extremely small, relatively small, generally small, and
normal) based on their area ranges. There are nine categories in this dataset, including airplane, helicopter,
small vehicle, large vehicle, ship, container, storage
tank, swimming pool, and windmill.
The preceding review shows that the early published
datasets generally have limited samples. For example,
NWPU VHR-10 [18] contains only 10 categories and 3,651
instances, and UCAS-AOD [274] consists of two categories
with 6,029 instances. In recent years, researchers have not
only introduced massive amounts of data and fine-grained
objects but also collected data from multiple sensors, various resolutions, and diverse scenes (e.g., DOTA [117], DIOR
[28], and FAIR1M [34]) to satisfy practical applications
in RSOD. Figure 18 exhibits typical samples of different
RSOD datasets.
We also provide dataset selection guidelines in Table 4
to help researchers select proper datasets and methods for
different challenges and scenarios. Notably, only the image-level annotations of the datasets are available for the
weak supervision scenario. As for the few-shot supervision
scenario, there are only K-shot box-level annotated samples
for each novel class, where K is set to " 3, 5, 10, 20, 30 , .
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
25
EVALUATION METRICS
In addition to datasets, evaluation metrics are equally important. Generally, the inference speed and the detection
accuracy are the two commonly adopted metrics for evaluating the performance of detectors.
Frames per second (FPS) is a standard metric for inference speed evaluation that indicates the number of images
that the detector can detect per second. Notably, both the
image size and hardware devices can influence the inference speed.
Average precision (AP) is the most commonly used
metric for detection accuracy. Given a test image I, let
"^b i, c i, p ih,iN= 1 denote the prediction detections, where b i is
the predicted box, c i is the predicted label, and p i is the
gt gt M
confidence score. Let #_ b j , c j i-j = 1 refer to the ground truth
gt
annotations on the test image I, where b j is the ground
gt
truth box and c j is the ground truth category. A prediction detection ^b i, c i, p ih is assigned as a true positive (TP)
gt gt
for ground truth annotation _ b j , c j i if it meets both of the
following criteria:
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
FIGURE 18. A visualization of different RSOD datasets. Diverse resolutions, massive instances, multisensor images, and fine-grained categories
are typical characteristics of RSOD datasets. (a) NWPU VHR-10. (b) DIOR. (c) SODA-A. (d) xView. (e) DOTA. (f) SSDD. (g) FAIR1M. (h) VEDAI.
26
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
◗ The confidence score p i is greater than the confidence
threshold t, and the predicted label is same as the
gt
ground truth label c j .
◗ The IOU between the predicted box b i and the ground
gt
truth box b j is larger than the IOU threshold f. The
IOU is calculated as follows:
IOU ^b, b g h =
area ^b + b g h
area ^b , b g h
gt
(2)
gt
where area (b i + b j ) and area (b i , b j ) stand for the intersection and union area of the predicted box and ground
truth box.
Otherwise, the prediction detection is considered a false
positive (FP). It is worth noting that multiple prediction
detections may match the same ground truth annotation
according to the preceding criteria, but only the prediction
detection with the highest confidence score is assigned as a
TP, and the rest are FPs [287].
Based on TP and FP detections, and considering false
negatives (FNs), the precision (P) and recall (R) can be computed as
TP
P = TP + FP
(3)
TP
R = TP + FN .
(4)
The precision measures the fraction of TPs of the prediction detections, and the recall measures the fraction of
positives that are correctly detected. However, the preceding two evaluation metrics reflect only a single aspect of
detection performance.
Taking into account both precision and recall, AP provides a comprehensive evaluation of detection performance and is calculated individually for each class. For a
given class, the precision-recall curve (PRC) is drawn according to the detection of the maximum precision at each
recall, and the AP summarizes the shape of the PRC [287].
For multiclass object detection, the mean of the AP values
for all classes, termed mAP, is adopted to evaluate the overall detection accuracy.
Early studies mainly employed a fixed IOU-based AP
metric (i.e., AP50) [18], [28], [117], where the IOU threshold
f is given as 0.5. This low IOU threshold exhibits a high
tolerance for bounding box deviations and fails to satisfy
the high localization accuracy requirements. Later, some
works [130], [131], [284] introduced a novel evaluation
metric, AP50: 95, which averages the AP over 10 IOU thresholds from 0.5 to 0.95, with an interval of 0.05. The AP50: 95
considers higher IOU thresholds and encourages more accurate localization.
As the cornerstone of evaluation metrics in RSOD, AP
has various extensions for different specific tasks. In the fewshot learning scenario, APnovel and APbase are two critical
metrics to evaluate the performance of few-shot detectors,
where APnovel and APbase represent the detection performance on the novel class and base class, respectively. An
excellent few-shot detector should achieve satisfactory performance in the novel class and avoid performance degradation in the base class [269]. In the incremental detection
of remote sensing objects, APold and APinc are employed to
evaluate the performance of the old and incremental classes on different incremental tasks. In addition, the harmonic
mean (HM) is also a vital evaluation metric for incremental
object detection [288], providing a comprehensive performance evaluation of both old and incremental classes, as
described by
2AP AP
HM = AP old+ APinc .
old
inc
(5)
APPLICATIONS
Deep learning techniques have injected significant innovations into RSOD, leading to an effective way to automatically identify objects of interest from voluminous RSIs.
Therefore, RSOD methods have been applied in a rich diversity of practice scenarios that significantly support the
implementation of sustainable development goals (SDGs)
and the improvement of society[289], [290], [291], as described in Figure 19.
DISASTER MANAGEMENT
Natural disasters pose a serious threat to the safety of human life and property. A quick and precise understanding
of disaster impacts and extents of damage is critical to disaster management. RSOD methods can accurately identify
ground objects from a bird’s-eye view of a disaster-affected
area, providing a novel potential for disaster management
[292], [293], [294], [295], [296]. Guan et al. [293] proposed
a novel instance segmentation model to accurately detect
fire in a complex environment, which can be applied to forest fire disaster response. Ma et al. [295] designed a realtime detection method for collapsed building assessment
following earthquakes.
TABLE 4. DATASET SELECTION GUIDELINES IN RSOD FOR
DIFFERENT CHALLENGES AND SCENARIOS.
SCENARIO
DATASETS
METHODS
Multiscale
objects
DOTA, DIOR,
and FAIR1M
HyNet [51] and FFA [80]
Rotated objects
DOTA and HRSC
KLD [131] and ReDet [155]
Weak objects
DOTA, DIOR,
and FAIR1M
RECNN [172] and CADNet [196]
Tiny objects
SODA-A and AI-TOD
NWD [10] and FSANet [216]
Weak
supervision
NWPU VHR-10
and DIOR
RINet [240] and MOL [241]
Few-shot
supervision
NWPU VHR-10
and DIOR
P-CNN [254] and G-FSDet [269]
Fine-grained
objects
DOSR and FAIR1M
RBFPN [94] and EIRNet [282]
SAR image
objects
SSDD and AIRSARShip
SSPNet [163] and
HyperLiNet [285]
Specific objects
HRSC and MAR20
GRS-Det [135] and COLOR [245]
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
27
PRECISION AGRICULTURE
With our unprecedented and still expanding population, ensuring agricultural production is a fundamental
obstacle to feeding growing numbers of people. RSOD
has the ability to monitor crop growth and estimate
food production, promoting further progress for precision agriculture [297], [298], [299], [300], [301], [302].
Pang et al. [298] used RSIs for early season maize detection and achieved an accurate estimation of emergence
rates. Chen et al. [302] designed an automatic strawberry
flower detection system to monitor the growth cycle of
strawberry fields.
SUSTAINABLE CITIES AND COMMUNITIES
Half of the global population now lives in cities, and this
population will keep growing in the coming decades. Sustainable cities and communities are the goals of modern
city development, in which RSOD can make a significant
impact. For instance, building and vehicle detection [303],
[304], [305], [306] can help estimate population density
distributions and transport traffic statistics, providing
suggestions for city development planning. Infrastructure
distribution detection [307] can assist in disaster assessment and early warnings in city environments.
CLIMATE ACTION
The ongoing climate change forces humans to face the
daunting challenge of the climate crisis. Some researchers
[308], [309], [310] employed object detection methods for
automatically mapping tundra ice wedge polygons to document and analyze the effects of climate warming on the
Arctic region. Besides, RSOD can produce statistics on the
number and spatial distribution of solar panels and wind
turbines [311], [312], [313], [314], facilitating the mitigation
of greenhouse gas emissions.
OCEAN CONSERVATION
The oceans cover nearly three-quarters of Earth’s surface,
and more than 3 billion people depend on the diverse life
of the oceans and coasts. The ocean is gradually deteriorating due to pollution, and RSOD can provide powerful
support for ocean conservation [315]. Several works applied
detection methods for litter detection along shores [316],
floating plastic detection at sea [317], deep-sea debris detection [318], and so on. Another important application is
ship detection [135], [136], which can help monitor illegal
fishing activities.
WILDLIFE SURVEILLANCE
A global loss of biodiversity is observed at all levels, and
object detection in combination with RSIs provides a
novel perspective for wildlife conservation [319], [320],
[321], [322], [323]. Delplanque et al. [322] adopted a deep
learning-based detector for multiple-species detection and
the identification of African mammals. Kellenberger et al.
[323] designed a weakly supervised wildlife detection
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Number: 185
Area: 145,830.7 m2
FIGURE 19. The widespread applications of RSOD make substantial contributions to implementing SDGs and improving society. (a) Collapsed
building detection following earthquakes for disaster assessment. (b) Corn plant detection for precision agriculture. (c) and (d) Building and
vehicle detection for sustainable cities and communities. (e) Solar photovoltaic detection for climate change mitigation. (f) Litter detection along
the shore for ocean conservation. (g) African mammal detection for wildlife surveillance. (h) Single-tree detection for forest ecosystem protection.
28
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
framework that requires only image-level labels to identify wildlife.
FOREST ECOSYSTEM PROTECTION
The forest ecosystem plays an important role in ecological protection, climate regulation, and carbon cycling.
Understanding the condition of trees is essential for forest ecosystem protection [324], [325], [326], [327], [328].
Safonova et al. [326] analyzed the shape, texture, and
color of detected trees’ crowns to determine their damage stage, providing a more efficient way to assess forest health. Sani-Mohammed et al. [328] utilized an
instance segmentation approach to map standing dead
trees, which is imperative for forest ecosystem management and protection.
FUTURE DIRECTIONS
Apart from the five RSOD research topics mentioned in
this survey, there is still much work to be done in this field.
Therefore, we present a forward-looking discussion of future directions to further improve and enhance the detectors in remote sensing scenes.
UNIFIED DETECTION FRAMEWORK FOR
LARGE-SCALE REMOTE SENSING IMAGES
Benefiting from the development of remote sensing technology, high-resolution large-scale RSIs (e.g., more than
10, 000 # 10, 000 pixels) can be easily obtained. However,
limited by GPU memory, the current mainstream RSOD
methods fail to directly perform object detection in largescale RSIs but adopt a sliding window strategy, mainly including sliding window cropping, patch prediction, and
results merging. On the one hand, this sliding window
framework requires complex data preprocessing and postprocessing, compared with the unified detection framework. On the other hand, objects usually occupy a small
area of RSIs, and the invalid calculation of massive backgrounds leads to increasing computation time and memory
consumption. Some studies [215], [329], [330] proposed a
coarse-to-fine detection framework for object detection in
large-scale RSIs. This framework first locates ROIs by filtering out meaningless regions and then achieves accurate detection from these filtered regions.
DETECTION WITH MULTIMODAL
REMOTE SENSING IMAGES
Restricted by the sensor imaging mechanism, detectors
based on single-modal RSIs often have detection performance deviations, which are difficult to meet in practical
applications [331]. In contrast, multimodal RSIs from different sensors have their own characteristics. For instance,
hyperspectral images contain high-spectral-resolution
and fine-grained spectral features, SAR images provide
abundant texture information, and optical images exhibit
high spatial resolution with rich detailed information. The
integrated processing of multimodal RSIs can improve the
interpretation of scenes and obtain a more objective and
comprehensive understanding of geospatial objects [332],
[333], [334], providing the possibility to further improve
the detection performance of RSOD.
DOMAIN ADAPTATION OBJECT DETECTION
IN REMOTE SENSING IMAGES
Due to the diversity of remote sensing satellite sensors,
resolutions, and bands, as well as the influence of weather
conditions, seasons, and geospatial regions [6], RSIs collected from different satellites are generally drawn from
similar but not identical distributions. Such distribution
differences (also called the domain gap) severely restrict the
generalization performance of the detector. Recent studies
on domain adaptation object detection [335], [336], [337],
[338] have proposed to tackle the domain gap problem.
However, these studies focus only on domain adaptation
detectors in single-modal RSIs, while cross-modal domain
adaptation object detection (e.g., from optical images to
SAR images [339], [340]) is a more challenging and worthwhile topic to investigate.
INCREMENTAL DETECTION OF
REMOTE SENSING OBJECTS
The real-world environment is dynamic and open, where
the number of categories has evolved over time. However,
mainstream detectors require both old and new data to
retrain the model when meeting new categories, resulting
in high computational costs. Recently, incremental learning has been considered the most promising way to solve
this problem, as it can learn new knowledge without forgetting old knowledge while using only new data [341]. Incremental learning has been preliminarily explored in the
remote sensing community [342], [343], [344], [345]. For
example, Chen et al. [342] integrated knowledge distillation into FPN and detection heads to learn new concepts
while maintaining old ones. More thorough research is still
needed in incremental RSOD to meet the dynamic learning
task in practical applications.
SELF-SUPERVISED PRETRAINED MODELS
FOR REMOTE SENSING SCENES
Current RSOD methods are always initialized with ImageNet [346] pretrained weights. However, there is an inevitable domain gap between natural and remote sensing scenes, probably limiting the performance of RSOD.
Recently, self-supervised pretraining approaches have
received extensive attention and shown excellent performance in classification and downstream tasks in nature scenes. Benefiting from rapid advances in remote
sensing technolog y, abundant remote sensing data
[347], [348] also provide sufficient data support for selfsupervised pretraining. Some researchers [349], [350],
[351], [352], [353] have initially demonstrated the effectiveness of remote sensing pretraining on representative
downstream tasks. Therefore, exploring self-supervised
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
29
pretraining models based on multisource remote sensing
data deserves further research.
COMPACT AND EFFICIENT OBJECT
DETECTION ARCHITECTURES
Most existing airborne and satellite-borne satellites require
sending back remote sensing data for interpretation, leading to additional resource overheads. Thus, it is essential
to investigate compact and efficient detectors for airborne
and satellite-borne platforms to reduce resource consumption in data transmission. Drawing on this demand, some
researchers have proposed lightweight detectors through
model design [285], [354], [355], network pruning [356],
[357], and knowledge distillation [358], [359], [360]. However, these detectors still rely heavily on high-performance
GPUs and cannot be deployed on airborne and satelliteborne satellites. Therefore, designing compact and efficient
object detection architectures for limited resource scenarios
remains challenging.
CONCLUSION
Object detection has been a fundamental but challenging
research topic in the remote sensing community. Thanks to
the rapid development of deep learning techniques, RSOD
has received considerable attention and made remarkable
achievements in the past decade. In this article, we presented a systematic review and summarization of existing deep
learning-based methods in RSOD. First, we summarized
the five main challenges in RSOD according to the characteristics of geospatial objects and categorized the methods
into five streams: multiscale object detection, rotated object detection, weak object detection, tiny object detection,
and object detection with limited supervision. Then, we adopted a systematic hierarchical division to review and summarize the methods in each category. Next, we introduced
typical benchmark datasets, evaluation metrics, and practical applications in the RSOD field. Finally, considering the
limitations of existing RSOD methods, we discussed some
promising directions for further research.
Given this time of high-speed technical evolution in
RSOD, we believe this survey can help researchers to
achieve a more comprehensive understanding of the main
topics in this field and to find potential directions for future research.
ACKNOWLEDGEMENT
This work was supported, in part, by the National Natural Science Foundation of China, under Grants 62276197,
62006178, and 62171332, and the Key Research and Development Program of Shaanxi Province, under Grant
2019ZDLGY03-08. Xiangrong Zhang is the corresponding author.
AUTHOR INFORMATION
Xiangrong Zhang (xrzhang@mail.xidian.edu.cn) received her
B.S. and M.S. degrees in computer application technology
30
from the School of Computer Science, Xidian University,
Xi’an, China, in 1999 and 2003, respectively, and her Ph.D.
degree in pattern recognition and intelligent system from
the School of Electronic Engineering, Xidian University, in
2006. From January 2015 to March 2016, she was a Visiting
Scientist with the Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology,
Cambridge, MA, USA. Currently, she is a professor with
the Key Laboratory of Intelligent Perception and Image
Understanding of the Ministry of Education, Xidian University, Xi’an 710071, China. Her research interests include
pattern recognition, machine learning, and remote sensing
image analysis and understanding. She is a Senior Member
of IEEE.
Tianyang Zhang (tianyangzhang@stu.xidian.edu.cn)
received his B.S. degree in intelligent science and technology from Xidian University, Xian, China in 2018. He is currently pursuing his Ph.D. degree from the Key Laboratory of
Intelligence Perception and Image Understanding of the
Ministry of Education, Xidian University, Xi’an 710071,
China. His current research interests include remote sensing object detection and remote sensing image analysis.
Guanchun Wang (guanchunwang1206@163.com) received his B.S. degree in intelligent science and technology
from Xidian University, Xian, China in 2019. He is currently pursuing his Ph.D. degree from the Key Laboratory of
Intelligence Perception and Image Understanding of the
Ministry of Education, Xidian University, Xi’an 710071,
China. His current research interests include object detection and remote sensing image analysis.
Peng Zhu (zhupeng@stu.xidian.edu.cn) received his
B.S. degree in intelligent science and technology from
Xidian University, Xian, China in 2017. He is currently pursuing his Ph.D. degree from the Key Laboratory of Intelligence Perception and Image Understanding of the Ministry
of Education, Xidian University, Xi’an 710071, China. His
current research interests include computer vision and remote sensing images analysis.
Xu Tang (tangxu128@gmail.com) received his B.Sc.,
M.Sc., and Ph.D. degrees in electronic circuits and systems
from Xidian University, Xi’an, China, in 2007, 2010, and
2017, respectively. From 2015 to 2016, he was a Joint Ph.D.
degree along with Prof. W. J. Emery at the University of
Colorado at Boulder, Boulder, CO, USA. He is currently a
professor with the Key Laboratory of Intelligent Perception
and Image Understanding of the Ministry of Education,
Xidian University, Xi’an 710071, China. He is also a Hong
Kong Scholar with the Hong Kong Baptist University, Hong
Kong. His research interests include remote sensing image
content-based retrieval and reranking, hyperspectral image
processing, remote sensing scene classification, and object
detection. He is a Senior Member of IEEE.
Xiuping Jia (xp.jia@ieee.org) received his B.Eng. degree
from the Beijing University of Posts and Telecommunications, Beijing, China, in January 1982, and his Ph.D. degree
in electrical engineering (via the part-time study) from The
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
University of New South Wales, Canberra, ACT, Australia, in 1996. She has a lifelong academic career in higher
education for which she has continued passion. She is currently an associate professor with the School of Engineering and Information Technology, The University of New
South Wales, Canberra ACT 2612, Australia. Her research
interests include remote sensing, hyperspectral image processing, and spatial data analysis. She has published widely
addressing various topics, including data correction, feature
reduction, and image classification using machine-learning
techniques. She has coauthored the remote sensing textbook Remote Sensing Digital Image Analysis [Springer-Verlag,
3rd edition (1999) and 4th edition (2006)]. She is the author of Field Guide to Hyperspectral/Multispectral Image Processing (SPIE, 2022). These publications are highly cited in the
remote sensing and image processing communities with an
H-index of 54 and an i-10-index of 189 (Google Scholar).
She received the graduate certificate in higher education
from the University of New South Wales in 2005. She is the
Editor-in-Chief of IEEE Transactions on Geoscience and Remote Sensing. She is a Fellow of IEEE.
Licheng Jiao (lchjiao@mail.xidian.edu.cn) received
his B.S. degree from Shanghai Jiaotong University, Shanghai, China, in 1982 and his M.S. and Ph.D. degrees from
Xi’an Jiaotong University, Xi’an, China, in 1984 and 1990,
respectively. Since 1992, he has been a distinguished professor with the School of Electronic Engineering, Xidian
University, Xi’an 710071 China, where he is currently the
director of the Key Laboratory of Intelligent Perception
and Image Understanding of the Ministry of Education of
China. He has been a foreign member of the Academia
Europaea and the Russian Academy of Natural Sciences.
His research interests include machine learning, deep
learning, natural computation, remote sensing, image
processing, and intelligent information processing. He is
the chairman of the Awards and Recognition Committee;
the vice board chairperson of the Chinese Association of
Artificial Intelligence; a councilor of the Chinese Institute
of Electronics; a committee member of the Chinese Committee of Neural Networks; and an expert of the Academic
Degrees Committee of the State Council. He is a Fellow of
IEEE and the Institution of Engineering and Technology;
Chinese Association for Artificial Intelligence; Chinese
Institute of Electronics; China Computer Federation; and
Chinese Association of Automation.
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
REFERENCES
[1]
[2]
[3]
N. Gorelick, M. Hancher, M. Dixon, S. Ilyushchenko, D. Thau,
and R. Moore, “Google earth engine: Planetary-scale geospatial analysis for everyone,” Remote Sens. Environ., vol. 202, pp.
18–27, Dec. 2017, doi: 10.1016/j.rse.2017.06.031.
D. Lam et al., “xView: Objects in conte x t in overhead
imager y,” 2018. [Online]. Available: http://arxiv.org/abs/
1802.07856
Z. Li, H. Shen, H. Li, G. Xia, P. Gamba, and L. Zhang, “Multi-feature combined cloud and cloud shadow detection in GaoFen-1
[15]
[16]
wide field of view imagery,” Remote Sens. Environ., vol. 191, pp.
342–358, Mar. 2017, doi: 10.1016/j.rse.2017.01.026.
S. Zhang, R. Wu, K. Xu, J. Wang, and W. Sun, “R-CNN-based
ship detection from high resolution remote sensing imagery,”
Remote Sens., vol. 11, no. 6, Mar. 2019, Art. no. 631, doi: 10.3390/
rs11060631.
Y. Wang, C. Wang, H. Zhang, Y. Dong, and S. Wei, “Automatic
ship detection based on retinaNet using multi-resolution
GaoFen-3 imagery,” Remote Sens., vol. 11, no. 5, Mar. 2019, Art.
no. 531, doi: 10.3390/rs11050531.
X. X. Zhu et al., “Deep learning in remote sensing: A comprehensive review and list of resources,” IEEE Geosci. Remote
Sens. Mag., vol. 5, no. 4, pp. 8–36, Dec. 2017, doi: 10.1109/
MGRS.2017.2762307.
L. Zhang, L. Zhang, and B. Du, “Deep learning for remote
sensing data: A technical tutorial on the state of the art,” IEEE
Geosci. Remote Sens. Mag., vol. 4, no. 2, pp. 22–40, Jun. 2016,
doi: 10.1109/MGRS.2016.2540798.
L. Zhang and L. Zhang, “Artificial intelligence for remote sensing data analysis: A review of challenges and opportunities,”
IEEE Geosci. Remote Sens. Mag., vol. 10, no. 2, pp. 270–294, Jun.
2022, doi: 10.1109/MGRS.2022.3145854.
W. Han et al., “Methods for small, weak object detection
in optical high-resolution remote sensing images: A survey of advances and challenges,” IEEE Geosci. Remote Sens.
Mag., vol. 9, no. 4, pp. 8–34, Dec. 2021, doi: 10.1109/
MGRS.2020.3041450.
C. Xu, J. Wang, W. Yang, H. Yu, L. Yu, and G.-S. Xia, “Detecting tiny objects in aerial images: A normalized Wasserstein
distance and a new benchmark,” ISPRS J. Photogrammetry
Remote Sens., vol. 190, pp. 79–93, Aug. 2022, doi: 10.1016/
j.isprsjprs.2022.06.002.
J. Yue et al., “Optical remote sensing image understanding
with weak supervision: Concepts, methods, and perspectives,”
IEEE Geosci. Remote Sens. Mag., vol. 10, no. 2, pp. 250–269, Jun.
2022, doi: 10.1109/MGRS.2022.3161377.
C. Xu and H. Duan, “Artificial bee colony (ABC) optimized
edge potential function (EPF) approach to target recognition
for low-altitude aircraft,” Pattern Recognit. Lett., vol. 31, no. 13,
pp. 1759–1772, Oct. 2010, doi: 10.1016/j.patrec.2009.11.018.
X. Sun, H. Wang, and K. Fu, “Automatic detection of geospatial objects using taxonomic semantics,” IEEE Geosci. Remote
Sens. Lett., vol. 7, no. 1, pp. 23–27, Feb. 2010, doi: 10.1109/
LGRS.2009.2027139.
Y. Lin, H. He, Z. Yin, and F. Chen, “Rotation-invariant object
detection in remote sensing images based on radial-gradient angle,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 4, pp. 746–750,
Apr. 2015, doi: 10.1109/LGRS.2014.2360887.
H. Moon, R. Chellappa, and A. Rosenfeld, “Performance
analysis of a simple vehicle detection algorithm,” Image Vision Comput., vol. 20, no. 1, pp. 1–13, Jan. 2002, doi: 10.1016/
S0262-8856(01)00059-2.
S. Leninisha and K. Vani, “Water flow based geometric active
deformable model for road network,” ISPRS J. Photogrammetry
Remote Sens., vol. 102, pp. 140–147, Apr. 2015, doi: 10.1016/
j.isprsjprs.2015.01.013.
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
31
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
32
D. Chaudhuri and A. Samal, “An automatic bridge detection
technique for multispectral images,” IEEE Trans. Geosci. Remote
Sens., vol. 46, no. 9, pp. 2720–2727, Sep. 2008, doi: 10.1109/
TGRS.2008.923631.
G. Cheng, J. Han, P. Zhou, and L. Guo, “Multi-class geospatial object detection and geographic image classification
based on collection of part detectors,” ISPRS J. Photogrammetry Remote Sens., vol. 98, pp. 119–132, Dec. 2014, doi:
10.1016/j.isprsjprs.2014.10.002.
L. Zhang, L. Zhang, D. Tao, and X. Huang, “Sparse transfer
manifold embedding for hyperspectral target detection,” IEEE
Trans. Geosci. Remote Sens., vol. 52, no. 2, pp. 1030–1043, Feb.
2014, doi: 10.1109/TGRS.2013.2246837.
J. Han et al., “Efficient, simultaneous detection of multiclass geospatial targets based on visual saliency modeling
and discriminative learning of sparse coding,” ISPRS J. Photogrammetry Remote Sens., vol. 89, pp. 37–48, Mar. 2014, doi:
10.1016/j.isprsjprs.2013.12.011.
H. Sun, X. Sun, H. Wang, Y. Li, and X. Li, “Automatic target
detection in high-resolution remote sensing images using spatial sparse coding bag-of-words model,” IEEE Geosci. Remote
Sens. Lett., vol. 9, no. 1, pp. 109–113, Jan. 2012, doi: 10.1109/
LGRS.2011.2161569.
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,
vol. 521, no. 7553, pp. 436–444, May 2015, doi: 10.1038/
nature14539.
S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2015,
pp. 91–99.
J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR),
2017, pp. 6517–6525, doi: 10.1109/CVPR.2017.690.
T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proc. IEEE Int. Conf.
Comput. Vision (ICCV), 2017, pp. 2999–3007, doi: 10.1109/
ICCV.2017.324.
Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional one-stage object detection,” in Proc. IEEE Int. Conf.
Comput. Vision (ICCV), 2019, pp. 9626–9635, doi: 10.1109/
ICCV.2019.00972.
L. Liu et al., “Deep learning for generic object detection: A survey,” Int. J. Comput. Vision, vol. 128, no. 2, pp. 261–318, Feb.
2020, doi: 10.1007/s11263-019-01247-4.
K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, “Object detection in optical remote sensing images: A survey and a new
benchmark,” ISPRS J. Photogrammetry Remote Sens., vol. 159,
pp. 296–307, Jan. 2020, doi: 10.1016/j.isprsjprs.2019.11.023.
G. Cheng and J. Han, “A survey on object detection in optical remote sensing images,” ISPRS J. Photogrammetry Remote
Sens., vol. 117, pp. 11–28, Jul. 2016, doi: 10.1016/j.isprsjprs.
2016.03.014.
U. Alganci, M. Soydas, and E. Sertel, “Comparative research
on deep learning approaches for airplane detection from very
high-resolution satellite images,” Remote Sens., vol. 12, no. 3,
Feb. 2020, Art. no. 458, doi: 10.3390/rs12030458.
[31]
Z. Li et al., “Deep learning-based object detection techniques
for remote sensing images: A survey,” Remote Sens., vol. 14, no.
10, May 2022, Art. no. 2385, doi: 10.3390/rs14102385.
[32] J. Kang, S. Tariq, H. Oh, and S. S. Woo, “A survey of deep learning-based object detection methods and datasets for overhead
imagery,” IEEE Access, vol. 10, pp. 20,118–20,134, Feb. 2022,
doi: 10.1109/ACCESS.2022.3149052.
[33] J. Ding et al., “Object detection in aerial images: A large-scale
benchmark and challenges,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 44, no. 11, pp. 7778–7796, Nov. 2022, doi: 10.1109/
TPAMI.2021.3117983.
[34] X. Sun et al., “FAIR1M: A benchmark dataset for fine-grained
object recognition in high-resolution remote sensing imagery,”
ISPRS J. Photogrammetry Remote Sens., vol. 184, pp. 116–130,
Feb. 2022, doi: 10.1016/j.isprsjprs.2021.12.004.
[35] W. Zhao, W. Ma, L. Jiao, P. Chen, S. Yang, and B. Hou, “Multiscale image block-level F-CNN for remote sensing images
object detection,” IEEE Access, vol. 7, pp. 43,607–43,621, Mar.
2019, doi: 10.1109/ACCESS.2019.2908016.
[36] S. M. Azimi, E. Vig, R. Bahmanyar, M. Körner, and P. Reinartz, “Towards multi-class object detection in unconstrained
remote sensing imagery,” in Proc. Asian Conf. Comput. Vision, 2018, vol. 11363, pp. 150–165, doi: 10.1007/978-3-03020893-6_10.
[37] P. Shamsolmoali, M. Zareapoor, J. Chanussot, H. Zhou, and
J. Yang, “Rotation equivariant feature image pyramid network for object detection in optical remote sensing imagery,”
IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi:
10.1109/TGRS.2021.3112481.
[38] Y. Chen et al., “Stitcher: Feedback-driven data provider for
object detection,” 2020. [Online]. Available: https://arxiv.org/
abs/2004.12432v1
[39] X. Xu, X. Zhang, and T. Zhang, “Lite-YOLOv5: A lightweight
deep learning detector for on-board ship detection in largescene sentinel-1 SAR images,” Remote Sens., vol. 14, no. 4, Feb.
2022, Art. no. 1018, doi: 10.3390/rs14041018.
[40] N. Su, Z. Huang, Y. Yan, C. Zhao, and S. Zhou, “Detect larger at
once: Large-area remote-sensing image arbitrary-oriented ship
detection,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, Jan.
2022, doi: 10.1109/LGRS.2022.3144485.
[41] B. Zhao, Y. Wu, X. Guan, L. Gao, and B. Zhang, “An improved
aggregated-mosaic method for the sparse object detection of
remote sensing imagery,” Remote Sens., vol. 13, no. 13, Jul.
2021, Art. no. 2602, doi: 10.3390/rs13132602.
[42] X. Han, Y. Zhong, and L. Zhang, “An efficient and robust integrated geospatial object detection framework for high spatial
resolution remote sensing imagery,” Remote Sens., vol. 9, no. 7,
Jun. 2017, Art. no. 666, doi: 10.3390/rs9070666.
[43] Y. Long, Y. Gong, Z. Xiao, and Q. Liu, “Accurate object localization in remote sensing images based on convolutional neural networks,” IEEE Trans. Geosci. Remote Sens.,
vol. 55, no. 5, pp. 2486–2498, May 2017, doi: 10.1109/TGRS.
2016.2645610.
[44] Y. Zhong, X. Han, and L. Zhang, “Multi-class geospatial object
detection based on a position-sensitive balancing framework
for high spatial resolution remote sensing imagery,” ISPRS J.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
Photogrammetry Remote Sens., vol. 138, pp. 281–294, Apr. 2018,
doi: 10.1016/j.isprsjprs.2018.02.014.
[45] P. Ding, Y. Zhang, W.-J. Deng, P. Jia, and A. Kuijper, “A light
and faster regional convolutional neural network for object
detection in optical remote sensing images,” ISPRS J. Photogrammetry Remote Sens., vol. 141, pp. 208–218, Jul. 2018, doi:
10.1016/j.isprsjprs.2018.05.005.
[46] W. Liu, L. Ma, and H. Chen, “Arbitrary-oriented ship detection
framework in optical remote-sensing images,” IEEE Geosci.
Remote Sens. Lett., vol. 15, no. 6, pp. 937–941, Jun. 2018, doi:
10.1109/LGRS.2018.2813094.
[47] W. Liu, L. Ma, J. Wang, and H. Chen, “Detection of multiclass
objects in optical remote sensing images,” IEEE Geosci. Remote
Sens. Lett., vol. 16, no. 5, pp. 791–795, May 2019, doi: 10.1109/
LGRS.2018.2882778.
[48] Y. Zhang, Y. Yuan, Y. Feng, and X. Lu, “Hierarchical and robust convolutional neural network for very high-resolution
remote sensing object detection,” IEEE Trans. Geosci. Remote
Sens., vol. 57, no. 8, pp. 5535–5548, Aug. 2019, doi: 10.1109/
TGRS.2019.2900302.
[49] Z. Lin, K. Ji, X. Leng, and G. Kuang, “Squeeze and excitation
rank faster R-CNN for ship detection in SAR images,” IEEE
Geosci. Remote Sens. Lett., vol. 16, no. 5, pp. 751–755, May 2019,
doi: 10.1109/LGRS.2018.2882551.
[50] Z. Deng, H. Sun, S. Zhou, J. Zhao, L. Lei, and H. Zou, “Multiscale object detection in remote sensing imagery with convolutional neural networks,” ISPRS J. Photogrammetry Remote
Sens., vol. 145, no. Part A, pp. 3–22, Nov. 2018, doi: 10.1016/j.
isprsjprs.2018.04.003.
[51] Z. Zheng et al., “HyNet: Hyper-scale object detection network
framework for multiple spatial resolution remote sensing imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 166, pp. 1–14,
Aug. 2020, doi: 10.1016/j.isprsjprs.2020.04.019.
[52] Y. Ren, C. Zhu, and S. Xiao, “Deformable faster R-CNN
with aggregating multi-layer features for partially occluded
object detection in optical remote sensing images,” Remote
Sens., vol. 10, no. 9, Sep. 2018, Art. no. 1470, doi: 10.3390/
rs10091470.
[53] W. Liu et al., “SSD: Single shot multibox detector,” in Proc.
Eur. Conf. Comput. Vision, Cham, Switzerland: Springer, 2016,
pp. 21–37.
[54] S. Liu, D. Huang, and Y. Wang, “Receptive field block net for
accurate and fast object detection,” in Proc. Eur. Conf. Comput.
Vision, 2018, pp. 385–400.
[55] Z. Shen, Z. Liu, J. Li, Y.-G. Jiang, Y. Chen, and X. Xue, “DSOD:
Learning deeply supervised object detectors from scratch,” in
Proc. IEEE Int. Conf. Comput. Vision (ICCV), 2017, pp. 1937–
1945, doi: 10.1109/ICCV.2017.212.
[56] Z. Zhang, S. Qiao, C. Xie, W. Shen, B. Wang, and A. L. Yuille,
“Single-shot object detection with enriched semantics,” in
Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognit. (CVPR),
2018, pp. 5813–5821, doi: 10.1109/CVPR.2018.00609.
[57] X. Lu, J. Ji, Z. Xing, and Q. Miao, “Attention and feature fusion SSD for remote sensing object detection,” IEEE Trans.
Instrum. Meas., vol. 70, pp. 1–9, Jan. 2021, doi: 10.1109/
TIM.2021.3052575.
[58]
G. Wang et al., “FSoD-Net: Full-scale object detection from optical remote sensing imagery,” IEEE Trans. Geosci. Remote Sens.,
vol. 60, pp. 1–18, 2022, doi: 10.1109/TGRS.2021.3064599.
[59] B. Hou, Z. Ren, W. Zhao, Q. Wu, and L. Jiao, “Object detection in high-resolution panchromatic images using deep models and spatial template matching,” IEEE Trans. Geosci. Remote
Sens., vol. 58, no. 2, pp. 956–970, Feb. 2020, doi: 10.1109/
TGRS.2019.2942103.
[60] X. Liang, J. Zhang, L. Zhuo, Y. Li, and Q. Tian, “Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis,” IEEE Trans. Circuits Syst. Video Technol.,
vol. 30, no. 6, pp. 1758–1770, Jun. 2020, doi: 10.1109/TCSVT.
2019.2905881.
[61] Z. Wang, L. Du, J. Mao, B. Liu, and D. Yang, “SAR target detection based on SSD with data augmentation and transfer learning,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 1, pp. 150–154,
Jan. 2019, doi: 10.1109/LGRS.2018.2867242.
[62] S. Bao, X. Zhong, R. Zhu, X. Zhang, Z. Li, and M. Li,
“Single shot anchor refinement network for oriented object detection in optical remote sensing imagery,” IEEE
Access, vol. 7, pp. 87,150–87,161, Jun. 2019, doi: 10.1109/
ACCESS.2019.2924643.
[63] T. Xu, X. Sun, W. Diao, L. Zhao, K. Fu, and H. Wang, “ASSD:
Feature aligned single-shot detection for multiscale objects
in aerial imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60,
pp. 1–17, 2022, doi: 10.1109/TGRS.2021.3089170.
[64] Q. Li, L. Mou, Q. Liu, Y. Wang, and X. X. Zhu, “HSF-Net:
Multiscale deep feature embedding for ship detection in
optical remote sensing imagery,” IEEE Trans. Geosci. Remote
Sens., vol. 56, no. 12, pp. 7147–7161, Dec. 2018, doi: 10.1109/
TGRS.2018.2848901.
[65] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S.
Belongie, “Feature pyramid networks for object detection,” in
Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2017,
pp. 936–944, doi: 10.1109/CVPR.2017.106.
[66] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit. (CVPR), Jun. 2018, pp. 8759–8768,
doi: 10.1109/CVPR.2018.00913.
[67] J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, and D. Lin, “Libra
R-CNN: Towards balanced learning for object detection,” in
Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit. (CVPR),
Jun. 2019, pp. 821–830.
[68] M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” in Proc. IEEE Conf. Comput. Vision
Pattern Recognit. (CVPR), 2020, pp. 10,781–10,790, doi:
10.1109/CVPR42600.2020.01079.
[69] L. Hou, K. Lu, and J. Xue, “Refined one-stage oriented object
detection method for remote sensing images,” IEEE Trans. Image Process., vol. 31, pp. 1545–1558, Jan. 2022, doi: 10.1109/
TIP.2022.3143690.
[70] W. Zhang, L. Jiao, Y. Li, Z. Huang, and H. Wang, “Laplacian
feature pyramid network for object detection in VHR optical remote sensing images,” IEEE Trans. Geosci. Remote Sens.,
vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3072488.
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
33
[71]
S. Wei et al., “Precise and robust ship detection for high-resolution SAR imagery based on HR-SDNet,” Remote Sens., vol.
12, no. 1, Jan. 2020, Art. no. 167, doi: 10.3390/rs12010167.
[72] G. Cheng, M. He, H. Hong, X. Yao, X. Qian, and L. Guo, “Guiding clean features for object detection in remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022,
doi: 10.1109/LGRS.2021.3104112.
[73] J. Jiao et al., “A densely connected end-to-end neural network
for multiscale and multiscene SAR ship detection,” IEEE Access, vol. 6, pp. 20,881–20,892, Apr. 2018, doi: 10.1109/ACCESS.2018.2825376.
[74] Q. Guo, H. Wang, and F. Xu, “Scattering enhanced attention
pyramid network for aircraft detection in SAR images,” IEEE
Trans. Geosci. Remote Sens., vol. 59, no. 9, pp. 7570–7587, Sep.
2021, doi: 10.1109/TGRS.2020.3027762.
[75] Y. Li, Q. Huang, X. Pei, L. Jiao, and R. Shang, “RADet: Refine
feature pyramid network and multi-layer attention network
for arbitrary-oriented object detection of remote sensing images,” Remote Sens., vol. 12, no. 3, Jan. 2020, Art. no. 389, doi:
10.3390/rs12030389.
[76] L. Shi, L. Kuang, X. Xu, B. Pan, and Z. Shi, “CANet: Centerness-aware network for object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–13, 2022,
doi: 10.1109/TGRS.2021.3068970.
[77] R. Yang, Z. Pan, X. Jia, L. Zhang, and Y. Deng, “A novel CNNbased detector for ship detection based on rotatable bounding box in SAR images,” IEEE J. Sel. Topics Appl. Earth Observ.
Remote Sens., vol. 14, pp. 1938–1958, Jan. 2021, doi: 10.1109/
JSTARS.2021.3049851.
[78] Y. Zhao, L. Zhao, B. Xiong, and G. Kuang, “Attention receptive
pyramid network for ship detection in SAR images,” IEEE J.
Sel. Topics Appl. Earth Observ. Remote Sens., vol. 13, pp. 2738–
2756, May 2020, doi: 10.1109/JSTARS.2020.2997081.
[79] X. Yang, X. Zhang, N. Wang, and X. Gao, “A robust one-stage detector for multiscale ship detection with complex background
in massive SAR images,” IEEE Trans. Geosci. Remote Sens.,
vol. 60, pp. 1–12, 2022, doi: 10.1109/TGRS.2021.3128060.
[80] K. Fu, Z. Chang, Y. Zhang, G. Xu, K. Zhang, and X. Sun, “Rotation-aware and multi-scale convolutional neural network
for object detection in remote sensing images,” ISPRS J. Photogrammetry Remote Sens., vol. 161, pp. 294–308, Mar. 2020, doi:
10.1016/j.isprsjprs.2020.01.025.
[81] W. Huang, G. Li, B. Jin, Q. Chen, J. Yin, and L. Huang, “Scenario context-aware-based bidirectional feature pyramid network for remote sensing target detection,” IEEE Geosci. Remote
Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2021.
3135935.
[82] V. Chalavadi, J. Prudviraj, R. Datla, C. S. Babu, and C. K. Mohan, “mSODANet: A network for multi-scale object detection
in aerial images using hierarchical dilated convolutions,”
Pattern Recognit., vol. 126, Jun. 2022, Art. no. 108548, doi:
10.1016/j.patcog.2022.108548.
[83] G. Cheng, Y. Si, H. Hong, X. Yao, and L. Guo, “Cross-scale
feature fusion for object detection in optical remote sensing
images,” IEEE Geosci. Remote Sens. Lett., vol. 18, no. 3, pp. 431–
435, Mar. 2021, doi: 10.1109/LGRS.2020.2975541.
34
[84] J. Fu, X. Sun, Z. Wang, and K. Fu, “An anchor-free method
based on feature balancing and refinement network for multiscale ship detection in SAR images,” IEEE Trans. Geosci. Remote
Sens., vol. 59, no. 2, pp. 1331–1344, Feb. 2021, doi: 10.1109/
TGRS.2020.3005151.
[85] Y. Liu, Q. Li, Y. Yuan, Q. Du, and Q. Wang, “ABNet: Adaptive
balanced network for multiscale object detection in remote
sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60,
pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3133956.
[86] H. Guo, X. Yang, N. Wang, B. Song, and X. Gao, “A rotational
libra R-CNN method for ship detection,” IEEE Trans. Geosci.
Remote Sens., vol. 58, no. 8, pp. 5772–5781, Aug. 2020, doi:
10.1109/TGRS.2020.2969979.
[87] T. Zhang, Y. Zhuang, G. Wang, S. Dong, H. Chen, and L. Li,
“Multiscale semantic fusion-guided fractal convolutional object detection network for optical remote sensing imagery,”
IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–20, 2022, doi:
10.1109/TGRS.2021.3108476.
[88] Y. Zheng, P. Sun, Z. Zhou, W. Xu, and Q. Ren, “ADT-Det:
Adaptive dynamic refined single-stage transformer detector
for arbitrary-oriented object detection in satellite optical imagery,” Remote Sens., vol. 13, no. 13, Jul. 2021, Art. no. 2623,
doi: 10.3390/rs13132623.
[89] Z. Wei et al., “Learning calibrated-guidance for object detection in aerial images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 2721–2733, Mar. 2022, doi: 10.1109/
JSTARS.2022.3158903.
[90] L. Chen, C. Liu, F. Chang, S. Li, and Z. Nie, “Adaptive multilevel feature fusion and attention-based network for arbitraryoriented object detection in remote sensing imagery,” Neurocomputing, vol. 451, no. 8, pp. 67–80, Apr. 2021, doi: 10.1016/j.
neucom.2021.04.011.
[91] X. Sun, P. Wang, C. Wang, Y. Liu, and K. Fu, “PBNET: Partbased convolutional neural network for complex composite
object detection in remote sensing imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 173, pp. 50–65, Mar. 2021, doi:
10.1016/j.isprsjprs.2020.12.015.
[92] T. Zhang et al., “Balance learning for ship detection from synthetic aperture radar remote sensing imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 182, pp. 190–207, Dec. 2021, doi:
10.1016/j.isprsjprs.2021.10.010.
[93] T. Zhang, X. Zhang, and X. Ke, “Quad-FPN: A novel quad feature
pyramid network for SAR ship detection,” Remote Sens., vol. 13,
no. 14, Jul. 2021, Art. no. 2771, doi: 10.3390/rs13142771.
[94] J. Song, L. Miao, Q. Ming, Z. Zhou, and Y. Dong, “Finegrained object detection in remote sensing images via adaptive label assignment and refined-balanced feature pyramid network,” IEEE J. Sel. Topics Appl. Earth Observ. Remote
Sens., vol. 16, pp. 71–82, 2023, doi: 10.1109/JSTARS.2022.
3224558.
[95] W. Guo, W. Yang, H. Zhang, and G. Hua, “Geospatial object
detection in high resolution satellite images based on multiscale convolutional neural network,” Remote Sens., vol. 10, no.
1, Jan. 2018, Art. no. 131, doi: 10.3390/rs10010131.
[96] S. Zhang, G. He, H. Chen, N. Jing, and Q. Wang, “Scale adaptive proposal network for object detection in remote sensing
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
[97]
[98]
[99]
[100]
[101]
[102]
[103]
[104]
[105]
[106]
[107]
[108]
[109]
images,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 6, pp. 864–
868, Jun. 2019, doi: 10.1109/LGRS.2018.2888887.
C. Li, C. Xu, Z. Cui, D. Wang, T. Zhang, and J. Yang, “Featureattentioned object detection in remote sensing imagery,” in
Proc. IEEE Int. Conf. Image Process. Conf. (ICIP), 2019, pp. 3886–
3890, doi: 10.1109/ICIP.2019.8803521.
Z. Dong, M. Wang, Y. Wang, Y. Zhu, and Z. Zhang, “Object
detection in high resolution remote sensing imagery based
on convolutional neural networks with suitable object scale
features,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 3, pp.
2104–2114, Mar. 2020, doi: 10.1109/TGRS.2019.2953119.
H. Qiu, H. Li, Q. Wu, F. Meng, K. N. Ngan, and H. Shi, “A2RMNET: Adaptively aspect ratio multi-scale network for object
detection in remote sensing images,” Remote Sens., vol. 11, no.
13, Jul. 2019, Art. no. 1594, doi: 10.3390/rs11131594.
J. Hou, X. Zhu, and X. Yin, “Self-adaptive aspect ratio anchor for oriented object detection in remote sensing images,”
Remote Sens., vol. 13, no. 7, Mar. 2021, Art. no. 1318, doi:
10.3390/rs13071318.
N. Mo, L. Yan, R. Zhu, and H. Xie, “Class-specific anchor
based and context-guided multi-class object detection in high
resolution remote sensing imagery with a convolutional neural network,” Remote Sens., vol. 11, no. 3, Jan. 2019, Art. no.
272, doi: 10.3390/rs11030272.
Z. Tian, R. Zhan, J. Hu, W. Wang, Z. He, and Z. Zhuang, “Generating anchor boxes based on attention mechanism for object
detection in remote sensing images,” Remote Sens., vol. 12, no.
15, Jul. 2020, Art. no. 2416, doi: 10.3390/rs12152416.
Z. Teng, Y. Duan, Y. Liu, B. Zhang, and J. Fan, “Global to local:
Clip-LSTM-based object detection from remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–13, 2022,
doi: 10.1109/TGRS.2021.3064840.
Y. Yu, H. Guan, D. Li, T. Gu, E. Tang, and A. Li, “Orientation
guided anchoring for geospatial object detection from remote
sensing imagery,” ISPRS J. Photogrammetry Remote Sens., vol.
160, pp. 67–82, Feb. 2020, doi: 10.1016/j.isprsjprs.2019.12.001.
J. Wang, K. Chen, S. Yang, C. C. Loy, and D. Lin, “Region proposal by guided anchoring,” in Proc. IEEE Conf. Comput. Vision
Pattern Recognit. (CVPR), 2019, pp. 2960–2969, doi: 10.1109/
CVPR.2019.00308.
X. Yang and J. Yan, “On the arbitrary-oriented object detection: Classification based approaches revisited,” Int. J. Comput. Vision, vol. 130, no. 5, pp. 1340–1365, Mar. 2022, doi:
10.1007/s11263-022-01593-w.
X. Yang et al., “SCRDet: Towards more robust detection for
small, cluttered and rotated objects,” in Proc. IEEE Int. Conf.
Comput. Vision (ICCV), 2019, pp. 8231–8240, doi: 10.1109/
ICCV.2019.00832.
X. Yang, J. Yan, Z. Feng, and T. He, “R3Det: Refined singlestage detector with feature refinement for rotating object,” in
Pro. AAAI Conf. Artif. Intell., 2021, vol. 35, no. 4, pp. 3163–3171,
doi: 10.1609/aaai.v35i4.16426.
X. Yang et al., “Automatic ship detection in remote sensing
images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks,” Remote Sens.,
vol. 10, no. 1, Jan. 2018, Art. no. 132, doi: 10.3390/rs10010132.
[110] X. Yang, H. Sun, X. Sun, M. Yan, Z. Guo, and K. Fu, “Position
detection and direction prediction for arbitrary-oriented ships
via multitask rotation region convolutional neural network,”
IEEE Access, vol. 6, pp. 50,839–50,849, Sep. 2018, doi: 10.1109/
ACCESS.2018.2869884.
[111] Q. Ming, L. Miao, Z. Zhou, and Y. Dong, “CFC-Net: A critical
feature capturing network for arbitrary-oriented object detection in remote-sensing images,” IEEE Trans. Geosci. Remote Sens.,
vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3095186.
[112] Q. Ming, Z. Zhou, L. Miao, H. Zhang, and L. Li, “Dynamic anchor learning for arbitrary-oriented object detection,” in Proc.
AAAI Conf. Artif. Intell., 2021, pp. 2355–2363, doi: 10.1609/
aaai.v35i3.16336.
[113] Y. Zhu, J. Du, and X. Wu, “Adaptive period embedding for representing oriented objects in aerial images,” IEEE Trans. Geosci.
Remote Sens., vol. 58, no. 10, pp. 7247–7257, Oct. 2020, doi:
10.1109/TGRS.2020.2981203.
[114] J. Ding, N. Xue, Y. Long, G.-S. Xia, and Q. Lu, “Learning RoI
transformer for oriented object detection in aerial images,” in
Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2019,
pp. 2844–2853, doi: 10.1109/CVPR.2019.00296.
[115] Q. An, Z. Pan, L. Liu, and H. You, “DRBox-v2: An improved detector with rotatable boxes for target detection in SAR images,”
IEEE Trans. Geosci. Remote Sens., vol. 57, no. 11, pp. 8333–8349,
Nov. 2019, doi: 10.1109/TGRS.2019.2920534.
[116] Q. Li, L. Mou, Q. Xu, Y. Zhang, and X. X. Zhu, “R 3-Net: A deep
network for multi-oriented vehicle detection in aerial images and videos,” 2018. [Online]. Available: http://arxiv.org/
abs/1808.05560
[117] G. Xia et al., “DOTA: A large-scale dataset for object detection in aerial images,” in Proc. IEEE Int. Conf. Comput. Vision
Pattern Recognit. (CVPR), 2018, pp. 3974–3983, doi: 10.1109/
CVPR.2018.00418.
[118] Y. Liu, S. Zhang, L. Jin, L. Xie, Y. Wu, and Z. Wang, “Omnidirectional scene text detection with sequential-free box discretization,” 2019, arXiv:1906.02371.
[119] W. Qian, X. Yang, S. Peng, J. Yan, and Y. Guo, “Learning modulated loss for rotated object detection,” in Proc. AAAI Conf.
Artif. Intell., 2021, vol. 35, no. 3, pp. 2458–2466, doi: 10.1609/
aaai.v35i3.16347.
[120] Y. Xu et al., “Gliding vertex on the horizontal bounding box
for multi-oriented object detection,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 43, no. 4, pp. 1452–1459, Apr. 2021, doi:
10.1109/TPAMI.2020.2974745.
[121] W. Qian, X. Yang, S. Peng, X. Zhang, and J. Yan, “RSDet++:
Point-based modulated loss for more accurate rotated object detection,” IEEE Trans. Circuits Syst. Video Technol., vol.
32, no. 11, pp. 7869–7879, Nov. 2022, doi: 10.1109/TCSVT.
2022.3186070.
[122] J. Luo, Y. Hu, and J. Li, “Surround-net: A multi-branch arbitraryoriented detector for remote sensing,” Remote Sens., vol. 14,
no. 7, Apr. 2022, Art. no. 1751, doi: 10.3390/rs14071751.
[123] Q. Song, F. Yang, L. Yang, C. Liu, M. Hu, and L. Xia, “Learning
point-guided localization for detection in remote sensing images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14,
pp. 1084–1094, 2021, doi: 10.1109/JSTARS.2020.3036685.
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
35
[124] X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, “Oriented
R-CNN for object detection,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV), 2021, pp. 3500–3509, doi: 10.1109/
ICCV48922.2021.00350.
[125] Y. Yao et al., “On improving bounding box representations for
oriented object detection,” IEEE Trans. Geosci. Remote Sens., vol.
61, pp. 1–11, 2023, doi: 10.1109/TGRS.2022.3231340.
[126] Q. Ming, L. Miao, Z. Zhou, X. Yang, and Y. Dong, “Optimization for arbitrary-oriented object detection via representation
invariance loss,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp.
1–5, 2022, doi: 10.1109/LGRS.2021.3115110.
[127] X. Yang and J. Yan, “Arbitrary-oriented object detection
with circular smooth label,” in Proc. Eur. Conf. Comput. Vision, Cham, Switzerland: Springer, 2020, pp. 677–694, doi:
10.1007/978-3-030-58598-3_40.
[128] X. Yang, L. Hou, Y. Zhou, W. Wang, and J. Yan, “Dense label encoding for boundary discontinuity free rotation detection,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit.
(CVPR), 2021, pp. 15,819–15,829, doi: 10.1109/CVPR46437.
2021.01556.
[129] J. Wang, F. Li, and H. Bi, “Gaussian focal loss: Learning distribution polarized angle prediction for rotated object detection in aerial images,” IEEE Trans. Geosci. Remote Sens., vol. 60,
pp. 1–13, 2022, doi: 10.1109/TGRS.2022.3175520.
[130] X. Yang, J. Yan, Q. Ming, W. Wang, X. Zhang, and Q. Tian,
“Rethinking rotated object detection with gaussian Wasserstein distance loss,” in Proc. Int. Conf. Machine Learn, 2021,
pp. 11,830–11,841.
[131] X. Yang et al., “Learning high-precision bounding box for
rotated object detection via Kullback-Leibler divergence,” in
Proc. Adv. Neural Inf. Process. Syst. 34, 2021, vol. 34, pp. 18,381–
18,394.
[132] X. Yang et al., “The KFIOU loss for rotated object detection,”
2022, arXiv:2201.12558.
[133] X. Yang et al., “Detecting rotated objects as gaussian distributions and its 3-D generalization,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 45, no. 4, pp. 4335–4354, Apr. 2023, doi:
10.1109/TPAMI.2022.3197152.
[134] J. Wang, J. Ding, H. Guo, W. Cheng, T. Pan, and W. Yang,
“Mask OBB: A semantic attention-based mask oriented
bounding box representation for multi-category object detection in aerial images,” Remote Sens., vol. 11, no. 24, Dec. 2019,
Art. no. 2930, doi: 10.3390/rs11242930.
[135] X. Zhang, G. Wang, P. Zhu, T. Zhang, C. Li, and L. Jiao, “GRSDet: An anchor-free rotation ship detector based on gaussianmask in remote sensing images,” IEEE Trans. Geosci. Remote
Sens., vol. 59, no. 4, pp. 3518–3531, Apr. 2021, doi: 10.1109/
TGRS.2020.3018106.
[136] Y. Yang et al., “AR 2Det: An accurate and real-time rotational
one-stage ship detector in remote sensing images,” IEEE Trans.
Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/
TGRS.2021.3092433.
[137] F. Zhang, X. Wang, S. Zhou, Y. Wang, and Y. Hou, “Arbitraryoriented ship detection through center-head point extraction,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, Oct.
2021, doi: 10.1109/TGRS.2021.3120411.
36
[138] J. Yi, P. Wu, B. Liu, Q. Huang, H. Qu, and D. Metaxas, “Oriented object detection in aerial images with box boundary-aware
vectors,” in Proc. IEEE Winter Conf. Appl. Comput. Vision (WACV),
2021, pp. 2149–2158, doi: 10.1109/WACV48630.2021.00220.
[139] Z. Xiao, L. Qian, W. Shao, X. Tan, and K. Wang, “Axis learning for orientated objects detection in aerial images,” Remote
Sens., vol. 12, no. 6, Mar. 2020, Art. no. 908, doi: 10.3390/
rs12060908.
[140] X. He, S. Ma, L. He, L. Ru, and C. Wang, “Learning rotated inscribed ellipse for oriented object detection in remote sensing
images,” Remote Sens., vol. 13, no. 18, Sep. 2021, Art. no. 3622,
doi: 10.3390/rs13183622.
[141] K. Fu, Z. Chang, Y. Zhang, and X. Sun, “Point-based estimator
for arbitrary-oriented object detection in aerial images,” IEEE
Trans. Geosci. Remote Sens., vol. 59, no. 5, pp. 4370–4387, May
2021, doi: 10.1109/TGRS.2020.3020165.
[142] H. Wei, Y. Zhang, Z. Chang, H. Li, H. Wang, and X. Sun, “Oriented objects as pairs of middle lines,” ISPRS J. Photogrammetry
Remote Sens., vol. 169, pp. 268–279, Nov. 2020, doi: 10.1016/j.
isprsjprs.2020.09.022.
[143] L. Zhou, H. Wei, H. Li, W. Zhao, Y. Zhang, and Y. Zhang,
“Arbitrary-oriented object detection in remote sensing images
based on polar coordinates,” IEEE Access, vol. 8, pp. 223,373–
223,384, Nov. 2020, doi: 10.1109/ACCESS.2020.3041025.
[144] X. Zheng, W. Zhang, L. Huan, J. Gong, and H. Zhang, “AProNet:
Detecting objects with precise orientation from aerial images,” ISPRS J. Photogrammetry Remote Sens., vol. 181, pp. 99–112,
Nov. 2021, doi: 10.1016/j.isprsjprs.2021.08.023.
[145] X. Yang, G. Zhang, W. Li, X. Wang, Y. Zhou, and J. Yan,
“H2RBox: Horizontal box annotation is all you need for oriented object detection,” 2022. [Online]. Available: https://
arxiv.org/abs/2210.06742
[146] G. Cheng, P. Zhou, and J. Han, “Learning rotation-invariant
convolutional neural networks for object detection in VHR
optical remote sensing images,” IEEE Trans. Geosci. Remote
Sens., vol. 54, no. 12, pp. 7405–7415, Dec. 2016, doi: 10.1109/
TGRS.2016.2601622.
[147] K. Li, G. Cheng, S. Bu, and X. You, “Rotation-insensitive and
context-augmented object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 4, pp. 2337–
2348, Apr. 2018, doi: 10.1109/TGRS.2017.2778300.
[148] G. Cheng, P. Zhou, and J. Han, “RIFD-CNN: Rotation-invariant and fisher discriminative convolutional neural networks
for object detection,” in Proc. IEEE Conf. Comput. Vision
Pattern Recognit. (CVPR), 2016, pp. 2884–2893, doi: 10.1109/
CVPR.2016.315.
[149] G. Cheng, J. Han, P. Zhou, and D. Xu, “Learning rotationinvariant and fisher discriminative convolutional neural
networks for object detection,” IEEE Trans. Image Process.,
vol. 28, no. 1, pp. 265–278, Jan. 2019, doi: 10.1109/TIP.2018.
2867198.
[150] X. Wu, D. Hong, J. Tian, J. Chanussot, W. Li, and R. Tao, “ORSIm detector: A novel object detection framework in optical
remote sensing imagery using spatial-frequency channel features,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 5146–
5158, Jul. 2019, doi: 10.1109/TGRS.2019.2897139.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
[151] X. Wu, D. Hong, J. Chanussot, Y. Xu, R. Tao, and Y. Wang,
“Fourier-based rotation-invariant feature boosting: An efficient framework for geospatial object detection,” IEEE Geosci.
Remote Sens. Lett., vol. 17, no. 2, pp. 302–306, Feb. 2020, doi:
10.1109/LGRS.2019.2919755.
[152] G. Wang, X. Wang, B. Fan, and C. Pan, “Feature extraction by rotation-invariant matrix representation for object detection in aerial image,” IEEE Geosci. Remote Sens.
Lett., vol. 14, no. 6, pp. 851–855, Jun. 2017, doi: 10.1109/
LGRS.2017.2683495.
[153] X. Wu, D. Hong, P. Ghamisi, W. Li, and R. Tao, “MsRi-CCF:
Multi-scale and rotation-insensitive convolutional channel
features for geospatial object detection,” Remote Sens., vol. 10,
no. 12, Dec. 2018, Art. no. 1990, doi: 10.3390/rs10121990.
[154] M. Zand, A. Etemad, and M. Greenspan, “Oriented bounding boxes for small and freely rotated objects,” IEEE Trans.
Geosci. Remote Sens., vol. 60, pp. 1–15, 2022, doi: 10.1109/
TGRS.2021.3076050.
[155] J. Han, J. Ding, N. Xue, and G.-S. Xia, “ReDet: A rotation-equivariant detector for aerial object detection,” in Proc. IEEE Conf.
Comput. Vision Pattern Recognit. (CVPR), 2021, pp. 2785–2794,
doi: 10.1109/CVPR46437.2021.00281.
[156] J. Han, J. Ding, J. Li, and G. Xia, “Align deep features for oriented object detection,” IEEE Trans. Geosci. Remote Sens., vol. 60,
pp. 1–11, 2022, doi: 10.1109/TGRS.2021.3062048.
[157] X. Yao, H. Shen, X. Feng, G. Cheng, and J. Han, “R 2Ipoints:
Pursuing rotation-insensitive point representation for aerial
object detection,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp.
1–12, May 2022, doi: 10.1109/TGRS.2022.3173373.
[158] X. Ye, F. Xiong, J. Lu, J. Zhou, and Y. Qian, “R 3-net: Feature
fusion and filtration network for object detection in optical remote sensing images,” Remote Sens., vol. 12, no. 24, Dec. 2020,
Art. no. 4027, doi: 10.3390/rs12244027.
[159] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit.
(CVPR), 2018, pp. 7132–7141, doi: 10.1109/CVPR.2018.00745.
[160] X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel networks,”
in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR),
2019, pp. 510–519, doi: 10.1109/CVPR.2019.00060.
[161] J. Fu et al., “Dual attention network for scene segmentation,”
in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), Jun.
2019, pp. 3141–3149, doi: 10.1109/CVPR.2019.00326.
[162] Z. Huang, W. Li, X. Xia, X. Wu, Z. Cai, and R. Tao, “A novel
nonlocal-aware pyramid and multiscale multitask refinement detector for object detection in remote sensing images,”
IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–20, 2022, doi:
10.1109/TGRS.2021.3059450.
[163] Y. Sun, X. Sun, Z. Wang, and K. Fu, “Oriented ship detection
based on strong scattering points network in large-scale SAR
images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–18,
2022, doi: 10.1109/TGRS.2021.3130117.
[164] W. Ma et al., “Feature split–merge–enhancement network for remote sensing object detection,” IEEE Trans. Geosci. Remote Sens.,
vol. 60, pp. 1–17, Jan. 2022, doi: 10.1109/TGRS.2022.3140856.
[165] Z. Cui, X. Wang, N. Liu, Z. Cao, and J. Yang, “Ship detection
in large-scale SAR images via spatial shuffle-group enhance
[166]
[167]
[168]
[169]
[170]
[171]
[172]
[173]
[174]
[175]
[176]
[177]
[178]
attention,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 1, pp.
379–391, Jan. 2021, doi: 10.1109/TGRS.2020.2997200.
J. Chen, L. Wan, J. Zhu, G. Xu, and M. Deng, “Multi-scale
spatial and channel-wise attention for improving object detection in remote sensing imagery,” IEEE Geosci. Remote Sens.
Lett., vol. 17, no. 4, pp. 681–685, Apr. 2020, doi: 10.1109/
LGRS.2019.2930462.
J. Bai et al., “Object detection in large-scale remote-sensing
images based on time-frequency analysis and feature optimization,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–16,
2022, doi: 10.1109/TGRS.2021.3119344.
J. Hu, X. Zhi, S. Jiang, H. Tang, W. Zhang, and L. Bruzzone, “Supervised multi-scale attention-guided ship detection in optical
remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol.
60, pp. 1–14, Sep. 2022, doi: 10.1109/TGRS.2022.3206306.
Y. Guo, X. Tong, X. Xu, S. Liu, Y. Feng, and H. Xie, “An anchorfree network with density map and attention mechanism for
multiscale object detection in aerial images,” IEEE Geosci.
Remote Sens. Lett., vol. 19, pp. 1–5, Sep. 2022, doi: 10.1109/
LGRS.2022.3207178.
D. Yu and S. Ji, “A new spatial-oriented object detection framework for remote sensing images,” IEEE Trans.
Geosci. Remote Sens., vol. 60, pp. 1–16, 2022, doi: 10.1109/
TGRS.2021.3127232.
C. Li et al., “Object detection based on global-local saliency
constraint in aerial images,” Remote Sens., vol. 12, no. 9, May
2020, Art. no. 1435, doi: 10.3390/rs12091435.
J. Lei, X. Luo, L. Fang, M. Wang, and Y. Gu, “Region-enhanced
convolutional neural network for object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol.
58, no. 8, pp. 5693–5702, Aug. 2020, doi: 10.1109/TGRS.
2020.2968802.
Y. Yuan, C. Li, J. Kim, W. Cai, and D. D. Feng, “Reversion correction and regularized random walk ranking for saliency
detection,” IEEE Trans. Image Process., vol. 27, no. 3, pp. 1311–
1322, Mar. 2018, doi: 10.1109/TIP.2017.2762422.
C. Xu, C. Li, Z. Cui, T. Zhang, and J. Yang, “Hierarchical semantic propagation for object detection in remote sensing
imagery,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 6, pp.
4353–4364, Jun. 2020, doi: 10.1109/TGRS.2019.2963243.
T. Zhang et al., “Foreground refinement network for rotated object detection in remote sensing images,” IEEE Trans.
Geosci. Remote Sens., vol. 60, pp. 1–13, 2022, doi: 10.1109/
TGRS.2021.3109145.
J. Wang, W. Yang, H. Li, H. Zhang, and G. Xia, “Learning center probability map for detecting objects in aerial images,”
IEEE Trans. Geosci. Remote Sens., vol. 59, no. 5, pp. 4307–4323,
May 2021, doi: 10.1109/TGRS.2020.3010051.
Z. Fang, J. Ren, H. Sun, S. Marshall, J. Han, and H. Zhao, “SAFDet: A semi-anchor-free detector for effective detection of oriented objects in aerial images,” Remote Sens., vol. 12, no. 19,
Oct. 2020, Art. no. 3225, doi: 10.3390/rs12193225.
Z. Ren, Y. Tang, Z. He, L. Tian, Y. Yang, and W. Zhang, “Ship detection in high-resolution optical remote sensing images aided
by saliency information,” IEEE Trans. Geosci. Remote Sens., vol.
60, pp. 1–16, May 2022, doi: 10.1109/TGRS.2022.3173610.
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
37
[179] H. Qu, L. Shen, W. Guo, and J. Wang, “Ships detection in SAR
images based on anchor-free model with mask guidance features,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol.
15, pp. 666–675, 2022, doi: 10.1109/JSTARS.2021.3137390.
[180] S. Liu, L. Zhang, H. Lu, and Y. He, “Center-boundary dual
attention for oriented object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022,
doi: 10.1109/TGRS.2021.3069056.
[181] J. Zhang, C. Xie, X. Xu, Z. Shi, and B. Pan, “A contextual bidirectional enhancement method for remote sensing image
object detection,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 13, pp. 4518–4531, Aug. 2020, doi: 10.1109/
JSTARS.2020.3015049.
[182] Y. Gong et al., “Context-aware convolutional neural network
for object detection in VHR remote sensing imagery,” IEEE
Trans. Geosci. Remote Sens., vol. 58, no. 1, pp. 34–44, Jan. 2020,
doi: 10.1109/TGRS.2019.2930246.
[183] W. Ma, Q. Guo, Y. Wu, W. Zhao, X. Zhang, and L. Jiao, “A novel
multi-model decision fusion network for object detection in
remote sensing images,” Remote Sens., vol. 11, no. 7, Mar. 2019,
Art. no. 737, doi: 10.3390/rs11070737.
[184] S. Tian et al., “Siamese graph embedding network for object
detection in remote sensing images,” IEEE Geosci. Remote Sens.
Lett., vol. 18, no. 4, pp. 602–606, Apr. 2021, doi: 10.1109/
LGRS.2020.2981420.
[185] S. Tian, L. Kang, X. Xing, J. Tian, C. Fan, and Y. Zhang, “A
relation-augmented embedded graph attention network
for remote sensing object detection,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–18, 2022, doi: 10.1109/TGRS.2021.
3073269.
[186] Y. Wu, K. Zhang, J. Wang, Y. Wang, Q. Wang, and Q. Li, “CDDNet: A context-driven detection network for multiclass object
detection,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5,
2022, doi: 10.1109/LGRS.2020.3042465.
[187] Y. Han, J. Liao, T. Lu, T. Pu, and Z. Peng, “KCPNet: Knowledgedriven context perception networks for ship detection in infrared imagery,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–19,
2023, doi: 10.1109/TGRS.2022.3233401.
[188] C. Chen, W. Gong, Y. Chen, and W. Li, “Object detection in
remote sensing images based on a scene-contextual feature
pyramid network,” Remote Sens., vol. 11, no. 3, Feb. 2019, Art.
no. 339, doi: 10.3390/rs11030339.
[189] Z. Wu, B. Hou, B. Ren, Z. Ren, S. Wang, and L. Jiao, “A
deep detection network based on interaction of instance
segmentation and object detection for SAR images,” Remote Sens., vol. 13, no. 13, Jul. 2021, Art. no. 2582, doi:
10.3390/rs13132582.
[190] Y. Wu, K. Zhang, J. Wang, Y. Wang, Q. Wang, and X. Li, “GCWNet: A global context-weaving network for object detection
in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol.
60, pp. 1–12, Mar. 2022, doi: 10.1109/TGRS.2022.3155899.
[191] G. Shi, J. Zhang, J. Liu, C. Zhang, C. Zhou, and S. Yang,
“Global context-augmented objection detection in VHR optical remote sensing images,” IEEE Trans. Geosci. Remote Sens.,
vol. 59, no. 12, pp. 10,604–10,617, Dec. 2021, doi: 10.1109/
TGRS.2020.3043252.
38
[192] J. Liu, S. Li, C. Zhou, X. Cao, Y. Gao, and B. Wang, “SRAFNet: A scene-relevant anchor-free object detection network in remote sensing images,” IEEE Trans. Geosci. Remote
Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.
3124959.
[193] C. Tao, L. Mi, Y. Li, J. Qi, Y. Xiao, and J. Zhang, “Scene contextdriven vehicle detection in high-resolution aerial images,”
IEEE Trans. Geosci. Remote Sens., vol. 57, no. 10, pp. 7339–7351,
Oct. 2019, doi: 10.1109/TGRS.2019.2912985.
[194] K. Zhang, Y. Wu, J. Wang, Y. Wang, and Q. Wang, “Semantic
context-aware network for multiscale object detection in remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 19,
pp. 1–5, 2022, doi: 10.1109/LGRS.2021.3067313.
[195] M. Wang, Q. Li, Y. Gu, L. Fang, and X. X. Zhu, “SCAF-Net:
Scene context attention-based fusion network for vehicle detection in aerial imagery,” IEEE Geosci. Remote Sens. Lett., vol.
19, pp. 1–5, 2022, doi: 10.1109/LGRS.2021.3107281.
[196] G. Zhang, S. Lu, and W. Zhang, “CAD-Net: A context-aware
detection network for objects in remote sensing imagery,”
IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12, pp. 10,015–
10,024, Dec. 2019, doi: 10.1109/TGRS.2019.2930982.
[197] E. Liu, Y. Zheng, B. Pan, X. Xu, and Z. Shi, “DCL-Net: Augmenting the capability of classification and localization for
remote sensing object detection,” IEEE Trans. Geosci. Remote
Sens., vol. 59, no. 9, pp. 7933–7944, Sep. 2021, doi: 10.1109/
TGRS.2020.3048384.
[198] Y. Feng, W. Diao, X. Sun, M. Yan, and X. Gao, “Towards automated ship detection and category recognition from highresolution aerial images,” Remote Sens., vol. 11, no. 16, Aug.
2019, Art. no. 1901, doi: 10.3390/rs11161901.
[199] P. Wang, X. Sun, W. Diao, and K. Fu, “FMSSD: Featuremerged single-shot detection for multiscale objects in largescale remote sensing imagery,” IEEE Trans. Geosci. Remote
Sens., vol. 58, no. 5, pp. 3377–3390, May 2020, doi: 10.1109/
TGRS.2019.2954328.
[200] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and
A. L. Yuille, “DeepLab: Semantic image segmentation
with deep convolutional nets, Atrous convolution, and
fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018, doi: 10.1109/
TPAMI.2017.2699184.
[201] Y. Bai, R. Li, S. Gou, C. Zhang, Y. Chen, and Z. Zheng, “Crossconnected bidirectional pyramid network for infrared smalldim target detection,” IEEE Geosci. Remote Sens. Lett., vol. 19,
pp. 1–5, Jan. 2022, doi: 10.1109/LGRS.2022.3145577.
[202] Y. Li, Q. Huang, X. Pei, Y. Chen, L. Jiao, and R. Shang,
“Cross-layer attention network for small object detection in
remote sensing imagery,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 2148–2161, 2021, doi: 10.1109/
JSTARS.2020.3046482.
[203] H. Gong et al., “Swin-transformer-enabled YOLOv5 with attention mechanism for small object detection on satellite images,” Remote Sens., vol. 14, no. 12, Jun. 2022, Art. no. 2861,
doi: 10.3390/rs14122861.
[204] J. Qu, C. Su, Z. Zhang, and A. Razi, “Dilated convolution
and feature fusion SSD network for small object detection in
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
[205]
[206]
[207]
[208]
[209]
[210]
[211]
[212]
[213]
[214]
[215]
[216]
[217]
remote sensing images,” IEEE Access, vol. 8, pp. 82,832–82,843,
Apr. 2020, doi: 10.1109/ACCESS.2020.2991439.
T. Ma, Z. Yang, J. Wang, S. Sun, X. Ren, and U. Ahmad, “Infrared small target detection network with generate label and
feature mapping,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp.
1–5, Jan. 2022, doi: 10.1109/LGRS.2022.3140432.
W. Han, A. Kuerban, Y. Yang, Z. Huang, B. Liu, and J. Gao,
“Multi-vision network for accurate and real-time small object detection in optical remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/
LGRS.2020.3044422.
Q. Hou, Z. Wang, F. Tan, Y. Zhao, H. Zheng, and W. Zhang,
“RISTDnet: Robust infrared small target detection network,”
IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi:
10.1109/LGRS.2021.3050828.
X. Lu, Y. Zhang, Y. Yuan, and Y. Feng, “Gated and axis-concentrated localization network for remote sensing object detection,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 1, pp.
179–192, Jan. 2020, doi: 10.1109/TGRS.2019.2935177.
L. Courtrai, M. Pham, and S. Lefèvre, “Small object detection
in remote sensing images based on super-resolution with auxiliary generative adversarial networks,” Remote Sens., vol. 12,
no. 19, Sep. 2020, Art. no. 3152, doi: 10.3390/rs12193152.
S. M. A. Bashir and Y. Wang, “Small object detection in
remote sensing images with residual feature aggregationbased super-resolution and object detector network,” Remote Sens., vol. 13, no. 9, May 2021, Art. no. 1854, doi:
10.3390/rs13091854.
J. Rabbi, N. Ray, M. Schubert, S. Chowdhury, and D. Chao,
“Small-object detection in remote sensing images with endto-end edge-enhanced GAN and object detector network,”
Remote Sens., vol. 12, no. 9, May 2020, Art. no. 1432, doi:
10.3390/rs12091432.
J. Wu and S. Xu, “From point to region: Accurate and efficient
hierarchical small object detection in low-resolution remote
sensing images,” Remote Sens., vol. 13, no. 13, Jul. 2021, Art.
no. 2620, doi: 10.3390/rs13132620.
J. Li, Z. Zhang, Y. Tian, Y. Xu, Y. Wen, and S. Wang, “Targetguided feature super-resolution for vehicle detection in remote
sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp.
1–5, 2022, doi: 10.1109/LGRS.2021.3112172.
J. Chen, K. Chen, H. Chen, Z. Zou, and Z. Shi, “A degraded reconstruction enhancement-based method for tiny ship detection in remote sensing images with a new large-scale dataset,”
IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, Jun. 2022,
doi: 10.1109/TGRS.2022.3180894.
J. Pang, C. Li, J. Shi, Z. Xu, and H. Feng, “R 2-CNN: Fast tiny
object detection in large-scale remote sensing images,” IEEE
Trans. Geosci. Remote Sens., vol. 57, no. 8, pp. 5512–5524, Aug.
2019, doi: 10.1109/TGRS.2019.2899955.
J. Wu, Z. Pan, B. Lei, and Y. Hu, “FSANet: Feature-and-spatialaligned network for tiny object detection in remote sensing
images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–17, Sep.
2022, doi: 10.1109/TGRS.2022.3205052.
M. Pham, L. Courtrai, C. Friguet, S. Lefèvre, and A. Baussard,
“YOLO-Fine: One-stage detector of small objects under vari-
[218]
[219]
[220]
[221]
[222]
[223]
[224]
[225]
[226]
[227]
[228]
[229]
ous backgrounds in remote sensing images,” Remote Sens., vol.
12, no. 15, Aug. 2020, Art. no. 2501, doi: 10.3390/rs12152501.
J. Yan, H. Wang, M. Yan, W. Diao, X. Sun, and H. Li, “IoU-adaptive deformable R-CNN: Make full use of IOU for multi-class
object detection in remote sensing imagery,” Remote Sens., vol.
11, no. 3, Feb. 2019, Art. no. 286, doi: 10.3390/rs11030286.
R. Dong, D. Xu, J. Zhao, L. Jiao, and J. An, “Sig-NMS-based
faster R-CNN combining transfer learning for small target detection in VHR optical remote sensing imagery,” IEEE Trans.
Geosci. Remote Sens., vol. 57, no. 11, pp. 8534–8545, Nov. 2019,
doi: 10.1109/TGRS.2019.2921396.
Z. Shu, X. Hu, and J. Sun, “Center-point-guided proposal generation for detection of small and dense buildings in aerial
imagery,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 7, pp.
1100–1104, Jul. 2018, doi: 10.1109/LGRS.2018.2822760.
C. Xu, J. Wang, W. Yang, and L. Yu, “Dot distance for tiny object detection in aerial images,” in Proc. IEEE Int. Conf. Comput.
Vision Pattern Recognit. Workshops, 2021, pp. 1192–1201, doi:
10.1109/CVPRW53098.2021.00130.
C. Xu, J. Wang, W. Yang, H. Yu, L. Yu, and G. Xia, “RFLA:
Gaussian receptive field based label assignment for tiny object detection,” in Proc. 17th Eur. Conf., 2022, pp. 526–543, doi:
10.1007/978-3-031-20077-9_31.
F. Zhang, B. Du, L. Zhang, and M. Xu, “Weakly supervised
learning based on coupled convolutional neural networks
for aircraft detection,” IEEE Trans. Geosci. Remote Sens.,
vol. 54, no. 9, pp. 5553–5563, Sep. 2016, doi: 10.1109/TGRS.
2016.2569141.
Y. Li, B. He, F. Melgani, and T. Long, “Point-based weakly
supervised learning for object detection in high spatial resolution remote sensing images,” IEEE J. Sel. Topics Appl. Earth
Observ. Remote Sens., vol. 14, pp. 5361–5371, Apr. 2021, doi:
10.1109/JSTARS.2021.3076072.
D. Zhang, J. Han, G. Cheng, Z. Liu, S. Bu, and L. Guo, “Weakly
supervised learning for target detection in remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 4, pp. 701–
705, Apr. 2015, doi: 10.1109/LGRS.2014.2358994.
J. Han, D. Zhang, G. Cheng, L. Guo, and J. Ren, “Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning,” IEEE Trans.
Geosci. Remote Sens., vol. 53, no. 6, pp. 3325–3337, Jun. 2015,
doi: 10.1109/TGRS.2014.2374218.
Y. Li, Y. Zhang, X. Huang, and A. L. Yuille, “Deep networks
under scene-level supervision for multi-class geospatial object
detection from remote sensing images,” ISPRS J. Photogrammetry Remote Sens., vol. 146, pp. 182–196, Dec. 2018, doi:
10.1016/j.isprsjprs.2018.09.014.
H. Bilen and A. Vedaldi, “Weakly supervised deep detection networks,” in Proc. IEEE Conf. Comput. Vision Pattern
Recognit. (CVPR), 2016, pp. 2846–2854, doi: 10.1109/CVPR.
2016.311.
X. Yao, X. Feng, J. Han, G. Cheng, and L. Guo, “Automatic
weakly supervised object detection from high spatial resolution remote sensing images via dynamic curriculum learning,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 1, pp. 675–
685, Jan. 2021, doi: 10.1109/TGRS.2020.2991407.
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
39
[230] H. Wang et al., “Dynamic pseudo-label generation for weakly
supervised object detection in remote sensing images,” Remote
Sens., vol. 13, no. 8, Apr. 2021, Art. no. 1461, doi: 10.3390/
rs13081461.
[231] X. Feng, J. Han, X. Yao, and G. Cheng, “Progressive contextual instance refinement for weakly supervised object detection in remote sensing images,” IEEE Trans. Geosci. Remote
Sens., vol. 58, no. 11, pp. 8002–8012, Nov. 2020, doi: 10.1109/
TGRS.2020.2985989.
[232] P. Shamsolmoali, J. Chanussot, M. Zareapoor, H. Zhou, and
J. Yang, “Multipatch feature pyramid network for weakly supervised object detection in optical remote sensing images,”
IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–13, 2022, doi:
10.1109/TGRS.2021.3106442.
[233] B. Wang, Y. Zhao, and X. Li, “Multiple instance graph learning for weakly supervised remote sensing object detection,”
IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–12, 2022, doi:
10.1109/TGRS.2021.3123231.
[234] X. Feng, J. Han, X. Yao, and G. Cheng, “TCANet: Triple
context-aware network for weakly supervised object detection in remote sensing images,” IEEE Trans. Geosci. Remote
Sens., vol. 59, no. 8, pp. 6946–6955, Aug. 2021, doi: 10.1109/
TGRS.2020.3030990.
[235] X. Feng, X. Yao, G. Cheng, J. Han, and J. Han, “SAENet: Selfsupervised adversarial and equivariant network for weakly supervised object detection in remote sensing images,”
IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–11, 2022, doi:
10.1109/TGRS.2021.3105575.
[236] X. Qian et al., “Incorporating the completeness and difficulty of proposals into weakly supervised object detection in
remote sensing images,” IEEE J. Sel. Topics Appl. Earth Observ.
Remote Sens., vol. 15, pp. 1902–1911, Feb. 2022, doi: 10.1109/
JSTARS.2022.3150843.
[237] W. Qian, Z. Yan, Z. Zhu, and W. Yin, “Weakly supervised
part-based method for combined object detection in remote
sensing imagery,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 5024–5036, Jun. 2022, doi: 10.1109/
JSTARS.2022.3179026.
[238] S. Chen, D. Shao, X. Shu, C. Zhang, and J. Wang, “FCC-Net:
A full-coverage collaborative network for weakly supervised
remote sensing object detection,” Electronics, vol. 9, no. 9, Aug.
2020, Art. no. 1356, doi: 10.3390/electronics9091356.
[239] G. Cheng, X. Xie, W. Chen, X. Feng, X. Yao, and J. Han, “Selfguided proposal generation for weakly supervised object detection,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–11, Jun.
2022, doi: 10.1109/TGRS.2022.3181466.
[240] X. Feng, X. Yao, G. Cheng, and J. Han, “Weakly supervised
rotation-invariant aerial object detection network,” in
Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit.
(CVPR), 2022, pp. 14,126–14,135, doi: 10.1109/CVPR52688.
2022.01375.
[241] G. Wang, X. Zhang, Z. Peng, X. Jia, X. Tang, and L. Jiao, “MOL:
Towards accurate weakly supervised remote sensing object detection via multi-view noisy learning,” ISPRS J. Photogrammetry
Remote Sens., vol. 196, pp. 457–470, Feb. 2023, doi: 10.1016/j.
isprsjprs.2023.01.011.
40
[242] T. Deselaers, B. Alexe, and V. Ferrari, “Weakly supervised
localization and learning with generic knowledge,” Int. J.
Comput. Vision, vol. 100, no. 3, pp. 275–293, May 2012, doi:
10.1007/s11263-012-0538-3.
[243] B. Hou et al., “A neural network based on consistency learning
and adversarial learning for semisupervised synthetic aperture radar ship detection,” IEEE Trans. Geosci. Remote Sens., vol.
60, pp. 1–16, Jan. 2022, doi: 10.1109/TGRS.2022.3142017.
[244] Z. Song, J. Yang, D. Zhang, S. Wang, and Z. Li, “Semi-supervised dim and small infrared ship detection network based
on Haar wavelet,” IEEE Access, vol. 9, pp. 29,686–29,695, Feb.
2021, doi: 10.1109/ACCESS.2021.3058526.
[245] Y. Zhong, Z. Zheng, A. Ma, X. Lu, and L. Zhang, “COLOR:
Cycling, offline learning, and online representation framework for airport and airplane detection using GF-2 satellite
images,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 12, pp.
8438–8449, Dec. 2020, doi: 10.1109/TGRS.2020.2987907.
[246] Y. Wu, W. Zhao, R. Zhang, and F. Jiang, “AMR-Net: Arbitrary-oriented ship detection using attention module, multi-scale
feature fusion and rotation pseudo-label,” IEEE Access,
vol. 9, pp. 68,208–68,222, Apr. 2021, doi: 10.1109/ACCESS.
2021.3075857.
[247] S. Chen, R. Zhan, W. Wang, and J. Zhang, “Domain adaptation for semi-supervised ship detection in SAR images,” IEEE
Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, May 2022, doi:
10.1109/LGRS.2022.3171789.
[248] Z. Zhang, Z. Feng, and S. Yang, “Semi-supervised object detection framework with object first mixup for remote sensing images,” in Proc. Int. Geosci. Remote Sens. Symp., 2021, pp.
2596–2599, doi: 10.1109/IGARSS47720.2021.9554202.
[249] B. Xue and N. Tong, “DIOD: Fast and efficient weakly semisupervised deep complex ISAR object detection,” IEEE Trans.
Cybern., vol. 49, no. 11, pp. 3991–4003, Nov. 2019, doi:
10.1109/TCYB.2018.2856821.
[250] L. Liao, L. Du, and Y. Guo, “Semi-supervised SAR target detection based on an improved faster R-CNN,” Remote Sens.,
vol. 14, no. 1, 2022, Art. no. 143, doi: 10.3390/rs14010143.
[251] Y. Du, L. Du, Y. Guo, and Y. Shi, “Semisupervised SAR ship detection network via scene characteristic learning,” IEEE Trans.
Geosci. Remote Sens., vol. 61, pp. 1–17, Jan. 2023, doi: 10.1109/
TGRS.2023.3235859.
[252] D. Wei, Y. Du, L. Du, and L. Li, “Target detection network for
SAR images based on semi-supervised learning and attention
mechanism,” Remote Sens., vol. 13, no. 14, Jul. 2021, Art. no.
2686, doi: 10.3390/rs13142686.
[253] L. Chen, Y. Fu, S. You, and H. Liu, “Efficient hybrid supervision for instance segmentation in aerial images,” Remote Sens.,
vol.≈13, no. 2, Jan. 2021, Art. no. 252, doi: 10.3390/rs13020252.
[254] G. Cheng et al., “Prototype-CNN for few-shot object detection
in remote sensing images,” IEEE Trans. Geosci. Remote Sens.,
vol. 60, pp. 1–10, 2022, doi: 10.1109/TGRS.2021.3078507.
[255] X. Li, J. Deng, and Y. Fang, “Few-shot object detection on remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60,
pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3051383.
[256] L. Li, X. Yao, G. Cheng, M. Xu, J. Han, and J. Han, “Solo-to-collaborative dual-attention network for one-shot object detection
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
[257]
[258]
[259]
[260]
[261]
[262]
[263]
[264]
[265]
[266]
[267]
[268]
[269]
in remote sensing images,” IEEE Trans. Geosci. Remote Sens.,
vol. 60, pp. 1–11, 2022, doi: 10.1109/TGRS.2021.3091003.
H. Zhang, X. Zhang, G. Meng, C. Guo, and Z. Jiang, “Fewshot multi-class ship detection in remote sensing images using attention feature map and multi-relation detector,” Remote
Sens., vol. 14, no. 12, Jun. 2022, Art. no. 2790, doi: 10.3390/
rs14122790.
B. Wang, Z. Wang, X. Sun, H. Wang, and K. Fu, “DMMLNet: Deep metametric learning for few-shot geographic object segmentation in remote sensing imagery,” IEEE Trans.
Geosci. Remote Sens., vol. 60, pp. 1–18, 2022, doi: 10.1109/
TGRS.2021.3116672.
J. Li et al., “MM-RCNN: Toward few-shot object detection in
remote sensing images with meta memory,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, Dec. 2022, doi: 10.1109/
TGRS.2022.3228612.
Z. Zhao, P. Tang, L. Zhao, and Z. Zhang, “Few-shot object detection of remote sensing images via two-stage fine-tuning,”
IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi:
10.1109/LGRS.2021.3116858.
Y. Zhou, H. Hu, J. Zhao, H. Zhu, R. Yao, and W. Du, “Fewshot object detection via context-aware aggregation for remote
sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp.
1–5, 2022, doi: 10.1109/LGRS.2022.3171257.
Y. Wang, C. Xu, C. Liu, and Z. Li, “Context information refinement for few-shot object detection in remote sensing images,” Remote Sens., vol. 14, no. 14, Jul. 2022, Art. no. 3255, doi:
10.3390/rs14143255.
Z. Zhou, S. Li, W. Guo, and Y. Gu, “Few-shot aircraft detection in satellite videos based on feature scale selection pyramid and proposal contrastive learning,” Remote Sens., vol. 14,
no. 18, Sep. 2022, Art. no. 4581, doi: 10.3390/rs14184581.
S. Chen, J. Zhang, R. Zhan, R. Zhu, and W. Wang, “Few shot
object detection for SAR images via feature enhancement and
dynamic relationship modeling,” Remote Sens., vol. 14, no. 15,
Jul. 2022, Art. no. 3669, doi: 10.3390/rs14153669.
S. Liu, Y. You, H. Su, G. Meng, W. Yang, and F. Liu, “Few-shot
object detection in remote sensing image interpretation: Opportunities and challenges,” Remote Sens., vol. 14, no. 18, Sep.
2022, Art. no. 4435, doi: 10.3390/rs14184435.
X. Huang, B. He, M. Tong, D. Wang, and C. He, “Few-shot
object detection on remote sensing images via shared attention module and balanced fine-tuning strategy,” Remote
Sens., vol. 13, no. 19, Sep. 2021, Art. no. 3816, doi: 10.3390/
rs13193816.
Z. Xiao, J. Qi, W. Xue, and P. Zhong, “Few-shot object detection with self-adaptive attention network for remote
sensing images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 4854–4865, May 2021, doi: 10.1109/
JSTARS.2021.3078177.
S. Wolf, J. Meier, L. Sommer, and J. Beyerer, “Double head predictor based few-shot object detection for aerial imagery,” in
Proc. IEEE Int. Conf. Comput. Vision Workshops, 2021, pp. 721–
731, doi: 10.1109/ICCVW54120.2021.00086.
T. Zhang, X. Zhang, P. Zhu, X. Jia, X. Tang, and L. Jiao, “Generalized few-shot object detection in remote sensing images,”
[270]
[271]
[272]
[273]
[274]
[275]
[276]
[277]
[278]
[279]
[280]
[281]
[282]
[283]
ISPRS J. Photogrammetry Remote Sens., vol. 195, pp. 353–364,
Jan. 2023, doi: 10.1016/j.isprsjprs.2022.12.004.
G. Heitz and D. Koller, “Learning spatial context: Using stuff
to find things,” in Proc. Eur. Conf. Comput. Vision, Berlin, Heidelberg: Springer, 2008, vol. 5302, pp. 30–43.
C. Benedek, X. Descombes, and J. Zerubia, “Building development monitoring in multitemporal remotely sensed image
pairs with stochastic birth-death dynamics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 1, pp. 33–50, Jan. 2012, doi:
10.1109/TPAMI.2011.94.
S. Razakarivony and F. Jurie, “Vehicle detection in aerial imagery: A small target detection benchmark,” J. Vis. Commun.
Image Representation, vol. 34, pp. 187–203, Jan. 2016, doi:
10.1016/j.jvcir.2015.11.002.
K. Liu and G. Máttyus, “Fast multiclass vehicle detection on
aerial images,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 9, pp.
1938–1942, Sep. 2015, doi: 10.1109/LGRS.2015.2439517.
H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, and J. Jiao, “Orientation
robust object detection in aerial images using deep convolutional neural network,” in Proc. IEEE Int. Conf. Image Process.,
2015, pp. 3735–3739, doi: 10.1109/ICIP.2015.7351502.
T. N. Mundhenk, G. Konjevod, W. A. Sakla, and K. Boakye,
“A large contextual dataset for classification, detection and
counting of cars with deep learning,” in Proc. Eur. Conf. Comput. Vision, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds.
Cham, Switzerland: Springer, 2016, vol. 9907, pp. 785–800.
Z. Liu, H. Wang, L. Weng, and Y. Yang, “Ship rotated bounding box space for ship extraction from high-resolution optical
satellite images with complex backgrounds,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 8, pp. 1074–1078, Aug. 2016, doi:
10.1109/LGRS.2016.2565705.
J. Li, C. Qu, and J. Shao, “Ship detection in SAR images based
on an improved faster R-CNN,” in Proc. SAR Big Data Era, Models, Methods Appl. (BIGSARDATA), 2017, pp. 1–6, doi: 10.1109/
BIGSARDATA.2017.8124934.
Z. Zou and Z. Shi, “Random access memories: A new paradigm for target detection in high resolution aerial remote
sensing images,” IEEE Trans. Image Process., vol. 27, no. 3, pp.
1100–1111, Mar. 2018, doi: 10.1109/TIP.2017.2773199.
X. Sun, Z. Wang, Y. Sun, W. Diao, Y. Zhang, and K. Fu, “Airsarship-1.0: High-resolution SAR ship detection dataset,” J.
Radars, vol. 8, no. 6, pp. 852–863, Dec. 2019, doi: 10.12000/
JR19097.
W. Yu et al., “Mar20: A benchmark for military aircraft recognition in remote sensing images,” in Proc. Nat. Remote Sens.
Bull., 2022, pp. 1–11, doi: 10.11834/jrs.20222139.
K. Chen, M. Wu, J. Liu, and C. Zhang, “FGSD: A dataset
for fine-grained ship detection in high resolution satellite images,” 2020. [Online]. Available: https://arxiv.org/
abs/2003.06832
Y. Han, X. Yang, T. Pu, and Z. Peng, “Fine-grained recognition for oriented ship against complex scenes in optical remote
sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp.
1–18, 2022, doi: 10.1109/TGRS.2021.3123666.
J. Wang, W. Yang, H. Guo, R. Zhang, and G. Xia, “Tiny object
detection in aerial images,” in Proc. IEEE Int. Conf. Pattern
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
41
[284]
[285]
[286]
[287]
[288]
[289]
[290]
[291]
[292]
[293]
[294]
[295]
[296]
42
Recognit., 2020, pp. 3791–3798, doi: 10.1109/ICPR48806.
2021.9413340.
G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, and J. Han, “Towards large-scale small object detection: Survey and benchmarks,” 2022, arXiv:2207.14096.
T. Zhang, X. Zhang, J. Shi, and S. Wei, “HyperLi-Net: A hyperlight deep learning network for high-accurate and high-speed
ship detection from synthetic aperture radar imagery,” ISPRS J.
Photogrammetry Remote Sens., vol. 167, pp. 123–153, Sep. 2020,
doi: 10.1016/j.isprsjprs.2020.05.016.
T. Zhang et al., “SAR ship detection dataset (SSDD): Official
release and comprehensive data analysis,” Remote Sens., vol.
13, no. 18, Sep. 2021, Art. no. 3690, doi: 10.3390/rs13183690.
M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and
A. Zisserman, “The pascal visual object classes (VOC) challenge,” Int. J. Comput. Vision, vol. 88, no. 2, pp. 303–338, Jun.
2010, doi: 10.1007/s11263-009-0275-4.
A. G. Menezes, G. de Moura, C. Alves, and A. C. P. L. F. de
Carvalho, “Continual object detection: A review of definitions, strategies, and challenges,” Neural Netw., vol. 161, pp.
476–493, Apr. 2023, doi: 10.1016/j.neunet.2023.01.041.
C. Persello et al., “Deep learning and earth observation to support the sustainable development goals: Current approaches,
open challenges, and future opportunities,” IEEE Geosci. Remote Sens. Mag., vol. 10, no. 2, pp. 172–200, Jun. 2022, doi:
10.1109/MGRS.2021.3136100.
T. Hoeser, F. Bachofer, and C. Kuenzer, “Object detection
and image segmentation with deep learning on earth observation data: A review—part II: Applications,” Remote
Sen., vol. 12, no. 18, Sep. 2020, Art. no. 3053, doi: 10.3390/
rs12183053.
L. Ma, Y. Liu, X. Zhang, Y. Ye, G. Yin, and B. A. Johnson, “Deep
learning in remote sensing applications: A meta-analysis and
review,” ISPRS J. Photogrammetry Remote Sens., vol. 152, pp.
166–177, Jun. 2019, doi: 10.1016/j.isprsjprs.2019.04.015.
P. Barmpoutis, P. Papaioannou, K. Dimitropoulos, and N.
Grammalidis, “A review on early forest fire detection systems
using optical remote sensing,” Sensors, vol. 20, no. 22, Nov.
2020, Art. no. 6442, doi: 10.3390/s20226442.
Z. Guan, X. Miao, Y. Mu, Q. Sun, Q. Ye, and D. Gao, “Forest
fire segmentation from aerial imagery data using an improved
instance segmentation model,” Remote Sens., vol. 14, no. 13,
Jul. 2022, Art. no. 3159, doi: 10.3390/rs14133159.
Z. Zheng, Y. Zhong, J. Wang, A. Ma, and L. Zhang, “Building
damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural
disasters to man-made disasters,” Remote Sens. Environ., vol. 265,
Nov. 2021, Art. no. 112636, doi: 10.1016/j.rse.2021.112636.
H. Ma, Y. Liu, Y. Ren, and J. Yu, “Detection of collapsed buildings in post-earthquake remote sensing images based on the
improved YOLOv3,” Remote Sens., vol. 12, no. 1, 2020, Art.
no. 44, doi: 10.3390/rs12010044.
Y. Pi, N. D. Nath, and A. H. Behzadan, “Convolutional neural networks for object detection in aerial imagery for disaster
response and recovery,” Adv. Eng. Informat., vol. 43, Jan. 2020,
Art. no. 101009, doi: 10.1016/j.aei.2019.101009.
[297] M. Weiss, F. Jacob, and G. Duveiller, “Remote sensing for
agricultural applications: A meta-review,” Remote Sens. Environ., vol. 236, Jan. 2020, Art. no. 111402, doi: 10.1016/j.rse.
2019.111402.
[298] Y. Pang et al., “Improved crop row detection with deep neural network for early-season maize stand count in UAV imagery,” Comput. Electron. Agriculture, vol. 178, Nov. 2020, Art. no.
105766, doi: 10.1016/j.compag.2020.105766.
[299] C. Mota-Delfin, G. de Jesús López-Canteñs, I. L. L. Cruz, E.
Romantchik-Kriuchkova, and J. C. Olguín-Rojas, “Detection
and counting of corn plants in the presence of weeds with convolutional neural networks,” Remote Sens., vol. 14, no. 19, Sep.
2022, Art. no. 4892, doi: 10.3390/rs14194892.
[300] L. P. Osco et al., “A CNN approach to simultaneously count
plants and detect plantation-rows from UAV imagery,” ISPRS J.
Photogrammetry Remote Sens., vol. 174, pp. 1–17, Apr. 2021, doi:
10.1016/j.isprsjprs.2021.01.024.
[301] M. M. Anuar, A. A. Halin, T. Perumal, and B. Kalantar, “Aerial
imagery paddy seedlings inspection using deep learning,” Remote Sens., vol. 14, no. 2, Jan. 2022, Art. no. 274, doi: 10.3390/
rs14020274.
[302] Y. Chen et al., “Strawberry yield prediction based on a deep
neural network using high-resolution aerial orthoimages,”
Remote Sens., vol. 11, no. 13, Jul. 2019, Art. no. 1584, doi:
10.3390/rs11131584.
[303] W. Zhao, C. Persello, and A. Stein, “Building outline delineation: From aerial images to polygons with an improved
end-to-end learning framework,” ISPRS J. Photogrammetry
Remote Sens., vol. 175, pp. 119–131, May 2021, doi: 10.1016/j.
isprsjprs.2021.02.014.
[304] Z. Li, J. D. Wegner, and A. Lucchi, “Topological map extraction from overhead images,” in Proc. IEEE Int. Conf. Comput.
Vision (ICCV), 2019, pp. 1715–1724, doi: 10.1109/ICCV.2019.
00180.
[305] L. Mou and X. X. Zhu, “Vehicle instance segmentation from
aerial image and video using a multitask learning residual
fully convolutional network,” IEEE Trans. Geosci. Remote Sens.,
vol. 56, no. 11, pp. 6699–6711, Nov. 2018, doi: 10.1109/
TGRS.2018.2841808.
[306] J. Zhang, X. Zhang, Z. Huang, X. Cheng, J. Feng, and L. Jiao,
“Bidirectional multiple object tracking based on trajectory
criteria in satellite videos,” IEEE Trans. Geosci. Remote Sens.,
vol. 61, pp. 1–14, Jan. 2023, doi: 10.1109/TGRS.2023.3235883.
[307] H. Kim and Y. Ham, “Participatory sensing-based geospatial localization of distant objects for disaster preparedness in urban built environments,” Automat. Construction,
vol. 107, Nov. 2019, Art. no. 102960, doi: 10.1016/j.autcon.2019.102960.
[308] M. A. E. Bhuiyan, C. Witharana, and A. K. Liljedahl, “Use of
very high spatial resolution commercial satellite imagery and
deep learning to automatically map ice-wedge polygons across
tundra vegetation types,” J. Imag., vol. 6, no. 12, Dec. 2020,
Art. no. 137, doi: 10.3390/jimaging6120137.
[309] W. Zhang et al., “Transferability of the deep learning mask
R-CNN model for automated mapping of ice-wedge polygons in high-resolution satellite and UAV images,” Remote
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
[310]
[311]
[312]
[313]
[314]
[315]
[316]
[317]
[318]
[319]
[320]
[321]
[322]
Sens., vol. 12, no. 7, Mar. 2020, Art. no. 1085, doi: 10.3390/
rs12071085.
C. Witharana et al., “An object-based approach for mapping
tundra ice-wedge polygon troughs from very high spatial resolution optical satellite imagery,” Remote Sens., vol. 13, no. 4,
Feb. 2021, Art. no. 558, doi: 10.3390/rs13040558.
J. Yu, Z. Wang, A. Majumdar, and R. Rajagopal, “DeepSolar:
A machine learning framework to efficiently construct
a solar deployment database in the United States,” Joule,
vol. 2, no. 12, pp. 2605–2617, Dec. 2018, doi: 10.1016/j.joule.
2018.11.021.
J. M. Malof, K. Bradbury, L. M. Collins, and R. G. Newell, “Automatic detection of solar photovoltaic arrays in high resolution aerial imagery,” Appl. Energy, vol. 183, pp. 229–240, Dec.
2016, doi: 10.1016/j.apenergy.2016.08.191.
W. Zhang, G. Wang, J. Qi, G. Wang, and T. Zhang, “Research
on the extraction of wind turbine all over the China based
on domestic satellite remote sensing data,” in Proc. Int. Geosci. Remote Sens. Symp., 2021, pp. 4167–4170, doi: 10.1109/
IGARSS47720.2021.9553559.
W. Hu et al., “Wind turbine detection with synthetic overhead
imagery,” in Proc. Int. Geosci. Remote Sens. Symp., 2021, pp.
4908–4911, doi: 10.1109/IGARSS47720.2021.9554306.
T. Jia et al., “Deep learning for detecting macroplastic litter in
water bodies: A review,” Water Res., vol. 231, Mar. 2023, Art.
no. 119632, doi: 10.1016/j.watres.2023.119632.
C. Martin, Q. Zhang, D. Zhai, X. Zhang, and C. M. Duarte,
“Enabling a large-scale assessment of litter along Saudi Arabian red sea shores by combining drones and machine learning,” Environmental Pollut., vol. 277, May 2021, Art. no. 116730,
doi: 10.1016/j.envpol.2021.116730.
K. Themistocleous, C. Papoutsa, S. C. Michaelides, and D. G.
Hadjimitsis, “Investigating detection of floating plastic litter
from space using sentinel-2 imagery,” Remote Sens., vol. 12, no.
16, Aug. 2020, Art. no. 2648, doi: 10.3390/rs12162648.
B. Xue et al., “An efficient deep-sea debris detection method
using deep neural networks,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 12,348–12,360, Nov. 2021, doi:
10.1109/JSTARS.2021.3130238.
J. Peng et al., “Wild animal survey using UAS imagery and
deep learning: Modified faster R-CNN for kiang detection
in Tibetan plateau,” ISPRS J. Photogrammetry Remote Sens.,
vol. 169, pp. 364–376, Nov. 2020, doi: 10.1016/j.isprsjprs.
2020.08.026.
N. Rey, M. Volpi, S. Joost, and D. Tuia, “Detecting animals
in African savanna with UAVs and the crowds,” Remote Sens.
Environ., vol. 200, pp. 341–351, Oct. 2017, doi: 10.1016/
j.rse.2017.08.026.
B. Kellenberger, D. Marcos, and D. Tuia, “Detecting mammals in UAV images: Best practices to address a substantially
imbalanced dataset with deep learning,” Remote Sens. Environ., vol. 216, pp. 139–153, Oct. 2018, doi: 10.1016/j.rse.2018.
06.028.
A. Delplanque, S. Foucher, P. Lejeune, J. Linchant, and J.
Théau, “Multispecies detection and identification of African
mammals in aerial imagery using convolutional neural net-
[323]
[324]
[325]
[326]
[327]
[328]
[329]
[330]
[331]
[332]
[333]
[334]
works,” Remote Sens. Ecology Conservation, vol. 8, no. 2, pp.
166–179, 2022, doi: 10.1002/rse2.234.
D. Wang, Q. Shao, and H. Yue, “Surveying wild animals from
satellites, manned aircraft and unmanned aerial systems
(UASS): A review,” Remote Sen., vol. 11, no. 11, Jun. 2019, Art.
no. 1308, doi: 10.3390/rs11111308.
T. Kattenborn, J. Leitloff, F. Schiefer, and S. Hinz, “Review on
convolutional neural networks (CNN) in vegetation remote
sensing,” ISPRS J. Photogrammetry Remote Sens., vol. 173, pp.
24–49, Mar. 2021, doi: 10.1016/j.isprsjprs.2020.12.010.
T. Dong, Y. Shen, J. Zhang, Y. Ye, and J. Fan, “Progressive cascaded convolutional neural networks for single tree detection
with google earth imagery,” Remote Sens., vol. 11, no. 15, Jul.
2019, Art. no. 1786, doi: 10.3390/rs11151786.
A. Safonova, S. Tabik, D. Alcaraz-Segura, A. Rubtsov, Y. Maglinets, and F. Herrera, “Detection of fir trees (Abies sibirica)
damaged by the bark beetle in unmanned aerial vehicle images with deep learning,” Remote Sens., vol. 11, no. 6, Mar. 2019,
Art. no. 643, doi: 10.3390/rs11060643.
Z. Hao et al., “Automated tree-crown and height detection in a
young forest plantation using mask region-based convolutional neural network (Mask R-CNN),” ISPRS J. Photogrammetry
Remote Sens., vol. 178, pp. 112–123, Aug. 2021, doi: 10.1016/j.
isprsjprs.2021.06.003.
A. Sani-Mohammed, W. Yao, and M. Heurich, “Instance segmentation of standing dead trees in dense forest from aerial
imagery using deep learning,” ISPRS Open J. Photogrammetry
Remote Sens., vol. 6, Dec. 2022, Art. no. 100024, doi: 10.1016/j.
ophoto.2022.100024.
A. V. Etten, “You only look twice: Rapid multi-scale object detection in satellite imagery,” 2018. [Online]. Available: http://
arxiv.org/abs/1805.09512
Q. Lin, J. Zhao, G. Fu, and Z. Yuan, “CRPN-SFNet: A highperformance object detector on large-scale remote sensing
images,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 1,
pp. 416 – 4 2 9, Ja n . 202 2 , doi : 10.110 9/ T N N L S . 2020.
3027924.
D. Hong et al., “More diverse means better: Multimodal deep
learning meets remote-sensing imagery classification,” IEEE
Trans. Geosci. Remote Sens., vol. 59, no. 5, pp. 4340–4354, May
2021, doi: 10.1109/TGRS.2020.3016820.
D. Hong, N. Yokoya, G.-S. Xia, J. Chanussot, and X. X. Zhu,
“X-ModalNet: A semi-supervised deep cross-modal network
for classification of remote sensing data,” ISPRS J. Photogrammetry Remote Sens., vol. 167, pp. 12–23, Sep. 2020, doi: 10.1016/
j.isprsjprs.2020.06.014.
M. Segal-Rozenhaimer, A. Li, K. Das, and V. Chirayath,
“Cloud detection algorithm for multi-modal satellite imagery
using convolutional neural-networks (CNN),” Remote Sens.
Environ., vol. 237, Feb. 2020, Art. no. 111446, doi: 10.1016/
j.rse.2019.111446.
Y. Shendryk, Y. Rist, C. Ticehurst, and P. Thorburn, “Deep
learning for multi-modal classification of cloud, shadow and
land cover scenes in planetscope and sentinel-2 imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 157, pp. 124–136, Nov.
2019, doi: 10.1016/j.isprsjprs.2019.08.018.
DECEMBER 2023 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.
43
[335] Y. Shi, L. Du, and Y. Guo, “Unsupervised domain adaptation
for SAR target detection,” IEEE J. Sel. Topics Appl. Earth Observ.
Remote Sens., vol. 14, pp. 6372–6385, Jun. 2021, doi: 10.1109/
JSTARS.2021.3089238.
[336] Y. Zhu, X. Sun, W. Diao, H. Li, and K. Fu, “RFA-Net: Reconstructed feature alignment network for domain adaptation
object detection in remote sensing imagery,” IEEE J. Sel. Topics
Appl. Earth Observ. Remote Sens., vol. 15, pp. 5689–5703, Jul.
2022, doi: 10.1109/JSTARS.2022.3190699.
[337] T. Xu, X. Sun, W. Diao, L. Zhao, K. Fu, and H. Wang, “FADA:
Feature aligned domain adaptive object detection in remote
sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp.
1–16, Jan. 2022, doi: 10.1109/TGRS.2022.3147224.
[338] Y. Koga, H. Miyazaki, and R. Shibasaki, “A method for vehicle detection in high-resolution satellite images that uses a
region-based object detector and unsupervised domain adaptation,” Remote Sens., vol. 12, no. 3, Feb. 2020, Art. no. 575,
doi: 10.3390/rs12030575.
[339] Y. Shi, L. Du, Y. Guo, and Y. Du, “Unsupervised domain adaptation based on progressive transfer for ship detection:
From optical to SAR images,” IEEE Trans. Geosci. Remote
Sens., vol. 60, pp. 1–17, Jun. 2022, doi: 10.1109/TGRS.2022.
3185298.
[340] P. Zhang et al., “SEFEPNet: Scale expansion and feature enhancement pyramid network for SAR aircraft detection with
small sample dataset,” IEEE J. Sel. Topics Appl. Earth Observ.
Remote Sens., vol. 15, pp. 3365–3375, Apr. 2022, doi: 10.1109/
JSTARS.2022.3169339.
[341] S. Dang, Z. Cao, Z. Cui, Y. Pi, and N. Liu, “Open set incremental learning for automatic target recognition,” IEEE Trans.
Geosci. Remote Sens., vol. 57, no. 7, pp. 4445–4456, Jul. 2019,
doi: 10.1109/TGRS.2019.2891266.
[342] J. Chen, S. Wang, L. Chen, H. Cai, and Y. Qian, “Incremental detection of remote sensing objects with feature
pyramid and knowledge distillation,” IEEE Trans. Geosci.
Remote Sens., vol. 60, pp. 1–13, 2022, doi: 10.1109/TGRS.
2020.3042554.
[343] X. Chen et al., “An online continual object detector on VHR
remote sensing images with class imbalance,” Eng. Appl. Artif. Intell., vol. 117, no. Part A, Jan. 2023, Art. no. 105549, doi:
10.1016/j.engappai.2022.105549.
[344] J. Li et al., “Class-incremental learning network for small objects enhancing of semantic segmentation in aerial imagery,”
IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–20, 2022, doi:
10.1109/TGRS.2021.3124303.
[345] W. Liu, X. Nie, B. Zhang, and X. Sun, “Incremental learning
with open-set recognition for remote sensing image scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–16,
May 2022, doi: 10.1109/TGRS.2022.3173995.
[346] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE
Int. Conf. Comput. Vision Pattern Recognit. (CVPR), 2009, pp.
248–255, doi: 10.1109/CVPR.2009.5206848.
[347] Y. Long et al., “On creating benchmark dataset for aerial image interpretation: Reviews, guidances, and million-AID,” IEEE J. Sel. Topics Appl. Earth Observ. Remote
44
[348]
[349]
[350]
[351]
[352]
[353]
[354]
[355]
[356]
[357]
[358]
[359]
[360]
Sens., vol. 14, pp. 4205–4230, Apr. 2021, doi: 10.1109/
JSTARS.2021.3070368.
G. A. Christie, N. Fendley, J. Wilson, and R. Mukherjee, “Functional map of the world,” in Proc. IEEE Int. Conf. Comput. Vision
Pattern Recognit. (CVPR), 2018, pp. 6172–6180, doi: 10.1109/
CVPR.2018.00646.
D. Wang, J. Zhang, B. Du, G.-S. Xia, and D. Tao, “An empirical study of remote sensing pretraining,” IEEE Trans.
Geosci. Remote Sens., vol. 61, pp. 1–20, 2023, doi: 10.1109/
TGRS.2022.3176603.
W. Li, K. Chen, H. Chen, and Z. Shi, “Geographical knowledge-driven representation learning for remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–16, 2022,
doi: 10.1109/TGRS.2021.3115569.
X. Sun et al., “RingMo: A remote sensing foundation model
with masked image modeling,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–22, 2023, doi: 10.1109/TGRS.2022.
3194732.
A. Fuller, K. Millard, and J. R. Green, “SatViT: Pretraining transformers for earth observation,” IEEE Geosci. Remote Sens. Lett.,
vol. 19, pp. 1–5, Aug. 2022, doi: 10.1109/LGRS.2022.3201489.
D. Wang et al., “Advancing plain vision transformer toward
remote sensing foundation model,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–15, 2023, doi: 10.1109/TGRS.
2022.3222818.
T. Zhang and X. Zhang, “ShipDeNet-20: An only 20 convolution layers and <1-mb lightweight SAR ship detector,” IEEE
Geosci. Remote Sens. Lett., vol. 18, no. 7, pp. 1234–1238, Jul.
2021, doi: 10.1109/LGRS.2020.2993899.
T. Zhang, X. Zhang, J. Shi, and S. Wei, “Depthwise separable
convolution neural network for high-speed SAR ship detection,” Remote Sens., vol. 11, no. 21, Oct. 2019, Art. no. 2483,
doi: 10.3390/rs11212483.
Z. Wang, L. Du, and Y. Li, “Boosting lightweight CNNs
through network pruning and knowledge distillation for SAR
target recognition,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 8386–8397, Aug. 2021, doi: 10.1109/
JSTARS.2021.3104267.
S. Chen, R. Zhan, W. Wang, and J. Zhang, “Learning slimming SAR ship object detector through network pruning
and knowledge distillation,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 1267–1282, 2021, doi: 10.1109/
JSTARS.2020.3041783.
Y. Zhang, Z. Yan, X. Sun, W. Diao, K. Fu, and L. Wang, “Learning efficient and accurate detectors with dynamic knowledge distillation in remote sensing imagery,” IEEE Trans.
Geosci. Remote Sens., vol. 60, pp. 1–19, 2022, doi: 10.1109/
TGRS.2021.3130443.
Y. Yang et al., “Adaptive knowledge distillation for lightweight
remote sensing object detectors optimizing,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–15, May 2022, doi: 10.1109/
TGRS.2022.3175213.
C. Li, G. Cheng, G. Wang, P. Zhou, and J. Han, “Instanceaware distillation for efficient object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–11,
Jan. 2023, doi: 10.1109/TGRS.2023.3238801.
GRS
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2023
Authorized licensed use limited to: XIDIAN UNIVERSITY. Downloaded on July 08,2024 at 11:41:28 UTC from IEEE Xplore. Restrictions apply.