Multi-Oriented Object Detection in High-Resolution Remote Sensing Imagery Based on Convolutional Neural Networks with Adaptive Object Orientation Features

Dong, Zhipeng; Wang, Mi; Wang, Yanli; Liu, Yanxiong; Feng, Yikai; Xu, Wenxue

doi:10.3390/rs14040950

Open AccessArticle

Multi-Oriented Object Detection in High-Resolution Remote Sensing Imagery Based on Convolutional Neural Networks with Adaptive Object Orientation Features

by

Zhipeng Dong

¹,

Mi Wang

^2,3,*,

Yanli Wang

⁴,

Yanxiong Liu

^1,5

,

Yikai Feng

^1,5 and

Wenxue Xu

¹

The First Institute of Oceanography, Ministry of Natural Resources, Qingdao 266061, China

²

The Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China

³

The State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan 430079, China

⁴

The College of Geodesy and Geomatics, Shandong University of Science and Technology, Qingdao 266590, China

⁵

The Key Laboratory of Ocean Geomatics, Ministry of Natural Resources, Qingdao 266590, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(4), 950; https://doi.org/10.3390/rs14040950

Submission received: 21 January 2022 / Revised: 6 February 2022 / Accepted: 14 February 2022 / Published: 16 February 2022

(This article belongs to the Section Remote Sensing Image Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In high-resolution earth observation systems, object detection in high spatial resolution remote sensing images (HSRIs) is the key technology for automatic extraction, analysis and understanding of image information. With respect to the multi-angle features of object orientation in HSRIs object detection, this paper presents a novel HSRIs object detection method based on convolutional neural networks (CNN) with adaptive object orientation features. First, an adaptive object orientation regression method is proposed to obtain object regions in any direction. In the adaptive object orientation regression method, five coordinate parameters are used to regress the object region with any direction. Then, a CNN framework for object detection of HSRIs is designed using the adaptive object orientation regression method. Using multiple object detection datasets, the proposed method is compared with some state-of-the-art object detection methods. The experimental results show that the proposed method can more accurately detect objects with large aspect ratios and densely distributed objects than some state-of-the-art object detection methods using a horizontal bounding box, and obtain better object detection results for HSRIs.

Keywords:

high spatial resolution remote sensing image; convolutional neural network; object detection; adaptive object orientation features; deep learning

Graphical Abstract

1. Introduction

In high-resolution earth observation systems, object detection in high spatial resolution remote sensing images (HSRIs) is the key technology for automatic extraction, analysis and understanding of image information [1,2,3]. It also plays an important role in the application of high-resolution earth observation systems to ocean monitoring, precision strike and military reconnaissance [4,5,6]. Object detection for HSRIs refers to the process of determining whether there are objects of interest and locating the objects of interest in the image [7]. In this paper, the objects detected are artificial geographical objects (e.g., storage-tanks, cars or airplanes) that have a clear boundary and have nothing to do with the HSRI background.

For object detection of HSRIs, scholars have carried out a lot of research. Mostly object detection methods usually use a three-stage mode of ① extracting object candidate regions, ② obtaining the features of object candidate regions, and ③ classifying object candidate regions using the features to detect objects in HSRIs [7]. For example, Xiao et al. [8] extracted object candidate regions using slide windows in various scales. Then, histograms of oriented gradients (HOG) of the object candidate regions were obtained. Finally, the object candidate regions were classified using support vector machines (SVM) to realize airplane and car detection in HSRIs. Cheng et al. [9] extracted object candidate regions using a slide window. Then, the HOG feature pyramid of the object candidate regions was obtained. Finally, according to the HOG feature pyramid, the object candidate regions were classified using SVM to detect airplanes in HSRIs. Diao et al. [10] used visual saliency to extract object candidate regions. Then, the features of object candidate regions were obtained and classified using deep belief networks to realize airplanes detection in HSRIs. Han et al. [11] extracted object candidate regions based on saliency map and visual attention computational model. Then, the features of object candidate regions were obtained and classified using fisher discrimination dictionary learning to realize multi-class object detection in HSRIs. The three-stage mode can achieve acceptable object detection results for object detection tasks in specific scenes [7]. However, remote sensing satellites in near-earth orbit acquire a large number of HSRIs from a top-down perspective every day, which is a way that is susceptible to illumination and weather conditions [12]. The three-stage mode cannot be effectively applied to detect objects of a large number of images in different complex scenes [3]. The universality and robustness of this mode are poor.

In recent years, many studies on deep learning have been carried out. The convolutional neural network (CNN) model is the most widely used deep learning model [13,14,15,16,17,18,19,20,21,22]. CNN does not need to use artificially designed features and can learn and extract effective features of the image using massive images and annotations. Moreover, with sufficient training data, CNN has good generalization ability, and can also maintain good universality and robustness in different complex scenes [3]. Therefore, the object detection methods based on CNN have been widely adopted to detect objects for HSRIs [23,24,25,26,27]. Long et al. [28] used regional CNN (R-CNN) to detect multi-class objects for HSRIs. Han et al. [12] applied transfer learning and Faster R-CNN to detect multi-class objects in HSRIs. Guo et al. [29] proposed a multi-scale CNN to detect objects in HSRIs, and obtained acceptable object detection results for HSRIs using an object detection framework based on multi-scale CNN. Chen et al. [30] applied a single shot multi-box detector (SSD) to detect airplanes of HSRIs. Li et al. [31] proposed an object detection framework based on CNN for object detection of remote sensing images. In the framework, there are two networks: region proposal network (RPN) and local-contextual feature fusion network. In the region proposal network (RPN), multiangle anchors were added. The double-channel feature fusion network was used to learn local and contextual properties. Deng et al. [32] proposed an object detection framework based on CNN for simultaneously detecting multi-class objects in remote sensing images with large scales variability. There were two subnetworks in the object detection framework. The object-like regions were generated in different scale layers using a multi-scale object proposal network. The object-like regions were classed based on fused feature maps using an accurate object detection network. Dong et al. [33] proposed an object detection framework based on CNN with suitable object scale features (CNN-SOSF) for multi-class object detection in HSRIs. CNN-SOSF provided acceptable multi-class object detection results in HSRIs. Liu et al. [34] designed an effective multiclass objects detection framework based on You Only Look Once version 2 (YOLOv2) to obtain acceptable object detection results for HSRIs. Ma et al. [35] proposed the improved YOLOv3 framework to detect collapsed buildings in HSRIs. However, in the above literature object detection frameworks based on CNN, all use the horizontal bounding box to detect objects in HSRIs, they are difficult to adapt to the densely packed object detection with a large aspect ratio, as shown in Figure 1a. In Figure 1a, there is a missed detection for the adjacent ships with a large aspect ratio. There is a large redundant area in the detection result of the ship. Therefore, for the objects detection of HSRIs, the object detection framework based on CNN needs to use the oriented bounding box (OBB) to improve the detection accuracy of the densely packed objects with a large aspect ratio, as shown in Figure 1b.

For using OBB to detect objects in HSRIs, some studies have been carried out. For example, Ding et al. [36] designed a Rotated Region of Interests (RRoI) learner to transform a Horizontal Region of Interest (HRoI) into an RRoI. The designed RRoI transformer was embedded into an object detector for orient object detection. Li et al. [37] proposed a feature-attentioned object detection framework to detect orient objects in HSRIs. The proposed framework consisted of three components: feature-attentioned feature pyramid networks, multiple receptive fields attention-based RPN, and proposal-level attention-based ROI module. Yang et al. [38] proposed a multi-category rotation detector for small, cluttered and rotated objects. In the rotated detector, the supervised pixel attention network and the channel attention network were jointly explored for small and cluttered object detection by suppressing the noise and highlighting the object’s features. For more accurate rotation estimation, the IOU constant factor was added to smooth L1 loss to address the boundary problem for the rotating bounding box. Wang et al. [39] provided a semantic attention-based mask oriented bounding box representation for multi-category object detection for HSRIs. In the proposed oriented object detector, an inception lateral connection network was used to enhance the FPN. Furthermore, a semantic attention network was adopted to provide the semantic feature to help distinguish the object of interest from the cluttered background effectively. Compared with HBB, object detectors based on OBB are more suitable for object detection of HSRIs. Therefore, object detection based on OBB has become a research hotspot.

With respect to the multi-angle features of object orientation in HSRIs object detection, this paper presents a novel HSRIs object detection method based on CNN with adaptive object orientation features (CNN-AOOF). First, an adaptive object orientation regression method is proposed. Then, a CNN framework for object detection in HSRIs is designed using the adaptive object orientation regression method.

The main contributions of this paper are as follows:

An HSRI object detection dataset with OBB, WHU-RSONE-OBB, is established and published to promote the development of object detection for HSRIs.
An adaptive object orientation regression method is proposed to obtain object regions in any direction.
An object detection framework based on CNN with adaptive object orientation features is designed to detect various objects for HSRIs.
The proposed method can more accurately detect objects with large aspect ratios and densely distributed objects than object detectors using a horizontal bounding box.

The rest of this paper is organized as follows. In Section 2, the CNN-AOOF framework is introduced in detail. In Section 3 and Section 4, the datasets are described, and experimental results are discussed and analyzed. In Section 5, the experimental results are summarized, and the conclusions are drawn.

2. Materials and Methods

In this paper, the CNN-AOOF framework is obtained using two steps. First, the adaptive object orientation regression method is proposed. Second, the CNN-AOOF framework is designed using the adaptive object orientation regression method.

2.1. The Adaptive Object Orientation Regression Method

At present, single-stage object detectors (such as YOLO [17], YOLOv2 [20] and SSD [18]) and two-stage object detectors (such as Fast-RCNN [15] and Faster-RCNN [16]) use four parameters (x, y, w, h) to train and regress the coordinates of the object region. x and y are the coordinates of the center point of the object region. w and h are the width and height of the object region. The object region that is trained and regressed using four parameters (x, y, w, h) is the horizontal bounding box, which is difficult to effectively couple the object region in HSRIs, as shown in Figure 1a. In order to well couple the object region in HSRIs, the adaptive object orientation regression method is proposed in this paper.

In the adaptive object orientation regression method, five parameters (x, y, w, h,

θ

) are used to train and regress the object region, as shown in Figure 2a. x and y are the coordinates of the center point of the object region. w and h are the width and height of the object region. In remote sensing image processing, the upper left corner is the coordinate origin (0, 0), the horizontal axis is the X axis, and the vertical axis is the Y axis, as shown in Figure 3.

θ

represents the angle of the clockwise rotation from the X axis to the Y axis. Therefore, in the adaptive object orientation regression method,

θ

is the angle between the corner point with the smallest y value in the four corner point coordinates of the object region and the X axis. The value range of

θ

is (0,

π / 2

]. When

θ

is

π / 2

,

θ

is the angle between the point P₁ and the X axis, as shown in Figure 2b. w and h are the length of |P₃P₄| and |P₄P₁|, respectively. The object region that is trained and regressed using five parameters (x, y, w, h,

θ

) is an arbitrary-oriented bounding box, which can well couple the object region in HSRIs.

In the adaptive object orientation regression method, five parameters (x, y, w, h,

θ

) of the object region are trained and regressed based on the anchor, as shown in Figure 4. In Figure 4, the dotted rectangle is the anchor at the position (i, j) of the output feature map. In the process of training and regressing the object region, five parameters (x, y, w, h,

θ

) of the object region at the position (i, j) are calculated as follows:

O_{x} = i + \frac{1}{1 + e^{- x_{0}}}

(1)

O_{y} = j + \frac{1}{1 + e^{- x_{1}}}

(2)

O_{w} = a_{w} e^{x_{2}}

(3)

O_{h} = a_{h} e^{x_{3}}

(4)

O_{θ} = \frac{π}{2} \times \frac{1}{1 + e^{- x_{4}}}

(5)

where (

O_{x}, O_{y}, O_{w}, O_{h}, O_{θ}

) are the regressed five parameters (x, y, w, h,

θ

) of the object region.

a_{w}

and

a_{h}

are w and h of the anchor, respectively, as shown in Figure 4. (

x_{0}, x_{1}, x_{2}, x_{3}, x_{4}

) are the output values of CNN correspond to the anchor at the position (i, j) of the output feature map for regressing the coordinates of the object region, as shown in Figure 5. In Figure 5, at each position of the output feature map of 13 pixels × 13 pixels, three object regions are trained and regressed based on the anchor. At each position of the output feature map, there are 3

\times (5 + 1 + c l a s s n u m b e r)

x outputs. The first five

x

of each anchor are used to calculate (

O_{x}, O_{y}, O_{w}, O_{h}, O_{θ}

).

In the adaptive object orientation regression method, the four corner point coordinates of the object region can be obtained using five parameters (x, y, w, h,

θ

) of the object region. The oriented bounding box can be drawn using the four corner point coordinates of the object region. The calculation formula is as follows:

{\begin{matrix} (x_{P 1}, y_{P 1}) = (x + \frac{h \sin θ - w \cos θ}{2}, y - \frac{h \sin θ + w \cos θ}{2}) \\ (x_{P 2}, y_{P 2}) = (x + \frac{h \sin θ + w \cos θ}{2}, y + \frac{h \sin θ - w \cos θ}{2}) \\ (x_{P 3}, y_{P 3}) = (x + \frac{h \sin θ - w \cos θ}{2}, y + \frac{h \sin θ + w \cos θ}{2}) \\ (x_{P 4}, y_{P 4}) = (x - \frac{h \sin θ + w \cos θ}{2}, y + \frac{h \sin θ - w \cos θ}{2}) \end{matrix}

(6)

where (

x_{P 1}, y_{P 1}

), (

x_{P 2}, y_{P 2}

), (

x_{P 3}, y_{P 3}

) and (

x_{P 4}, y_{P 4}

) are the coordinates of the four points

P_{1}

,

P_{2}

,

P_{3}

and

P_{4}

of the object region, respectively.

2.2. CNN-AOOF Framework Design

Using the adaptive object orientation regression method, a novel object detection framework based on CNN-AOOF for object detection in HSRIs is proposed. The CNN-AOOF framework is a single-stage object detector, as shown in Figure 6. In the CNN-AOOF framework, darknet-53 [21] is used to generate the feature maps. The size of the input image in darknet-53 is 416 pixels × 416 pixels. In the CNN-AOOF framework, the object region is trained and regressed based on the anchor on three different scale feature maps. On the feature map with the size of 13 pixels × 13 pixels, object candidate regions are trained and regressed based on three anchors at each position of the feature map. The size of the three anchors are 116 pixels × 90 pixels, 156 pixels × 198 pixels, and 373 pixels × 326 pixels, respectively, as shown in the blue dotted rectangle in Figure 6. The feature map with the size of 13 pixels × 13 pixels is upsampled and combined with the feature map with the size of 26 pixels × 26 pixels to form a new feature map with the size of 26 pixels × 26 pixels. On the new feature map with the size of 26 pixels × 26 pixels, object candidate regions are trained and regressed based on three anchors at each position of the feature map. The size of the three anchors are 30 pixels × 61 pixels, 62 pixels × 45 pixels, and 59 pixels × 119 pixels, respectively, as shown in the green dotted rectangle in Figure 6. The new feature map with the size of 26 pixels × 26 pixels is upsampled and combined with the feature map with the size of 52 pixels × 52 pixels to form a new feature map with the size of 52 pixels × 52 pixels. On the new feature map with the size of 52 pixels × 52 pixels, object candidate regions are trained and regressed based on three anchors at each position of the feature map. The size of the three anchors are 10 pixels × 13 pixels, 16 pixels × 30 pixels, and 33 pixels × 23 pixels, respectively, as shown in the red dotted rectangle in Figure 6.

In the CNN-AOOF framework, a multi-scale training method is used. Three object candidate regions are generated based on three anchors at each position of three different scale feature maps. If the intersection-over-union (IOU) overlap of the anchor and a ground truth box is the greatest among that of all anchors and a ground truth box, a positive label is assigned to the anchor. If a positive label is not assigned to the anchor, a negative label will be assigned to the anchor. In training the CNN-AOOF framework, there are

m \times m \times n \times (r + 6)

predicted value outputs on each feature map. The loss function of the CNN-AOOF framework is calculated as follows:

L o s s = L_{c o o r d} + L_{c l a s s} + L_{o b j}

(7)

L_{c o o r d} = \sum_{i}^{m} \sum_{j}^{m} \sum_{k}^{n} I_{i j k}^{o b j} {(2 - w_{i j} \times h_{i j})}^{2} [\begin{array}{l} {(x_{i j} - i - s (x_{i j}^{*}))}^{2} + {(y_{i j} - j - s (y_{i j}^{*}))}^{2} \\ + {(\ln (\frac{w_{i j}}{w_{a}}) - w_{i j}^{*})}^{2} + {(\ln (\frac{h_{i j}}{h_{a}}) - h_{i j}^{*})}^{2} \\ + {(\frac{2 θ_{i j}}{π} - s (θ_{i j}^{*}))}^{2} \end{array}]

(8)

L_{c l a s s} = {\begin{array}{l} \sum_{i}^{m} \sum_{j}^{m} \sum_{k}^{n} \sum_{l}^{r} I_{i j k}^{o b j} & l = t r u t h_{c l a s s} \\ \sum_{i}^{m} \sum_{i}^{m} \sum_{k}^{n} \sum_{l}^{r} I_{i j k}^{o b j} (0 - s (c_{i j l}^{*})) & l \neq t r u t h_{c l a s s} \end{array}

(9)

L_{o b j} = {\begin{array}{l} \sum_{i}^{m} \sum_{j}^{m} \sum_{k}^{n} 1 & I_{i j k}^{o b j} = 1 \\ \sum_{i}^{m} \sum_{j}^{m} \sum_{k}^{n} {(0 - s (u_{i j k}^{*}))}^{2} & I_{i j k}^{o b j} \neq 1 \end{array}

(10)

S (x) = \frac{1}{1 + e^{- x}}

(11)

where

L o s s

is the training loss of CNN-AOOF framework.

L_{c o o r d}

,

L_{c l a s s}

and

L_{o b j}

are the training loss of coordinates, class and confidence of generated object regions based on anchors, respectively.

m

is the width and height of the feature map.

n

is the number of anchors at each position of the feature map.

I_{i j k}^{o b j}

indicates whether a positive label is assigned to label

k

anchor at position (i, j) of the feature map. If a positive label is assigned to label

k

anchor,

I_{i j k}^{o b j}

is 1, otherwise

I_{i j k}^{o b j}

is 0.

w_{i j}

and

h_{i j}

are the width and height of the ground truth box corresponding to the label k anchor at position (i, j) of the feature map, respectively.

(x_{i j}, y_{i j}, w_{i j}, h_{i j}, θ_{i j})

are the five parameters of the ground truth box.

(x_{i j}^{*}, y_{i j}^{*}, w_{i j}^{*}, h_{i j}^{*}, θ_{i j}^{*})

are the framework output values to calculate five parameters of the generated object region based on label k anchor.

w_{a}

and

h_{a}

are the width and height of the label k anchor, respectively.

r

is the classification number of the object.

c_{i j l}^{*}

is the framework output value of different classifications of the generated object region based on label k anchor.

u_{i j k}^{*}

is the framework output value of the object confidence of generated object region based on label k.

In the testing process of the CNN-AOOF framework, all x, y,

θ

, confidence and classification values among all the output values of the CNN-AOOF framework are processed using Formula (11). Then the five parameters of the generated object region based on the anchor at each position of the feature map are obtained using Formulas (1)–(5). If the confidence of the generated object region is greater than the threshold, it is retained, otherwise, it is removed. The confidence threshold of various object detection results of CNN-AOOF is set to 0.05 for quantitative evaluation. The classification of retained generated object regions is determined based on classification output values of the CNN-AOOF framework. To reduce redundancy, the non-maximum suppression (NMS) algorithm is applied to retained generated object regions based on their confidence. The IOU threshold is set to 0.3 in the NMS algorithm. After NMS, the object detection result of an HSRI is obtained.

3. Results

Some state-of-the-art object detection algorithms (Faster-RCNN, CNN-SOSF, YOLOv2 and YOLOv3) have been effectively applied to object detection for HSRIs. To examine the object detection effectiveness of CNN-AOOF, four HSRI object detection datasets (WHU-RSONE-OBB, UCAS-AOD, HRSC2016 and DOTA) are used to compare CNN-AOOF with Faster-RCNN, CNN-SOSF, YOLOv2 and YOLOv3. CNN-AOOF is based on the darknet framework and programmed using C++. The experiments are carried out on a server with Inter(R) Xeon(R) CPU E5-2667 v4 @ 3.20 GHz, NVIDIA Quadro M4000 (8 GB GPU memory), 16 GB RAM, and Windows 10 operating system.

3.1. Object Detection for WHU-RSONE-OBB

Large-scale object detection datasets are the basis and key for supporting object detection methods based on CNN to achieve high performance [40,41]. Therefore, an object detection dataset with OBB for HSRI, WHU-RSONE-OBB, is established and made publication to promote the development of HSRI object detection in this paper. In WHU-RSONE-OBB, images were obtained from SuperView1 images, Tianditu, and Google Earth images. In WHU-RSONE-OBB, there are 5977 images. The size of images in WHU-RSONE-OBB ranges from 600 pixels × 600 pixels to 1372 pixels × 1024 pixels. The spatial resolution of images ranges from 0.5 m to 0.8 m. There are three kinds of geospatial objects (airplane, storage-tank and ship) in WHU-RSONE-OBB, and object samples are labeled using OBB. The number of the three kinds of geospatial objects in WHU-RSONE-OBB is shown in Table 1.

In this paper, mean average precision (mAP) is used as the evaluation criteria for object detection results of object detectors [42]. If the IOU of the bounding box of object detection result and bounding box of ground-truth is equal to or greater than 0.5, the object detection result is considered correct, and vice versa. The larger the mAP value is, the higher accuracy of the object detector becomes, and vice versa. The mAP is obtained using the following formula:

m A P = \frac{1}{n} \sum_{i}^{n} A P_{i}

(12)

where i is the label of an object class.

n

is the class number of the detected objects.

A P_{i}

is the average precision of label i class. Its value is the area under the precision–recall curve (PRC), as shown in Figure 7.

In WHU-RSONE-OBB, we randomly select 4781 images as the training set, 598 images as the validation set, and 598 images as the testing set. Using WHU-RSONE-OBB, CNN-AOOF and some state-of-the-art object detection algorithms are trained and tested. Table 2 shows the quantitative comparison results of CNN-AOOF and some state-of-the-art object detection algorithms. In Table 2, the AP values of airplane, storage-tank and ship are 0.9857, 0.8831 and 0.792, respectively, in the object detection results using CNN-AOOF. The AP values of various objects using CNN-AOOF are greater than those of other object detection algorithms. Moreover, the mAP value of CNN-AOOF is the largest among the five object detection algorithms. These show that CNN-AOOF can obtain more accurate object detection results than other object detection algorithms for HSRIs in WHU-RSONE-OBB.

Figure 8 shows the PRCs of the five object detection algorithms for object detection results of WHU-RSONE-OBB. In Figure 8, for airplane, storage-tank and ship, the PRC areas of CNN-AOOF are greater than those of the other object detection algorithms. The experimental results show that CNN-AOOF outperforms the other object detection algorithms, and can obtain more accurate object detection results.

Table 3 shows the average time consumptions of five object detection algorithms for per image object detection in WHU-RSONE-OBB. In Table 3, the average time consumptions of the five object detection algorithms are 0.467 s, 0.528 s, 0.102 s, 0.139 s and 0.233 s, respectively. The average time consumptions of the five object detection algorithms are less than 1 s. The experimental results show that the objects in HSRIs can be detected efficiently using CNN-AOOF.

3.2. Object Detection for UCAS-AOD

To further verify the object detection effectiveness of CNN-AOOF, UCAS-AOD [43] is used to compare CNN-AOOF with some state-of-the-art object detection algorithms (Faster-RCNN, CNN-SOSF, YOLOv2 and YOLOv3). UCAS-AOD is an HSRI object detection dataset that contains two kinds of objects: airplane and car. Object samples are labeled using OBB. Images are cropped from Google Earth. There are 1510 images in UCAS-AOD, and the size of images ranges from 1280 pixels × 659 pixels to 1372 pixels × 941 pixels. In line with [44,45], we randomly select 1060 images for training and 450 images for testing.

Table 4 shows the quantitative comparison results of the five object detection algorithms for UCAS-AOD. In Table 4, the AP values of airplane and car are 0.9488 and 0.8996, respectively, in the object detection results using CNN-AOOF. The AP values of two kinds of objects using CNN-AOOF are greater than those of other object detection algorithms. Moreover, the mAP value of CNN-AOOF is the largest among the five object detection algorithms. The experimental results show that CNN-AOOF is superior to the other four object detection algorithms for UCAS-AOD.

Figure 9a,b are the PRCs of the five object detection algorithms for airplane and car in UCAS-AOD, respectively. In Figure 9, we can see that for airplane and car, the PRC areas of CNN-AOOF are greater than those of the other object detection algorithms. The experimental results show that CNN-AOOF outperforms the other object detection algorithms, and can obtain more accurate airplane and car detection results for UCAS-AOD dataset.

3.3. Object Detection for HRSC2016

HRSC2016 [46] dataset is used to compare CNN-AOOF with the other object detection algorithms (Faster-RCNN, CNN-SOSF, YOLOv2 and YOLOv3) to verify the object detection effectiveness of CNN-AOOF. HSRC2016 is a ship detection dataset of HSRI. Ship samples are labeled using OBB. Images are cropped from Google Earth. The size of images ranges from 300 pixels × 300 pixels to 1500 pixels × 900 pixels. There are 1061 images in HSRC2016, including 436 images for training, 181 images for validation, and 444 images for testing.

Table 5 shows the quantitative comparison results of the five object detection algorithms for HSRC2016. In Table 5, the AP values of the ship in the object detection results of the five algorithms are 0.8349, 0.8301, 0.423, 0.8144 and 0.8567, respectively. The AP value of the ship using CNN-AOOF is greater than that of the other four object detection algorithms. The experimental results show that CNN-AOOF outperforms the other four object detection algorithms for HRSC2016.

Figure 10 is the PRC of five object detection algorithms for ship in HSRC2016. In Figure 10, we can see that for ship, the PRC area of CNN-AOOF is greater than that of the other four object detection algorithms. The experimental results show that CNN-AOOF is superior to Faster-RCNN, CNN-SOSF, YOLOv2 and YOLOv3, and can obtain more accurate ship detection results for the UCAS-AOD dataset.

3.4. Object Detection for DOTA

DOTA [40] is a multi-category object detection dataset for HSRIs. There are 2806 images in the dataset. The training set, validation set and test set account for 1/3, 1/6 and 1/2 of the data set, respectively. The images range from about 800 pixels × 800 pixels to 4000 pixels × 4000 pixels. There are 15 kinds of objects (plane (PL), ship (SH), storage-tank (ST), baseball diamond (BD), tennis court (TC), basketball court (BC), ground track field (GTF), harbor (HA), bridge (BR), large vehicle (LV), small vehicle (SV), helicopter (HC), roundabout (RA), soccer ball field (SBF) and swimming pool (SP)) in the dataset. In DOTA, object samples are labeled using OBB.

Table 6 shows the quantitative comparison results of CNN-AOOF and some state-of-the-art object detection algorithms (RoI Trans, SCRDet, Li et al., Mask OBB). In Table 6, the mAP values of five object detection algorithms are 0.6956, 0.7261, 0.7328, 0.7533 and 0.7571, respectively. The mAP of CNN-SOSF is 0.7571, which is the largest value among the five object detection algorithms. The experimental results show that CNN-SOSF outperforms the other four object detection algorithms, and can obtain better object detection results for the DOTA dataset.

4. Discussion

In this section, CNN-AOOF is compared with other object detection algorithms by using visual evaluation. In two-stage object detectors, the object detection accuracy of CNN-SOSF is better than that of Faster-RCNN for WHU-RSONE, UCAS-AOD and HSRC2016 datasets. In single-stage object detectors, the object detection accuracy of YOLOv3 is better than that of YOLOv2. Therefore, CNN-AOOF is compared with CNN-SOSF and YOLOv3 which have greater detection accuracy by using visual discrimination. Figure 11 shows some object detection result samples of CNN-SOSF, YOLOv3 and CNN-AOOF for WHU-RSONE, UCAS-AOD and HSRC2016 datasets. Figure 11a–c are object detection results of CNN-SOSF, YOLOv3 and CNN-AOOF, respectively.

In Figure 11(a1,b1), due to the dense distribution of the airplanes, the airplanes indicated by the yellow arrow cannot be detected correctly using CNN-SOSF and YOLOv3. In Figure 11(c1), airplanes are correctly detected using CNN-AOOF.

In Figure 11(a2,b2), due to the ship with large aspect ratios, the ship indicated by the yellow arrow cannot be detected correctly using CNN-SOSF and YOLOv3. There are large redundant areas in the detection results of other ships. In Figure 11(c2), ships are correctly detected using CNN-AOOF.

In Figure 11(a3), due to the large aspect ratios and dense distribution of the ships, the ships indicated by the yellow arrow cannot be detected correctly using CNN-SOSF. In Figure 11(b3), the storage-tank indicated by the yellow arrow cannot be detected correctly using YOLOv3. In Figure 11(c3), ships and storage-tanks are correctly detected using CNN-AOOF.

In Figure 11(a4,b4), due to the ships with large aspect ratios, the ships indicated by the yellow arrow cannot be detected correctly using CNN-SOSF and YOLOv3. There are large redundant areas in the detection results of other ships. In Figure 11(c4), ships are correctly detected using CNN-AOOF.

In Figure 11(a5,b5), due to the ships with large aspect ratios, the ships indicated by the yellow arrow cannot be detected correctly using CNN-SOSF and YOLOv3. There are large redundant areas in the detection results of other ships. In Figure 11(c5), ships are correctly detected using CNN-AOOF.

In Figure 11(a6,b6), dense ships cannot be detected accurately using CNN-SOSF and YOLOv3. In Figure 11(c6), dense ships are detected accurately using CNN-AOOF.

In Figure 11(a7,b7), due to the dense distribution of cars, the cars indicated by the yellow arrow cannot be detected accurately using CNN-SOSF and YOLOv3. In Figure 11(c7), dense cars are detected accurately using CNN-AOOF.

In Figure 11(a8), due to the dense distribution of cars, the cars indicated by the yellow arrow cannot be detected accurately using CNN-SOSF. In Figure 11(b8,c8), cars are detected accurately using CNN-AOOF.

The experimental results show that CNN-SOSF and YOLOv3 are difficult to accurately detect objects with large aspect ratios and densely distributed objects because they use horizontal bounding boxes to detect objects. However, CNN-AOOF uses OBB to detect objects, and can accurately detect objects with large aspect ratios and densely distributed objects. Therefore, CNN-AOOF is superior to CNN-SOSF and YOLOv3 for WHU-RSONE-OBB, UCAS-AOD and HSRC2016 datasets.

5. Conclusions and Future Work

With respect to the multi-angle features of object orientation in HSRIs object detection, a novel HSRIs object detection method based on convolutional neural networks with adaptive object orientation features (CNN-AOOF) is proposed in this paper. First, an adaptive object orientation regression method is proposed to obtain object regions in any direction. Then, a CNN framework for object detection of HSRIs is designed using the adaptive object orientation regression method. To verify the object detection effectiveness of CNN-AOOF, WHU-RSONE-OBB, UCAS-AOD, HSRC2016, and DOTA datasets are used to qualitatively and quantitatively compare CNN-AOOF with some state-of-the-art object detection algorithms. The experimental results show that CNN-SOSF is superior to other state-of-the-art object detection algorithms, and can accurately detect objects with large aspect ratios and densely distributed objects for different object detection datasets of HSRIs. Object anchor scales are the vital factor affecting object detection results of HSRIs. In future work, how to adaptively adjust object anchor scales in the proposed method for different object detection tasks will be studied to obtain more accurate object detection results.

Author Contributions

Methodology, Software, Writing—original draft, Writing—review and editing, Z.D. Funding acquisition, Supervision, Writing—review and editing, M.W. Writing—review and editing, Y.W. and Y.L. Data collection, Y.F. and W.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant [No. 61825103 and 41871381]; the Fundamental Research Funds for the Central Universities under Grant [No. 2042021kf1030]; and the Key Laboratory of Ocean Geomatics, Ministry of Natural Resources, China under Grant [No. 2021A01].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

WHU-RSONE-OBB, UCAS-AOD, HSRC2016 and DOTA can be downloaded at https://pan.baidu.com/s/1_Gdeedwo9dcEJqIh4eHHMA (accessed on 1 January 2022) (password: 1234), https://github.com/fireae/UCAS-AOD-benchmark (accessed on 1 January 2022), https://sites.google.com/site/hrsc2016/ (accessed on 1 January 2022) and https://captain-whu.github.io/DOTA/index.html (accessed on 1 January 2022), respectively.

Acknowledgments

The authors would like to thank the editors and anonymous reviewers for their valuable comments, which helped improve this paper.

Conflicts of Interest

We declare that we have no conflict of interest.

Abbreviations

HSRIs	High Spatial Resolution Remote Sensing Images
CNN	Convolutional Neural Networks
SVM	Support Vector Machines
HOG	Histograms of Oriented Gradients
R-CNN	Regional Convolutional Neural Networks
RPN	Region Proposal Network
CNN-SOSF	Convolutional Neural Networks with Suitable Object Scale Features
OBB	Oriented Bounding Box
CNN-AOOF	Convolutional Neural Networks with Adaptive Object Orientation Features
SSD	Single Shot Multibox Detector
YOLO	You Only Look Once
PRC	Precision–Recall Curve
IOU	Intersection-Over-Union
NMS	Non-Maximum Suppression
mAP	mean Average Precision

References

Li, D.; Wang, M.; Dong, Z.; Shen, X.; Shi, L. Earth observation brain (EOB): An intelligent earth observation system. Geo-Spatial Inf. Sci. 2017, 20, 134–140. [Google Scholar] [CrossRef] [Green Version]
Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
Wang, Y.; Dong, Z.; Zhu, Y. Multiscale block fusion object detection method for large-scale high-resolution remote sensing imagery. IEEE Access 2019, 7, 99530–99539. [Google Scholar] [CrossRef]
Schilling, H.; Bulatov, D.; Niessner, R.; Middelmann, W.; Soergel, U. Detection of vehicles in multisensor data via multibranch convolutional neural networks. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2018, 11, 4299–4316. [Google Scholar] [CrossRef]
Yang, X.; Sun, H.; Fu, K.; Yang, J.; Sun, X.; Yan, M.; Guo, Z. Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sens. 2018, 10, 132. [Google Scholar] [CrossRef] [Green Version]
Wang, C.; Bai, X.; Wang, S.; Zhou, J.; Ren, P. Multiscale visual attention networks for object detection in vhr remote sensing images. IEEE Geosci. Remote Sens. Lett. 2019, 16, 310–314. [Google Scholar] [CrossRef]
Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef] [Green Version]
Xiao, Z.; Liu, Q.; Tang, G.; Zhai, X. Elliptic fourier transformation-based histograms of oriented gradients for rotationally invariant object detection in remote-sensing images. Int. J. Remote Sens. 2015, 36, 618–644. [Google Scholar] [CrossRef]
Cheng, G.; Han, J.; Guo, L.; Qian, X.; Zhou, P.; Yao, X.; Hu, X. Object detection in remote sensing imagery using a discriminatively trained mixture model. ISPRS J. Photogramm. Remote Sens. 2018, 85, 32–43. [Google Scholar] [CrossRef]
Diao, W.; Sun, X.; Zheng, X.; Dou, F.; Wang, H.; Fu, K. Efficient saliency-based object detection in remote sensing images using deep belief networks. IEEE Trans. Geosci. Remote Sens. 2016, 13, 137–141. [Google Scholar] [CrossRef]
Han, J.; Zhou, P.; Zhang, D.; Cheng, G.; Guo, L.; Liu, Z.; Bu, S.; Wu, J. Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding. ISPRS J. Photogramm. Remote Sens. 2014, 89, 37–48. [Google Scholar] [CrossRef]
Han, X.; Zhong, Y.; Zhang, L. An efficient and robust integrated geospatial object detection framework for high spatial resolution remote sensing imagery. Remote Sens. 2017, 9, 666. [Google Scholar] [CrossRef] [Green Version]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (CVPR), Boston, MA, USA, 8–12 June 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 21–37. [Google Scholar]
Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object detection via region based fully convolutional networks. In Proceedings of the Neural Information Processing Systems (NIPS), Barcelona, Spain, 4–9 December 2016; pp. 379–387. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Wen, N.; Guo, R.; Ma, D.; Ye, X.; He, B. AIoU: Adaptive bounding box regression for accurate oriented object detection. Int. J. Intell. Syst. 2022, 37, 748–769. [Google Scholar] [CrossRef]
Zhang, Y.; Fu, K.; Sun, H.; Sun, X.; Zheng, X.; Wang, H. A multi-model ensemble method based on convolutional neural networks for aircraft detection in large remote sensing images. Remote Sens. Lett. 2018, 9, 11–20. [Google Scholar] [CrossRef]
Cheng, G.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
Wu, X.; Hong, D.; Tian, J.; Chanussot, J.; LI, W.; Tao, R. Orsim detector: A novel object detection framework in optical remote sensing imagery using spatial-frequency channel features. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5146–5158. [Google Scholar] [CrossRef] [Green Version]
Ren, Y.; Zhu, C.; Xiao, S. Small object detection in optical remote sensing images via modified faster R-CNN. Appl. Sci. 2018, 8, 813. [Google Scholar] [CrossRef] [Green Version]
Yang, C.; Li, W.; Lin, Z. Vehicle object detection in remote sensing imagery based on multi-perspective convolutional neural network. ISPRS Int. J. Geo-Inf. 2018, 7, 249. [Google Scholar] [CrossRef] [Green Version]
Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
Guo, W.; Yang, W.; Zhang, H.; Hua, G. Geospatial object detection in high resolution satellite images based on multi-scale convolutional neural network. Remote Sens. 2018, 10, 131. [Google Scholar] [CrossRef] [Green Version]
Chen, Z.; Zhang, T.; Ouyang, C. End-to-end airplane detection using transfer learning in remote sensing images. Remote Sens. 2018, 10, 139. [Google Scholar] [CrossRef] [Green Version]
Li, K.; Cheng, G.; Bu, S.; You, X. Rotation-insensitive and context augmented object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2337–2348. [Google Scholar] [CrossRef]
Deng, Z.; Sun, H.; Zhou, S.; Zhao, J.; Lei, L.; Zou, H. Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2018, 145, 3–22. [Google Scholar] [CrossRef]
Dong, Z.; Wang, M.; Wang, Y.; Zhu, Y.; Zhang, Z. Object detection in high resolution remote sensing imagery based on convolutional neural networks with suitable object scale features. IEEE Trans. Geosci. Remote Sens. 2020, 58, 2104–2114. [Google Scholar] [CrossRef]
Liu, W.; Ma, L.; Wang, J.; Chen, H. Detection of multiclass objects in optical remote sensing images. IEEE Geosci. Remote Sens. Lett. 2019, 16, 791–795. [Google Scholar] [CrossRef]
Ma, H.; Liu, Y.; Ren, Y.; Yu, J. Detection of collapsed building in post-earthquake remote sensing images based on the improved YOLOv3. Remote Sens. 2019, 12, 44. [Google Scholar] [CrossRef] [Green Version]
Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI Transformer for Detecting Oriented Objects in Aerial Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Los Angeles, CA, USA, 16–19 June 2019. [Google Scholar]
Li, C.; Xu, C.; Cui, Z.; Wang, D.; Zhang, T.; Yang, J. Feature-Attentioned Object Detection in Remote Sensing Imagery. In Proceedings of the IEEE International Conference on Image Processing, Taipei, Taiwan, 22–25 September 2019; pp. 3886–3890. [Google Scholar]
Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
Wang, J.; Ding, J.; Guo, H.; Cheng, W.; Pan, T.; Yang, W. Mask OBB: A Semantic Attention-Based Mask Oriented Bounding Box Representation for Multi-Category Object Detection in Aerial Images. Remote Sens. 2019, 11, 2930. [Google Scholar] [CrossRef] [Green Version]
Xia, G.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
Qian, X.; Lin, S.; Cheng, G.; Yao, X.; Ren, H.; Wang, W. Object detection in remote sensing images based on improved bounding box regression and multi-level features fusion. Remote Sens. 2020, 12, 143. [Google Scholar] [CrossRef] [Green Version]
Henderson, P.; Ferrari, V. End-to-end training of object class detectors for mean average precision. In Proceedings of the Asian Conference on Computer Vision (ACCV), Kyoto, Japan, 30 November–4 December 2016; pp. 198–213. [Google Scholar]
Zhu, H.; Chen, X.; Dai, W.; Fu, K.; Ye, Q.; Jiao, J. Orientation robust object detection in aerial images using deep convolutional neural network. In Proceedings of the IEEE International Conference Image Processing, Quebec City, QC, Canada, 27–30 September 2015; pp. 3735–3739. [Google Scholar]
Qian, W.; Yang, X.; Peng, S.; Guo, Y.; Yan, J. Learning modulated loss for rotated object detection. arXiv 2019, arXiv:1911.08299. [Google Scholar]
Yang, X.; Yan, J.; Feng, Z.; He, T. R3Det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Liu, Z.; Wang, H.; Weng, L.; Yang, Y. Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1074–1078. [Google Scholar] [CrossRef]

Figure 1. (a) Horizontal bounding box. (b) Oriented bounding box.

Figure 2. (a) Oriented bounding box using five parameters (x, y, w, h,

θ

). (b) Oriented bounding box with

θ

of

π / 2

.

Figure 2. (a) Oriented bounding box using five parameters (x, y, w, h,

θ

). (b) Oriented bounding box with

θ

of

π / 2

.

Figure 3. The coordinate system of remote sensing image processing.

Figure 4. Coordinate regression of the object region based on the anchor.

Figure 5. The output values of CNN.

Figure 6. The object detection framework based on convolutional neural networks with suitable object scale features.

Figure 7. Precision–recall curve diagram.

Figure 8. PRCs of the five object detection algorithms for (a) airplane, (b) storage-tank and (c) ship in WHU-RSONE-OBB.

Figure 9. PRCs of the five object detection algorithms for (a) airplane, and (b) car in the UCAS-AOD dataset.

Figure 10. PRC of the five object detection algorithms for ship in HSRC2016 dataset.

Figure 11. (a–c) are object detection results of CNN-SOSF, YOLOv3 and CNN-AOOF, respectively.

Table 1. The number of three kinds of objects in WHU-RSONE-OBB.

Object	Number
Airplane	15,703
Storage-tank	24,692
Ship	10,263

Table 2. Performance comparisons of the five object detection algorithms in terms of AP values for WHU-RSONE-OBB dataset. The bold number represents the maximum value of each column.

	Airplane	Storage-Tank	Ship	mAP
Faster-RCNN [16]	0.9486	0.5634	0.7638	0.7586
CNN-SOSF [33]	0.9521	0.7461	0.7520	0.8167
YOLOv2 [20]	0.7116	0.3166	0.4422	0.4901
YOLOv3 [21]	0.9776	0.8709	0.7865	0.8784
CNN-AOOF	0.9857	0.8831	0.7920	0.8869

Table 3. The average time consumptions of the five object detection algorithms.

	Time/Per Image (s)
Faster-RCNN [16]	0.467
CNN-SOSF [33]	0.528
YOLOv2 [20]	0.102
YOLOv3 [21]	0.139
CNN-AOOF	0.233

Table 4. Performance comparisons of the five object detection algorithms in terms of AP values for UCAS-AOD dataset.

	Airplane	Car	mAP
Faster-RCNN [16]	0.9270	0.7582	0.8426
CNN-SOSF [33]	0.9339	0.7965	0.8652
YOLOv2 [20]	0.7426	0.1501	0.4463
YOLOv3 [21]	0.9414	0.8805	0.9109
CNN-AOOF	0.9488	0.8996	0.9242

Table 5. Performance comparisons of the five object detection algorithms in terms of AP values for the HSRC2016 dataset.

	Ship
Faster-RCNN [16]	0.8349
CNN-SOSF [33]	0.8301
YOLOv2 [20]	0.4230
YOLOv3 [21]	0.8144
CNN-AOOF	0.8567

Table 6. The quantitative comparison results of different object detection algorithms for the DOTA dataset.

	PL	SH	ST	BD	TC	BC	GTF	HA	BR	LV	SV	HC	RA	SBF	SP	mAP
RoI Trans [36]	0.8864	0.8359	0.8146	0.7852	0.9074	0.7727	0.7592	0.6283	0.4344	0.7368	0.6881	0.4767	0.5354	0.5839	0.5893	0.6956
SCRDet [37]	0.8998	0.7241	0.8686	0.8065	0.9085	0.8794	0.6836	0.6625	0.5209	0.6032	0.6836	0.6521	0.6668	0.6502	0.6824	0.7261
Li et al. [38]	0.9021	0.7956	0.8468	0.7958	0.9083	0.834	0.7641	0.7417	0.4549	0.6827	0.7318	0.6486	0.6542	0.534	0.6969	0.7328
Mask OBB [39]	0.8956	0.8563	0.8648	0.8595	0.8985	0.8381	0.729	0.7394	0.5421	0.7416	0.7652	0.6332	0.6964	0.5489	0.6906	0.7533
CNN-AOOF	0.8821	0.7763	0.8612	0.8162	0.8954	0.8531	0.7293	0.8063	0.588	0.7882	0.7102	0.6361	0.6092	0.6263	0.7784	0.7571

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, Z.; Wang, M.; Wang, Y.; Liu, Y.; Feng, Y.; Xu, W. Multi-Oriented Object Detection in High-Resolution Remote Sensing Imagery Based on Convolutional Neural Networks with Adaptive Object Orientation Features. Remote Sens. 2022, 14, 950. https://doi.org/10.3390/rs14040950

AMA Style

Dong Z, Wang M, Wang Y, Liu Y, Feng Y, Xu W. Multi-Oriented Object Detection in High-Resolution Remote Sensing Imagery Based on Convolutional Neural Networks with Adaptive Object Orientation Features. Remote Sensing. 2022; 14(4):950. https://doi.org/10.3390/rs14040950

Chicago/Turabian Style

Dong, Zhipeng, Mi Wang, Yanli Wang, Yanxiong Liu, Yikai Feng, and Wenxue Xu. 2022. "Multi-Oriented Object Detection in High-Resolution Remote Sensing Imagery Based on Convolutional Neural Networks with Adaptive Object Orientation Features" Remote Sensing 14, no. 4: 950. https://doi.org/10.3390/rs14040950

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Oriented Object Detection in High-Resolution Remote Sensing Imagery Based on Convolutional Neural Networks with Adaptive Object Orientation Features

Abstract

1. Introduction

2. Materials and Methods

2.1. The Adaptive Object Orientation Regression Method

2.2. CNN-AOOF Framework Design

3. Results

3.1. Object Detection for WHU-RSONE-OBB

3.2. Object Detection for UCAS-AOD

3.3. Object Detection for HRSC2016

3.4. Object Detection for DOTA

4. Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI