HCA-RFLA: A SAR Remote Sensing Ship Detection Based on Hierarchical Collaborative Attention Method and Gaussian Receptive Field-Driven Label Assignment Strategy

Xue, Tao; Zhang, Jiayi; Lv, Wen; Xi, Long; Li, Xiang

doi:10.3390/electronics13224470

Open AccessArticle

HCA-RFLA: A SAR Remote Sensing Ship Detection Based on Hierarchical Collaborative Attention Method and Gaussian Receptive Field-Driven Label Assignment Strategy

by

Tao Xue

,

Jiayi Zhang

,

Wen Lv

^*,

Long Xi

and

Xiang Li

Shaanxi Key Laboratory of Clothing Intelligence and State-Province Joint Engineering and Research Center of Advanced Networking and Intelligent Information Services, School of Computer Science, Xi’an Polytechnic University, Xi’an 710048, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(22), 4470; https://doi.org/10.3390/electronics13224470

Submission received: 14 October 2024 / Revised: 8 November 2024 / Accepted: 11 November 2024 / Published: 14 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

Ensuring safety at sea has become a primary focus of marine monitoring, driving the increasing adoption of ship detection technology in the maritime industry. Detecting small ship targets in SAR images presents challenges, as they occupy only a small portion of the image and exhibit subtle features, reducing detection efficiency. To address these challenges, we propose the HCA-RFLA algorithm for ship detection in SAR remote sensing. To better capture small targets, we design a hierarchical collaborative attention (HCA) mechanism that enhances feature representation by integrating multi-level features with contextual information. Additionally, due to the scarcity of positive samples for small targets under IoU and center sampling strategies, we propose a label assignment strategy based on Gaussian receptive fields, known as RFLA. RFLA assigns positive samples to small targets based on the Gaussian distribution between feature points and ground truth, increasing the model’s sensitivity to small samples. The HCA-RFLA was experimentally validated using the SSDD, HRSID, and SSD datasets. Compared to other state-of-the-art methods, HCA-RFLA improves detection accuracy by 6.2%, 4.4%, and 3.6%, respectively. These results demonstrate that HCA-RFLA outperforms existing algorithms in SAR remote sensing ship detection.

Keywords:

SAR; hierarchical collaborative attention; label assignment strategy; Gaussian distribution relationship

1. Introduction

Remote sensing ship detection methods are now widely applied in marine monitoring [1,2,3], including maritime transportation regulation [4], real-time tracking of military ship movements, and search and rescue operations for vessels in distress [5]. Accurate identification of ship targets, particularly in military applications, forms a critical foundation for tracking ship movements and supporting national political and defense strategies.

For the past several years, convolutional neural networks (CNNs) have garnered significant attention from researchers for their robust feature extraction capabilities. In the area of small ship target detection, particularly for addressing the challenges of multi-scale ship detection in SAR images, researchers have proposed methods based on feature fusion and enhanced target feature representation. For example, A Khan et al. [6] proposed a contextual refinement module (CRM) to enhance the refinement of object features, considering multi-scale features, further refining the features, and generating more accurate prediction maps. Ren et al. [7] proposed embedding the Channel and Location-enhanced Attention (CPEA) module and customizing the Enhanced Spatial Pyramid Pooling (EnSPP) module in the YOLOv5 backbone network. Chen et al. [8] proposed the SAS-FPN module, which integrates atrous spatial pyramid pooling and shuffle attention, allowing the model to concentrate on critical information. Gong et al. [9] proposed the SSPNet module, an enhancement of the FPN, enabling networks to select more representative features. Zhang et al. [10] proposed an FPN with deformable convolutional enhancement, improving the model’s feature extraction and fusion capabilities. U Khan et al. [11] proposed a diabetic retinopathy detection (DRD) method combining jump-joining and an upgraded feature block (UFB), which integrates residual learning strategies to extract and aggregate features across different levels, thereby improving the model’s classification performance. Sun et al. [12] proposed DANet, a two-branch activation network that extracts small target features through refinement and a pyramid structure. Cheng et al. [13] enhanced ship detection accuracy by introducing the MBA module, which strengthens ship feature representation and increases their saliency in SAR images. While these methods effectively fuse information across feature layers, they rely on a direct layer-by-layer accumulation and fail to take into account the contribution of the target at each layer.

Anchor-based methods determine positive and negative samples using IoU thresholding [14,15,16,17], while anchor-free methods rely on center sampling to identify positive and negative samples [18,19]. Both approaches determine samples based on the overlap between the ground truth and the predicted anchor frame. However, for small targets, the ground truth and most predicted frames may not overlap at all. This results in a scarcity of positive samples for small targets under both methods, leading to frequent missed detections and misidentifications of small targets.

Firstly, although existing SAR ship detection algorithms that incorporate attention mechanisms can improve detection accuracy, they treat all layers of the feature map as equally important, performing cumulative operations. This approach can lead to feature leakage and misdetection, particularly in complex environments or when detecting dense and small targets. During feature extraction, as the network deepens, its ability to extract deeper semantic information increases, but this often comes at the cost of shallow, high-resolution features. Furthermore, existing models fail to account for the contribution of different feature map layers to the output during extraction, and they do not utilize inter-layer collaboration to dynamically assign weights for feature fusion. Consequently, the network’s inability to effectively distinguish between interference and ship features in SAR images undermines performance. Secondly, ship targets occupy a very small portion of the entire SAR image, and in existing models using the IoU label assignment criterion, most preset anchor boxes and ground truths (GTs) do not overlap, resulting in their classification as negative samples. This leads to a severe imbalance between positive and negative samples for small targets, degrading detection performance. To address the scarcity of positive samples for ship targets in SAR scenarios, this paper proposes a label assignment strategy based on the Gaussian receptive field, where the assignment rule relies on a Gaussian distribution to measure the similarity between ground truth (GT) and predicted values, instead of overlap. This strategy can still be effective even when there is no overlap between the two anchor boxes.

Current attention mechanisms primarily focus on assigning weights to information at individual levels, often overlooking inter-level relationships. To address this, we propose the Hierarchical Collaborative Attention (HCA) module, which captures the relationships between neighboring hierarchical feature maps and assigns weights according to their contributions at each scale. Additionally, to further improve the model’s ability to focus on small targets and ensure a balanced distribution of positive samples, we propose a label assignment strategy based on the Gaussian receptive field, which enhances detection performance for small targets.

Specifically, the main contributions of our proposed HCA-RFLA method are:

To address the issue that existing attention mechanisms overlook the correlation between adjacent layers in small target detection, we propose the Hierarchical Collaborative Attention (HCA) module, which enables the network to extract effective features of small targets and significantly improves detection accuracy.
To enhance the network’s detection speed for ships in SAR images and address the imbalance between positive and negative samples for small targets, we propose a Gaussian Receptive Field-based Label Assignment (RFLA) strategy to effectively improve the network’s focus on small targets.
We conduct experiments on the SSDD, HRSID, and SSD datasets. The results show that, compared to the traditional model, the HCA-RFLA achieves higher accuracy in small target detection.

2. Related Work

2.1. Traditional SAR Ship Detection Methods

Traditional object detection algorithms are typically manually designed and fine-tuned using feature parameters. These methods primarily include threshold-based detection, template matching, statistical feature-based detection, and similar approaches.

Threshold-based detection algorithms are generally divided into global and local threshold methods. Zhu et al. [20] proposed an algorithm that determines the global threshold based on target pixels and then locates the ship within the entire SAR image. The algorithm models the distribution of unchanged and changed pixels in advance, yielding a non-integer threshold solution to determine the ship’s position in the SAR image. However, sea surface clutter influences the threshold, reducing the algorithm’s detection performance in complex sea conditions.

Liu et al. [21] introduced a template matching algorithm utilizing a self-attentive interactive fusion network. The algorithm first calculates a ship target template, then fuses extracted features through the self-attentive layer of a transformer for information interaction, and finally matches the result with the ship target in the SAR image. Chen et al. [22] proposed integrating deep learning with pattern matching techniques to enhance feature extraction accuracy.

One of the most widely used statistical detection algorithms is the constant false-alarm rate (CFAR). Researchers have developed various CFAR-based statistical models tailored to specific task requirements. Zeng et al. [23] introduced the CFAR-DP-FW algorithm, enhancing the network’s ability to detect small maritime targets by using dual-polarized feature representation and CFAR to suppress background noise. Yang et al. [24] applied CFAR to detect ship targets and cluster small target pixels based on spatial distances and detection thresholds, enhancing SAR image target detection accuracy. Li et al. [25] proposed a CFAR approach based on superpixel merging, which filtered noise from candidate superpixels and selected remaining candidate superpixels for large target merging. The effectiveness of the CFAR algorithm is largely dependent on the statistical model of sea surface clutter and the chosen model parameters. Specifically, CFAR relies on the designer’s a priori knowledge.

In summary, traditional SAR ship detection methods are heavily dependent on the designer’s a priori knowledge. Although these methods perform well in relatively simple conditions, they tend to produce more false alarms and exhibit reduced performance in complex offshore environments.

2.2. Deep-Learning-Based SAR Ship Detection Methods

According to the literature [26,27,28], existing deep learning-based ship detection methods are divided into two-stage and single-stage detection algorithms. Two-stage detection models primarily include region-based convolutional neural networks (R-CNN) [29], Fast R-CNN [30], and Mask R-CNN [31]. These algorithms first generate candidate regions using region proposal methods, then perform feature extraction on these regions in the convolutional neural network, and finally predict the target’s location and category probabilities through classification and regression operations [32]. While these models offer high detection accuracy, they impact the network’s image processing speed. In contrast, single-stage algorithms like single-shot multibox detector (SSD) [33,34], RetinaNet [35,36,37], and You Only Look Once (YOLO) [38] bypass the candidate region selection stage, simplifying detection into a regression problem [39]. Although they generally exhibit lower detection accuracy, they achieve faster computation speeds than two-stage models, making them suitable for real-time detection applications.

In recent years, many studies have enhanced the YOLO series algorithms, leading to substantial improvements in detection performance. For example, Xi et al. [40] proposed an improved SSD method that employs two feature pyramids—one from the inverse convolution module and one from the feature fusion module—for feature prediction, thereby increasing detection accuracy. Chen et al. [41] developed a high-resolution feature pyramid (HR-FPN) that improves small target detection accuracy while reducing redundant features. Zhang et al. [42] created an enhanced layered feature fusion network based on YOLOv7, using a 1 × 1 convolutional kernel to reduce feature dimensionality and lower computational load during inference. Cheng et al. [43] incorporated full-dimensional convolution into the YOLOv5 backbone to boost overall model accuracy without expanding network width or depth. Wang et al. [44] proposed a feature pyramid design incorporating multiscale feature attention, which extracts and enhances principal components with higher intraclass cohesion and better interclass separability, addressing issues of high intraclass variance and interclass overlap in ship features. Lu et al. [45] integrated ShuffleNetV2 and CBAM attention mechanisms into the network, substantially reducing model weight while noticeably improving detection accuracy. While these methods generally meet remote sensing detection requirements, they still struggle with misdetection and under-detection of small targets.

In summary, unlike traditional SAR ship detection methods, deep-learning-based algorithms are data-driven and do not depend on a priori knowledge like manually defined thresholds or sea clutter distributions. This offers a novel solution for ship detection across various scenarios. Hence, we propose the HCA-RFLA algorithm for detecting small targets in remotely sensed SAR ship images.

2.3. Label Assignment Strategy

Currently, both anchor-based and anchor-free methods assign positive and negative samples by measuring the overlap between ground truth (GT) and predicted anchors. For small targets, GT and most predicted anchors may not overlap due to the limited size of feature points. As a result, researchers have proposed various label matching rules. Wang et al. [46] developed the DeIoU paradigm, which enhances the differentiation of small target features. DeIoU effectively reduces overlap between prediction frames and minimizes bias in prediction results. Feng et al. [47] introduced task alignment learning, which dynamically selects high-quality anchors while optimizing anchor assignment and weighting. This collaborative approach enhances the accuracy of classification and localization tasks, improving overall target detection accuracy. Xu et al. [48] introduced the NWD-RKA strategy, which can be embedded into various anchored detectors to enhance supervisory information during network training and improve small target detection performance. Xu et al. [49] proposed a Gaussian receptive field-based label assignment (RFLA) strategy that replaces IoU and central sampling with a Gaussian distribution method to assign samples. It measures the similarity between the receptive field and GT using a Gaussian distribution, ensuring balanced learning for small objects.

3. Algorithmic Improvements

The HCA-RFLA method is illustrated in Figure 1. YOLOv8 is chosen as the baseline network primarily because its CSP architecture enhances feature representation by partially segmenting and sharing computational feature maps, making it especially suitable for detecting small targets. Furthermore, CSPDarknet [50] improves gradient stability during training by reducing redundant computations in the gradient stream, which is especially important for small target detection. Specifically, images are input into CSPDarknet to extract target features, which are then fed into the FPN. Since the element-wise accumulation method of the FPN assumes that the feature maps at each level have equal importance, it does not account for the varying contributions of different levels to the target, leading to issues of feature redundancy and loss, particularly for small targets. To address this issue, we designed the Hierarchical Collaborative Attention (HCA) module in the neck part of the FPN. The HCA module determines the final feature fusion result by leveraging spatial and inter-channel hierarchical correlations to enhance small-target features in each hierarchical feature map while suppressing redundant and invalid features. Next, the features extracted by the HCA module are fed into the RFLA module for positive and negative sample allocation of ship targets, and the results are passed to the Decoupled Head. The decoupled head reduces channel dimensions through convolutional layers and generates ship target positions and classifications by predicting the bounding box and confidence level of the target. Finally, the optimal detection results from HCA-RFLA are produced using non-maximal suppression (NMS).

3.1. Integrated Attention Mechanism Module

To enhance the model’s capability of detecting targets at different scales, the FPN uses a top-down, element-wise summation method to integrate deep semantic information into shallow feature maps. However, this element-wise summation assumes equal importance for feature maps at each level. In small target detection, the resolution of feature maps decreases progressively with downsampling, potentially leading to the loss of small target information in higher-level feature maps. To address this issue, this paper proposes a Hierarchical Collaborative Attention (HCA) module. The module fully accounts for the correlation between neighboring feature maps in small target detection and the influence of surrounding environmental noise on the ship target. The HCA processes three feature layers, C3, C4, and C5, extracted by the backbone network, and assigns weights to them through channel and spatial attention mechanisms to suppress irrelevant features. This enables the network to retain high-resolution features in the shallow layers while combining semantic information from the upper layers, resulting in a fused feature map with richer positional and semantic information.

In this paper, the HCA module captures the correlation information between adjacent levels during feature fusion and determines the weights for feature fusion to generate a feature map that adapts to the target size.

The internal structure of the HCA is shown in Figure 2. It consists of two components: channel attention and spatial attention. SENet [51] is a widely used channel attention mechanism that significantly improves model accuracy by computing the weights of two input feature maps to enhance the original features. However, SENet focuses on a single channel, whereas the channel attention mechanism in this paper captures correlations between feature maps from different levels within the channel. Specifically, the channel attention block cascades feature maps from two layers, applies average pooling, and compresses them into a vector V. The vector is then split into two parts and sent through two fully connected layers to obtain the weight vectors

w_{t_{1}}

and

w_{t_{2}}

for the two channels. Finally, the input feature map is dot-multiplied by

w_{t_{1}}

and

w_{t_{2}}

, respectively, to produce the output of the channel attention block. The calculation processes are illustrated in Equations (1)–(3).

V = \frac{1}{N} \sum_{i = 1}^{N} A v g p o o l (F_{1}, F_{2})

(1)

In Equation (1),

F_{1}

and

F_{2}

are the two input feature maps, and N is the total number of feature channels.

A v g p o o l (F_{1}, F_{2})

denotes the average pooling operation applied to the input features

F_{1}

and

F_{2}

to obtain the compression vector V.

W_{t_{i}} = F C (V_{i})

(2)

In Equation (2),

V_{i}

is the segmentation vector of V,

F C (V_{i})

denotes the fully connected operation, and

W_{t_{i}}

is the vector of channel weights obtained after processing by the fully connected layer.

T = F_{i} ⊙ W_{t_{i}}

(3)

In Equation (3), ⊙ is the dot product operation and T is the output of the channel attention module.

F_{i}

is the i-th feature map of the input, where

i \in (1, 2)

.

Unlike the spatial attention module in CBAM [52], which focuses solely on the characteristics of a single feature map, the spatial attention block proposed in this paper computes spatial weights for two input feature maps to capture their spatial correlation. Specifically, the spatial attention block cascades the two output feature maps from the channel attention block, splits them into two parts, and applies Sigmoid activation to generate two spatial attention weights,

W_{k_{1}}

and

W_{k_{2}}

. Finally, the feature maps at the two levels are dot-multiplied by

W_{k_{1}}

and

W_{k_{2}}

, respectively. The final output of the spatial attention block is obtained by summing these results. The computational steps are illustrated in Equations (4) and (5).

W_{k_{i}} = S i g m o i d (S p l i t (\sum_{i = 1}^{2} C h a n n e l (F_{i})))

(4)

In Equation (4),

C h a n n e l (F_{i})

denotes the channel attention result of the i-th feature map.

S p l i t (.)

refers to the vector-splitting operation, which separates the concatenated two-part channel feature map into two segments, after which the spatial attention weights

W_{k_{i}}

are calculated using the Sigmoid activation function.

K = \sum_{i = 1}^{2} W_{k_{i}} ⊙ F_{i}

(5)

In Equation (5), K is the output result of the spatial attention module.

Through the cross-calculation of feature maps across channels and spatial dimensions, the HCA module derives the spatial and channel weights for two input feature maps from adjacent levels, thereby enhancing relevant features and suppressing irrelevant ones during feature fusion.

3.2. Gaussian Receptive Field Based Label Assignment Strategy

For small target detection, YOLOv8’s task-aligned assigner [47] must continuously monitor model performance on the training set and adjust the positive-to-negative sample ratio based on the performance, thus increasing the computational and time costs during model training. Taking these factors into account, we introduce the RFLA strategy. This approach assigns labels based on the receptive field of feature points, directly reflecting their ability to detect targets without adding extra computational or time costs, making it more suitable for detecting small and fine-grained targets.

In addition, box prior methods determine positive and negative samples using an IoU threshold, and point prior methods use a central sampling strategy to determine positive and negative samples. Both sampling methods determine the sample according to the overlap between the GT and the predicted anchor. For tiny targets, there is often no overlap between the GT and most of the predicted anchors, which largely leads to the misdetection of small targets. Therefore, we introduce the RFLA strategy, which uses a Gaussian distribution to measure the similarity between GT and predicted anchor. Even if the two anchors do not have overlapping regions, the strategy can still learn effectively. In brief, the IoU and center sampling-based strategies focus more on large target detection tasks; therefore, RFLA is introduced to achieve balanced learning for small targets.

In the first stage, the ground truths (GTs) of the SAR ship dataset are initially constrained to the smallest possible rectangular bounding boxes. The GTs are then transformed into Gaussian representations, and the perceptual fields are modeled as Gaussian distributions. As illustrated in Figure 3,

x_{p}

and

y_{p}

represent the coordinates of each feature point, while

x_{g}

and

y_{g}

denote the center coordinates of the GT. Additionally,

w_{g}

and

h_{g}

correspond to the width and height of the GT, and

r_{p}

refers to the radius of the Gaussian effective receptive field.

Equation (6) represents the interior tangent ellipse of the minimal rectangular box defining the GT in Figure 3.

\frac{{(x - x_{g})}^{2}}{{(\frac{w_{g}}{2})}^{2}} + \frac{{(y - y_{g})}^{2}}{{(\frac{h_{g}}{2})}^{2}} = 1

(6)

Equation (7) represents the GT of a two-dimensional Gaussian distribution.

μ_{g}

is the mean and is the centroid coordinates of the GT,

Σ_{g}

is the covariance, and T is the matrix transpose symbol. The sensory field is also converted to a Gaussian distribution

N_{p} (μ_{p}, Σ_{p})

.

μ_{g} = {[x_{g}, y_{g}]}^{T}, Σ_{g} = [\begin{matrix} \frac{w_{g}^{2}}{4} & 0 \\ 0 & \frac{h_{g}^{2}}{4} \end{matrix}]

(7)

In the second stage, the Wasserstein distance is employed to quantify the difference between two distributions. The Wasserstein distance is a metric used to evaluate the similarity between two probability distributions. In tiny target detection, the GT and most predicted anchors usually do not overlap, which hinders the performance of subsequent detection tasks. Therefore, the Wasserstein distance is particularly useful in ship detection for measuring the distance between non-overlapping distributions. The two-dimensional Wasserstein distance is expressed in Equation (8).

w_{2}^{2} (n_{e}, n_{g}) = {∥({[x_{p}, y_{p}, r_{p}, r_{p}]}^{T}, {[x_{g}, y_{g}, \frac{w_{g}}{2}, \frac{h_{g}}{2}]}^{T})∥}_{2}^{2}

(8)

In (8),

n_{e} = N_{e} (μ_{e}, Σ_{e})

and

n_{g} = N_{g} (μ_{g}, Σ_{g})

are given Gaussian ERF and Gaussian GT.

As indicated in Equation (8), the position of the center point and the dimensions (length and width) of the anchor frame are closely related to the Wasserstein distance. The relationship between feature points and GTs is influenced by the specific characteristics of tiny targets, such as the edge details of ship targets. This complex, nonlinear relationship cannot be accurately captured by a simple linear distance metric, thus necessitating a nonlinear transformation of the Wasserstein distance to better reflect the similarity between feature points and GTs. Finally, the Gaussian receptive field distance (RFD), normalized to a range between 0 and 1, is derived. It is computed as shown in Equation (9).

R F D = \frac{1}{1 + w_{2}^{2} (n_{e}, n_{g})}

(9)

Finally, to accurately determine the positional relationship between a feature point and the ground truth (GT), we propose the hierarchical label assignment (HLA) strategy, specifically designed for ship detection tasks. The HLA framework is depicted in Figure 4. In this strategy, the first step is to compute the RFD score matrix for each feature point relative to all ground truths (GTs) using the aforementioned method and then rank the RFD scores. Next, the feature points with the top three RFD scores are selected and labeled as positive samples, denoted as

P 1

,

P 2

, and

P 3

, respectively. The positive sample size denotes P. After labeling the samples, the preliminary allocation result

c_{1}

and a mask y identifying these selected keypoints are obtained. Here, y is a binary value of 1 or 0, indicating ship positive and negative samples, respectively.

To further improve the model’s recall for positive samples and reduce interference from outliers, we introduce an adaptive factor

μ

, with a value between 0 and 1, to adjust the radius of the effective receptive field, thereby generating a new radius R.

R = μ \cdot r_{p}

(10)

The radius of the effective receptive field was adjusted, enabling the network to reorder the RFD scores based on a more precise range. The initial distribution result

c_{1}

, feature point radius

r_{p}

, and mask y from the first stage are then used as inputs for the second stage. This reordering strategy is repeated, with one positive sample added for each GT to obtain the updated distribution result

c_{2}

. This method not only refines the selection of high-scoring feature points but also introduces an additional level to the final allocation result C, thereby enhancing the efficiency and accuracy of the overall matching strategy. The final result is given by Equation (11).

C = c_{1} y + c_{2} (1 - y)

(11)

The bounding boxes of small targets cover limited areas and contain fewer feature points, often resulting in missed and false detections. The RFLA strategy proposed in this paper effectively addresses the issue of insufficient positive samples for small targets, significantly enhancing model accuracy.

3.3. Loss Function

The loss function evaluates the model’s accuracy in target localization. HCA-RFLA uses CIoU [53] as the position loss function. CIoU considers the centroid distance, aspect ratio difference, and area difference between target frames. Therefore, the CIoU loss function can better assess the similarity of target frames, thereby improving model accuracy in target localization. The formula for the CIoU loss function is as follows:

L o s s_{C I o U} = 1 - I o U + \frac{ρ^{2} (b_{p}, b_{g})}{c^{2}} + a v

(12)

In Equation (12), IoU denotes the intersection over union of the predicted frame and the ground truth frame;

ρ^{2} (b_{p}, b_{g})

is the squared distance between the centroids of the predicted and ground truth frames; c is the diagonal of the enclosing frame, and a and v regulate frame size differences.

HCA-RFLA uses DFL Loss [54] as the target loss function, which integrates location and categorization loss weights. By adjusting these weights, DFL Loss increases focus on difficult samples and improves regression accuracy. The DFL Loss formula is as follows:

L o s s_{D F L} = - ((y_{i + 1} - y) l o g (S_{i}) + (y - y_{i}) l o g (S_{i + 1}))

(13)

In Equation (13), y is the sample label, and y and

y_{i}

are the labels of two neighboring samples.

Class loss calculates classification error in target detection. The HCA-RFLA method in this paper employs BCE Loss to evaluate the classification performance of the model [55]. BCE Loss effectively addresses classification challenges, particularly in the presence of unbalanced classes. Its formula is as follows:

L o s s_{B C E} = - (y l o g (\hat{y}) + (1 - y) l o g (1 - \hat{y}))

(14)

In Equation (14), y is the sample label, with the positive sample labeled as 1, and

\hat{y}

is the predicted probability of a positive sample. By minimizing BCE Loss, the model better adjusts its parameters and improves classification accuracy. This loss function effectively penalizes misclassification and helps the model learn more accurate classification boundaries.

For SAR remote sensing ship detection, the CIoU loss, DFL loss, and BCE loss described above are key loss functions that enhance model detection accuracy. The overall loss function is:

L o s s = L o s s_{C I o U} + L o s s_{D F L} + L o s s_{B C E}

(15)

4. Experiments and Analysis

4.1. Experimental Environment and Data Preparation

The experimental environment is the Windows 11 operating system with the Pytorch deep learning framework based on Python 3.8 and CUDA 11.1. The GPU model is NVIDIA GeForce RTX4060, and the RAM is 32 GB. The initial learning rate of the network is set to 0.01, the AdamW optimizer is used, the non-maximum suppression (NMS) threshold is set to 0.5, the batch size is set to 16, and the epoch is set to 100. To avoid overfitting, we use the early stopping mechanism in our experiments and set the patience to 10 epochs. If the validation loss does not significantly decrease over 10 epochs, training will be stopped early.

This experiment uses the SAR Ship Detection Dataset (SSDD) [56], which contains a total of 1160 images and 2456 ships, with an average of 2.12 ships per image. The average size of the ship samples is approximately

35 \times 35

pixels. The dataset is split into training and test sets in an 8:2 ratio.

To evaluate the algorithm’s performance in complex scenarios, we also used the High Resolution SAR Images Dataset (HRSID) [57] and SAR-Ship-Dataset (SSD) [58]. The HRSID dataset includes 5604 high-resolution SAR images with 16,951 ship targets of varying sizes, each image of size

800 \times 800

pixels. The SSD dataset includes 210 SAR images and 39,729 ship targets. Both datasets contain scenarios with different resolutions and complexities, such as harbors and near-shore locations, enabling the algorithms to effectively learn from diverse scenarios. The training and test sets for both datasets are allocated in a 7:3 ratio.

4.2. Evaluation Criterion

To test the performance of the improved algorithm, we conduct a comparative evaluation based on the following indicators: precision (P), recall (R), average precision (AP), average accuracy (mAP), frames per second (FPS), and so on. The related equations are as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(16)

R e c a l l = \frac{T P}{T P + F N}

(17)

A P = \int_{0}^{1} P (R)

(18)

m A P = \frac{\sum_{j = 1}^{c} A P_{j}}{c}

(19)

F P S = \frac{N}{T}

(20)

In the equations above, TP (true positive) refers to the positive class correctly classified as positive, FP (false positive) refers to the negative class incorrectly classified as positive, FN (false negative) refers to the positive class incorrectly classified as negative, and TN (true negative) refers to the negative class correctly classified as negative. According to Equation (18), the value of average precision (AP) is the integral of the precision–recall (PR) curve between 0 and 1, where the PR curve is generated from the computed precision (P) and recall (R). Additionally, c represents the number of classes, N is the total number of samples, and T is the time required for data detection.

4.3. Experimental Results and Analysis

Comparative Experiments

To validate the effectiveness of the proposed method, we compare the improved approach with several state-of-the-art target detection algorithms. For example, the Faster R-CNN [59], YOLOv5 [60], YOLOv7 [61], YOLOv8, HFPNET [62], DLAHSD [63], and VS-LSDet [64] algorithms are compared with the HCA-RFLA algorithm proposed in this paper on the SSDD dataset.

The detection results on the SSDD dataset are presented in Table 1. The performance metrics of HCA-RFLA, including precision, recall, and average precision, achieved the highest results. The precision is 97.1%, recall is 95.8%, and average precision is 98.3%, which are 4.3%, 6.3%, and 6.2% higher than the baseline model, respectively. The mAP of HCA-RFLA is 5.5%, 4%, and 2.6% higher than the latest algorithms HFPNET, DLAHSD, and VS-LSDet, respectively. Although the recall is 0.3% lower than HFPNet, the FPS of the proposed algorithm is the highest, demonstrating superior overall performance. Thus, the comparison results validate the effectiveness of the proposed HCA-RFLA algorithm.

To further demonstrate the effectiveness of HCA-RFLA and its superior performance across different datasets, we compare its performance with other algorithms on the HRSID dataset. As shown in Table 2, the HCA-RFLA algorithm proposed in this paper improves the mAP of the benchmark model by 4.4% and significantly enhances the FPS, achieving the highest FPS among all compared algorithms.

To further assess the generalization performance of our method, we evaluate the SSD dataset. As shown in Table 3, HCA-RFLA achieved a 96.3% mAP and the highest recall rate of 95.4%, outperforming other methods. HCA-RFLA improved mAP and recall (R) by 4.6% and 5.9%, respectively, compared to the baseline model YOLOv8. Compared to the latest algorithms HFPNET, DLAHSD, and VS-LSDet, the proposed method shows a 2% to 3% improvement in mAP. Furthermore, the FPS of HCA-RFLA is 37.9, significantly outperforming the other methods in FPS. Thus, HCA-RFLA achieves real-time detection without compromising accuracy. In summary, the HCA-RFLA method outperforms all other methods in terms of both mAP and FPS.

Figure 5 shows the PR curves for Faster R-CNN, YOLOv5, YOLOv7, YOLOv8, HFPNET, DLAHSD, VS-LSDet, and HCA-RFLA on the HRSID dataset. It is evident that YOLOv7 performs poorly in detecting small targets in remote sensing images of ships. This may be because YOLOv7 uses anchor boxes of fixed size and proportion to predict the position and size of targets, which may not be suitable for small targets of all sizes and proportions, resulting in poor detection performance for small targets. Apart from YOLOv7, the PR curves of the other methods are quite similar. Additionally, the brown curve representing HCA-RFLA consistently stays above the other curves. Figure 5 indicates that the performance of the other seven models on the same dataset is lower than that of the HCA-RFLA model. The superior performance of HCA-RFLA can be attributed to its adaptive fusion of features from adjacent levels during feature extraction, which emphasizes features relevant to small targets and thereby improves the accuracy of small target detection.

To provide a clearer comparison of the algorithms discussed, Figure 6 shows their detection results on the HRSID dataset.

Figure 6b–e clearly show that algorithms like Faster R-CNN, YOLOv5, YOLOv7, and YOLOv8 frequently misidentify background noise and island features as targets. These methods exhibit low overall classification accuracy for ships, resulting in missed detections and false positives. As shown in Figure 7a, the HFPNET method also exhibits fewer missed detections. This is due to the network’s failure to capture subtle ship features at the edges, resulting in missed detections of ships near the port. Although the detection performance in (b) and (c) is significantly improved over the baseline network, the accuracy in the nearshore region is much lower than in the offshore region. This may be due to the increased complexity of the nearshore scene compared to the offshore area, resulting in the failure to capture certain ship features that confuse the nearshore region.In contrast, the HCA-RFLA method proposed in this paper effectively detects the majority of ship targets, even amidst strong noise interference, demonstrating its effectiveness for remote sensing of small targets. The HCA-RFLA method showcases robust feature extraction capabilities and precise judgment. Compared to other methods, HCA-RFLA better focuses on and captures small targets.

The detection of inshore ships in complex scenes highlights the superiority of the HCA-RFLA method. As scene complexity increases, the number of missed detections and false positives also rises. The similarity between port and nearshore buildings and ship targets leads to the problem of redundant frames, which reduces algorithm accuracy and further affects the effectiveness of maritime safety detection. Despite these challenges, HCA-RFLA still effectively distinguishes between targets and backgrounds. As shown in Figure 7d, inshore vessels are accurately detected and captured. The HCA and RFLA components ensure accurate classification and matching of ship samples. In summary, the comparison of detection results further confirms the effectiveness of the HCA-RFLA method for SAR remote sensing ship detection.

4.4. Ablation Experiments

4.4.1. Module Ablation Experiment

This section evaluates the effectiveness of the improved modules. YOLOv8 is used as the baseline network, with SSDD serving as the ablation dataset for various experiments. We assess the impact of each improved module on detection accuracy. The results are presented in Table 4, where “✔” denotes the inclusion of the module and “×” denotes its exclusion.

According to the analysis of Table 4, strong scattering interference from land scenes results in the baseline network having poor detection ability for nearshore ship targets. The mean average precision (mAP) of the baseline network is only 92.1%. Incorporating HCA and RFLA significantly enhances the detection performance of the overall algorithm network.

Figure 8 illustrates the comparison of precision, recall, and mAP results from the ablation experiment. The results clearly show that incorporating HCA significantly improves the detection of small targets on remote sensing ships in SAR scenarios, enhancing precision and mAP by 1.7% and 3.8%, respectively. The complex background on the sea surface and increased speckle noise in SAR images may cause the network to mistakenly identify small pieces of coastline as ship targets during training. The baseline network’s feature extraction method, lacking detail and accuracy, is affected by various factors, leading to significant fluctuations and rendering it unsuitable for detecting small targets in remote sensing. Thus, incorporating the HCA module improves layer interconnections, enabling the network to better learn deep feature representations and preserve more critical information. The further integration of the RFLA module enhances the model’s capability to detect small targets. The Gaussian priors employed by this module align more closely with the characteristics of the Gaussian effective receptive field, mitigating the issue of receptive field mismatch for small targets. This enhancement improves the network’s localization performance for small targets, raising the mAP to 98.9%. The HCA-RFLA method addresses the issues of missed and false detections, proving superior to the baseline method in detecting small targets.

Figure 9 presents a comparison of heat maps for different methods across various scenarios. It is evident that the baseline model performs poorly in complex scenarios and with numerous small ship targets. This is likely due to the indistinct features of nearshore ships in complex backgrounds and the strong scattering interference from land scenes, which impair the network’s ability to detect nearshore ship targets. In contrast, Figure 9c shows that incorporating the HCA module enhances the capture of small target features and effectively reduces the network’s false detection rate. Figure 9d places greater emphasis on small target areas compared to the baseline model, leading to a reduction in the missed detection rate. Nevertheless, some background information contributing to false detections may still persist under these conditions. The HCA-RFLA method proposed in this paper, however, demonstrates exceptional performance and high accuracy in both complex backgrounds and scenarios with dense small targets. Figure 10 specifically presents a comparison of detection results from various methods on the SSDD dataset.

When the image contains complex noise and the ship’s profile is incomplete (Column 1), the baseline network is adversely affected by this noise, leading to the incomplete ship profile being misidentified as a reef. Figure 10c demonstrates that the detection accuracy of the network using the HCA module is significantly improved.

In dense small target scenarios (Column 2), many targets suffer from low resolution and a lack of textural features, further exacerbated by substantial noise interference. This results in the baseline network’s inability to detect small target features, leading to a high miss rate for these targets. The HCA-RFLA method described in this study substantially improves the network’s ability to identify small targets and significantly reduces both missed and false detections.

In the case of complex docking backgrounds (column 3), the baseline network struggles to distinguish between docked ships and buildings due to texture blurring in the image, resulting in significant omissions and false detections. Figure 10b shows that the baseline network has low prediction accuracy in complex harbor environments. When the ship is docked, the baseline network cannot distinguish between the ship and surrounding buildings, leading to significant omissions and false detections. This occurs because the baseline network uses an IoU strategy for positive and negative sample allocation, which heavily relies on the degree of overlap between predicted and true values. For small targets, it is common for there to be no overlap between the predicted and true values, leading to a lack of positive samples during the detection of small ship targets. In contrast, Figure 10d shows significantly fewer missed and false ship detections compared to Figure 10b, due to the use of the Gaussian label assignment strategy. This strategy dynamically assigns positive samples through a Gaussian distribution, adjusting the weights of candidate frames based on their distance from the target center, thus avoiding the hard boundary issue associated with IoU threshold assignment. Finally, Figure 10e demonstrates the effectiveness of the proposed method. The HCA-RFLA algorithm enhances the network’s feature extraction capabilities for small targets in complex backgrounds and introduces a label matching scheme, which significantly reduces the probability of omissions and false detections of ship targets.

To further optimize the hyperparameters in the RFLA module, we conducted an ablation experiment to evaluate the impact of different settings of the adaptive factor and the number of positive samples P on model performance. The experimental results are shown in Figure 11.

On the test set, HCA-RFLA achieved a maximum mAP of 97.2 at an

μ

value of 0.9, indicating that the appropriate receptive field range enhances the capture of small object features. Additionally, the number of positive samples P has little impact on mAP, and P = 4 is chosen to balance detection accuracy and model convergence speed. This experiment further confirms the significant influence of hyperparameters on small object detection.

To ensure training stability and fast convergence, we carefully selected and tuned the learning rate during model training. In preliminary experiments, we selected an initial learning rate of 0.01. An appropriate learning rate typically balances convergence speed with training stability, preventing oscillations and instability.

To verify the optimality of this setting, we conducted additional experiments with smaller learning rates (e.g., 0.001 and 0.0001). As shown in the Figure 12, while smaller learning rates lead to a smoother training process, convergence is significantly slower, and training time increases. Therefore, we selected 0.01 as the final learning rate.

4.4.2. Attention Module Ablation Experiment

To evaluate the impact of the attention module on model performance, we conducted ablation experiments. To assess the effectiveness of HCA, we integrated ECA [65], Coordinate Attention [66], Bi-former Attention [67], and SimAM [68] modules into the FPN of the YOLOv8 model separately. Experiments were conducted on the HRSID dataset.

Table 5 shows that all attention mechanisms enhance detection efficiency to some extent. The HCA module achieves the highest mAP of 95.9%, demonstrating its superior feature extraction capability for SAR small target detection. Further analysis reveals that the HCA module effectively focuses on key image regions, particularly for detecting small and medium-sized targets in complex backgrounds.

In summary, the comprehensive ablation experiments demonstrate that, compared to the baseline network, the HCA-RFLA method effectively addresses issues related to image blurring and a lack of texture features in SAR image ship target detection, and it performs robustly across various resolutions and scenes.

5. Conclusions

Detecting ship targets in SAR images presents several challenges, including image blurriness and significant noise, high target density in distant sea areas, and the presence of mixed targets and buildings in complex offshore backgrounds. These challenges lead to a high rate of both missed and false detections. To address these challenges, we propose HCA-RFLA, an improved YOLOv8-based algorithm for ship target detection in SAR remote sensing images. First, the HCA module is designed using attention mechanisms in both the channel and spatial domains, improving feature integration across adjacent levels and enhancing the network’s ability to extract features for small targets. Second, we introduce the RFLA strategy. By utilizing a Gaussian receptive field as prior information, the RFLA strategy models the positional relationships between feature points and ground truths throughout the image, facilitates the generation of balanced positive samples of various sizes, and addresses the mismatch in receptive fields for small targets between point and box priors. To validate the performance of the HCA-RFLA method, experiments were conducted on the SSDD, HRSID, and SSD datasets. The results demonstrate that the HCA-RFLA method outperforms other state-of-the-art detection models in terms of both speed and accuracy.

The HCA-RFLA method significantly enhances feature extraction and improves SAR small target detection performance while also providing valuable technical support for marine security monitoring, illegal activity detection, and marine resource management, thus contributing to the advancement of remote sensing-based ship detection. However, HCA-RFLA still has certain limitations. While it performs well in simulations across various datasets, it has yet to be deployed in real-world satellite or space station monitoring missions, which limits the ability to effectively test and verify its robustness and stability. In the future, we will prioritize testing and refining this model in real-world space environments, as well as developing more precise target extraction techniques for SAR images. Additionally, we aim to optimize the model architecture to improve its adaptability and interpretability in complex scenarios.

Author Contributions

Writing—original draft, J.Z.; Writing—review & editing, T.X.; Assistance, W.L., L.X. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by Xi’an Major Scientific and Technological Achievements Transformation Industrialization Project grant number 23CGZHCYH0008,and in part by Xi’an Polytechnic University 2024 Graduate Innovation Fund Project grant number chx2024026.

Data Availability Statement

Both SSDD and HRSID datasets used in this study are publicly available.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sun, S.; Wang, J. Ship Detection in SAR Images Based on Steady CFAR Detector and Knowledge-Oriented GBDT Classifier. Electronics 2024, 13, 2692. [Google Scholar] [CrossRef]
Hou, Q.; Wang, Z.; Tan, F.; Zhao, Y.; Zheng, H.; Zhang, W. RISTDnet: Robust Infrared Small Target Detection Network. IEEE Geosci. Remote Sens. Lett. 2022, 19, 7000805. [Google Scholar] [CrossRef]
Tian, T.; Zhou, F.; Li, Y.; Sun, B.; Fan, W.; Gong, C.; Yang, S. Performance Evaluation of Deception Against Synthetic Aperture Radar Based on Multifeature Fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 103–115. [Google Scholar] [CrossRef]
Zheng, H.; Xue, X.; Yue, R.; Liu, C.; Liu, Z. SAR Image Ship Target Detection Based on Receptive Field Enhancement Module and Cross-Layer Feature Fusion. Electronics 2024, 13, 167. [Google Scholar] [CrossRef]
Zhou, X.; Li, T. Ship Detection in PolSAR Images Based on a Modified Polarimetric Notch Filter. Electronics 2023, 12, 2683. [Google Scholar] [CrossRef]
Khan, A.; Khan, M.; Gueaieb, W.; El Saddik, A.; De Masi, G.; Karray, F. CamoFocus: Enhancing Camouflage Object Detection With Split-Feature Focal Modulation and Context Refinement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024. [Google Scholar]
Ren, X.; Bai, Y.; Liu, G.; Zhang, P. YOLO-Lite: An Efficient Lightweight Network for SAR Ship Detection. Remote Sens. 2023, 15, 3771. [Google Scholar] [CrossRef]
Chen, Z.; Liu, C.; Filaretov, V.F.; Yukhimets, D.A. Multi-Scale Ship Detection Algorithm Based on YOLOv7 for Complex Scene SAR Images. Remote Sens. 2023, 15, 2071. [Google Scholar] [CrossRef]
Gong, Y.; Zhang, Z.; Wen, J.; Lan, G.; Xiao, S. Small Ship Detection of SAR Images Based on Optimized Feature Pyramid and Sample Augmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 7385–7392. [Google Scholar] [CrossRef]
Zhang, Z.T.; Zhang, X.; Shao, Z. Deform-FPN: A Novel FPN with Deformable Convolution for Multi-Scale SAR Ship Detection. In Proceedings of the IGARSS 2023–2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; pp. 5273–5276. [Google Scholar] [CrossRef]
Khan, U.; Khan, M.; Elsaddik, A.; Gueaieb, W. Ddnet: Diabetic Retinopathy Detection System Using Skip Connection-Based Upgraded Feature Block. In Proceedings of the 2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA), Jeju, Republic of Korea, 14–16 June 2023. [Google Scholar]
Sun, Y.; Su, L.; Yuan, S.; Meng, H. DANet: Dual-Branch Activation Network for Small Object Instance Segmentation of Ship Images. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 6708–6720. [Google Scholar] [CrossRef]
Zha, C.; Min, W.; Han, Q.; Xiong, X.; Wang, Q.; Xiang, H. SAR Ship Detection Based on Salience Region Extraction and Multi-Branch Attention. Int. J. Appl. Earth Obs. Geoinf. 2023, 123, 103489. [Google Scholar] [CrossRef]
Dong, C.; Duoqian, M. Control Distance IoU and Control Distance IoU Loss for Better Bounding Box Regression. Pattern Recognit. 2023, 137, 109256. [Google Scholar] [CrossRef]
Cai, D.; Zhang, Z.; Zhang, Z. Corner-Point and Foreground-Area IoU Loss: Better Localization of Small Objects in Bounding Box Regression. Sensors 2023, 23, 4961. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Yao, X.; Wang, X.; Hong, D.; Cheng, G.; Han, J. Robust Few-Shot Aerial Image Object Detection via Unbiased Proposals Filtration. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5617011. [Google Scholar] [CrossRef]
Zhang, S.; Li, C.; Jia, Z.; Liu, L.; Zhang, Z.; Wang, L. Diag-IoU Loss for Object Detection. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7671–7683. [Google Scholar] [CrossRef]
Li, Z.; Hou, B.; Wu, Z.; Ren, B.; Yang, C. FCOSR: A Simple Anchor-Free Rotated Detector for Aerial Object Detection. Remote Sens. 2023, 15, 5499. [Google Scholar] [CrossRef]
Liang, Y.; Feng, J.; Zhang, X.; Zhang, J.; Jiao, L. MidNet: An Anchor-and-Angle-Free Detector for Oriented Ship Detection in Aerial Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5612113. [Google Scholar] [CrossRef]
Zhu, J.; Wang, F.; You, H. Unsupervised SAR Image Change Detection Based on Structural Consistency and CFAR Threshold Estimation. Remote Sens. 2023, 15, 1422. [Google Scholar] [CrossRef]
Liu, M.; Zhou, G.; Ma, L.; Li, L.; Mei, Q. SIFNet: A Self-Attention Interaction Fusion Network for Multisource Satellite Imagery Template Matching. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103247. [Google Scholar] [CrossRef]
Chen, J.; Xie, H.; Zhang, L.; Hu, J.; Jiang, H.; Wang, G. SAR and Optical Image Registration Based on Deep Learning with Co-Attention Matching Module. Remote Sens. 2023, 15, 3879. [Google Scholar] [CrossRef]
Zeng, T.; Zhang, T.; Shao, Z.; Xu, X.; Zhang, W.; Shi, J.; Jun, W.; Zhang, X. CFAR-DP-FW: A CFAR-Guided Dual-Polarization Fusion Framework for Large-Scene SAR Ship Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 7242–7259. [Google Scholar] [CrossRef]
Li, Y.; Wang, Z.; Chen, H.; Li, Y. A Density Clustering-Based CFAR Algorithm for Ship Detection in SAR Images. IEEE Geosci. Remote Sens. Lett. 2024, 21, 4009505. [Google Scholar] [CrossRef]
Li, J.; Liu, Y.; Wang, X.; Jiang, Z.; Li, Y. A Robust CFAR Algorithm Based on Superpixel Merging Operation for SAR Ship Detection. In Proceedings of the ACM Conference, New York, NY, USA, 13–17 May 2024. [Google Scholar] [CrossRef]
Yasir, M.; Jianhua, W.; Mingming, X.; Hui, S.; Zhe, Z.; Shanwei, L.; Colak, A.T.I.; Hossain, M.S. Ship Detection Based on Deep Learning Using SAR Imagery: A Systematic Literature Review. Soft Comput. 2023, 27, 63–84. [Google Scholar] [CrossRef]
Li, J.; Chen, J.; Cheng, P.; Yu, Z.; Yu, L.; Chi, C. A Survey on Deep-Learning-Based Real-Time SAR Ship Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 3218–3247. [Google Scholar] [CrossRef]
Er, M.J.; Zhang, Y.; Chen, J.; Gao, W. Ship Detection with Deep Learning: A Survey. Artif. Intell. Rev. 2023, 56, 11825–11865. [Google Scholar] [CrossRef]
Xu, X.; Zhang, X.; Zeng, T.; Shi, J.; Shao, Z.; Zhang, T. Group-Wise Feature Fusion R-CNN for Dual-Polarization SAR Ship Detection. In Proceedings of the 2023 IEEE Radar Conference (RadarConf23), San Antonio, TX, USA, 1–5 May 2023; pp. 1–5. [Google Scholar] [CrossRef]
Jiang, M.; Gu, L.; Li, X.; Gao, F.; Jiang, T. Ship Contour Extraction From SAR Images Based on Faster R-CNN and Chan–Vese Model. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5203414. [Google Scholar] [CrossRef]
Yang, J.-R.; Hao, L.-Y.; Liu, Y.; Zhang, Y. SLT-Net: Enhanced Mask R-CNN Network for Ship Long-Tailed Detection. In Proceedings of the 2023 IEEE 2nd Industrial Electronics Society Annual On-Line Conference (ONCON), Virtual, 8–10 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
Wang, H.; Xiao, N. Underwater Object Detection Method Based on Improved Faster R-CNN. Appl. Sci. 2023, 13, 2746. [Google Scholar] [CrossRef]
Wen, G.; Cao, P.; Wang, H.; Chen, H.; Liu, X.; Xu, J.; Zaiane, O. MS-SSD: Multi-Scale Single Shot Detector for Ship Detection in Remote Sensing Images. Appl. Intell. 2023, 53, 1586–1604. [Google Scholar] [CrossRef]
Yang, Y.; Chen, P.; Ding, K.; Chen, Z.; Hu, K. Object Detection of Inland Waterway Ships Based on Improved SSD Model. Ships Offshore Struct. 2022, 18, 1192–1200. [Google Scholar] [CrossRef]
Dhorajiya, A.; Rakhi, A.M.; Saranya, P. Ship Detection from Satellite Imagery Using RetinaNet with Instance Segmentation. In Proceedings of the International Conference on Recent Trends in Computing; Mahapatra, R.P., Peddoju, S.K., Roy, S., Parwekar, P., Eds.; Lecture Notes in Networks and Systems. Springer: Singapore, 2023; Volume 600. [Google Scholar] [CrossRef]
Cheng, J.; Wang, R.; Lin, A.; Jiang, D.; Wang, Y. A Feature Enhanced RetinaNet-Based Method for Instance-Level Ship Recognition. Eng. Appl. Artif. Intell. 2023, 126 Pt D, 107133. [Google Scholar] [CrossRef]
Jian, P.; Guo, F.; Pan, C.; Wang, Y.; Yang, Y.; Li, Y. Interpretable Geometry Problem Solving Using Improved RetinaNet and Graph Convolutional Network. Electronics 2023, 12, 4578. [Google Scholar] [CrossRef]
Li, J.; Xu, C.; Su, H.; Gao, L.; Wang, T. Deep Learning for SAR Ship Detection: Past, Present and Future. Remote Sens. 2022, 14, 2712. [Google Scholar] [CrossRef]
Wang, B.; Li, Y.-Y.; Xu, W.; Wang, H.; Hu, L. Vehicle–Pedestrian Detection Method Based on Improved YOLOv8. Electronics 2024, 13, 2149. [Google Scholar] [CrossRef]
Liang, X.; Zhang, J.; Zhuo, L.; Li, Y.; Tian, Q. Small Object Detection in Unmanned Aerial Vehicle Images Using Feature Fusion and Scaling-Based Single Shot Detector With Spatial Context Analysis. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 1758–1770. [Google Scholar] [CrossRef]
Chen, Z.; Ji, H.; Zhang, Y.; Zhu, Z.; Li, Y. High-Resolution Feature Pyramid Network for Small Object Detection on Drone View. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 475–489. [Google Scholar] [CrossRef]
Zhang, H.; Wu, Y. CSEF-Net: Cross-Scale SAR Ship Detection Network Based on Efficient Receptive Field and Enhanced Hierarchical Fusion. Remote Sens. 2024, 16, 622. [Google Scholar] [CrossRef]
Cheng, S.; Zhu, Y.; Wu, S. Deep Learning Based Efficient Ship Detection from Drone-Captured Images for Maritime Surveillance. Ocean Eng. 2023, 285 Pt 2, 115440. [Google Scholar] [CrossRef]
Wang, C.; Liu, J.; Zhang, L.; Chen, M.; Li, Y. SAR Ship Target Recognition via Multiscale Feature Attention and Adaptive-Weighed Classifier. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4003905. [Google Scholar] [CrossRef]
Lyu, Z.; Wang, C.; Sun, X.; Zhou, Y.; Ni, X.; Yu, P. Real-Time Ship Detection System for Wave Glider Based on YOLOv5s-Lite-CBAM Model. Appl. Ocean Res. 2024, 144, 103833. [Google Scholar] [CrossRef]
Wang, L.; Zhan, Y.; Lan, L.; Lin, X.; Tao, D.; Gao, X. DeIoU: Towards Distinguishable Box Prediction in Densely Packed Object Detection. IEEE Trans. Circuits Syst. Video Technol. 2024. [Google Scholar] [CrossRef]
Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.; Huang, W. TOOD: Task-Aligned One-Stage Object Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 3490–3499. [Google Scholar] [CrossRef]
Xu, C.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.-S. Detecting Tiny Objects in Aerial Images: A Normalized Wasserstein Distance and a New Benchmark. ISPRS J. Photogramm. Remote Sens. 2022, 190, 79–93. [Google Scholar] [CrossRef]
Xu, C.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.S. RFLA: Gaussian Receptive Field Based Label Assignment for Tiny Object Detection. In Computer Vision—ECCV 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2022; Volume 13669, pp. 526–543. [Google Scholar] [CrossRef]
Li, F.; Sun, T.; Dong, P.; Wang, Q.; Li, Y.; Sun, C. MSF-CSPNet: A Specially Designed Backbone Network for Faster R-CNN. IEEE Access 2024, 12, 52390–52399. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Trans. Cybern. 2022, 52, 8574–8586. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Vancouver, BC, Canada, 2020; Volume 33, pp. 21002–21012. [Google Scholar]
Mao, A.; Mohri, M.; Zhong, Y. Cross-Entropy Loss Functions: Theoretical Analysis and Applications. In Proceedings of the 40th International Conference on Machine Learning; Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J., Eds.; PMLR: Honolulu, HI, USA, 2023; Volume 202, pp. 23803–23828. [Google Scholar]
Li, J.; Qu, C.; Shao, J. Ship Detection in SAR Images Based on an Improved Faster R-CNN. In Proceedings of the 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), Beijing, China, 13–14 November 2017; pp. 1–6. [Google Scholar] [CrossRef]
Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A High-Resolution SAR Images Dataset for Ship Detection and Instance Segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
Wang, Y.; Wang, C.; Zhang, H.; Dong, Y.; Wei, S. A SAR Dataset of Ship Detection for Deep Learning under Complex Backgrounds. Remote Sens. 2019, 11, 765. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Kim, J.-H.; Kim, N.; Park, Y.W.; Won, C.S. Object Detection and Classification Based on YOLO-V5 with Improved Maritime Dataset. J. Mar. Sci. Eng. 2022, 10, 377. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Chen, C.; Zeng, W.; Zhang, X. HFPNet: Super Feature Aggregation Pyramid Network for Maritime Remote Sensing Small-Object Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 5973–5989. [Google Scholar] [CrossRef]
Yin, X.; Lan, S.; Huang, W.; Ma, Y.; Wang, W.; Yang, H.; Zheng, Y. DLAHSD: Dynamic Label Adopted in Auxiliary Head for SAR Detection. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 8–11 October 2023; pp. 3434–3438. [Google Scholar] [CrossRef]
Yu, H.; Yang, S.; Zhou, S.; Sun, Y. VS-LSDet: A Multiscale Ship Detector for Spaceborne SAR Images Based on Visual Saliency and Lightweight CNN. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 1137–1154. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Yu, C.; Shin, Y. SAR Ship Detection Based on Improved YOLOv5 and BiFPN. ICT Express 2024, 10, 28–33. [Google Scholar] [CrossRef]
Liu, Y.; Ma, Y.; Chen, F.; Shang, E.; Yao, W.; Zhang, S.; Yang, J. YOLOv7oSAR: A Lightweight High-Precision Ship Detection Model for SAR Images Based on the YOLOv7 Algorithm. Remote Sens. 2024, 16, 913. [Google Scholar] [CrossRef]
Zhao, H.; Zhang, H.; Zhao, Y. YOLOv7-Sea: Object Detection of Maritime UAV Images Based on Improved YOLOv7. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 233–238. [Google Scholar]

Figure 1. HCA-RFLA overall architecture diagram. (The yellow box represents the Neck component of HCA-RFLA, the red box indicates the HCA module propose in this paper, the blue box highlights the RFLA module introduced here, and the green box denotes the detection head).

Figure 2. HCA Module.

Figure 3. Gaussian transformation of GT and predicted value for small targets.

Figure 4. HLA Framework. (The yellow box illustrates the initial allocation phase of the HLA strategy for positive ship target samples, while the blue box represents the second allocation phase for these samples).

Figure 5. Precise Recall (PR) curves for different methods.

Figure 6. Comparison of experimental results of different algorithms.The green box indicates GT, the red box indicates the predicted value, the yellow box indicates the misdetected ship, and the blue box indicates the missed ship. (a) GT; (b) Faster R-CNN; (c) YOLOv5; (d) YOLOv7; (e) YOLOv8.

Figure 7. Comparison of experimental results of different algorithms.The green box indicates GT, the red box indicates the predicted value, the yellow box indicates the misdetected ship, and the blue box indicates the missed ship. (a) HFPNET; (b) DLAHSD; (c) VS-LSDet; (d) HCA-RFLA.

Figure 8. Comparison of ablation experiment indicators.

Figure 9. Comparison of heat maps for different scenarios. (a) GT; (b) YOLOv8; (c) YOLOv8 with HCA; (d) YOLOv8 with RFLA; (e) HCA-RFLA.

Figure 10. Results of ablation experiments in different scenarios.The green box indicates GT, the red box indicates the predicted value, the yellow box indicates the misdetected ship, and the blue box indicates the missed ship. (a) GT; (b) YOLOv8; (c) YOLOv8 with HCA; (d) YOLOv8 with RFLA; (e) HCA-RFLA.

Figure 11. mAP vs. P for different values of

μ

.

Figure 11. mAP vs. P for different values of

μ

.

Figure 12. Effect of different learning rates on Loss.

Table 1. Comparison of Different Methods on SSDD.

Model	P/%	R/%	mAP50/%	FPS
Faster R-CNN	89.3	86.4	90.2	17.9
YOLOv5	92.6	90.6	91.2	19.5
YOLOv7	91.1	85.0	91.0	16.8
YOLOv8	92.8	89.5	92.1	28.1
HFPNET	95.2	96.1	92.8	36.5
DLAHSD	93.6	92.8	94.3	40.2
VS-LSDet	95.1	91.9	95.7	33.6
HCA-RFLA	97.1	95.8	98.3	42.3

Table 2. Comparison of Different Methods on HRSID.

Model	P/%	R/%	mAP50/%	FPS
Faster R-CNN	88.4	81.4	89.3	17.3
YOLOv5	92.1	90.2	92.4	18.4
YOLOv7	85.9	62.1	72.6	11.6
YOLOv8	91.8	84.3	92.8	23.5
HFPNET	93.8	91.2	94.7	38.6
DLAHSD	91.4	90.1	92.4	37.9
VS-LSDet	92.6	89.4	93.1	32.1
HCA-RFLA	96.2	93.8	97.2	39.0

Table 3. Comparison of Different Methods on SSD.

Model	P/%	R/%	mAP50/%	FPS
Faster R-CNN	81.5	82.4	85.3	19.1
YOLOv5	83.8	84.2	91.4	20.2
YOLOv7	79.7	69.3	82.7	19.6
YOLOv8	84.1	89.5	91.7	23.5
HFPNET	92.8	86.4	92.7	30.7
DLAHSD	91.9	88.6	92.4	32.6
VS-LSDet	93.4	90.2	93.3	28.8
HCA-RFLA	93.9	95.4	95.3	37.9

Table 4. Results of Ablation Experiments.

Model	HCA	RFLA	P/%	R/%	mAP50/%
YOLOv8	×	×	92.8	89.5	92.1
YOLOv8+HCA	✔	×	94.5	90.6	95.9
YOLOv8+RFLA	×	✔	93.2	92.1	94.2
HCA-RFLA	✔	✔	97.1	95.8	98.9

Table 5. Attention Module Comparison Results.

Model	P/%	R/%	mAP50/%	mAP50–95/%
ECA	92.1	83.3	91.6	66.6
Coordinate attention	92.9	82.1	91.7	67.4
Bi-former attention	93.3	85.4	93.8	67.2
SimAM	92.5	84.1	93.9	66.9
HCA	94.5	87.1	95.9	70.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xue, T.; Zhang, J.; Lv, W.; Xi, L.; Li, X. HCA-RFLA: A SAR Remote Sensing Ship Detection Based on Hierarchical Collaborative Attention Method and Gaussian Receptive Field-Driven Label Assignment Strategy. Electronics 2024, 13, 4470. https://doi.org/10.3390/electronics13224470

AMA Style

Xue T, Zhang J, Lv W, Xi L, Li X. HCA-RFLA: A SAR Remote Sensing Ship Detection Based on Hierarchical Collaborative Attention Method and Gaussian Receptive Field-Driven Label Assignment Strategy. Electronics. 2024; 13(22):4470. https://doi.org/10.3390/electronics13224470

Chicago/Turabian Style

Xue, Tao, Jiayi Zhang, Wen Lv, Long Xi, and Xiang Li. 2024. "HCA-RFLA: A SAR Remote Sensing Ship Detection Based on Hierarchical Collaborative Attention Method and Gaussian Receptive Field-Driven Label Assignment Strategy" Electronics 13, no. 22: 4470. https://doi.org/10.3390/electronics13224470

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HCA-RFLA: A SAR Remote Sensing Ship Detection Based on Hierarchical Collaborative Attention Method and Gaussian Receptive Field-Driven Label Assignment Strategy

Abstract

1. Introduction

2. Related Work

2.1. Traditional SAR Ship Detection Methods

2.2. Deep-Learning-Based SAR Ship Detection Methods

2.3. Label Assignment Strategy

3. Algorithmic Improvements

3.1. Integrated Attention Mechanism Module

3.2. Gaussian Receptive Field Based Label Assignment Strategy

3.3. Loss Function

4. Experiments and Analysis

4.1. Experimental Environment and Data Preparation

4.2. Evaluation Criterion

4.3. Experimental Results and Analysis

Comparative Experiments

4.4. Ablation Experiments

4.4.1. Module Ablation Experiment

4.4.2. Attention Module Ablation Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI