Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
MsFNet: Multi-Scale Fusion Network Based on Dynamic Spectral Features for Multi-Temporal Hyperspectral Image Change Detection
Previous Article in Journal
A Simple Path to the Small Perturbation Method for Scattering from Slightly Rough Dielectric Surfaces
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Multi-Scale Mask Convolution-Based Blind-Spot Network for Hyperspectral Anomaly Detection

1
Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China
2
Department of Geography and Spatial Information Techniques, Ningbo University, Ningbo 315211, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(16), 3036; https://doi.org/10.3390/rs16163036
Submission received: 24 July 2024 / Revised: 11 August 2024 / Accepted: 16 August 2024 / Published: 18 August 2024

Abstract

:
Existing methods of hyperspectral anomaly detection still face several challenges: (1) Due to the limitations of self-supervision, avoiding the identity mapping of anomalies remains difficult; (2) the ineffective interaction between spatial and spectral features leads to the insufficient utilization of spatial information; and (3) current methods are not adaptable to the detection of multi-scale anomaly targets. To address the aforementioned challenges, we proposed a blind-spot network based on multi-scale blind-spot convolution for HAD. The multi-scale mask convolution module is employed to adapt to diverse scales of anomaly targets, while the dynamic fusion module is introduced to integrate the advantages of mask convolutions at different scales. The proposed approach includes a spatial–spectral joint module and a background feature attention mechanism to enhance the interaction between spatial–spectral features, with a specific emphasis on highlighting the significance of background features within the network. Furthermore, we propose a preprocessing technique that combines pixel shuffle down-sampling (PD) with spatial spectral joint screening. This approach addresses anomalous identity mapping and enables finite-scale mask convolution for better detection of targets at various scales. The proposed approach was assessed on four real hyperspectral datasets comprising anomaly targets of different scales. The experimental results demonstrate the effectiveness and superior performance of the proposed methodology compared with nine state-of-the-art methods.

1. Introduction

Hyperspectral images (HSI), which comprise numerous contiguous narrow bands, offer an abundance of spectral information. The incorporation of this abundant spectral data enhances the accuracy and precision of target recognition and discrimination. As one of the most popular tasks, the objective of hyperspectral anomaly detection (HAD) is to identify a target based on its distinctive spectral and spatial characteristics.
Without any prior knowledge [1,2,3]. Unlike supervised target detection, as in infrared small target detection [4,5,6], complex manual annotation of samples is required, and existing algorithms are often built on large-scale datasets [7,8,9]. In contrast, hyperspectral anomaly detection algorithms, due to the smaller size of their datasets, place a greater emphasis on lightweight network design. This method does not need labeled data, making it more efficient. At present, hyperspectral anomaly detection has been widely used in urban vehicle recognition [10], military target detection [11], and border surveillance [12], while HAD methods have evolved significantly over the past decade, transitioning from earlier statistical and representation-based approaches to current deep learning-based approaches.
The Reed Xiaoli (RXD) [13] algorithm serves as a crucial starting point in the field of anomaly detection. This algorithm utilizes the mean value and covariance matrix of all pixels to statistically model the background and calculates the Mahalanobis distance to measure each pixel’s degree of anomaly. Building upon this foundation, numerous improved algorithms have been derived; for instance, the local RX (LRXD) [14] method innovatively leverages local background for background modeling. KRXD [15] employs kernel functions to map original data into high-dimensional feature spaces, addressing limitations posed by single distribution hypotheses; Subspace RX (SSRXD) [16] reduces anomaly pollution’s impact on background estimation through projection onto subspaces. However, background distributions are highly complex in reality, severely limiting background estimation based on statistical methods. Consequently, representation-based methods such as collaborative representation (CR) [17,18,19], sparse representation (SR) [20,21,22,23], and low-rank representation (LRR) [24,25,26] began to be developed. The CR-based method (CRD) primarily emphasizes the cooperative relationship among all dictionary atoms, aiming to determine whether each pixel can be linearly represented by its surrounding pixels. The SR-based method posits that normal background samples can be well represented by a few atoms from an overcomplete background dictionary, while anomaly samples cannot. Both methods consider the reconstruction residuals as the anomaly representation of the test pixels. The LRR-based method decomposes the HIS into a low-rank component and a sparse component, assuming the background is low-rank and anomalies are sparse. Recently, Cheng et al. [27] proposed a novel low-rank representation method based on dual-graph regularization and an adaptive dictionary (DGRAD-LRR) for hyperspectral anomaly detection, achieving promising results. This method further underscores the importance of integrating spatial and spectral features in HAD. Zhang et al. [28] proposed a self-determined progress probability co-representation detector (SP-ProCRD), which effectively minimizes the negative impact of abnormal atomic contamination in the background dictionary. However, these methods often require extensive parameter adjustments that make it difficult to determine them in advance and hinder their practical application.
Furthermore, anomalies can be detected solely using spatial information. For example, the recently proposed attribute and edge-holding filtering (AED) [29] method, along with HAD (STGD) [30], exhibits exceptional performance in HAD by employing local filtering operations. Nevertheless, these methods overlook spectral information’s importance.
In summary, traditional methods primarily face the following issues: (1) insufficient representation capability and inability to adapt to real complex hyperspectral scenes; (2) susceptibility to parameter variations and a large number of parameters that are challenging to determine; and (3) difficulty in effectively integrating spatial information with spectral information.
The utilization of deep learning methods in HAD is prevalent due to their ability to effectively capture distribution features and extract profound learning characteristics, thereby facilitating better adaptation to the intricate background distribution in real-world scenarios [31,32,33]. Currently, deep learning-based approaches primarily employ the reconstruction paradigms of autoencoders (AE) and generative adversarial networks (GANs) for anomaly detection, where the background can be efficiently reconstructed while the anomalies cannot [34,35,36]. Jiang et al. [37] were the first to introduce a GAN network into HAD, proposing the HADGAN network, which can learn multivariate normal distribution features of hyperspectral background in the hidden layer. Xiang et al. [38] introduced a guidance autoencoder (GAED) to incorporate image-based guidance modules into deep networks to suppress anomalous reconstruction. Fan et al. [39] developed a robust Graph AE (RGAE) detector by introducing graph regularization and demonstrated that spatial information in HSI is crucial for anomaly detection. Wang et al. [40] then presented an adaptive HAD method that utilized full convolution to extract spatial–spectral features for the first time, thereby improving the utilization rate of spatial information. Additionally, Wang et al. [41] and Cheng et al. [42] combined low-rank priors with fully convolutional AE to make optimal use of prior information and spatial details and, respectively, proposed a method based on deep low-rank prior (DeepLR) and deep self-representation learning framework (DSLF). To further exploit spatial information, Wang et al. [43] put forward a residual-self-attention-based HAD autoencoder (RSAAE), which employs residual-self-attention to focus on spatial features of HSI. Another branch based on deep learning involves various blind-spot reconstruction anomaly detection networks proposed by Wang et al. [44,45,46], where surrounding pixel features are used to reconstruct blind-spot pixels, thus introducing a new paradigm for HAD known as a blind-spot network with a unique blind-spot-centered acceptance field. Therefore, this implies that the blind-spot network can solely rely on the surrounding pixels to predict the value of the blind spot (effectively utilizing spatial information). When there is a significant deviation from the neighboring pixel, indicating an anomalous central pixel, accurate prediction or reconstruction becomes challenging for the blind-spot network. Consequently, this leads to an increase in reconstruction error and higher anomaly scores. Hence, blind-spot networks are inherently suitable for joint spatial–spectral and hyperspectral anomaly detection tasks.
In summary, deep learning-based methods primarily face the following issues: (1) Avoiding the identity mapping of anomalies is difficult due to the self-supervised reconstruction paradigm, where anomalies supervise anomalies and the background supervises the background. (2) Effective interaction between spatial and spectral information remains difficult. (3) Existing deep learning methods still struggle to adapt to the detection of large and multi-scale anomaly targets. Especially for the blind-spot network, the spatial local similarity of large anomaly targets and the difference of multi-scale anomaly targets are undoubtedly fatal.
The existing methods have certain limitations, including inadequate utilization of spatial information, subpar detection performance for large-scale and multi-scale targets, and aberrant identity mapping resulting from self-supervision. Therefore, we propose three challenging research questions in this field: (Q1) How to effectively mitigate anomaly identity mapping during the supervision process? (Q2) How to efficiently detect anomaly targets across various scales and adapt to scale variations? (Q3) How to enhance the utilization of spatial information and enhance the joint interaction of spatial–spectral information effectively? Therefore, we propose a multi-scale mask convolution-based blind-spot network (MMBSN) for detecting anomalous objects at multiple scales. Our contributions in this article are as follows:
(1)
For (Q1), the purpose of the proposed preprocessing module is to effectively eliminate the local spatial correlation among multi-scale objects while maximizing preservation of the spatial correlation in the background (thus reducing the anomaly scale). Additionally, we can ensure that the background pixels sampled from the original abnormal area are dominant in the screened samples through a specific screening process. Simultaneously, a strategy of partial sample training and full image testing is adopted to effectively prevent anomaly identity mapping and overfitting.
(2)
For (Q2), we propose a mask convolution module combined with the PD operation to adapt to the detection of anomaly targets at different scales. Additionally, we introduce a dynamic, learnable fusion module to effectively integrate detection results from mask convolutions at different scales.
(3)
For (Q3), the proposed approach aims to enhance the interaction between spatial spectral information and amplify the disparity between background and target by incorporating a spatial spectral joint module and a background feature attention module.
The rest of this article is organized as follows. In Section 2, we present a comprehensive overview of the implementation details for the proposed MMBSN method. In Section 3, extensive experimental results of the proposed method compared with state-of-the-art approaches are conducted to evaluate the performance of MMBSN. Finally, Section 4 draws our conclusions.

2. Proposed Method: MMBSN

2.1. Overview

In this paper, we propose a novel MMBSN method for HAD, as illustrated in Figure 1. Specifically, the proposed method comprises three distinct phases:
(1)
Sample Preparation Stage: The raw HSI initially undergoes a down-sampling process using pixel shuffle down-sampling (PD) to obtain a set of down-sampled HSI samples. Subsequently, spectral and spatial screening modules are employed to select a specific proportion of training samples for network training.
(2)
Training Stage: The selected samples are sequentially fed into the blind-spot network. The blind-spot network comprises a multi-scale mask convolution module, a spatial–spectral joint module, a background feature concern module, and a dynamic learnable fusion module. Ultimately, the reconstructed HSI is obtained through the supervised reconstruction of multiple training samples using L_1 loss.
(3)
Detection Stage: The original HSI is down-sampled by the PD operation to obtain a set of down-sampled HSIs. Each down-sampled HSI is sequentially input into the trained MMBSN model. Finally, the background HSI is reconstructed using the PD inversion operation, and the resulting reconstruction error serves as the result of HAD.
The subsequent sections offer a thorough exposition of these facets.
(1)
Extracting Prior Knowledge with Dual Clustering: The purpose of dual clustering is to obtain coarse labels for supervised network learning and provide the network with a clear learning direction to enhance its performance. Dual clustering (i.e., unsupervised DBSCAN and connected domain analysis clustering) techniques are employed to cluster the HSI from spectral domain to spatial domain, which yields preliminary separation results between background and anomaly regions. Subsequently, prior samples representing background and anomaly regions are obtained through this processing, which effectively purifies the supervision information provided to the deep network by conveying more background-related information as well as anomaly-related information. These anomaly features are then utilized to suppress anomaly generation, while the background features contribute towards reconstructing most of the background.
(2)
Training for Fully Convolutional Auto-Encoder: the prior background and anomaly samples extracted in the first stage are used as training data for fully convolutional auto-encoder model training. During the training phase, the original hyperspectral information is inputted into a fully convolutional deep network using a mask strategy, while an adversarial consistency network is employed to learn the true background distribution and suppress anomaly generation. Finally, by leveraging self-supervision learning as a foundation, the whole deep network is guided to learn by incorporating the triplet loss and adversarial consistency loss. Additionally, spatial and spectral joint attention mechanisms are introduced in both the encoder and decoder stages to enable adaptive learning for spatial and spectral focus.
(3)
Testing with the Original Hyperspectral Imagery: The parameters of the proposed deep network are fixed, and the original hyperspectral imagery is fed into the trained network for reconstructing the expected background for hyperspectral imagery. At this stage, the deep network only consists of an encoder and a decoder. The reconstruction error serves as the final detection result of the proposed hyperspectral anomaly detection method.

2.2. Sample Preparation Stage

Due to the inherent limitations of self-supervision, anomaly samples are reconstructed by anomaly sample supervision, which at training time will inevitably result in a proficient reconstruction of anomalies, i.e., the identity map of anomalies. Although the blind-spot network has partially mitigated this issue, its impact remains substantial. This phase primarily aims to address this problem (as depicted in Figure 1). Through PD down-sampling, we can obtain samples with disrupted local correlation while preserving background spatial correlation. However, due to the characteristics of PD down-sampling, the sampled sample may include anomaly pixels or background pixels within the original anomaly region. Therefore, we propose a combined spatial–spectral screening method to extract purer samples. The screened samples contain more background pixels compared to the original anomaly region, as shown in Figure 2. During the training phase, emphasis is placed on learning how to reconstruct background pixels rather than preserving the identity mapping of anomalies. To effectively address overfitting, we employ partial sample training and complete image testing as a strategy.
(1)
Pixel-shuffle Down-Sampling: The primary objective of the P D operation is to disrupt the spatial correlation among anomalies while preserving the spatial correlation among backgrounds as much as possible, thereby enhancing the distinction between backgrounds and anomalies. Since all HSI obtained after P D operation exhibited remarkably strong correlations, this enabled us to train with only partial samples. Figure 3 illustrates the P D and P D 1 diagram with a step factor of 2. In the visualization, the blue box signifies the sampling box with a step factor of 2, and each number inside represents the index of the pixel. We can intuitively see the basic process of P D and P D 1 , A given HSI X R H × W × B , where H , W , and B are the row number, column number, and spectral dimension (the number of spectral channels) of the HSI, respectively, which is decomposed into four sub-images referred to as P D (·), and the four sub-images are recovered into an HSI referred to as P D 1 . In these sub-images obtained through the P D operation, the scale of the anomaly target is effectively reduced in the original anomaly region. However, due to the inherent characteristics of this process, it is challenging to determine whether this region is sampled as abnormal or background pixels, and which is more dominant.
(2)
Spatial–Spectral Joint Screening: Samples are selected based on their spatial and spectral characteristics. The classical GRX method is utilized to obtain the spectral distribution deviation score, which indicates the degree of deviation from the background distribution. A lower score implies fewer pixels in the sample deviate from the background distribution. We aim to obtain samples with an overall minimum deviation in pixel distribution. The overall bias score on spectral characteristics can be expressed as follows:
s c o r e s p e c t r a l = i H / f × W / f y i μ Γ 1 y i μ
where f is the stride factor for down-sampling, y i ϵ R B × 1 is the ith spectral vector in the down-sampled sample hyperspectral samples Y ϵ R H / f × W / f × B ,   μ ϵ R B × 1 and Γ 1 are the mean and the covariance matrix Y , respectively.
The proposed method utilizes a spatial domain-based screening approach to calculate the spatial structural similarity between test pixels and their neighborhood background, inspired by the local mean filtering algorithm. This can be mathematically expressed as follows:
s c o r e s p a t i a l = i H / f × W / f j 8 w c i w j i T ( w c i w j i )
As shown in Figure 4, the pixel to be measured is selected as the center of an outer window, which is then divided into a central inner window and eight neighborhood background inner windows (with sizes 9 and 3, respectively). Subsequently, the Euclidean distance between the central inner window w c and the eight neighboring background inner windows w 1 w 8 is calculated to quantify the spatial structure similarity between them. Since the similarity between the backgrounds is high, a higher similarity score indicates a higher likelihood that the central window represents the background, while a lower similarity score suggests potential local spatial anomalies within the central inner window. By calculating this measure for all central inner windows, it becomes possible to assess the number of local, spatially similar anomalies present in the sample. Combined with spatial spectrum analysis, a comprehensive screening score for each sample can be obtained:
s c o r e = n o r m s c o r e s p e c t r a l + n o r m s c o r e s p a t i a l
where n o r m · stands for normalization. Finally, according to the comprehensive score, a certain proportion of samples was selected from small to large as the training sample Y i ϵ R H / f × W / f × B , i = 1 , 2 , 3 f 2 × r a t e . where r a t e is the proportion of screened samples.
With the spatial–spectral joint screening, we can ensure that the background pixels sampled from the original anomaly regions are dominant in the filtered samples. In the process of learning supervision, since the background pixels are more dominant in the supervised learning samples, the network is more inclined to reconstruct the background. Even if there are a few abnormal pixels present, they will engage in a supervised competition with the background pixels from other samples, resulting in a larger reconstruction error.

2.3. Training Stage

(1)
Multi-Scale Mask Convolution Module (MMCM): Currently, there are two main methods for constructing blind spots. One approach uses masking to directly obscure input samples. For instance, as described in the literature [47], masking is used to obscure highly mixed areas, allowing the network to focus on learning the mapping relationship between endmembers and relatively pure region abundances. This reduces noise and yields more accurate unmixing results. The other approach employs central mask convolution, where blind spots are constructed using sliding central mask convolutions. The former requires preprocessing steps like clustering to identify candidate blind-spot regions, while the latter can adaptively construct blind spots through convolution. Therefore, we adopt the second approach and design a multi-scale mask convolution module to detect anomaly targets at different scales. Due to the characteristics of the blind-spot network, the center pixel is reconstructed based on surrounding pixel information. Therefore, we utilize small-scale mask convolution to mask the small target in the center and large-scale mask convolution to isolate similar anomalous pixels around the large target. As illustrated in Figure 1, the multi-scale mask convolution module comprises a 1 × 1 × B × 128 convolution and six mask convolutions of varying scales, with inner and outer window sizes being (1,3), (1,5), (1,7), (3,5), (3,7), and (3,9), respectively. These mask convolutions consist of two scales for an inner window and different-sized background acceptance domains for an outer window. Given training sample Y i ϵ R H / f × W / f × B , i = 1 , 2 , 3 f 2 × r a t e , we first extract features using a 1 × 1 × B × 128 convolution, followed by dividing them into six branches and using background features from various receiving fields to reconstruct obscured center pixels. The output channel is turned to 64 after the mask convolution module. Due to the varying center masks of different scale mask convolutions, their detection performance for abnormal objects also varies significantly. Small-scale mask convolutions are more suitable for detecting small targets, while large-scale mask convolutions are better suited for detecting large targets.
(2)
Spatial–Spectral Joint Module (SSJM): To enhance the utilization rate of spatial information and the interaction between spatial information and spectral information, we propose a spatial–spectral joint module (shown in Figure 5) that leverages deep convolution ( D W C o n v ) for extracting features from different frequency bands. Additionally, deep extended convolution ( D W D C o n v ) is employed to capture background features at greater distances, aiding in the reconstruction of the center pixel. On the other branch, D W C o n v is utilized to determine the importance of various band features. These important values are then transformed into weighted enhancement features with significant contributions using a sigmoid activation function. By focusing on the most influential features, redundancy in spectral characteristics can be reduced while improving the utilization of spatial attributes. Finally, point convolution ( P W C o n v ) is applied to enhance the interaction between spatial and spectral features. To prevent focus polarization caused by self-supervision, a feature fusion approach employing a jumping mode is adopted. SSJM can effectively facilitate the interaction of spatial information across different bands. The entire process can be summarized as follows:
F = F + P W C o n v R e L U D W D C o n v R e L U D W C o n v F S i g m o i d ( D W C o n v ( F ) ) )
where F R H / f × W / f × 64 is the feature extracted by the multi-scale mask convolution module, and F R H / f × W / f × 64 is the enhanced feature by the spatial–spectral joint module.
(3)
Background Feature Attention Module (BFAM): The function of the background feature attention mechanism is to solve the problem that the mask convolution of the large background receptive field may introduce adjacent anomaly features. We need to make the network pay more attention to the background features so as to ignore the small number of introduced abnormal features. The fundamental concept involves computing the similarity between a feature vector at a specific position and other feature vectors, summing all these cosine similarities, and subsequently obtaining the confidence level for background features through a sigmoid layer. Since background accounts for most of the HSI, the background feature vector has a large similarity with other background feature vectors, while the anomaly feature is just the opposite. Finally, we obtain the background confidence of each position, which is weighted to the input feature to enhance the expression of the background feature. To prevent information loss due to extreme cases of attention, I also added a skip connection. The specific implementation process is illustrated in Figure 6. As shown in Figure 6, the features extracted by the masked convolution with the same scale but different background receptive fields are first dimensionally reduced using a 1 × 1 × 64 × 32 convolution, which reduces the number of channels to half, thereby decreasing the computational load. Subsequently, next, channel transformations are applied, and matrix multiplication is performed to compute the dot product between the feature vector at each position and those at all other positions. This result is then normalized to obtain cosine similarity. The cosine similarities are summed row-wise to get the total similarity for each position with all other positions, and then mapped to background feature confidence using a sigmoid layer. Finally, the enhanced features are obtained by weighting the input features, and background-enhanced features are fused from the enhanced features of the three background receptive fields. Figure 6 shows the process of background feature enhancement extracted by mask convolution with an inner window of 3. It can be expressed as:
X 3 , j = C o n v 1 F l a t t e n X 3 , j
X 3 , j r = X 3 , j X 3 , j . T
X 3 , j r 1 k = k h × w q h × w X 3 , j r X 3 , j r 2 k , q
X 3 , j r 2 = S i g m o i d R e s h a p e X 3 , j r 1
X 3 , j o u t = R e p e a t X 3 , j r 2 X 3 , j + X 3 , j
X 3 o u t = C o n v 1 C o n c a t X 3 , 7 r 2 , X 3 , 5 r 2 , X 3 , 9 r 2
where X 3 , j and X 3 , j . T are the eigenmatrix and the transpose of the eigenmatrix, respectively, and is the matrix multiplication. X 3 , j r R h × w × b represents the inner product of spectral vectors at the H × W positions with spectral vectors at the other H × W positions, respectively, and h = H / f ,   w = W / f .
Finally, the background enhancement features under three different background receptive fields are fused by concatenation and 1 × 1 convolution. Through background feature attention, we can make the network pay more attention to the background features, thereby widening the distance between the anomaly and the background. Here, only the feature enhancement process of mask convolution with inner window 3 is shown, while the feature enhancement process of mask convolution with inner window 1 is the same.
(4)
Dynamic Learnable Fusion Module (DLFM): The detection performance of mask convolution varies at different scales. Small-scale mask convolution exhibits good detection performance for small anomalous targets, but its performance is degraded when detecting large anomalous targets due to the interference from neighboring anomalous pixels. Conversely, the large-scale mask mold incorporates a well-designed center shielding window and demonstrates excellent detection performance for large anomaly targets. However, excessive center shielding hinders the utilization of background information in the surrounding area, making it challenging to detect smaller anomaly targets. To address multi-scale anomaly target detection requirements, we propose a dynamic learnable fusion module as shown in Figure 7. Specifically, small-scale mask convolution effectively identifies small-scale anomalous targets and reconstructs large-scale ones. On the other hand, large-scale mask convolution efficiently detects large-scale anomalous targets and reconstructs small-scale ones. Therefore, we introduce three dynamic learnable parameters α , β and γ to fuse advantages of mask convolutions at different scales, α represents the weight of the feature X 1 o u t extracted by the mask convolution with a mask window of 1, β represents the weight of the feature X 3 o u t extracted by the mask convolution with a mask window of 3, and γ represents the weight of the difference feature | X 1 o u t X 3 o u t | . The features extracted by the first two mask convolutions are weighted and summed, and then the weighted difference features are subtracted. Finally, the resulting features undergo dynamic weight learning and adaptation to obtain the final fused features, which enhance the background and suppress anomalies. The dynamic fusion process can be expressed as:
X o u t = C o n v 3 R e l u D W C o n v X 1 o u t × α + C o n v 3 R e l u D W C o n v X 3 o u t × β C o n v 3 R e l u D W C o n v X 1 o u t X 3 o u t × γ
where X 1 o u t R H / f × W / f × 64 is the feature extracted by small-scale masked convolution, X 3 o u t R H / f × W / f × 64 is the feature extracted by large-scale masked convolution and X o u t R H / f × W / f × 64 is the output of DLFM.
The reconstructed background sample Y ~ i ϵ R H / f × W / f × B , i = 1 , 2 , 3 f 2 × r a t e is finally obtained through the convolution of 1 × 1 × 64 × 128 and 1 × 1 × 128 × B , which can be expressed as:
Y ~ i = C o n v 1 C o n v 1 X o u t
We opted for L 1 losses as reconstruction losses. During the reconstruction process, our supervised samples undergo a combined spatial–spectral screening method, resulting in a higher proportion of background pixels being sampled within the original anomaly region compared to anomalous pixels. Consequently, during the supervised anomalous pixel reconstruction process, both background and anomalous pixels coexist simultaneously. However, due to the inherent advantage of background pixels, network learning tends to prioritize their reconstruction while inhibiting the reconstruction of anomaly pixels. The loss function can be expressed as:
L R = Y ~ i Y i 1 ,   i = 1 , 2 , 3 f 2

2.4. Detection Stage

In the detection phase, all samples are tested. Firstly, the PD operation is conducted with a specific stride size f to acquire the f 2 down-sampling test sample in a similar manner. Subsequently, these samples are individually fed into the trained network one by one to generate f 2 reconstructed samples. Finally, by applying the inverse PD operation of phase synchronization length, the anticipated reconstructed background HSI is obtained as follows:
Y 1 , Y 2 , Y 3 Y f 2 = P D X Y ~ i = M M B S N Y i , i = 1 , 2 , 3 f 2 X ~ = P D 1 Y ~ 1 , Y ~ 2 , Y ~ 3 Y ~ f 2
The combination of the aforementioned self-supervision method and blind-spot network mechanism effectively suppresses anomaly identity mapping, thereby rendering anomaly reconstruction challenging and resulting in significant reconstruction errors. Through training, a background reconstruction network is acquired, enabling the unreal anomaly reconstruction of the desired background HSI. In comparison to the background, anomalies inevitably exhibit larger reconstruction errors. Finally, based on these reconstruction errors, the final detection result is obtained as follows:
D i , j = x i , j x ~ i , j 2
where x i , j R B × 1 and x ~ i , j R B × 1 represent the pixels of the original HSI and reconstructed HSI X ~ R H × W × B , respectively. D i , j denotes the anomaly degree score of the pixels at this position i , j which ultimately forms the final detection map D = D i , j i = 1 , j = 1 i = H , j = W R H × W . Algorithm 1 provides a detailed description of the main steps involved in our proposed method.
Algorithm 1 Proposed MMBSN for HAD
Input: the original HSI X R H × W × B
Parameters: epochs, learning rate l r , stride factor f , Screening ratio r a t e
Output: final detection result: D = D i , j i = 1 , j = 1 i = H , j = W R H × W
Stage 1: Sample Preparation Stage
  Obtain f 2 candidate samples Y 1 , Y 2 , Y 3 Y f 2 ϵ R H / f × W / f × B by PD operation
  Obtain f 2 × r a t e  training samples Y 1 , Y 2 , Y 3 Y f 2 × r a t e ϵ R H / f × W / f × B by spatial–spectral joint screening
Stage 2: Training Stage
   Initialize the network with random weights
   for each epoch do:
    MMBSN update MMCM, SSJM, BFAM and DLFM by  L R = Y ~ i Y i 1 ,   i = 1 , 2 , 3 f 2
    back-propagate  L R
    to change MMCM, SSJM, BFAM and DLFM
  end
Stage 3: Detection Stage
  Obtain the test samples  Y 1 , Y 2 , Y 3 Y f 2 ϵ R H / f × W / f × B  by PD operation
  Obtain the reconstructed samples  Y ~ 1 , Y ~ 2 , Y ~ 3 Y ~ f 2 ϵ R H / f × W / f × B  by feeding test samples into the well-trained MMBSN
  Obtain the reconstructed HSI  X ~ R H × W × B by PD inverse operation
  Calculate the degree of anomaly  D i , j  for each pixel in X by (Equation (15))

3. Experiments and Analysis

In this section, we validate the effectiveness and superiority of MMBSN in multi-scale anomaly detection through a series of qualitative and quantitative experiments. These include comparative analysis, parameter analysis, ablation study, and mask convolution scale characteristic experiment conducted on four hyperspectral datasets with anomalies of different scales.

3.1. Datasets

A total of four HSI real scenarios were utilized in this experiment. The initial experimental dataset is the Pavia dataset, collected by the reflective optical systems imaging spectrometer (ROSIS-03) hyperspectral remote sensor, situated in the central region of Pavia in northern Italy. The anomalous targets primarily consist of various vehicles. The second experimental dataset is the Gainesville dataset obtained by the AVIRIS hyper-spectral remote sensor in the urban area of Gainesville, located in north-central Florida, USA. Additionally, experiments employed the San Diego-1 dataset from the San Diego Airport area in California, USA, acquired through AVIRIS hyperspectral remote sensor technology. Finally, data was gathered using an AVIRIS hyperspectral remote sensor at Gulfport in southern United States. In particular, the Gulfport dataset contains three anomalous aircraft targets of apparently different scales. The detailed information regarding the aforementioned four datasets can be found in Table 1. Furthermore, Figure 8 showcases false-color images and ground-truth plots of anomaly targets for these datasets.

3.2. Evaluation Metrics

We quantitatively investigate the detection performance of the proposed method and the comparative approaches using three widely adopted evaluation metrics for anomaly detection in hyperspectral remote sensing imagery: background–anomaly separation analysis (boxplot) [48], receiver operating characteristic (ROC) [49], and area under the ROC curve (AUC) [50]. If the ROC curve of the anomaly detector exhibits a higher true positive (TPR, P d ) at a lower false alarm rate (FAR, P f ) which indicates that the ROC curve is closer to the top left corner, it suggests superior detection performance of the anomaly detector. However, if the ROC curves of two detectors demonstrate interleaved TPRs under different FARs, it becomes rather difficult to judge their performance solely based on visual results from the ROC curves. In such cases, an alternative quantitative evaluation criterion named AUC for anomaly detectors should be employed. If the AUC score is closer to 1, the detection performance is better. The boxplot can be utilized to assess the degree of separation between background for different anomaly detectors. An anomaly detector with a higher degree of separation between background and anomalies exhibits superior detection performance.

3.3. Comparison Algorithms and Evaluation Metrics

(1)
Comparison Algorithms: The comparison algorithms employed in the experiment encompass four conventional methods (GRX [13], FRFE [48], CRD [17], and LRASR [51]) as well as five deep learning-based approaches (GAED [38], RGAE [39], Auto-AD [40], PDBSNet [44], and BockNet [45]). GRX and FRFE are typical statistical-based methods, with FRFE being an improved version of GRX that extends traditional statistical modeling to the frequency domain. CRD and LRASR are classical representation-based methods that encompass collaborative representation, sparse representation, and low-rank representation. GAED and RGAE are early pixel-wise reconstruction deep learning methods. Auto-AD is the earliest fully convolutional hyperspectral anomaly detection network, while PDBSNet and BockNet are the latest hyperspectral anomaly detection networks using blind-spot networks. Our selection of comparison methods covers almost all existing approaches comprehensively and includes mainstream methods from different periods, which effectively highlights the validity of our comparisons. It is noteworthy that both PDBSNet and BockNet are devised based on the blind-spot network architecture, which aligns with our proposed MMBSN anomaly detection network model. Henceforth, meticulous attention should be paid to discerning disparities in the detection outcomes of these three techniques to accentuate the advantages of MMBSN. All algorithms were executed on a computer equipped with an Intel Core i7-12700H CPU, 16 GB RAM, along with GeForce RTX 3090, utilizing MATLAB 2018a and Python 3.8.18 alongside Pytorch 1.7.1 and CUDA 11.0.
(2)
Evaluation Metrics: In our experiments, the widely adopted HAD criteria were utilized to assess the detection performance of various methods. Specifically, we employed statistical separability map [48] and receiver operating characteristic (ROC) [51] analysis, including the area under the ROC curve (AUC) [52], as evaluation metrics. The ROC curve effectively visualizes the relationship between detection rate and false alarm rate for different methods, while the AUC quantitatively measures their detection accuracy. We anticipate a high detection rate and a low false alarm rate from the detector. Therefore, closer proximity of the ROC curve to the upper left corner and an AUC value approaching 1 indicate superior performance of the respective algorithm. Furthermore, statistical analysis can reveal differences in background suppression and anomaly separation capabilities among different methods, where the blue box represents background and the red box represents anomalies. each with a statistical range of 10–90%. Naturally, if the algorithm has a predominant ability to suppress backgrounds and separate anomalies, the blue box will be at the bottom and narrow in width, with a large gap from the red box.

3.4. Detection Performance for Different Methods

The heat map in Figure 9, Figure 10, Figure 11 and Figure 12 illustrates the detection results of the four datasets, with the first image serving as a reference. The proximity to yellow indicates a higher degree of anomaly, while closeness to blue suggests a better inhibition effect. Figure 13 and Figure 14 present the ROC curve and separable box plot for each method, respectively, while Table 2 displays the AUC value for the quantitative detection results of each method. It is important to note that we have highlighted the best results in red and the second-best results in blue. From the visualizations provided in Figure 9, Figure 10, Figure 11 and Figure 12, it can be observed that our approach achieves a satisfactory balance between anomaly detection and background suppression at different scales. In the Pavia dataset, it is evident that MMBSN effectively detected all anomalies, whereas other methods exhibited significant issues with both missed detections and false alarms. The four traditional methods failed to effectively suppress the background, and other deep learning methods displayed a high false alarm rate. Existing methods still struggle to effectively capture the spatial structure of anomalous targets. Visual inspection of the results from four datasets shows that the anomalies detected by comparison methods are often incomplete, lacking certain spatial structures. This issue is particularly evident in the Gulfport dataset, which contains multi-scale anomalies with specific shapes. Current methods find it challenging to adapt to the scale differences of multi-scale targets. However, MMBSN, with its mask convolution module combined with PD operations and a preprocessing module that prevents identity mapping, effectively adapts to the detection of multi-scale targets. The detected anomalies are notably prominent, and their spatial structures are well preserved.
To comprehensively evaluate the performance of different algorithms, we conduct qualitative and quantitative evaluations from three perspectives. Firstly, Figure 13 illustrates the ROC curves of various methods on the four datasets. It can be observed that in most cases, the ROC curve of MMBSN surpasses others and is closest to the top left corner, indicating a remarkably high detection rate and an exceptionally low false alarm rate compared to other methods. Moreover, for further quantitative evaluation, Table 2 presents the AUC values (i.e., area under the ROC curve) of different methods’ detection results. The AUC score of MMBSN undoubtedly outperforms those of ten comparison algorithms by a large margin. Particularly noteworthy is its outstanding AUC score of 0.9983 on the Gulfport dataset.
Additionally, Figure 14 displays a separability boxplot demonstrating anomaly separation and background suppression capabilities among different methods. The anomaly box for the MMBSN method consistently shows the greatest distance from the background box, highlighting our method’s superior anomaly separation capability. However, the background box is not the narrowest among all methods, indicating that MMBSN does not excel in background suppression. This can be attributed to our training strategy, which involves training with only a portion of the samples. Consequently, the network has not seen some of the test samples and cannot make precise predictions. Nonetheless, this approach helps prevent the identity mapping of anomalies to some extent, showing that our method sacrifices some background suppression capability to enhance anomaly separation. The anomalies detected by MMBSN are always the most prominent, and the background box is relatively narrow (though not the narrowest). It can be said that MMBSN achieves a near-perfect balance between background suppression and anomaly separation.
In summary, our proposed MMBSN has shown outstanding results in visual representation as well as in qualitative and quantitative assessments. Notably, the average AUC score even surpassed that of the second-place method by 0.047 points, reaching an impressive value of 0.9963. These results unequivocally highlight the robust competitiveness of MMBSN among contemporary approaches.

3.5. Validation

The validation section aims to better verify the effectiveness of the proposed modules by combining them with experimental results. I will now provide a comprehensive validation analysis for our method based on the detection results from each dataset.
For the Pavia dataset, GRX suffered from severe missed detections because not all anomalies exhibit global statistical deviations. Some may only have local deviations. CRD, FRFE, and LRASR mitigated the issue of missed detections to some extent, but their background suppression methods were inadequate, leading to high false alarm rates. This is primarily due to the limited representational capabilities of traditional methods. When it comes to deep learning methods like GAED, RGAE, and Auto-AD, background suppression improved significantly, but the anomalies were still not prominent, and missed detections were common. Blind-spot networks such as PDBSNet and BockNet alleviated the issue of weak anomaly detection to a large extent, but missed detections still occurred. In contrast, MMBSN almost entirely eliminated missed detections, excelled in background suppression, and made anomalies highly prominent, achieving an AUC of 0.9945. This success can be attributed to the background feature attention module and the spatial–spectral joint module, which fully leverage spatial–spectral features for detection while enhancing anomaly detection by focusing on background features.
For the Gainesville dataset, high false alarm rates are a common issue, primarily due to the presence of small rectangular buildings that are prone to misdetection, which have global spectral statistical deviations. GRX, FRFE, and LRASR exhibit severe false alarms, while CRD, using a dual-window strategy, reduces false alarms but suffers from significant missed detections. Early pixel-wise reconstruction deep learning methods, such as GAED and RGAE, also show high false alarm rates due to the lack of spatial information utilization. It was not until the introduction of fully convolutional networks like Auto-AD that false alarm rates began to decrease, as spatial information was better utilized. Blind-spot networks like PDBSNet and BockNet further reduced false alarm rates by masking the central region and thereby enhancing spatial information utilization, yet missed detections persisted. MMBSN, an advancement of blind-spot networks, improves spatial information utilization and enhances spatial–spectral interaction through the spatial–spectral joint module. Additionally, the background feature attention module makes anomalies more prominent. As a result, our method achieves the most noticeable anomalies, the lowest false alarm rate, and an AUC of 0.9963, demonstrating strong discriminative ability.
For the San Diego dataset, the primary challenge lies in maintaining the integrity of the spatial structure of anomalies. Large-scale anomalies often exhibit local spatial correlations, making them susceptible to identity mapping, which can result in missed detections. This issue is evident in GRX and FRFE, where high miss rates lead to near-complete failure in detecting anomalies. CRD and LRASR suffer from both high false alarm rates and severe spatial loss of anomalies. Methods like GAED, RGAE, and Auto-AD also struggle with partial miss detections due to the impact of identity mapping on anomalies. Blind-spot networks such as PDBSNet and BockNet mitigate the identity mapping issue by masking central anomalies, but they still face challenges with making anomalies prominent and preserving spatial structure. In contrast, MMBSN effectively addresses identity mapping through the integration of PD preprocessing and the application of multi-scale masked convolutions, which enhance the prominence and spatial completeness of anomalies. This approach results in an AUC of 0.9961. However, since MMBSN uses a strategy of training on partial samples and testing on the full image, it does not fit the data as perfectly as some other methods, leading to some limitations in background suppression.
For the Gulfport dataset, the key challenge was adapting to the detection of multi-scale anomaly targets. Methods like GRX, FRFE, CRD, and RGAE barely detected any targets, while LRASR had extremely high false alarm rates, detecting only the most obvious anomalies. Methods like GAED, Auto-AD, PDBSNet, and BockNet, particularly PDBSNet and BockNet, struggled to adapt to multi-scale target detection, either missing large-scale targets or small-scale targets. In contrast, MMBSN effectively detected targets of different scales and had a significant advantage in highlighting anomalies, achieving an ideal AUC value of 0.9983. This demonstrates that the multi-scale masked convolution module and dynamic learnable fusion module can perfectly integrate the advantages of masked convolution at different scales, enabling effective detection of multi-scale anomaly targets.

3.6. Parameter Analysis

The proposed MMBSN involves two parameters for analysis: the stride factor utilized in PD preprocessing and the screening rate of training samples. It is noteworthy that the step factor of the PD inverse operation aligns with the PD operation. Among them, the stride size for the corresponding four datasets is set to 1–5. In order to maintain variable control, the screening rate of training samples is set at 0.5. When the stride size is 1, there exists only one sample. Hence, the screening rate is adjusted to 1.
The AUC values for different size factors are presented in Figure 15 and Table 3. It is evident that, to achieve optimal AUC, the step factor is typically small for datasets comprising small objects and large for datasets containing large objects. This choice stems from our utilization of a protective mask with only two scales in the mask convolution process. Additionally, the introduction of PD operation aims to adjust the step size factor and adapt it to diverse scale targets, thereby reducing the scale of the mask convolution.
Through experimental analysis of the stride factor, we determine the optimal stride factor for each dataset. Subsequently, while keeping the stride factor constant, we vary the screening rate from 0.1 to 1. As depicted in Figure 16 and Table 4, superior detection performance can be achieved when the screening rate of training samples across all datasets is approximately 0.5. This phenomenon arises due to PD sampling not only disrupting anomaly structures but also causing spatial disruptions in certain small background regions. When the screening rate is set at 0.1, an insufficient number of samples leads to a significant loss of background details and subsequently results in a high false alarm rate. By increasing the sample screening rate, more background details are incorporated while simultaneously reducing the number of screened anomaly samples, thereby gradually enhancing detection performance. However, once the screening rate surpasses 0.5, overall detection performance significantly declines due to the excessive inclusion of anomaly samples into training datasets. Nevertheless, when trained with all available samples, detecting anomalies becomes challenging due to overfitting and anomalous identity mapping.
Based on the experimental analysis, the optimal parameters are determined as follows: For the Pavia dataset, the stride factor is set to 2 and the screening rate to 0.5; for the Gainesville dataset, the stride factor is set to 3 and the screening rate to 0.4; for the San Diego dataset, the stride factor is set to 3 and the screening rate to 0.4; and for the Gulfport dataset, the stride factor is set to 4 and the screening rate to 0.5.
In general, the preprocessing module not only disrupts the local correlation of anomalies to enhance the blind-spot network’s ability to reconstruct the background but also leverages the benefits of background samples to amplify reconstruction errors in anomaly regions, effectively preventing overfitting and anomaly identity mapping [44]. Additionally, with the inclusion of a pre-processing module, our network can adapt to multi-scale anomaly targets through finite-scale mask convolution.

3.7. Ablation Study

To validate the efficacy of each component, I conducted ablation experiments on four datasets for each component and categorized the ablation study into five scenarios. The first scenario entails substituting the mask convolution in the original MMCM with a regular convolution of identical dimensions. In the second scenario, three SSJ modules are replaced with three standard 3 × 3 convolutions. The third scheme employs MMBSN without BFAM. The fourth scheme integrates the detection results from different scale mask convolutions by sequentially adding 1 × 1 convolutions instead of DLFM. Finally, we employ the complete MMBSN as a benchmark for comparison against other scenarios. It is important to note that no ablation is required for the preprocessing module, as its effectiveness has already been verified through previous parameter analysis when both factor and rate are set to 1. Additionally, we introduced the dual-window mask encoder block (DWMEB) from S2DWMTrans [53] as a benchmark and added case 5 to perform an ablation study on SSJM.
The experimental results are presented in Table 5, demonstrating a significant reduction in AUC after the removal of each component. Particularly for the Gulfport dataset, the absence of MMCM resulted in a decrease of 0.0309 in the AUC score, emphasizing the crucial importance and effectiveness of MMCM in multi-scale target detection. Since fully utilizing and integrating spatial information with spectral information is pivotal for anomaly detection tasks, there is a slight decline observed in the AUC score for case 2. However, achieving improved detection performance by leveraging spatial and spectral features to their fullest extent remains quite limited. The ablation analysis conducted on case 3 reveals that even though the network is already highly sparse, it inevitably pays attention to anomaly features; hence, incorporating a background feature attention module can enhance focus on background features while reducing attention towards anomaly ones. Furthermore, Case 4 demonstrates that not all mask convolutions yield similar detection performance, thus losing an effective fusion module leads to rapid degradation of detection performance, especially evident when dealing with datasets containing multiple scales, while its impact on single-scale Pavia datasets is relatively minor. The case 5 results indicate that the SSJM outperforms the DWMEB from S2DWMT. It can be seen that SSJM also takes into account both global and local feature interactions, which can effectively improve the utilization of spatial information and enhance the interaction of spatial–spectral features. Secondly, the calculation time of case 6 is half faster than that of case 5, which shows that our SSJM is a lightweight design. Overall, each module designed within this study has played an indispensable role and positively influenced one another.

3.8. Mask Convolution Scale Characteristic Experiment

To analyze the characteristics of masked convolution at different scales, we conducted tests using the Pavia dataset, which contains small anomaly targets, and the San Diego dataset, which features large-scale anomaly targets. We modified the dual-branch multi-scale masked convolution module into two single-branch masked convolution modules with kernel sizes of 1 × 1 and 3 × 3 removed the dynamic fusion module. Using the same parameters, we performed detection on both datasets. As shown in Table 6, as we previously discussed, the small-scale masked convolution demonstrated superior performance in detecting small-scale anomaly targets, while the large-scale masked convolution struggled with detection due to incomplete masking of large-scale anomaly targets. Conversely, the large-scale masked convolution performed better for large-scale anomalies. This perfectly confirms our earlier hypothesis and further validates the effectiveness of the multi-scale masked convolution module in integrating the advantages of different scales of masked convolution.

3.9. Comparison of Inference Times

Table 7 presents the inference times of different methods across all datasets to evaluate the computational burden of these models. Compared to other traditional methods, GRX achieved the shortest inference time due to its simple design principle. Among deep learning methods, Auto-AD achieved the shortest average inference time thanks to its lightweight network design. In contrast, BockNet requires rotating the original HSI and feeding all four rotated HSIs into the network, resulting in longer inference times. Similarly, the proposed MMBSN also requires feeding the PD-processed HSI into the network, and an additional PD inverse operation is needed after inference to obtain the final reconstructed HSI. However, since our network primarily employs lightweight designs, such as depth-wise separable convolutions, our inference time is significantly reduced compared to BockNet. Although MMBSN does not have the shortest inference time, it achieves promising performance, and its competitive advantage is undeniable.

4. Conclusions and Perspectives

The present paper proposes a novel blind-spot network for multi-scale anomaly targets. Specifically, MMBSN fully exploits the advantages of different scale mask convolutions in blind-spot anomaly detection networks and introduces a new training strategy that employs a preprocessing method combining PD operation and spatial–spectral joint screening to select samples for training. This enables MMBSN to adapt to objects of varying scales and prevent anomaly identity mapping. Additionally, we propose a spatial–spectral joint module and a background feature attention module, which effectively enhance the interaction between spatial–spectral features and the utilization of background features. Experimental results on four datasets demonstrate that our MMBSN exhibits superior and comprehensive detection performance across different scenarios. Particularly for multi-scale objects, our approach effectively prevents anomaly mapping, thereby enabling better detection of their spatial form.
However, our current work still has some limitations, such as the generalization capability of our method. Due to the scarcity of hyperspectral anomaly detection datasets, our method is still limited to the training dataset and cannot achieve cross-scene detection. Additionally, the detection paradigm is singular, remaining confined to the reconstruction paradigm. In the future, our work will focus on establishing a larger hyperspectral anomaly detection dataset and developing new deep learning paradigms for hyperspectral anomaly detection with strong generalization capabilities.

Author Contributions

Methodology, Z.Y.; investigation, G.Y., W.S., S.Z. and J.L.; supervision, R.Z. and X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, 42301376; the National Natural Science Foundation of China, 42171326; the Zhejiang Province “Pioneering Soldier” and “Leading Goose” R&D Project, 2023C01027; the Zhejiang Provincial Natural Science Foundation of China, LR23D010001; the Zhejiang Provincial Natural Science Foundation of China, LY22F010014; and the Ningbo Natural Science Foundation, 2022J076.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Sun, X.; Qu, Y.; Gao, L.; Sun, X.; Qi, H.; Zhang, B.; Shen, T. Target detection through tree-structured encoding for hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4233–4249. [Google Scholar] [CrossRef]
  2. Gao, L.; Sun, X.; Sun, X.; Zhuang, L.; Du, Q.; Zhang, B. Hyperspectral anomaly detection based on chessboard topology. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5505016. [Google Scholar] [CrossRef]
  3. Cheng, X.; Zhang, M.; Lin, S.; Zhou, K.; Zhao, S.; Wang, H. Two-stream isolation forest based on deep features for hyperspectral anomaly detection. IEEE Geosci. Remote Sens. Lett. 2023, 20, 3271899. [Google Scholar] [CrossRef]
  4. Zhang, M.; Zhang, R.; Yang, Y.; Bai, H.; Zhang, J.; Guo, J. ISNet: Shape matters for infrared small target detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 877–886. [Google Scholar]
  5. Zhang, M.; Bai, H.; Zhang, J.; Zhang, R.; Wang, C.; Guo, J.; Gao, X. Rkformer: Runge-kutta transformer with random-connection attention for infrared small target detection. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 1730–1738. [Google Scholar]
  6. Zhang, M.; Yue, K.; Zhang, J.; Li, Y.; Gao, X. Exploring feature compensation and cross-level correlation for infrared small target detection. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 1857–1865. [Google Scholar]
  7. Zhang, M.; Zhang, R.; Zhang, J.; Guo, J.; Li, Y.; Gao, X. Dim2Clear Network for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 3263848. [Google Scholar] [CrossRef]
  8. Zhang, M.; Li, B.; Wang, T.; Bai, H.; Yue, K.; Li, Y. Chfnet: Curvature half-level fusion network for single-frame infrared small target detection. Remote Sens. 2023, 15, 1573. [Google Scholar] [CrossRef]
  9. Zhang, M.; Yang, H.; Yue, K.; Zhang, X.; Zhu, Y.; Li, Y. Thermodynamics-Inspired Multi-Feature Network for Infrared Small Target Detection. Remote Sens. 2023, 15, 4716. [Google Scholar] [CrossRef]
  10. Tejasree, G.; Agilandeeswari, L. An extensive review of hyperspectral image classification and prediction: Techniques and challenges. Multimedia Tools Appl. 2024, 1–98. [Google Scholar] [CrossRef]
  11. Su, H.; Wu, Z.; Zhang, H.; Du, Q. Hyperspectral anomaly detection: A survey. IEEE Geosci. Remote Sens. Mag. 2021, 10, 64–90. [Google Scholar] [CrossRef]
  12. Racetin, I.; Krtalić, A. Systematic review of anomaly detection in hyperspectral remote sensing applications. Appl. Sci. 2021, 11, 4878. [Google Scholar] [CrossRef]
  13. Reed, I.S.; Yu, X. Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution. IEEE Trans. Acoust. Speech Signal Process. 1990, 38, 1760–1770. [Google Scholar] [CrossRef]
  14. Molero, J.M.; Garzon, E.M.; Garcia, I.; Plaza, A. Analysis and optimizations of global and local versions of the RX algorithm for anomaly detection in hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 801–814. [Google Scholar] [CrossRef]
  15. Kwon, H.; Nasrabadi, N.M. Kernel RX-algorithm: A nonlinear anomaly detector for hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2005, 43, 388–397. [Google Scholar] [CrossRef]
  16. Schaum, A. Joint subspace detection of hyperspectral targets. In Proceedings of the 2004 IEEE Aerospace Conference Proceedings (IEEE Cat. No. 04TH8720), Big Sky, MT, USA, 6–13 March 2004. [Google Scholar]
  17. Li, W.; Du, Q. Collaborative representation for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2014, 53, 1463–1474. [Google Scholar] [CrossRef]
  18. Tu, B.; Li, N.; Liao, Z.; Ou, X.; Zhang, G. Hyperspectral anomaly detection via spatial density background purification. Remote Sens. 2019, 11, 2618. [Google Scholar] [CrossRef]
  19. Vafadar, M.; Ghassemian, H. Anomaly detection of hyperspectral imagery using modified collaborative representation. IEEE Geosci. Remote Sens. Lett. 2018, 15, 577–581. [Google Scholar] [CrossRef]
  20. Ling, Q.; Guo, Y.; Lin, Z.; An, W. A constrained sparse representation model for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2018, 57, 2358–2371. [Google Scholar] [CrossRef]
  21. Ren, L.; Ma, Z.; Bovolo, F.; Bruzzone, L. A nonconvex framework for sparse unmixing incorporating the group structure of the spectral library. IEEE Trans. Geosci. Remote Sens. 2021, 60, 3081101. [Google Scholar] [CrossRef]
  22. Yuan, Y.; Ma, D.; Wang, Q. Hyperspectral anomaly detection via sparse dictionary learning method of capped norm. IEEE Access 2019, 7, 16132–16144. [Google Scholar] [CrossRef]
  23. Zhuang, L.; Ng, M.K.; Liu, Y. Cross-track illumination correction for hyperspectral pushbroom sensor images using low-rank and sparse representations. IEEE Trans. Geosci. Remote Sens. 2023, 61, 3236818. [Google Scholar] [CrossRef]
  24. Cheng, T.; Wang, B. Graph and total variation regularized low-rank representation for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2019, 58, 391–406. [Google Scholar] [CrossRef]
  25. Yang, Y.; Zhang, J.; Song, S.; Liu, D. Hyperspectral anomaly detection via dictionary construction-based low-rank representation and adaptive weighting. Remote Sens. 2019, 11, 192. [Google Scholar] [CrossRef]
  26. Qu, Y.; Wang, W.; Guo, R.; Ayhan, B.; Kwan, C.; Vance, S.; Qi, H. Hyperspectral anomaly detection through spectral unmixing and dictionary-based low-rank decomposition. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4391–4405. [Google Scholar] [CrossRef]
  27. Cheng, X.; Mu, R.; Lin, S.; Zhang, M.; Wang, H. Hyperspectral Anomaly Detection via Low-Rank Representation with Dual Graph Regularizations and Adaptive Dictionary. Remote Sens. 2024, 16, 1837. [Google Scholar] [CrossRef]
  28. Zhang, C.; Su, H.; Wang, X.; Wu, Z.; Yang, Y.; Xue, Z.; Du, Q. Self-paced Probabilistic Collaborative Representation for Anomaly Detection of Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 3393303. [Google Scholar] [CrossRef]
  29. Kang, X.; Zhang, X.; Li, S.; Li, K.; Li, J.; Benediktsson, J.A. Hyperspectral anomaly detection with attribute and edge-preserving filters. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5600–5611. [Google Scholar] [CrossRef]
  30. Xie, W.; Jiang, T.; Li, Y.; Jia, X.; Lei, J. Structure tensor and guided filtering-based algorithm for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4218–4230. [Google Scholar] [CrossRef]
  31. Wang, Z.; Wang, X.; Tan, K.; Han, B.; Ding, J.; Liu, Z. Hyperspectral anomaly detection based on variational background inference and generative adversarial network. Pattern Recognit. 2023, 143, 109795. [Google Scholar] [CrossRef]
  32. Zhang, S.; Meng, X.; Liu, Q.; Yang, G.; Sun, W. Feature-Decision Level Collaborative Fusion Network for Hyperspectral and LiDAR Classification. Remote Sens. 2023, 15, 4148. [Google Scholar] [CrossRef]
  33. Cheng, X.; Huo, Y.; Lin, S.; Dong, Y.; Zhao, S.; Zhang, M.; Wang, H. Deep Feature Aggregation Network for Hyperspectral Anomaly Detection. IEEE Trans. Instrum. Meas. 2024, 2024, 3403211. [Google Scholar] [CrossRef]
  34. Xiang, P.; Ali, S.; Zhang, J.; Jung, S.K.; Zhou, H. Pixel-associated autoencoder for hyperspectral anomaly detection. Int. J. Appl. Earth Obs. Geoinf. 2024, 129, 103816. [Google Scholar] [CrossRef]
  35. Wang, D.; Zhuang, L.; Gao, L.; Sun, X.; Zhao, X.; Plaza, A. Sliding Dual-Window-Inspired Reconstruction Network for Hyperspectral Anomaly Detection. IEEE Trans. Instrum. Meas. 2024, 62, 3351179. [Google Scholar] [CrossRef]
  36. Lian, J.; Wang, L.; Sun, H.; Huang, H. GT-HAD: Gated Transformer for Hyperspectral Anomaly Detection. IEEE Trans. Neural Networks Learn. Syst. 2024, 2024, 3355166. [Google Scholar] [CrossRef]
  37. Jiang, T.; Li, Y.; Xie, W.; Du, Q. Discriminative reconstruction constrained generative adversarial network for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4666–4679. [Google Scholar] [CrossRef]
  38. Xiang, P.; Ali, S.; Jung, S.K.; Zhou, H. Hyperspectral anomaly detection with guided autoencoder. IEEE Trans. Geosci. Remote Sens. 2022, 60, 3207165. [Google Scholar] [CrossRef]
  39. Fan, G.; Ma, Y.; Mei, X.; Fan, F.; Huang, J.; Ma, J. Hyperspectral anomaly detection with robust graph autoencoders. IEEE Trans. Geosci. Remote Sens. 2021, 60, 3097097. [Google Scholar]
  40. Wang, S.; Wang, X.; Zhang, L.; Zhong, Y. Auto-AD: Autonomous hyperspectral anomaly detection network based on fully convolutional autoencoder. IEEE Trans. Geosci. Remote Sens. 2021, 60, 3057721. [Google Scholar] [CrossRef]
  41. Wang, S.; Wang, X.; Zhang, L.; Zhong, Y. Deep low-rank prior for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 3165833. [Google Scholar] [CrossRef]
  42. Cheng, X.; Zhang, M.; Lin, S.; Li, Y.; Wang, H. Deep Self-Representation Learning Framework for Hyperspectral Anomaly Detection. IEEE Trans. Instrum. Meas. 2023, 73, 3330225. [Google Scholar] [CrossRef]
  43. Wang, L.; Wang, X.; Vizziello, A.; Gamba, P. RSAAE: Residual self-attention-based autoencoder for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 3271719. [Google Scholar] [CrossRef]
  44. Wang, D.; Zhuang, L.; Gao, L.; Sun, X.; Huang, M.; Plaza, A. PDBSNet: Pixel-shuffle down-sampling blind-spot reconstruction network for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5511914. [Google Scholar] [CrossRef]
  45. Wang, D.; Zhuang, L.; Gao, L.; Sun, X.; Huang, M.; Plaza, A. BockNet: Blind-block reconstruction network with a guard window for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 3335484. [Google Scholar] [CrossRef]
  46. Gao, L.; Wang, D.; Zhuang, L.; Sun, X.; Huang, M.; Plaza, A. BS 3 LNet: A new blind-spot self-supervised learning network for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 3246565. [Google Scholar]
  47. Xu, M.; Xu, J.; Liu, S.; Sheng, H.; Yang, Z. Multi-Scale Convolutional Mask Network for Hyperspectral Unmixing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3687–3700. [Google Scholar] [CrossRef]
  48. Manolakis, D.; Shaw, G. Detection algorithms for hyperspectral imaging applications. IEEE Signal Process. Mag. 2002, 19, 29–43. [Google Scholar] [CrossRef]
  49. Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
  50. Ferri, C.; Hernández-Orallo, J.; Flach, P.A. A coherent interpretation of AUC as a measure of aggregated classification performance. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 28 June–2 July 2011; pp. 657–664. [Google Scholar]
  51. Xu, Y.; Wu, Z.; Li, J.; Plaza, A.; Wei, Z. Anomaly detection in hyperspectral images based on low-rank and sparse representation. IEEE Trans. Geosci. Remote Sens. 2015, 54, 1990–2000. [Google Scholar] [CrossRef]
  52. Chang, C.-I. An effective evaluation tool for hyperspectral target detection: 3D receiver operating characteristic curve analysis. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5131–5153. [Google Scholar] [CrossRef]
  53. Xiao, S.; Zhang, T.; Xu, Z.; Qu, J.; Hou, S.; Dong, W. Anomaly detection of hyperspectral images based on transformer with spatial–spectral dual-window mask. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 1414–1426. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the proposed MMBSN method for HAD.
Figure 1. Flowchart of the proposed MMBSN method for HAD.
Remotesensing 16 03036 g001
Figure 2. Visualization of the preprocessing process.
Figure 2. Visualization of the preprocessing process.
Remotesensing 16 03036 g002
Figure 3. Diagram of P D and P D 1 with a stride factor of 2.
Figure 3. Diagram of P D and P D 1 with a stride factor of 2.
Remotesensing 16 03036 g003
Figure 4. Diagram of the screening method based on spatial domain (SSD).
Figure 4. Diagram of the screening method based on spatial domain (SSD).
Remotesensing 16 03036 g004
Figure 5. Detailed deep network architecture of SSJM.
Figure 5. Detailed deep network architecture of SSJM.
Remotesensing 16 03036 g005
Figure 6. Detailed deep network architecture of BFAM.
Figure 6. Detailed deep network architecture of BFAM.
Remotesensing 16 03036 g006
Figure 7. Detailed deep network architecture of DLFM.
Figure 7. Detailed deep network architecture of DLFM.
Remotesensing 16 03036 g007
Figure 8. Pseudo-color images and ground truth maps of the HSI datasets in our experiments. (a) Pavia; (b) Gainesville; (c) San Diego; (d) Gulfport.
Figure 8. Pseudo-color images and ground truth maps of the HSI datasets in our experiments. (a) Pavia; (b) Gainesville; (c) San Diego; (d) Gulfport.
Remotesensing 16 03036 g008
Figure 9. Heat maps obtained by different algorithms on the Pavia dataset: (a) Ground truth; (b) GRX; (c) FRFE; (d) CRD; (e) LRASR; (f) GAED; (g) RGAE; (h) Auto-AD; (i) PDBSNet; (j) BockNet; (k) MMBSN.
Figure 9. Heat maps obtained by different algorithms on the Pavia dataset: (a) Ground truth; (b) GRX; (c) FRFE; (d) CRD; (e) LRASR; (f) GAED; (g) RGAE; (h) Auto-AD; (i) PDBSNet; (j) BockNet; (k) MMBSN.
Remotesensing 16 03036 g009
Figure 10. Heat maps obtained by different algorithms on the Gainesville dataset: (a) Ground truth; (b) GRX; (c) FRFE; (d) CRD; (e) LRASR; (f) GAED; (g) RGAE; (h) Auto-AD; (i) PDBSNet; (j) BockNet; (k) MMBSN.
Figure 10. Heat maps obtained by different algorithms on the Gainesville dataset: (a) Ground truth; (b) GRX; (c) FRFE; (d) CRD; (e) LRASR; (f) GAED; (g) RGAE; (h) Auto-AD; (i) PDBSNet; (j) BockNet; (k) MMBSN.
Remotesensing 16 03036 g010
Figure 11. Heat maps obtained by different algorithms on the San Diego dataset: (a) Ground truth; (b) GRX; (c) FRFE; (d) CRD; (e) LRASR; (f) GAED; (g) RGAE; (h) Auto-AD; (i) PDBSNet; (j) BockNet; (k) MMBSN.
Figure 11. Heat maps obtained by different algorithms on the San Diego dataset: (a) Ground truth; (b) GRX; (c) FRFE; (d) CRD; (e) LRASR; (f) GAED; (g) RGAE; (h) Auto-AD; (i) PDBSNet; (j) BockNet; (k) MMBSN.
Remotesensing 16 03036 g011
Figure 12. Heat maps obtained by different algorithms on the Gulfport dataset: (a) Ground truth; (b) GRX; (c) FRFE; (d) CRD; (e) LRASR; (f) GAED; (g) RGAE; (h) Auto-AD; (i) PDBSNet; (j) BockNet; (k) MMBSN.
Figure 12. Heat maps obtained by different algorithms on the Gulfport dataset: (a) Ground truth; (b) GRX; (c) FRFE; (d) CRD; (e) LRASR; (f) GAED; (g) RGAE; (h) Auto-AD; (i) PDBSNet; (j) BockNet; (k) MMBSN.
Remotesensing 16 03036 g012
Figure 13. ROC curves for different anomaly detectors on different datasets. (a) Pavia; (b) Gainesville; (c) San Diego; (d) Gulfport.
Figure 13. ROC curves for different anomaly detectors on different datasets. (a) Pavia; (b) Gainesville; (c) San Diego; (d) Gulfport.
Remotesensing 16 03036 g013aRemotesensing 16 03036 g013b
Figure 14. Separability boxplots for different anomaly detectors on different datasets. (a) Pavia; (b) Gainesville; (c) San Diego; (d) Gulfport.
Figure 14. Separability boxplots for different anomaly detectors on different datasets. (a) Pavia; (b) Gainesville; (c) San Diego; (d) Gulfport.
Remotesensing 16 03036 g014
Figure 15. Effects of the different factors with the AUC values on each dataset.
Figure 15. Effects of the different factors with the AUC values on each dataset.
Remotesensing 16 03036 g015
Figure 16. Effects of the different rates with the AUC values on each dataset. (a) Pavia; (b) Gainesville; (c) San Diego; (d) Gulfport.
Figure 16. Effects of the different rates with the AUC values on each dataset. (a) Pavia; (b) Gainesville; (c) San Diego; (d) Gulfport.
Remotesensing 16 03036 g016
Table 1. Details of the experimental datasets.
Table 1. Details of the experimental datasets.
DatasetSensorImage SizeResolution
PaviaROSIS150 × 150 × 1021.3 m
GainesvilleAVIRIS100 × 100 × 1913.5 m
San DiegoAVIRIS100 × 100 × 1893.5 m
GulfportAVIRIS100 × 100 × 1913.4 m
Table 2. AUC values of the different algorithms on the four considered datasets.
Table 2. AUC values of the different algorithms on the four considered datasets.
Dataset The   AUC   P d , P f of Different Methods
GRXFRFECRDLRASRGAEDRGAEAuto-ADPDBSNetBockNetMMBSN
Pavia0.95380.94570.95100.93800.93980.96880.99250.99150.99050.9945 1
Gainesville0.96840.96330.95360.72830.98290.96470.98080.98630.99010.9963 1
San Diego0.87360.97870.97680.98240.98610.98540.97940.98920.99010.9961 1
Gulfport0.95260.97220.93420.91200.97050.98420.98250.98950.99550.9983 1
Average0.93710.96500.95390.89020.96980.97580.98380.98910.99160.9963 1
1 The best performance has a footer with red font.
Table 3. AUC values of the proposed MMBSN with different factors on different datasets.
Table 3. AUC values of the proposed MMBSN with different factors on different datasets.
FactorThe AUC of Different Factors on Four Datasets
PaviaGainesvilleSan DiegoGulfport
1 (Rate = 1.0)0.99250.98420.98960.9767
2 (Rate = 0.5)0.99410.98860.99230.9913
3 (Rate = 0.5)0.98840.99360.99530.9960
4 (Rate = 0.5)0.97630.98260.99470.9971
5 (Rate = 0.5)0.95520.97650.99210.9915
Table 4. AUC values of the proposed MMBSN with different rate on different datasets.
Table 4. AUC values of the proposed MMBSN with different rate on different datasets.
RateThe AUC of Different Rates on Four Datasets
Pavia
(Factor = 2)
Gainesville
(Factor = 3)
San Diego
(Factor = 3)
Gulfport
(Factor = 4)
0.10.99200.98810.99280.9945
0.20.99200.98810.99280.9961
0.30.99200.99170.99440.9940
0.40.99200.99410.99620.9951
0.50.99380.98980.99490.9974
0.60.99380.99020.99410.9950
0.70.99380.98910.99230.9933
0.80.99060.98320.99170.9927
0.90.99060.97400.99060.9912
1.00.98820.96810.98990.9837
Table 5. The AUC values of ablation study on different datasets.
Table 5. The AUC values of ablation study on different datasets.
ComponentCase1Case2Case3Case4Case5Case6
MMCM×
SSJM××
BFAM×
DWMEB×××××
DLFM×
DatasetThe AUC of different cases
Pavia0.98620.99230.99010.99380.99310.9943 1
Gainesville0.97720.99200.98820.99230.99550.9960 1
San Diego0.98200.99290.99100.99270.99440.9952 1
Gulfport0.96670.99170.98890.99110.99610.9976 1
1 The best performance has a footer with red font.
Table 6. The AUC values of mask convolution at different scale on Pavia and San Diego datasets.
Table 6. The AUC values of mask convolution at different scale on Pavia and San Diego datasets.
ScaleThe AUC of Different Case on Pavia and San Diego Datasets
PaviaSan Diego
1 × 1 0.99360.9876
3 × 3 0.98970.9946
ALL0.99450.9960
Table 7. The inference time of different detectors.
Table 7. The inference time of different detectors.
DatasetInference Time of Different Detectors
GRXFRFECRDLRASRGAEDRGAEAuto-ADPDBSNetBockNetMMBSN
Pavia0.982333.58335.314661.19380.10720.04760.0305 10.04190.92580.4580 2
Gainesville0.115514.32759.920355.51050.0347 20.06080.0209 10.02700.94400.4285
San Diego0.21469.88653.914546.33390.03050.03350.0274 20.0266 10.95940.3595
Gulfport0.598813.60965.028763.43490.06200.03730.0220 10.02810.92380.4975 2
Average0.4778 17.8517 6.0445 56.6183 0.0586 0.0448 0.0252 10.0309 20.9383 0.4359
1 The best performance has a footer with red font. 2 The best performance has a footer with bold font.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, Z.; Zhao, R.; Meng, X.; Yang, G.; Sun, W.; Zhang, S.; Li, J. A Multi-Scale Mask Convolution-Based Blind-Spot Network for Hyperspectral Anomaly Detection. Remote Sens. 2024, 16, 3036. https://doi.org/10.3390/rs16163036

AMA Style

Yang Z, Zhao R, Meng X, Yang G, Sun W, Zhang S, Li J. A Multi-Scale Mask Convolution-Based Blind-Spot Network for Hyperspectral Anomaly Detection. Remote Sensing. 2024; 16(16):3036. https://doi.org/10.3390/rs16163036

Chicago/Turabian Style

Yang, Zhiwei, Rui Zhao, Xiangchao Meng, Gang Yang, Weiwei Sun, Shenfu Zhang, and Jinghui Li. 2024. "A Multi-Scale Mask Convolution-Based Blind-Spot Network for Hyperspectral Anomaly Detection" Remote Sensing 16, no. 16: 3036. https://doi.org/10.3390/rs16163036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop