Dual Hyperspectral Mamba for Efficient Spectral Compressive Imaging
Abstract
Deep unfolding methods have made impressive progress in restoring 3D hyperspectral images (HSIs) from 2D measurements through convolution neural networks or Transformers in spectral compressive imaging. However, they cannot efficiently capture long-range dependencies using global receptive fields, which significantly limits their performance in HSI reconstruction. Moreover, these methods may suffer from local context neglect if we directly utilize Mamba to unfold a 2D feature map as a 1D sequence for modeling global long-range dependencies. To address these challenges, we propose a novel Dual Hyperspectral Mamba (DHM) to explore both global long-range dependencies and local contexts for efficient HSI reconstruction. After learning informative parameters to estimate degradation patterns of the CASSI system, we use them to scale the linear projection and offer noise level for the denoiser (i.e., our proposed DHM). Specifically, our DHM consists of multiple dual hyperspectral S4 blocks (DHSBs) to restore original HSIs. Particularly, each DHSB contains a global hyperspectral S4 block (GHSB) to model long-range dependencies across the entire high-resolution HSIs using global receptive fields, and a local hyperspectral S4 block (LHSB) to address local context neglect by establishing structured state-space sequence (S4) models within local windows. Experiments verify the benefits of our DHM for HSI reconstruction. The source codes and models will be available at https://github.com/JiahuaDong/DHM.
1 Introduction
Unlike standard RGB images with only three spectral bands, hyperspectral images (HSIs) [54, 18, 45, 23] comprise multiple contiguous bands, providing detailed spectral information for each pixel. In recent decades, HSIs have achieved remarkable successes in a wide range of applications such as remote sensing [3, 47, 75], object detection [32, 55], vehicle tracking [62, 63, 21], and medical image analysis [1, 41, 50]. With the development of compressive sensing theory, the coded aperture snapshot spectral imaging (CASSI) [49, 22], one of the snapshot compressive imaging systems [74, 44, 58, 64], has shown impressive performance in capturing HSIs at video rate. The CASSI system modulates HSI signals at various wavelengths, and mixes all modulated spectra to output a 2D compressed measurement. Then, numerous HSI reconstruction methods [20, 78, 57] are developed to restore original HSIs from 2D compressed measurements (i.e., the CASSI inverse problem [8]).
Different from natural image restoration, HSI reconstruction deals with substantially degraded measurements caused by uncertain system noise and spectral compression [49, 14]. Thus, it is more challenging to learn underlying HSI properties than natural image restoration. Generally, existing HSI reconstruction methods can be mainly divided into four categories. To solve the CASSI inverse problem, model-based methods [68, 78, 60] are heavily dependent on hand-crafted image priors (e.g., low-rank [39] and sparsity [34]), suffering from limited generalization capability. Some plug-and-play works [77, 80, 54] apply the pretrained denoiser into model-based methods [57, 73], while end-to-end algorithms [49, 27, 52] ignore the working mechanism of the CASSI system and instead model a brute-force projection from 2D compressed measurements to HSIs via convolutional neural networks (CNNs). Moreover, deep unfolding methods [79, 20, 66, 67] introduce a multi-stage unfolding framework to iteratively learn a linear projection and a denoiser. They possess the interpretability of model-based methods [72] as well as the powerful encoding capability of deep learning, thereby achieving state-of-the-art performance to lead the development of HSI reconstruction task.
Many deep unfolding methods [79, 72] rely on CNNs as denoiser to capture local contexts, showing significant limitations in exploiting the crucial global contexts for HSI reconstruction. To tackle this issue, some works employ Transformers [17] to model wide-range dependencies [14, 5, 4, 7], but the complexity is quadratic to the token size. Therefore, there is a trade-off between computation complexity and effective receptive fields, hindering these methods from exploring long-range dependencies, especially in high-resolution HSIs. Recently, structured state space sequence (S4) models [25, 65, 46] have emerged as a promising backbone to address the limitations of Transformers and CNNs. Then visual Mamba models [81, 69] introduce a cross-scan module to apply S4 models into vision tasks by unfolding 2D features as 1D array along four directions. It can use global receptive fields to capture long-range contexts while reducing the quadratic complexity to linear. However, existing Mamba models [81, 69] face a crucial challenge of local context neglect when directly applied to the high-resolution HSI reconstruction. Since Mamba unfolds a 2D feature map as a 1D sequence, spatially close pixels may end up being located at distant positions in the flattened sequences. The excessive distance among nearby pixels leads to the problem of local context neglect (i.e., significant loss of critical local textures), thereby degrading the performance of HSI reconstruction.
To resolve the above challenges, we develop a novel Dual Hyperspectral Mamba (DHM) for efficient HSI reconstruction. Our DHM relies on structured state-space sequence (S4) models to reconstruct HSIs from 2D degraded measurements, which can capture both global long-range dependencies and local contexts with linear computation complexity. It is the first attempt to address HSI reconstruction via S4 models in the field of hyperspectral compressive imaging. After learning informative parameters from the physical mask and degraded measurement of the CASSI system, we feed them into multi-stage unfolding framework by scaling the linear projection and estimating noise level for the denoiser (i.e., our proposed DHM). The core component of our DHM is dual hyperspectral S4 block (DHSB), which is mainly composed of a global hyperspectral S4 block (GHSB) and a local hyperspectral S4 block (LHSB). More specifically, the GHSB focuses on understanding global long-range dependencies by modeling discrete state-space equation on the entire high-resolution HSIs, which can effectively balance computation complexity and global receptive fields. Besides, the LHSB aims to surmount the challenge of local context neglect by constructing S4 models within different local windows. As shown in Fig. 1, experiments shows that our DHM significantly surpasses existing HSI reconstruction methods. The novel contributions of our paper are listed as follows:
We propose a new Dual Hyperspectral Mamba (DHM) for HSI reconstruction, capable of capturing both global long-range dependencies and local contexts with linear computational complexity. To our best knowledge, our DHM is the first Mamba-based deep unfolding method for HSI reconstruction.
We develop a global hyperspectral S4 block (GHSB) to explore long-range dependencies across the entire high-resolution HSIs using global receptive fields, while design a local hyperspectral S4 block (LHSB) to tackle local context neglect by constructing S4 models within different local windows.
We conduct comprehensive experiments to illustrate that our DHM significantly surpasses SOTA deep unfolding methods, while requiring lower model size and computational complexity.
2 Related Work
Hyperspectral Image Reconstruction: Traditional model-based HSI reconstruction methods [21, 60, 68, 71, 78] utilize hand-crafted priors such as sparsity [34], total total variation [71] and low-rank constraint to address the CASSI inverse problem. Unfortunately, they highly rely on manual parameter tuning, leading to unsatisfactory reconstruction performance. In light of this, some plug-and-play methods [77, 54, 36] focus on integrating convex optimization with the pretrained denoising networks for HSI reconstruction. They have limited generalization performance due to the overreliance on pretrained denoiser. Besides, end-to-end (E2E) algorithms [5, 27, 49] rely on convolutional neural networks (CNNs) [13, 12] or Transformers [17] to learn a brute-force projection function for HSI restoration. They can improve the HSI reconstruction performance but lack robustness and interpretability. To address these limitations, deep unfolding methods [8, 20, 28, 42, 67] are developed to restore HSI cubes from 2D compressed measurements via a multi-stage framework, showcasing the interpretability and strong encoding ability. [42, 79, 67] employ CNNs to estimate degradation patterns, showing limitations to explore long-range contexts. After Cai et al. [8] employ Transformer to capture non-local dependencies, many Transformer-based methods [29, 15, 38, 14, 70] are proposed to design the denoisers. However, the above methods suffer from a trade-off between computation complexity and effective receptive fields, preventing them from understanding long-range dependencies with global receptive fields to achieve better HSI reconstruction performance.
State Space Models (SSMs) [25, 26, 59, 35] have attracted increasing attention recently due to their capability to linearly scale with sequence length in the long-range dependency modeling. After structured state space sequence (S4) model [25] shows impressive performance on long-range sequence modeling tasks, S5 model [59] introduces an efficient parallel scan and a general MIMO SSM based on S4. Then [19, 46] are proposed to alleviate the performance gap between Transformers and SSMs. Mamba [24], an enhanced SSM with efficient hardware design and a selective mechanism, has surpassed Transformer in natural language processing [30, 53]. Due to its ability in modeling long-range dependencies with linear complexity, Mamba has been widely applied to diverse vision tasks, such as image/video understanding [40, 76, 37] and biomedical image analysis [43]. However, these Mamba models [24, 76, 37, 40, 53] ’may face the challenge of local context neglect (i.e., substantial loss of critical local textures), when directly applied to the high-resolution HSI reconstruction task.
3 The Proposed Model
3.1 The CASSI System
Degradation Model: In the coded aperture snapshot spectral imaging (CASSI) system [61, 49, 22], the camera can capture the vectorized degraded measurement , where . and represent the number of wavelengths, shifting step of dispersion, height and width in hyperspectral images (HSIs), respectively. As introduced in [8], after vectorizing the shifted HSI signal as , we express the degradation model of the CASSI system as follows:
(1) |
where is the vectorized imaging noise on . indicates the sparse and fat sensing matrix which is determined via the physical mask in the CASSI system [16, 31]. Given and in the CASSI system, the goal of HSI reconstruction is to restore HSI signal by removing the imaging noise .
Estimation of Degradation Patterns: As analyzed in previous deep unfolding methods [8, 15, 14, 70], the estimation of degradation patterns is crucial to improve HSI reconstruction performance in the multi-stage unfolding framework, by adaptively scaling linear projection and offering information about imaging noise for the denoiser. Thus, motivated by [8, 15], we use maximum a posteriori (MAP) theory to restore original HSI signal in Eq. (1) via optimizing the following energy function:
(2) |
where denotes the prior term about , and is the hyperparameter to balance the importance of prior term. In order to solve Eq. (2), we define an auxiliary variable as , and then utilize the half-quadratic splitting algorithm to minimize the following loss :
(3) |
where is a penalty parameter. We decouple and into two iterative subproblems to solve Eq. (3):
(4) |
where denotes the iterative stage index in the multi-stage unfolding framework, as shown in Fig. 2. Since the subproblem of solving in Eq. (4) is a quadratic regularized least-squares problem, we can derive its closed solution as . Considering the high computational overhead of brought by the fat sensing matrix , we resort to the matrix inversion formula to simplify it: . As a result, we can reformulate the closed solution of in Eq. (4) as follows:
(5) |
As introduced in [4, 8], is a diagonal matrix in the CASSI system. After defining , we plug into Eq. (5):
(6) |
where is the element-wise multiplication. Since is precomputed and stored in , the value of in Eq. (6) can affect the output of each iterative stage in the multi-stage unfolding framework. To eliminate negative influence of manually determining , we set to be learnable in the multi-stage framework, and denote as the value of at the -th iterative stage. Besides, we also define a learnable parameter at the -th stage, and express the subproblem of solving in Eq. (4) as:
(7) |
In Eq. (7), the subproblem of solving is equivalent to denoising the image with a Gaussian noise level of , according to Bayesian probability [9]. Given and , we can introduce the following iterative optimization scheme to estimate degradation patterns of the CASSI system and reconstruct original HSI signal in Eq. (1):
(8) |
where is the parameter learner. is equivalent to Eq. (6), which is a linear projection used for mapping to . indicates the Gaussian denoiser to solve Eq. (7). As shown in Fig. 2, we depict our unfolding framework with iterative training stages to restore original HSI signal in Eq. (1). Specifically, we first concatenate the given sensing matrix and compressed measurement , and input it into a convolution block to initialize . At the -th () stage, the parameter learner contains two degradation-aware blocks (DABs), an average pooling layer and three fully connected layers to encode and , and then outputs learnable parameters . The DAB has three convolution layers and two GELU functions. Then and use the parameters to iteratively update and in Eq. (8) until the -th stage. Particularly, learned by can effectively scale the linear projection in Eq. (6), while offering accurate noise level for the denoiser to solve Eq. (7). In the CASSI system, they are essential to estimate the ill-posedness degree and degradation patterns, thereby substantially improving HSI reconstruction performance.
3.2 Dual Hyperspectral Mamba (DHM)
Generally, existing deep unfolding methods [7, 14, 70, 15] mainly utilize CNNs or Transformers to design the denoiser . However, these methods struggle to capture long-range dependencies using global receptive fields, thereby limiting their HSI reconstruction performance. Besides, directly applying Mamba to high-resolution HSI reconstruction suffers from local context neglect (i.e., substantial loss of critical local details). To resolve the above challenges, we develop a novel Dual Hyperspectral Mamba (DHM) as the denoiser in Eq. (8). Our DHM uses global receptive fields to model long-range dependencies while tackling local context neglect via capturing local contexts.
Fig. 3a shows the architecture of our DHM (i.e., the denoiser ) at the -th () iterative stage in Fig. 2. Specifically, given the scalar and at the -th stage, we first reshape to , and concatenate with the reshaped to extract shallow feature via a convolutional layer, where , and is the feature dimension. Then we forward to the encoder, bottleneck and decoder to obtain the deep feature . The encoder and decoder comprise pairs of dual hyperspectral S4 block (DHSB) and the resizing module, while the bottleneck only has DHSBs. In Fig. 3a, we visualize the pipeline of our DHM when and for better demonstration. In Fig. 3b, the DHSB includes a global hyperspectral S4 block (GHSB), a local hyperspectral S4 block (LHSB), a gated feed-forward network (GFFN) and three layer normalization (LN). Fig. 3c presents the components of GHSB and LHSB, which are the two most important modules in our DHM. Apart from the reshaping operation, they have the same architectures. Particularly, the GHSB can use global receptive fields to model long-range dependencies, and the LHSB aims to address local context neglect by constructing structured state space sequence (S4) model within local windows. Besides, Fig. 3d shows the design of GFFN module. Then we perform a convolution operation on to obtain . Finally, we sum and to generate the denoised image at the -th iterative stage. In the following subsections, we introduce the detailed components of the GHSB and LHSB.
Global Hyperspectral S4 Block (GHSB) constructs S4 model on the entire high-resolution HSIs to capture global contexts using global receptive fields. As shown in Fig. 3c, we forward a given feature into two branches, where denotes the feature dimensions at different levels of encoder, bottleneck and decoder. Specifically, the upper branch encodes to via a linear projection , a depth-wise convolution and a SILU activation function . Then we reshape as , and input it into the to model long-range dependencies using global receptive fields. As a result, we can formulate the output feature of the GHSB module as follows:
(9) |
where denotes the element-wise multiplication. denotes the output of lower branch in Fig. 3c, and is the linear mapping. is the layer normalization (LN), can reshape the given feature to , and is the linear projection to obtain . Moreover, denotes the proposed hyperspectral image state space module (HSI-SSM).
HyperSpectral Image State Space Module (HSI-SSM) can model long-range cross-pixel interactions to explore global contexts of using global receptive fields. As shown in Fig. 3e, given the input feature , we unfold the entire hyperspectral image (HSI) that includes pixels, into four one-dimensional sequences with a size of , by scanning these pixels along four distinct traversal paths: from the top-left to the bottom-right, from the top-right to the bottom-left, from the bottom-right to the top-left, and from the bottom-left to the top-right. We denote four sequence features as , where , and denotes the sequence length in the GHSB. Motivated by Mamba [24, 40, 69], we construct some enhanced discrete state space equations on the -th () sequence feature . Specifically, after defining the learnable variables: and , we can formulate some continuous parameters such as and a timescale parameter as:
(10) |
where is the latent feature dimension, and is the softplus activation function. and are the linear projection matrices. Inspired by the zero-order hold (ZOH) discretization rule [24], we reshape the parameter as , and utilize it to transform the continuous parameters and into the discrete parameters and :
(11) |
After obtaining the discrete and via Eq. (11), we reshape the parameter as , and formulate the semantic encoding of as the form of recurrent neural networks (RNNs) to extract a new sequence feature . Then we denote as the latent features of the -th and -th hidden states in the RNNs, and define as follows:
(12) |
where denotes the scale parameter. Inspired by [24], we use the broadcasting mechanism to match the dimensions of different matrices for matrix multiplication operations in Eqs. (11)(12). Then we merge all sequence features to get the final output map of the HSI-SSM. In the GHSB, we utilize the HSI-SSM to encode the entire high-resolution HSI in a recursive manner. It can explore long-range dependencies of the input feature using global receptive fields.
Local Hyperspectral S4 Block (LHSB) aims to explore local contexts within position-specific windows. Different from the GHSB that uses the HSI-SSM to unfold and scan the entire high-resolution HSI containing pixels, the LHSB scans each local window, including pixels, to capture local contexts. Specifically, as shown in Fig. 3c, after encoding the given feature to via the upper branch, we partition to non-overlapping windows, then reshape it as , and input into the HSI-SSM, where denotes the number of windows and each window includes pixels. In the HSI-SSM, we flatten each window including pixels and scan them along four distinctive directions to obtain four sequence features . Note that we set and in the LHSB, which are different from the GHSB. After encoding each sequence under a recursive manner to get , we sum them to get the output map of the HSI-SSM. The LHSB can capture local contexts of HSI by encoding different local windows of the given feature in a recursive manner. Thus, we formulate the final feature outputted by the LHSB as follows:
(13) |
Optimization: As shown in Fig. 2, we utilize and (i.e., our DHM) to iteratively update and in Eq. (8) until the -th stage. After getting at the -th stage, we follow [14, 15] to train our DHM by minimizing the Charbonnier loss between the groundtruth and reconstructed HSI .
4 Experiments
4.1 Implementation Details
For fair comparisons, we set exactly the same experimental configurations with existing HSI reconstruction methods [7, 70, 10, 14, 6, 27] to validate the effectiveness of our DHM. Following the settings of [27, 48, 6, 49], we perform spectral interpolation on the original HSIs and choose a wide spectral range from 450 nm to 650 nm for comparisons on both the simulation and real datasets. The simulation dataset is composed of two subsets: KAIST [11] and CAVE [56]. We employ the CAVE subset to train our DHM, and select 10 HSIs from the KAIST to evaluate performance. Moreover, the real dataset [49] consists of five HSI cubes, which are captured by the practical CASSI system [49].
During training, we employ the Adam optimizer [33] to train all variants of our DHM on a single NVIDIA A100 GPU, where initial learning rate is , and the training epoches are set to 300. Following [7, 70, 14, 27], we randomly crop HSI cubes to for simulation dataset, and for real dataset. The shifting step of dispersion in the CASSI system is set to . Moreover, we set and in this paper. Motivated by baseline HSI reconstruction methods [14, 15], we share the network weights of our DHM across different stages, and use exactly the same data augmentation to train our DHM.
Comparison Methods | #Params | GFLOPS | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 | Avg | ||||||||||||||||||||||
TwIST [2] | - | - |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
-Net [52] | 62.64M | 117.98 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DNU [67] | 1.19M | 163.48 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DIP-HSI [51] | 33.85M | 64.42 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DGSMP [29] | 3.76M | 646.65 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
GAP-Net [48] | 4.27M | 78.58 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
ADMM-Net [42] | 4.27M | 78.58 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
HDNet [27] | 2.37M | 154.76 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
MST-L [5] | 2.03M | 28.15 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
MST++ [6] | 1.33M | 19.42 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
CST-L [4] | 3.00M | 40.01 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
BIRNAT [10] | 4.40M | 2122.66 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
LDMUN [70] | – | – |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DAUHST [8] | 6.15M | 79.50 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
PADUT [38] | 5.38M | 90.46 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
RDLUF [15] | 1.89M | 115.34 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DERNN (3stg) [14] | 0.65M | 27.41 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DERNN (5stg) [14] | 0.65M | 45.60 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DERNN (7stg) [14] | 0.65M | 63.80 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DERNN (9stg) [14] | 0.65M | 81.99 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DERNN (9stg∗) [14] | 1.09M | 134.18 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DHM-light (3stg) | 0.66M | 26.42 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DHM-light (5stg) | 0.66M | 43.96 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DHM-light (7stg) | 0.66M | 61.50 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DHM-light (9stg) | 0.66M | 79.04 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DHM (3stg) | 0.92M | 36.34 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DHM (5stg) | 0.92M | 60.50 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DHM (7stg) | 0.92M | 84.65 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
DHM (9stg) | 0.92M | 108.80 |
|
|
|
|
|
|
|
|
|
|
|
4.2 Quantitative Performance Comparisons
As shown in Tab. 1, we introduce comprehensive quantitative comparisons between our HDM and SOTA HSI reconstruction methods on the simulation dataset with 10 scenes (S1S10). From the results in Tab. 1, we observe that the proposed DHM (9stg) (i.e., our DHM at the -th stage) achieves the best HSI reconstruction performance (i.e., 40.50 dB in PSNR and 0.982 in SSIM). Our DHM (9stg) substantially surpasses existing methods [2, 39, 51, 4, 28], especially several recent SOTA comparison models (e.g., DAUHST [8], LDMUN [70], RDLUF-Mix [15], DERNN [14]) by dB. Such improvements verify the effectiveness of our DHM in exploring long-range dependencies across the entire high-resolution HSIs using global receptive fields, while capturing local context within local windows. More importantly, our DHM requires lower model size and computational costs to dramatically outperform existing methods. Compared with the SOTA DERNN (9stg∗) [14], our DHM (9stg) improves 0.17 dB in PSNR and 0.005 in SSIM, but only consumes 84.40% (0.92M / 1.09M) parameters and 81.09% (108.80 / 134.18) GFLOPS. Moreover, we propose a light model (i.e., DHM-light) where each DHSB contains a single global hyperspectral S4 block (GHSB) and a GFFN. In Tab. 1, our DHM-light at the 3/5/7/9-th stage has significant improvement than other comparison methods (e.g., DERNN [14]) with the same number of stages, while retaining comparable model size and less GFLOPs. It illustrate the effectiveness of our DHM for HSI reconstruction task.
4.3 Qualitative Performance Comparisons
Simulation Dataset: As depicted in Fig. 4, we select 4 out of the 28 spectral channels to visualize some qualitative comparisons of HSI reconstruction on the Scene 7 (S7) of simulation dataset. For better visibility, we zoom in on the regions within the yellow boxes of the original HSIs (bottom), and show the comparison of these regions in the top-right part. In Fig. 4, previous methods suffer from blotchy texture, distortions and blurring artifacts. In contrast, our DHM (9stg) can effectively restore HSIs with less artifacts and finer details. Besides, the spectral density curves corresponding to the green boxes in the top-left RGB image are depicted in the top-middle part. Our DHM (9stg) exhibits the best correlation with groundtruth, which illustrates the effectiveness of our DHM.
Real Dataset: To verify the superiority of our model in real HSI reconstruction, we follow [49, 8, 70, 14] to retrain our DHM-light (5stg) on the joint KAIST [11] and CAVE [56]. Besides, we introduce 11-bit shot noise into training samples to simulate real imaging scenarios. As shown in Fig. 5, our DHM-light (5stg) can effectively restore the plant region corresponding to the yellow box. Compared with SOTA methods [8, 38, 14], our DHM-light (5stg) restores clearer contents and structural details with less artifacts, verifying the robustness of our model to address the real HSI restoration.
GHSB | LHSB | GFFN | #Params | GFLOPs | PSNR | SSIM |
✓ | ✓ | 0.66M | 43.96 | 39.81 | 0.979 | |
✓ | ✓ | 0.66M | 43.96 | 38.76 | 0.973 | |
✓ | ✓ | 0.92M | 60.26 | 39.93 | 0.979 | |
✓ | ✓ | ✓ | 0.92M | 60.50 | 40.16 | 0.980 |
GHSBGA | LHSBLA | #Params | GFLOPs | PSNR | SSIM |
0.92M | 60.50 | 40.16 | 0.980 | ||
✓ | 0.79M | 53.05 | 39.11 | 0.975 | |
✓ | 0.79M | 53.05 | 40.08 | 0.980 | |
✓ | ✓ | 0.65M | 45.60 | 39.38 | 0.973 |
GSLS | LSGS | SPs | #Params | GFLOPs | PSNR | SSIM |
✓ | 4.59M | 60.50 | 39.23 | 0.977 | ||
✓ | ✓ | 0.92M | 60.50 | 40.12 | 0.980 | |
✓ | 4.59M | 60.50 | 39.28 | 0.977 | ||
✓ | ✓ | 0.92M | 60.50 | 40.16 | 0.980 |
Variants | #Params | GFLOPs | PSNR | SSIM | ||
Baseline | 0.90M | 59.11 | 39.86 | 0.979 | ||
DHM w/o | ✓ | 0.92M | 60.50 | 39.71 | 0.978 | |
DHM w/o | ✓ | 0.92M | 60.42 | 39.92 | 0.979 | |
DHM | ✓ | ✓ | 0.92M | 60.50 | 40.16 | 0.980 |
GHSB | LHSB | GFFN | #Params | GFLOPS | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 | Avg | ||||||||||||||||||||||
✓ | ✓ | 0.66M | 43.96 |
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||
✓ | ✓ | 0.66M | 43.96 |
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||
✓ | ✓ | 0.92M | 60.26 |
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||
✓ | ✓ | ✓ | 0.92M | 60.50 |
|
|
|
|
|
|
|
|
|
|
|
4.4 Ablation Studies
This subsection analyzes the effectiveness of all proposed modules on simulation dataset using our DHM (5stg) as an example. 1) DHSB: As shown in Tab. 2a, when we remove the GHSB, LHSB or replace the GFFN with a traditional feed-forward network (FNN) [8] in the DHSB, the performance of our DHM (5stg) significantly decreases by dB in PSNR and in SSIM. Tab. 3 presents ablation results of our DHM (5stg) on 10 scenes (S1S10) to veirify the effectiveness of the DHSB. 2) Variants: In Tab. 2b, our model decreases by dB in PSNR when we replace the GHSB with non-local MSA [14] (GHSBGA) or substitute the LHSB with local MSA [14] (LHSBLA), where MSA is the multi-head self-attention [17]. It verifies the effectiveness of our DHM in using global receptive fields to model long-range dependencies while capturing local contexts. 3) Block Orders: In Tab. 2c, we perform ablation studies about shared parameters (SPs) across different stages, and the orders of GHSB and LHSB: from GHSB to LHSB (GSLS) or from LHSB to GHSB (LSGS). The ablation results validate the effectiveness of our DHM. 4) Parameters: Tab. 2d shows ablation studies about learnable parameters , which validates their effectiveness to estimate degradation patterns. Fig. 6 visualizes as examples to verify the effectiveness of our unfolding framework in HSI reconstruction, when we use our DHM (9stg) as the denoiser .
5 Conclusion
In this paper, we propose a novel Dual Hyperspectral Mamba (DHM) to model both global and local dependencies for efficient HSI reconstruction. After estimating degradation patterns of the CASSI system via the learnable parameters, we utilize these parameters to scale the linear projection and offer noise level for the denoiser (i.e., our DHM) in the multi-stage unfolding framework. Particularly, the proposed DHM mainly consists of a global hyperspectral S4 block (GHSB) and a local hyperspectral S4 block (LHSB). The GHSB can explore long-range dependencies across the entire high-resolution HSIs using global receptive fields, while the LHSB constructs S4 models within different local windows to capture local contexts. We conduct enormous quantitative and qualitative comparison experiments on both the simulation and real datasets to demonstrate the effectiveness of our DHM.
References
- [1] V. Backman, M. B. Wallace, L. Perelman, J. Arendt, R. Gurjar, M. Muller, Q. Zhang, G. Zonios, E. Kline, and T. McGillican. Detection of preinvasive cancer cells. Nature, 2000.
- [2] JosÉ M. Bioucas-Dias and MÁrio A. T. Figueiredo. A new twist: Two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Transactions on Image Processing, 16(12):2992–3004, 2007.
- [3] M. Borengasser, W. S. Hungate, and R. Watkins. Hyperspectral remote sensing: principles and applications. CRC press, 2007.
- [4] Yuanhao Cai, Jing Lin, Xiaowan Hu, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, and Luc Van Gool. Coarse-to-fine sparse transformer for hyperspectral image reconstruction. In European Conference on Computer Vision, pages 686–704. Springer, 2022.
- [5] Yuanhao Cai, Jing Lin, Xiaowan Hu, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, and Luc Van Gool. Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17502–17511, 2022.
- [6] Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Zhang, Hanspeter Pfister, Radu Timofte, and Luc Van Gool. Mst++: Multi-stage spectral-wise transformer for efficient spectral reconstruction. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 744–754, 2022.
- [7] Yuanhao Cai, Jing Lin, Haoqian Wang, Xin Yuan, Henghui Ding, Yulun Zhang, Radu Timofte, and Luc V Gool. Degradation-aware unfolding half-shuffle transformer for spectral compressive imaging. Advances in Neural Information Processing Systems, 35:37749–37761, 2022.
- [8] Yuanhao Cai, Jing Lin, Haoqian Wang, Xin Yuan, Henghui Ding, Yulun Zhang, Radu Timofte, and Luc Van Gool. Degradation-aware unfolding half-shuffle transformer for spectral compressive imaging. In Advances in Neural Information Processing Systems, 2022.
- [9] Stanley H. Chan, Xiran Wang, and Omar A. Elgendy. Plug-and-play admm for image restoration: Fixed-point convergence and applications. IEEE Transactions on Computational Imaging, 3(1):84–98, 2017.
- [10] Ziheng Cheng, Bo Chen, Ruiying Lu, Zhengjue Wang, Hao Zhang, Ziyi Meng, and Xin Yuan. Recurrent neural networks for snapshot compressive imaging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):2264–2281, 2023.
- [11] Inchang Choi, MH Kim, D Gutierrez, DS Jeon, and G Nam. High-quality hyperspectral reconstruction using a spectral prior. In Technical report, 2017.
- [12] Jiahua Dong, Yang Cong, Gan Sun, Zhen Fang, and Zhengming Ding. Where and how to transfer: Knowledge aggregation-induced transferability perception for unsupervised domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(3):1664–1681, 2024.
- [13] Jiahua Dong, Yang Cong, Gan Sun, Bineng Zhong, and Xiaowei Xu. What can be transferred: Unsupervised domain adaptation for endoscopic lesions segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4022–4031, June 2020.
- [14] Yubo Dong, Dahua Gao, Yuyan Li, Guangming Shi, and Danhua Liu. Degradation estimation recurrent neural network with local and non-local priors for compressive spectral imaging. arXiv preprint arXiv:2311.08808, 2024.
- [15] Yubo Dong, Dahua Gao, Tian Qiu, Yuyan Li, Minxi Yang, and Guangming Shi. Residual degradation learning unfolding framework with mixing priors across spectral and spatial for compressive spectral imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22262–22271, 2023.
- [16] D.L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, 2006.
- [17] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
- [18] Mathieu Fauvel, Yuliya Tarabalka, Jón Atli Benediktsson, Jocelyn Chanussot, and James C. Tilton. Advances in spectral-spatial classification of hyperspectral images. Proceedings of the IEEE, 101(3):652–675, 2013.
- [19] Daniel Y Fu, Tri Dao, Khaled K Saab, Armin W Thomas, Atri Rudra, and Christopher Ré. Hungry hungry hippos: Towards language modeling with state space models. arXiv preprint arXiv:2212.14052, 2022.
- [20] Ying Fu, Zhiyuan Liang, and Shaodi You. Bidirectional 3d quasi-recurrent neural network for hyperspectral image super-resolution. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:2674–2688, 2021.
- [21] Ying Fu, Yinqiang Zheng, Imari Sato, and Yoichi Sato. Exploiting spectral-spatial correlation for coded hyperspectral image restoration. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3727–3736, 2016.
- [22] M. E. Gehm, R. John, D. J. Brady, R. M. Willett, and T. J. Schulz. Single-shot compressive spectral imaging with a dual-disperser architecture. Opt. Express, 15(21):14013–14027, Oct 2007.
- [23] Alexander F. H. Goetz, Gregg Vane, Jerry E. Solomon, and Barrett N. Rock. Imaging spectrometry for earth remote sensing. Science, 228(4704):1147–1153, 1985.
- [24] Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
- [25] Albert Gu, Karan Goel, and Christopher Ré. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021.
- [26] Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré. Combining recurrent, convolutional, and continuous-time models with linear state space layers. Advances in neural information processing systems, 34:572–585, 2021.
- [27] Xiaowan Hu, Yuanhao Cai, Jing Lin, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, and Luc Van Gool. Hdnet: High-resolution dual-domain learning for spectral compressive imaging. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17521–17530, 2022.
- [28] Tao Huang, Weisheng Dong, Xin Yuan, Jinjian Wu, and Guangming Shi. Deep gaussian scale mixture prior for spectral compressive imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16216–16225, 2021.
- [29] Tao Huang, Xin Yuan, Weisheng Dong, Jinjian Wu, and Guangming Shi. Deep gaussian scale mixture prior for image reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- [30] Md Mohaiminul Islam, Mahmudul Hasan, Kishan Shamsundar Athrey, Tony Braskich, and Gedas Bertasius. Efficient movie scene detection using state-space transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18749–18758, 2023.
- [31] Shirin Jalali and Xin Yuan. Snapshot compressed sensing: Performance bounds and algorithms. IEEE Transactions on Information Theory, 65(12):8005–8024, 2019.
- [32] Min H. Kim, Todd Alan Harvey, David S. Kittle, Holly Rushmeier, Julie Dorsey, Richard O. Prum, and David J. Brady. 3d imaging spectroscopy for measuring hyperspectral patterns on solid objects. ACM Transactions on Graphics, 31(4), jul 2012.
- [33] Diederik P. Kingma and Jimmy Lei Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
- [34] David Kittle, Kerkil Choi, Ashwin Wagadarikar, and David J. Brady. Multiframe image estimation for coded aperture snapshot spectral imagers. Appl. Opt., 49(36):6824–6833, Dec 2010.
- [35] Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
- [36] Zeqiang Lai, Kaixuan Wei, and Ying Fu. Deep plug-and-play prior for hyperspectral image restoration. Neurocomputing, 481:281–293, 2022.
- [37] Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, and Yu Qiao. Videomamba: State space model for efficient video understanding. arXiv preprint arXiv:2403.06977, 2024.
- [38] Miaoyu Li, Ying Fu, Ji Liu, and Yulun Zhang. Pixel adaptive deep unfolding transformer for hyperspectral image reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12959–12968, 2023.
- [39] Yang Liu, Xin Yuan, Jinli Suo, David J. Brady, and Qionghai Dai. Rank minimization for snapshot compressive imaging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(12):2990–3006, 2019.
- [40] Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yunfan Liu. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166, 2024.
- [41] Guolan Lu and Baowei Fei. Medical hyperspectral imaging: a review. Journal of Biomedical Optics, 2014.
- [42] Jiawei Ma, Xiao-Yang Liu, Zheng Shou, and Xin Yuan. Deep tensor admm-net for snapshot compressive imaging. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 10222–10231, 2019.
- [43] Jun Ma, Feifei Li, and Bo Wang. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722, 2024.
- [44] Xiao Ma, Xin Yuan, Chen Fu, and Gonzalo R. Arce. Led-based compressive spectral-temporal imaging. Opt. Express, 29(7):10698–10715, Mar 2021.
- [45] Emmanuel Maggiori, Guillaume Charpiat, Yuliya Tarabalka, and Pierre Alliez. Recurrent neural networks to correct satellite image classification maps. IEEE Transactions on Geoscience and Remote Sensing, 55(9):4962–4971, 2017.
- [46] Harsh Mehta, Ankit Gupta, Ashok Cutkosky, and Behnam Neyshabur. Long range language modeling via gated state spaces. arXiv preprint arXiv:2206.13947, 2022.
- [47] F. Melgani and L. Bruzzone. Classification of hyperspectral remote sensing images with support vector machines. IEEE Transactions on Geoscience and Remote Sensing, 42(8):1778–1790, 2004.
- [48] Ziyi Meng, Shirin Jalali, and Xin Yuan. Gap-net for snapshot compressive imaging. arXiv preprint arXiv:2012.08364, 2020.
- [49] Ziyi Meng, Jiawei Ma, and Xin Yuan. End-to-end low cost compressive spectral imaging with spatial-spectral self-attention. In European conference on computer vision, pages 187–204. Springer, 2020.
- [50] Ziyi Meng, Mu Qiao, Jiawei Ma, Zhenming Yu, Kun Xu, and Xin Yuan. Snapshot multispectral endomicroscopy. Optics Letters, 2020.
- [51] Ziyi Meng, Zhenming Yu, Kun Xu, and Xin Yuan. Self-supervised neural networks for spectral snapshot compressive imaging. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2602–2611, 2021.
- [52] Xin Miao, Xin Yuan, Yunchen Pu, and Vassilis Athitsos. lambda-net: Reconstruct hyperspectral images from a snapshot measurement. In IEEE International Conference on Computer Vision, pages 4058–4068, Oct. 2019.
- [53] Eric Nguyen, Karan Goel, Albert Gu, Gordon Downs, Preey Shah, Tri Dao, Stephen Baccus, and Christopher Ré. S4nd: Modeling images and videos as multidimensional signals with state spaces. Advances in neural information processing systems, 35:2846–2861, 2022.
- [54] Hien Van Nguyen, Amit Banerjee, and Rama Chellappa. Tracking via object reflectance using a hyperspectral video camera. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pages 44–51, 2010.
- [55] Zhihong Pan, G. Healey, M. Prasad, and B. Tromberg. Face recognition in hyperspectral images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12):1552–1560, 2003.
- [56] Jong-Il Park, Moon-Hyun Lee, Michael D. Grossberg, and Shree K. Nayar. Multispectral imaging using multiplexed illumination. In ICCV, 2007.
- [57] Mu Qiao, Ziyi Meng, Jiawei Ma, and Xin Yuan. Deep learning for video compressive sensing. Apl Photonics, 2020.
- [58] Mu Qiao and Xin Yuan. Coded aperture compressive temporal imaging using complementary codes and untrained neural networks for high-quality reconstruction. Opt. Lett., 48(1):109–112, Jan 2023.
- [59] Jimmy TH Smith, Andrew Warrington, and Scott W Linderman. Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933, 2022.
- [60] Jin Tan, Yanting Ma, Hoover Rueda, Dror Baron, and Gonzalo R. Arce. Compressive hyperspectral imaging via approximate message passing. IEEE Journal of Selected Topics in Signal Processing, 10(2):389–401, 2016.
- [61] Joel A. Tropp and Anna C. Gilbert. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12):4655–4666, 2007.
- [62] Burak Uzkent, Matthew J. Hoffman, and Anthony Vodacek. Real-time vehicle tracking in aerial video using hyperspectral features. In 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1443–1451, 2016.
- [63] Burak Uzkent, Aneesh Rangnekar, and Matthew J. Hoffman. Aerial vehicle tracking by adaptive fusion of hyperspectral likelihood maps. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 233–242, 2017.
- [64] Ashwin Wagadarikar, Renu John, Rebecca Willett, and David Brady. Single disperser design for coded aperture snapshot spectral imaging. Appl. Opt., 47(10):B44–B51, Apr 2008.
- [65] Jue Wang, Wentao Zhu, Pichao Wang, Xiang Yu, Linda Liu, Mohamed Omar, and Raffay Hamid. Selective structured state-spaces for long-form video understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6387–6397, 2023.
- [66] Lizhi Wang, Chen Sun, Ying Fu, Min H. Kim, and Hua Huang. Hyperspectral image reconstruction using a deep spatial-spectral prior. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8024–8033, 2019.
- [67] Lizhi Wang, Chen Sun, Maoqing Zhang, Ying Fu, and Hua Huang. Dnu: Deep non-local unrolling for computational spectral imaging. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1658–1668, 2020.
- [68] Lizhi Wang, Zhiwei Xiong, Guangming Shi, Feng Wu, and Wenjun Zeng. Adaptive nonlocal sparse representation for dual-camera compressive hyperspectral imaging. IEEE transactions on pattern analysis and machine intelligence, 39(10):2104–2111, 2016.
- [69] Ziyang Wang, Jian-Qing Zheng, Yichi Zhang, Ge Cui, and Lei Li. Mamba-unet: Unet-like pure visual mamba for medical image segmentation. arXiv preprint arXiv:2402.05079, 2024.
- [70] Zongliang Wu, Ruiying Lu, Ying Fu, and Xin Yuan. Latent diffusion prior enhanced deep unfolding for spectral image reconstruction. arXiv preprint arXiv:2311.14280, 2023.
- [71] Xin Yuan. Generalized alternating projection based total variation minimization for compressive sensing. In 2016 IEEE International Conference on Image Processing (ICIP), pages 2539–2543, 2016.
- [72] Xin Yuan, David J. Brady, and Aggelos K. Katsaggelos. Snapshot compressive imaging: Theory, algorithms, and applications. IEEE Signal Processing Magazine, 38(2):65–88, 2021.
- [73] Xin Yuan, Yang Liu, Jinli Suo, and Qionghai Dai. Plug-and-play algorithms for large-scale snapshot compressive imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1447–1457, 2020.
- [74] Xin Yuan, Tsung-Han Tsai, Ruoyu Zhu, Patrick Llull, David Brady, and Lawrence Carin. Compressive hyperspectral imaging with side information. IEEE Journal of Selected Topics in Signal Processing, 9(6):964–976, 2015.
- [75] Yuan Yuan, Xiangtao Zheng, and Xiaoqiang Lu. Hyperspectral image superresolution by transfer learning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(5):1963–1974, 2017.
- [76] Yubiao Yue and Zhenzhang Li. Medmamba: Vision mamba for medical image classification. arXiv preprint arXiv:2403.03849, 2024.
- [77] Kai Zhang, Yawei Li, Wangmeng Zuo, Lei Zhang, Luc Van Gool, and Radu Timofte. Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6360–6376, 2021.
- [78] Shipeng Zhang, Lizhi Wang, Ying Fu, Xiaoming Zhong, and Hua Huang. Computational hyperspectral imaging based on dimension-discriminative low-rank tensor recovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10183–10192, 2019.
- [79] Xuanyu Zhang, Yongbing Zhang, Ruiqin Xiong, Qilin Sun, and Jian Zhang. Herosnet: Hyperspectral explicable reconstruction and optimal sampling deep network for snapshot compressive imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17532–17541, 2022.
- [80] Siming Zheng, Yang Liu, Ziyi Meng, Mu Qiao, Zhishen Tong, Xiaoyu Yang, Shensheng Han, and Xin Yuan. Deep plug-and-play priors for spectral snapshot compressive imaging. Photon. Res., 9(2):B18–B29, Feb 2021.
- [81] Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024.