research-article

Multiscale spatial–spectral transformer network for hyperspectral and multispectral image fusion

Authors:

Sen Jia,

Zhichao Min,

Xiyou FuAuthors Info & Claims

Volume 96, Issue C

Pages 117 - 129

https://doi.org/10.1016/j.inffus.2023.03.011

Published: 01 August 2023 Publication History

Abstract

Fusing hyperspectral images (HSIs) and multispectral images (MSIs) is an economic and feasible way to obtain images with both high spectral resolution and spatial resolution. Due to the limited receptive field of convolution kernels, fusion methods based on convolutional neural networks (CNNs) fail to take advantage of the global relationship in a feature map. In this paper, to exploit the powerful capability of Transformer to extract global information from the whole feature map for fusion, we propose a novel Multiscale Spatial–spectral Transformer Network (MSST-Net). The proposed network is a two-branch network that integrates the self-attention mechanism of the Transformer to extract spectral features from HSI and spatial features from MSI, respectively. Before feature extraction, cross-modality concatenations are performed to achieve cross-modality information interaction between the two branches. Then, we propose a spectral Transformer (SpeT) to extract spectral features and introduce multiscale band/patch embeddings to obtain multiscale features through SpeTs and spatial Transformers (SpaTs). To further improve the network’s performance and generalization, we proposed a self-supervised pre-training strategy, in which a masked bands autoencoder (MBAE) and a masked patches autoencoder (MPAE) are specially designed for self-supervised pre-training of the SpeTs and SpaTs. Extensive experiments on simulated and real datasets illustrate that the proposed network can achieve better performance when compared to other state-of-the-art fusion methods. The code of MSST-Net will be available at http://www.jiasen.tech/papers/ for the sake of reproducibility.

Graphical abstract

Display Omitted

Highlights

•

A multiscale spatial–spectral Transformer network is proposed.

•

Spectral multi-head self-attention is designed to extract spectral features.

•

Multiscale band/patch embeddings are introduced to extract Multiscale features.

•

A self-supervised pre-training strategy is developed.

References

[1]

Zhuang L., Ng M.K., Fu X., Bioucas-Dias J.M., Hy-demosaicing: Hyperspectral blind reconstruction from spectral subsampling, IEEE Trans. Geosci. Remote Sens. 60 (2022) 1–15.

Abstract

Graphical abstract

Highlights

References

Cited By

Recommendations

A method based on hybrid cross-multiscale spectral-spatial transformer network for hyperspectral and multispectral image fusion

Spatial Spectral Joint Correction Network for Hyperspectral and Multispectral Image Fusion

Multispectral and hyperspectral image fusion with spatial-spectral sparse representation

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations