Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multiscale spatial–spectral transformer network for hyperspectral and multispectral image fusion

Published: 01 August 2023 Publication History

Abstract

Fusing hyperspectral images (HSIs) and multispectral images (MSIs) is an economic and feasible way to obtain images with both high spectral resolution and spatial resolution. Due to the limited receptive field of convolution kernels, fusion methods based on convolutional neural networks (CNNs) fail to take advantage of the global relationship in a feature map. In this paper, to exploit the powerful capability of Transformer to extract global information from the whole feature map for fusion, we propose a novel Multiscale Spatial–spectral Transformer Network (MSST-Net). The proposed network is a two-branch network that integrates the self-attention mechanism of the Transformer to extract spectral features from HSI and spatial features from MSI, respectively. Before feature extraction, cross-modality concatenations are performed to achieve cross-modality information interaction between the two branches. Then, we propose a spectral Transformer (SpeT) to extract spectral features and introduce multiscale band/patch embeddings to obtain multiscale features through SpeTs and spatial Transformers (SpaTs). To further improve the network’s performance and generalization, we proposed a self-supervised pre-training strategy, in which a masked bands autoencoder (MBAE) and a masked patches autoencoder (MPAE) are specially designed for self-supervised pre-training of the SpeTs and SpaTs. Extensive experiments on simulated and real datasets illustrate that the proposed network can achieve better performance when compared to other state-of-the-art fusion methods. The code of MSST-Net will be available at http://www.jiasen.tech/papers/ for the sake of reproducibility.

Graphical abstract

Display Omitted

Highlights

A multiscale spatial–spectral Transformer network is proposed.
Spectral multi-head self-attention is designed to extract spectral features.
Multiscale band/patch embeddings are introduced to extract Multiscale features.
A self-supervised pre-training strategy is developed.

References

[1]
Zhuang L., Ng M.K., Fu X., Bioucas-Dias J.M., Hy-demosaicing: Hyperspectral blind reconstruction from spectral subsampling, IEEE Trans. Geosci. Remote Sens. 60 (2022) 1–15.
[2]
Bian J., Li A., Zhang Z., Zhao W., Lei G., Yin G., Jin H., Tan J., Huang C., Monitoring fractional green vegetation cover dynamics over a seasonally inundated alpine wetland using dense time series HJ-1A/B constellation images and an adaptive endmember selection LSMM model, Remote Sens. Environ. 197 (2017) 98–114.
[3]
Jia S., Shen L., Zhu J., Li Q., A 3-D gabor phase-based coding and matching framework for hyperspectral imagery classification, IEEE Trans. Cybern. 48 (4) (2018) 1176–1188.
[4]
Zhao J., Zhong Y., Hu X., Wei L., Zhang L., A robust spectral-spatial approach to identifying heterogeneous crops using remote sensing imagery with high spectral and spatial resolutions, Remote Sens. Environ. 239 (2020).
[5]
Fu X., Jia S., Zhuang L., Xu M., Zhou J., Li Q., Hyperspectral anomaly detection via deep plug-and-play denoising CNN regularization, IEEE Trans. Geosci. Remote Sens. 59 (11) (2021) 9553–9568.
[6]
Zhuang L., Fu X., Ng M.K., Bioucas-Dias J.M., Hyperspectral image denoising based on global and nonlocal low-rank factorizations, IEEE Trans. Geosci. Remote Sens. 59 (12) (2021) 10438–10454.
[7]
Ghassemian H., A review of remote sensing image fusion methods, Inf. Fusion 32 (2016) 75–89.
[8]
Wei Q., Dobigeon N., Tourneret J.-Y., Fast fusion of multi-band images based on solving a sylvester equation, IEEE Trans. Image Process. 24 (11) (2015) 4109–4121.
[9]
Dian R., Li S., Fang L., Wei Q., Multispectral and hyperspectral image fusion with spatial-spectral sparse representation, Inf. Fusion 49 (2019) 262–270.
[10]
Fu X., Jia S., Xu M., Zhou J., Li Q., Fusion of hyperspectral and multispectral images accounting for localized inter-image changes, IEEE Trans. Geosci. Remote Sens. 60 (2021) 1–18.
[11]
Li S., Dian R., Fang L., Bioucas-Dias J.M., Fusing hyperspectral and multispectral images via coupled sparse tensor factorization, IEEE Trans. Image Process. 27 (8) (2018) 4118–4130.
[12]
R. Dian, L. Fang, S. Li, Hyperspectral image super-resolution via non-local sparse tensor factorization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5344–5353.
[13]
Shen D., Liu J., Xiao Z., Yang J., Xiao L., A twice optimizing net with matrix decomposition for hyperspectral and multispectral image fusion, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 13 (2020) 4095–4110.
[14]
Zhang X., Huang W., Wang Q., Li X., SSR-NET: Spatial–spectral reconstruction network for hyperspectral and multispectral image fusion, IEEE Trans. Geosci. Remote Sens. 59 (7) (2020) 5953–5965.
[15]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I., Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).
[16]
Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., et al., An image is worth 16 × 16 words: Transformers for image recognition at scale, 2020, arXiv preprint arXiv:2010.11929.
[17]
He K., Chen X., Xie S., Li Y., Dollár P., Girshick R., Masked autoencoders are scalable vision learners, 2021, arXiv preprint arXiv:2111.06377.
[18]
Dian R., Li S., Sun B., Guo A., Recent advances and new guidelines on hyperspectral and multispectral image fusion, Inf. Fusion 69 (2021) 40–51.
[19]
R. Kawakami, Y. Matsushita, J. Wright, M. Ben-Ezra, Y.-W. Tai, K. Ikeuchi, High-resolution hyperspectral imaging via matrix factorization, in: CVPR 2011, 2011, pp. 2329–2336.
[20]
N. Akhtar, F. Shafait, A. Mian, Bayesian sparse representation for hyperspectral image super resolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015, pp. 3631–3640.
[21]
Simões M., Bioucas-Dias J., Almeida L.B., Chanussot J., A convex formulation for hyperspectral image superresolution via subspace-based regularization, IEEE Trans. Geosci. Remote Sens. 53 (6) (2015) 3373–3388.
[22]
Dian R., Li S., Kang X., Regularizing hyperspectral and multispectral image fusion by CNN denoiser, IEEE Trans. Neural Netw. Learn. Syst. 32 (3) (2021) 1124–1135.
[23]
Dian R., Li S., Hyperspectral image super-resolution via subspace-based low tensor multi-rank regularization, IEEE Trans. Image Process. 28 (10) (2019) 5135–5146.
[24]
Yokoya N., Yairi T., Iwasaki A., Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion, IEEE Trans. Geosci. Remote Sens. 50 (2) (2011) 528–537.
[25]
C. Lanaras, E. Baltsavias, K. Schindler, Hyperspectral Super-Resolution by Coupled Spectral Unmixing, in: 2015 IEEE International Conference on Computer Vision, ICCV, 2015, pp. 3586–3594.
[26]
Dian R., Li S., Fang L., Lu T., Bioucas-Dias J.M., Nonlocal sparse tensor factorization for semiblind hyperspectral and multispectral image fusion, IEEE Trans. Cybern. 50 (10) (2020) 4469–4480.
[27]
Prévost C., Usevich K., Comon P., Brie D., Hyperspectral super-resolution with coupled tucker approximation: Recoverability and SVD-based algorithms, IEEE Trans. Signal Process. 68 (2020) 931–946.
[28]
Kanatsoulis C.I., Fu X., Sidiropoulos N.D., Ma W.-K., Hyperspectral super-resolution: A coupled tensor factorization approach, IEEE Trans. Signal Process. 66 (24) (2018) 6503–6517.
[29]
Xu Y., Wu Z., Chanussot J., Comon P., Wei Z., Nonlocal coupled tensor CP decomposition for hyperspectral and multispectral image fusion, IEEE Trans. Geosci. Remote Sens. 58 (1) (2020) 348–362.
[30]
Dian R., Li S., Fang L., Learning a low tensor-train rank representation for hyperspectral image super-resolution, IEEE Trans. Neural Netw. Learn. Syst. 30 (9) (2019) 2672–2683.
[31]
He W., Chen Y., Yokoya N., Li C., Zhao Q., Hyperspectral super-resolution via coupled tensor ring factorization, Pattern Recognit. 122 (2022).
[32]
Q. Xie, M. Zhou, Q. Zhao, D. Meng, W. Zuo, Z. Xu, Multispectral and hyperspectral image fusion by MS/HS fusion net, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1585–1594.
[33]
Dian R., Li S., Guo A., Fang L., Deep hyperspectral image sharpening, IEEE Trans. Neural Netw. Learn. Syst. 29 (11) (2018) 5345–5355.
[34]
Palsson F., Sveinsson J.R., Ulfarsson M.O., Multispectral and hyperspectral image fusion using a 3-D-convolutional neural network, IEEE Geosci. Remote Sens. Lett. 14 (5) (2017) 639–643.
[35]
Wold S., Esbensen K., Geladi P., Principal component analysis, Chemometr. Intell. Lab. Syst. 2 (1–3) (1987) 37–52.
[36]
Zheng Y., Li J., Li Y., Guo J., Wu X., Shi Y., Chanussot J., Edge-conditioned feature transform network for hyperspectral and multispectral image fusion, IEEE Trans. Geosci. Remote Sens. 60 (2022) 1–15.
[37]
Y. Qu, H. Qi, C. Kwan, Unsupervised sparse dirichlet-net for hyperspectral image super-resolution, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2511–2520.
[38]
L. Zhang, J. Nie, W. Wei, Y. Zhang, S. Liao, L. Shao, Unsupervised adaptation learning for hyperspectral imagery super-resolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3073–3082.
[39]
Liu J., Wu Z., Xiao L., Wu X.-J., Model inspired autoencoder for unsupervised hyperspectral image super-resolution, IEEE Trans. Geosci. Remote Sens. 60 (2022) 1–12.
[40]
Aiazzi B., Baronti S., Selva M., Improving component substitution pansharpening through multivariate regression of MS ++Pan data, IEEE Trans. Geosci. Remote Sens. 45 (10) (2007) 3230–3239.
[41]
Liu J.G., Smoothing filter-based Intensity Modulation: A spectral preserve image fusion technique for improving spatial details, Int. J. Remote Sens. 21 (18) (2000) 3461–3472.
[42]
Zhu X.X., Bamler R., A sparse image fusion algorithm with application to pan-sharpening, IEEE Trans. Geosci. Remote Sens. 51 (5) (2013) 2827–2836.
[43]
Picone D., Restaino R., Vivone G., Addesso P., Dalla Mura M., Chanussot J., Band assignment approaches for hyperspectral sharpening, IEEE Geosci. Remote Sens. Lett. 14 (5) (2017) 739–743.
[44]
Selva M., Aiazzi B., Butera F., Chiarantini L., Baronti S., Hyper-sharpening: A first approach on SIM-GA data, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8 (6) (2015) 3008–3024.
[45]
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in: European Conference on Computer Vision, 2020, pp. 213–229.
[46]
S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P.H. Torr, et al., Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6881–6890.
[47]
D.A. Hudson, L. Zitnick, Generative adversarial transformers, in: International Conference on Machine Learning, 2021, pp. 4487–4499.
[48]
Hong D., Han Z., Yao J., Gao L., Zhang B., Plaza A., Chanussot J., SpectralFormer: Rethinking hyperspectral image classification with transformers, IEEE Trans. Geosci. Remote Sens. 60 (2021) 1–15.
[49]
He X., Chen Y., Lin Z., Spatial-spectral transformer for hyperspectral image classification, Remote Sens. 13 (3) (2021) 498.
[50]
Selen A., Esra T.-G., SpectralSWIN: a spectral-swin transformer network for hyperspectral image classification, Int. J. Remote Sens. 43 (11) (2022) 4025–4044.
[51]
Y. Cai, J. Lin, X. Hu, H. Wang, X. Yuan, Y. Zhang, R. Timofte, L. Van Gool, Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17502–17511.
[52]
W.G.C. Bandara, V.M. Patel, HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1767–1777.
[53]
Wang L., Wu Z., Zhong Y., Yuan X., Snapshot spectral compressive imaging reconstruction using convolution and contextual transformer, Photon. Res. 10 (8) (2022) 1848–1858.
[54]
Hu J.-F., Huang T.-Z., Deng L.-J., Dou H.-X., Hong D., Vivone G., Fusformer: A transformer-based fusion network for hyperspectral image super-resolution, IEEE Geosci. Remote Sens. Lett. 19 (2022) 1–5.
[55]
W. Shi, J. Caballero, F. Huszár, J. Totz, A.P. Aitken, R. Bishop, D. Rueckert, Z. Wang, Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 1874–1883.
[56]
M. Tan, R. Pang, Q.V. Le, Efficientdet: Scalable and efficient object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10781–10790.
[57]
X. Wu, T.-Z. Huang, L.-J. Deng, T.-J. Zhang, Dynamic Cross Feature Fusion for Remote Sensing Pansharpening, in: 2021 IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 14667–14676.
[58]
Devlin J., Chang M.-W., Lee K., Toutanova K., Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv preprint arXiv:1810.04805.
[59]
Touvron H., Vedaldi A., Douze M., Jégou H., Fixing the train-test resolution discrepancy, Adv. Neural Inf. Process. Syst. 32 (2019).
[60]
A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly, N. Houlsby, Big transfer (bit): General visual representation learning, in: European Conference on Computer Vision, 2020, pp. 491–507.
[61]
Yasuma F., Mitsunaga T., Iso D., Nayar S.K., Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum, IEEE Trans. Image Process. 19 (9) (2010) 2241–2253.
[62]
A. Chakrabarti, T. Zickler, Statistics of Real-World Hyperspectral Images, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2011, pp. 193–200.
[63]
Yokoya N., Grohnfeldt C., Chanussot J., Hyperspectral and multispectral data fusion: A comparative review of the recent literature, IEEE Geosci. Remote Sens. M 5 (2) (2017) 29–56.
[64]
Ren K., Sun W., Meng X., Yang G., Peng J., Huang J., A locally optimized model for hyperspectral and multispectral images fusion, IEEE Trans. Geosci. Remote Sens. 60 (2022) 1–15.
[65]
Wald L., Ranchin T., Mangolini M., Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images, Photogramm. Eng. Remote Sens. 63 (6) (1997) 691–699.
[66]
W. Wang, W. Zeng, Y. Huang, X. Ding, J. Paisley, Deep blind hyperspectral image fusion, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4150–4159.
[67]
Huang T., Dong W., Wu J., Li L., Li X., Shi G., Deep hyperspectral image fusion network with iterative spatio-spectral regularization, IEEE Trans. Comput. Imaging 8 (2022) 201–214.
[68]
Vivone G., Multispectral and hyperspectral image fusion in remote sensing: A survey, Inf. Fusion 89 (2023) 405–417.
[69]
Alparone L., Aiazzi B., Baronti S., Garzelli A., Nencini F., Selva M., Multispectral and panchromatic data fusion assessment without reference, Photogramm. Eng. Remote Sens. 74 (2) (2008) 193–200.
[70]
Aiazzi B., Alparone L., Baronti S., Carlà R., Garzelli A., Santurri L., Full-scale assessment of pansharpening methods and data products, in: Bruzzone L. (Ed.), Image and Signal Processing for Remote Sensing XX, Vol. 9244, 9244, International Society for Optics and Photonics, SPIE, 2014.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Fusion
Information Fusion  Volume 96, Issue C
Aug 2023
329 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 August 2023

Author Tags

  1. Hyperspectral image (HSI)
  2. Multispectral image (MSI)
  3. Transformer
  4. Pre-training
  5. Spectral multi-head self-attention
  6. Image fusion

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)STSNetInformation Fusion10.1016/j.inffus.2024.102689114:COnline publication date: 1-Feb-2025
  • (2025)A method based on hybrid cross-multiscale spectral-spatial transformer network for hyperspectral and multispectral image fusionExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125742263:COnline publication date: 5-Mar-2025
  • (2025)Low-Rank Transformer for High-Resolution Hyperspectral Computational ImagingInternational Journal of Computer Vision10.1007/s11263-024-02203-7133:2(809-824)Online publication date: 1-Feb-2025
  • (2024)Linearly-evolved Transformer for Pan-sharpeningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680979(1486-1494)Online publication date: 28-Oct-2024
  • (2024)ViTMatteInformation Fusion10.1016/j.inffus.2023.102091103:COnline publication date: 1-Mar-2024
  • (2024)Identification of chrysanthemum using hyperspectral imaging based on few-shot class incremental learningComputers and Electronics in Agriculture10.1016/j.compag.2023.108371215:COnline publication date: 27-Feb-2024
  • (2024)A Novel Multi-scale Feature Fusion Based Network for Hyperspectral and Multispectral Image FusionPattern Recognition and Computer Vision10.1007/978-981-97-8493-6_37(530-544)Online publication date: 18-Oct-2024
  • (2024)Spectral Modality-Aware Interactive Fusion Network for HSI Super-ResolutionComputer Vision – ACCV 202410.1007/978-981-96-0911-6_18(301-317)Online publication date: 8-Dec-2024

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media