Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

CrossDiff: Exploring Self-SupervisedRepresentation of Pansharpening via Cross-Predictive Diffusion Model

Published: 20 September 2024 Publication History

Abstract

Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening, which aims to combine abundant spatial details of PAN and spectral information of MS images. Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution. When taking original MS and PAN images as inputs, they always obtain sub-optimal results due to the scale variation. In this paper, we propose to explore the self-supervised representation for pansharpening by designing a cross-predictive diffusion model, named CrossDiff. It has two-stage training. In the first stage, we introduce a cross-predictive pretext task to pre-train the UNet structure based on conditional Denoising Diffusion Probabilistic Model (DDPM). While in the second stage, the encoders of the UNets are frozen to directly extract spatial and spectral features from PAN and MS images, and only the fusion head is trained to adapt for pansharpening task. Extensive experiments show the effectiveness and superiority of the proposed model compared with state-of-the-art supervised and unsupervised methods. Besides, the cross-sensor experiments also verify the generalization ability of proposed self-supervised representation learners for other satellite datasets. Code is available at <uri>https://github.com/codgodtao/CrossDiff</uri>.

References

[1]
M. Gong, T. Zhan, P. Zhang, and Q. Miao, “Superpixel-based difference representation learning for change detection in multispectral remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5, pp. 2658–2673, May 2017.
[2]
M. Vakalopoulou, K. Karantzalos, N. Komodakis, and N. Paragios, “Building detection in very high resolution multispectral data with deep learning features,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), Jul. 2015, pp. 1873–1876.
[3]
M. Bouziani, K. Goita, and D.-C. He, “Rule-based classification of a very high resolution image in an urban environment using multispectral segmentation guided by cartographic data,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 8, pp. 3198–3211, Aug. 2010.
[4]
L. Wang, Z. Xiong, G. Shi, F. Wu, and W. Zeng, “Adaptive nonlocal sparse representation for dual-camera compressive hyperspectral imaging,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 10, pp. 2104–2111, Oct. 2017.
[5]
J. Choi, K. Yu, and Y. Kim, “A new adaptive component-substitution-based satellite image fusion by using partial replacement,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 1, pp. 295–309, Jan. 2011.
[6]
A. Garzelli, F. Nencini, and L. Capobianco, “Optimal MMSE pan sharpening of very high resolution multispectral images,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 1, pp. 228–236, Jan. 2008.
[7]
B. Aiazzi, L. Alparone, S. Baronti, and A. Garzelli, “Context-driven fusion of high spatial and spectral resolution images based on oversampled multiresolution analysis,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 10, pp. 2300–2312, Oct. 2002.
[8]
L. Alparone, L. Wald, J. Chanussot, C. Thomas, P. Gamba, and L. M. Bruce, “Comparison of pansharpening algorithms: Outcome of the 2006 GRS-S data-fusion contest,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 10, pp. 3012–3021, Oct. 2007.
[9]
B. Aiazzi, L. Alparone, S. Baronti, A. Garzelli, and M. Selva, “MTF-tailored multiscale fusion of high-resolution MS and pan imagery,” Photogramm. Eng. Remote Sens., vol. 72, no. 5, pp. 591–596, May 2006.
[10]
F. Palsson, J. R. Sveinsson, and M. O. Ulfarsson, “A new pansharpening algorithm based on total variation,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 1, pp. 318–322, Jan. 2013.
[11]
M. R. Vicinanza, R. Restaino, G. Vivone, M. D. Mura, and J. Chanussot, “A pansharpening method based on the sparse representation of injected details,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 1, pp. 180–184, Jan. 2015.
[12]
P. Liu, L. Xiao, and T. Li, “A variational pan-sharpening method based on spatial fractional-order geometry and spectral–spatial low-rank priors,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 3, pp. 1788–1802, Mar. 2018.
[13]
L.-J. Deng et al., “Machine learning in pansharpening: A benchmark, from shallow to deep networks,” IEEE Geosci. Remote Sens. Mag., vol. 10, no. 3, pp. 279–315, Sep. 2022.
[14]
J. Yang, X. Fu, Y. Hu, Y. Huang, X. Ding, and J. Paisley, “PanNet: A deep network architecture for pan-sharpening,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 5449–5457.
[15]
G. Masi, D. Cozzolino, L. Verdoliva, and G. Scarpa, “Pansharpening by convolutional neural networks,” Remote Sens., vol. 8, no. 7, p. 594, 2016.
[16]
Y. Xing, S. Yang, Z. Feng, and L. Jiao, “Dual-collaborative fusion model for multispectral and panchromatic image fusion,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no.
[17]
Q. Liu, H. Zhou, Q. Xu, X. Liu, and Y. Wang, “PSGAN: A generative adversarial network for remote sensing image pan-sharpening,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 12, pp. 10227–10242, Dec. 2020.
[18]
Q. Xu, Y. Li, J. Nie, Q. Liu, and M. Guo, “UPanGAN: Unsupervised pansharpening based on the spectral and spatial loss constrained generative adversarial network,” Inf. Fusion, vol. 91, pp. 31–46, Mar. 2023.
[19]
J. Ma, W. Yu, C. Chen, P. Liang, X. Guo, and J. Jiang, “Pan-GAN: An unsupervised pan-sharpening method for remote sensing image fusion,” Inf. Fusion, vol. 62, pp. 110–120, Oct. 2020.
[20]
H. Zhou, Q. Liu, and Y. Wang, “PanFormer: A transformer based model for pan-sharpening,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2022, pp. 1–6.
[21]
W. G. C. Bandara and V. M. Patel, “HyperTransformer: A textural and spectral feature fusion transformer for pansharpening,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2022, pp. 1767–1777.
[22]
M. Zhou, J. Huang, Y. Fang, X. Fu, and A. Liu, “Pan-sharpening with customized transformer and invertible neural network,” in Proc. AAAI Conf. Artif. Intell., 2022, vol. 36, no. 3, pp. 3553–3561.
[23]
L. Wald, T. Ranchin, and M. Mangolini, “Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images,” Photogramm. Eng. Remote Sens., vol. 63, no. 6, pp. 691–699, 1997.
[24]
H. Zhang, H. Wang, X. Tian, and J. Ma, “P2Sharpen: A progressive pansharpening network with deep spectral transformation,” Inf. Fusion, vol. 91, pp. 103–122, Mar. 2023.
[25]
M. Ciotola, S. Vitale, A. Mazza, G. Poggi, and G. Scarpa, “Pansharpening by convolutional neural networks in the full resolution framework,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no.
[26]
Y. Xing, L. Qu, S. Zhang, J. Feng, X. Zhang, and Y. Zhang, “Empower generalizability for pansharpening through text-modulated diffusion model,” IEEE Trans. Geosci. Remote Sens., vol. 62, 2024, Art. no.
[27]
S. Luo, S. Zhou, Y. Feng, and J. Xie, “Pansharpening via unsupervised convolutional neural networks,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 13, pp. 4295–4310, 2020.
[28]
A. Van Den Oord, “Neural discrete representation learning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 6309–6318.
[29]
J. Donahue and K. Simonyan, “Large scale adversarial representation learning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 10542–10552.
[30]
P. Dhariwal and A. Nichol, “Diffusion models beat GANs on image synthesis,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 34, 2021, pp. 8780–8794.
[31]
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 6840–6851.
[32]
D. Baranchuk, A. Voynov, I. Rubachev, V. Khrulkov, and A. Babenko, “Label-efficient semantic segmentation with diffusion models,” in Proc. Int. Conf. Learn. Represent., 2022, pp. 6569–6574. [Online]. Available: https://openreview.net/forum?id=SlxSY2UZQT
[33]
X. Chen, Z. Liu, S. Xie, and K. He, “Deconstructing denoising diffusion models for self-supervised learning,” 2024, arXiv:2401.14404.
[34]
G. Larsson, M. Maire, and G. Shakhnarovich, “Colorization as a proxy task for visual understanding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 6874–6883.
[35]
Z. Wang, X. Li, H. Duan, and X. Zhang, “A self-supervised residual feature learning model for multifocus image fusion,” IEEE Trans. Image Process., vol. 31, pp. 4527–4542, 2022.
[36]
O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. Med. Image Comput. Comput.-Assist. Intervent., Munich, Germany. London, U.K.: Google DeepMind, Oct. 2015, pp. 234–241.
[37]
C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, Feb. 2015.
[38]
H. Zhang and J. Ma, “GTP-PNet: A residual learning network based on gradient transformation prior for pansharpening,” ISPRS J. Photogramm. Remote Sens., vol. 172, pp. 223–239, Feb. 2021.
[39]
J. Hou, Q. Cao, R. Ran, C. Liu, J. Li, and L.-J. Deng, “Bidomain modeling paradigm for pansharpening,” in Proc. 31st ACM Int. Conf. Multimedia, Oct. 2023, pp. 347–357.
[40]
S. Peng, C. Guo, X. Wu, and L.-J. Deng, “U2Net: A general framework with spatial–spectral-integrated double U-Net for image fusion,” in Proc. 31st ACM Int. Conf. Multimedia, Oct. 2023, pp. 3219–3227.
[41]
X. Meng, N. Wang, F. Shao, and S. Li, “Vision transformer for pansharpening,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no.
[42]
S. Li, Q. Guo, and A. Li, “Pan-sharpening based on CNN+ pyramid transformer by using no-reference loss,” Remote Sens., vol. 14, no. 3, p. 624, Jan. 2022.
[43]
S. Vitale and G. Scarpa, “A detail-preserving cross-scale learning strategy for CNN-based pansharpening,” Remote Sens., vol. 12, no. 3, p. 348, Jan. 2020.
[44]
Y. Qu, R. K. Baghbaderani, H. Qi, and C. Kwan, “Unsupervised pansharpening based on self-attention mechanism,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 4, pp. 3192–3208, Apr. 2020.
[45]
Q. Liu, X. Meng, F. Shao, and S. Li, “Supervised-unsupervised combined deep convolutional neural networks for high-fidelity pansharpening,” Inf. Fusion, vol. 89, pp. 292–304, Jan. 2023.
[46]
J. Bruna, P. Sprechmann, and Y. LeCun, “Super-resolution with deep convolutional sufficient statistics,” 2015, arXiv:1511.05666.
[47]
Y. Xing, S. Yang, Y. Zhang, and Y. Zhang, “Learning spectral cues for multispectral and panchromatic image fusion,” IEEE Trans. Image Process., vol. 31, pp. 6964–6975, 2022.
[48]
C. Ledig et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4681–4690.
[49]
A. Gastineau, J.-F. Aujol, Y. Berthoumieu, and C. Germain, “Generative adversarial network for pansharpening with spectral and spatial discriminators,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no.
[50]
C. Saharia, J. Ho, W. Chan, T. Salimans, D. J. Fleet, and M. Norouzi, “Image super-resolution via iterative refinement,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 4, pp. 4713–4726, Apr. 2023.
[51]
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2022, pp. 10684–10695.
[52]
C. Saharia et al., “Palette: Image-to-image diffusion models,” in Proc. ACM SIGGRAPH Conf., 2022, pp. 1–10.
[53]
Q. Meng, W. Shi, S. Li, and L. Zhang, “PanDiff: A novel pansharpening method based on denoising diffusion probabilistic model,” IEEE Trans. Geosci. Remote Sens., vol. 61, 2023, Art. no.
[54]
Z. Cao, S. Cao, L.-J. Deng, X. Wu, J. Hou, and G. Vivone, “Diffusion model with disentangled modulations for sharpening multispectral and hyperspectral images,” Inf. Fusion, vol. 104, Apr. 2024, Art. no.
[55]
S. Li, S. Li, and L. Zhang, “Hyperspectral and panchromatic images fusion based on the dual conditional diffusion models,” IEEE Trans. Geosci. Remote Sens., vol. 61, 2023, Art. no.
[56]
W. Diao, F. Zhang, J. Sun, Y. Xing, K. Zhang, and L. Bruzzone, “ZeRGAN: Zero-reference GAN for fusion of multispectral and panchromatic images,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 11, pp. 1–15, Nov. 2022.
[57]
X. Rui, X. Cao, L. Pang, Z. Zhu, Z. Yue, and D. Meng, “Unsupervised hyperspectral pansharpening via low-rank diffusion model,” Inf. Fusion, vol. 107, Jul. 2024, Art. no.
[58]
W. G. C. Bandara, N. G. Nair, and V. M. Patel, “DDPM-CD: Denoising diffusion probabilistic models as feature extractors for change detection,” 2022, arXiv:2206.11892.
[59]
K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9729–9738.
[60]
A. Ziegler and Y. M. Asano, “Self-supervised learning of object parts for semantic segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 14502–14511.
[61]
R. Zhang, P. Isola, and A. A. Efros, “Split-brain autoencoders: Unsupervised learning by cross-channel prediction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1058–1067.
[62]
Z. Wang, J. Wang, Z. Liu, and Q. Qiu, “Energy-inspired self-supervised pretraining for vision models,” in Proc. Int. Conf. Learn. Represent., 2023, pp. 10873–10883.
[63]
M. Chen et al., “Generative pretraining from pixels,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 1691–1703.
[64]
W. Xiang, H. Yang, D. Huang, and Y. Wang, “Denoising diffusion autoencoders are unified self-supervised learners,” 2023, arXiv:2303.09769.
[65]
A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in Proc. 38th Int. Conf. Mach. Learn., Jul. 2021, pp. 8162–8171.
[66]
C. Saharia et al., “Photorealistic text-to-image diffusion models with deep language understanding,” 2022, arXiv:2205.11487.
[67]
A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with CLIP latents,” 2022, arXiv:2204.06125.
[68]
A. Jolicoeur-Martineau, R. Piché-Taillefer, I. Mitliagkas, and R. T. D. Combes, “Adversarial score matching and improved sampling for image generation,” in Proc. Int. Conf. Learn. Represent., 2021, pp. 3155–3164.
[69]
A. G. Roy, N. Navab, and C. Wachinger, “Concurrent spatial and channel `squeeze & excitation' in fully convolutional networks,” in Proc. 21st Int. Conf. Med. Image Comput. Comput. Assist. Intervent. (MICCAI), Granada, Spain. Munich, Germany: Technische Universität München, Sep. 2018, pp. 421–429.
[70]
W. Zhou, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, pp. 600–612, 2004.
[71]
L. Alparone, B. Aiazzi, S. Baronti, A. Garzelli, F. Nencini, and M. Selva, “Multispectral and panchromatic data fusion assessment without reference,” Photogramm. Eng. Remote Sens., vol. 74, no. 2, pp. 193–200, Feb. 2008.
[72]
R. Restaino, M. Dalla Mura, G. Vivone, and J. Chanussot, “Context-adaptive pansharpening based on image segmentation,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 2, pp. 753–766, Feb. 2017.
[73]
G. Vivone, R. Restaino, and J. Chanussot, “A regression-based high-pass modulation pansharpening approach,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 2, pp. 984–996, Feb. 2018.
[74]
G. Vivone, R. Restaino, and J. Chanussot, “Full scale regression-based injection coefficients for panchromatic sharpening,” IEEE Trans. Image Process., vol. 27, no. 7, pp. 3418–3431, Jul. 2018.
[75]
M. Ciotola, G. Poggi, and G. Scarpa, “Unsupervised deep learning-based pansharpening with jointly enhanced spectral and spatial fidelity,” IEEE Trans. Geosci. Remote Sens., vol. 61, 2023, Art. no.
[76]
H. Zhou, Q. Liu, D. Weng, and Y. Wang, “Unsupervised cycle-consistent generative adversarial networks for pan sharpening,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no.
[77]
J. Ni et al., “LDP-Net: An unsupervised pansharpening network based on learnable degradation processes,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 5468–5479, 2022.
[78]
T. Uezato, D. Hong, N. Yokoya, and W. He, “Guided deep decoder: Unsupervised image pair fusion,” in Proc. Eur. Conf. Comput. Vis. (ECCV). Sydney, NSW, Australia: Australian Centre for Field Robotics, The University of Sydney, 2020, pp. 87–102.
[79]
G. Vivone et al., “A new benchmark based on recent advances in multispectral pansharpening: Revisiting pansharpening with classical and emerging pansharpening methods,” IEEE Geosci. Remote Sens. Mag., vol. 9, no. 1, pp. 53–81, Mar. 2021.
[80]
R. H. Yuhas, A. F. Goetz, and J. W. Boardman, “Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm,” in Proc. JPL, Summaries 3rd Annu. JPL Airborne Geosci. Workshop, AVIRIS Workshop, 1992, pp. 147–149.
[81]
L. Wald, Data Fusion: Definitions and Architectures: Fusion of Images of Different Spatial Resolutions. Chengdu, China: University of Electronic Science and Technology of China, 2002.
[82]
A. Garzelli and F. Nencini, “Hypercomplex quality assessment of multi/hyperspectral images,” IEEE Geosci. Remote Sens. Lett., vol. 6, no. 4, pp. 662–665, Oct. 2009.
[83]
G. Vivone et al., “A critical comparison among pansharpening algorithms,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 5, pp. 2565–2586, May 2014.
[84]
Y. Zhang, C. Liu, M. Sun, and Y. Ou, “Pan-sharpening using an efficient bidirectional pyramid network,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 8, pp. 5549–5563, Aug. 2019.
[85]
Z. Zhao et al., “FGF-GAN: A lightweight generative adversarial network for pansharpening via fast guided filter,” in Proc. IEEE Int. Conf. Multimedia Expo. (ICME), Jul. 2021, pp. 1–6.
[86]
D. Lei, H. Chen, L. Zhang, and W. Li, “NLRNet: An efficient nonlocal attention ResNet for pansharpening,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no.
[87]
W. Diao, F. Zhang, H. Wang, J. Sun, and K. Zhang, “Pansharpening via triplet attention network with information interaction,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 3576–3588, 2022.
[88]
S. Lolli, L. Alparone, A. Garzelli, and G. Vivone, “Haze correction for contrast-based multispectral pansharpening,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 12, pp. 2255–2259, Dec. 2017.

Index Terms

  1. CrossDiff: Exploring Self-SupervisedRepresentation of Pansharpening via Cross-Predictive Diffusion Model
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image IEEE Transactions on Image Processing
            IEEE Transactions on Image Processing  Volume 33, Issue
            2024
            6889 pages

            Publisher

            IEEE Press

            Publication History

            Published: 20 September 2024

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 14 Jan 2025

            Other Metrics

            Citations

            View Options

            View options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media