Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

MixFuse: : An iterative mix-attention transformer for multi-modal image fusion

Published: 01 February 2025 Publication History

Abstract

Multi-modal image fusion plays a crucial role in various visual systems. However, existing methods typically involve a multi-stage pipeline, i.e., feature extraction, integration, and reconstruction, which limits the effectiveness and efficiency of feature interaction and aggregation. In this paper, we propose MixFuse, a compact multi-modal image fusion framework based on Transformers. It smoothly unifies the process of feature extraction and integration, As its core, the Mix Attention Transformer Block (MATB) integrates the Cross-Attention Transformer Module (CATM) and the Self-Attention Transformer Module (SATM). The CATM introduces a symmetrical cross-attention mechanism to identify modality-specific and general features, filtering out irrelevant and redundant information. Meanwhile, the SATM is designed to refine the combined features via a self-attention mechanism, enhancing the internal consistency and proper preservation of the features. This successive cross and self-attention modules work together to enhance the generation of more accurate and refined feature maps, which are essential for later reconstruction. Extensive evaluation of MixFuse on five public datasets shows its superior performance and adaptability over state-of-the-art methods. The code and model will be released at https://github.com/Bitlijinfu/MixFuse.

Highlights

Proposing a compact framework for multi-modal image fusion.
Proposing a symmetrical attention Transformer to extract and integrate features.
Achieving superior performance in five typical scenarios.

References

[1]
Ali F.E., El-Dokany I., Saad A., Abd El-Samie F., A curvelet transform approach for the fusion of MR and CT images, Journal of Modern Optics 57 (4) (2010) 273–286,.
[2]
Bhavana V., Krishnappa H., Multi-modality medical image fusion using discrete wavelet transform, in: Proceedings of the 4th international conference on eco-friendly computing and communication systems, 2015, pp. 625–631,.
[3]
Chaki J., Woźniak M., A deep learning based four-fold approach to classify brain MRI: Btscnet, Biomedical Signal Processing and Control 85 (2023),.
[4]
Chen Y., Blum R.S., A new automated quality assessment algorithm for image fusion, Image and Vision Computing 27 (10) (2009) 1421–1432,.
[5]
Cvejic N., Bull D., Canagarajah N., Region-based multimodal image fusion using ICA bases, IEEE Sensors Journal 7 (5) (2007) 743–751,.
[6]
Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., Uszkoreit J., Houlsby N., An image is worth 16x16 words: Transformers for image recognition at scale, in: Proceedings of the international conference on learning representations, ICLR, 2021.
[7]
Du J., Li W., Lu K., Xiao B., An overview of multi-modal medical image fusion, Neurocomputing 215 (2016) 3–20,.
[8]
Du J., Li W., Xiao B., Anatomical-functional image fusion by information of interest in local Laplacian filtering domain, IEEE Transactions on Image Processing 26 (12) (2017) 5855–5866,.
[9]
Fang X., Yang Y., Fu Y., Visible-infrared person re-identification via semantic alignment and affinity inference, in: Proceedings of the IEEE/CVF international conference on computer vision, ICCV, 2023, pp. 11270–11279,.
[10]
Goyal B., Dogra A., Lepcha D.C., Koundal D., Alhudhaif A., Alenezi F., Althubiti S.A., Multi-modality image fusion for medical assistive technology management based on hybrid domain filtering, Expert Systems with Applications 209 (2022),.
[11]
Guo C., Fan D., Jiang Z., Zhang D., MDFN: Mask deep fusion network for visible and infrared image fusion without reference ground-truth, Expert Systems with Applications 211 (2023),.
[12]
Han J., Pauwels E.J., de Zeeuw P., Fast saliency-aware multi-modality image fusion, Neurocomputing 111 (2013) 70–80,.
[13]
Hill P., Al-Mualla M.E., Bull D., Perceptual image fusion using wavelets, IEEE Transactions on Image Processing 26 (3) (2017) 1076–1088,.
[14]
Huang Z., Sun S., Zhao J., Mao L., Multi-modal policy fusion for end-to-end autonomous driving, Information Fusion 98 (2023),.
[15]
Jie Y., Li X., wang M., Zhou F., Tan H., Medical image fusion based on extended difference-of-Gaussians and edge-preserving, Expert Systems with Applications 227 (2023),.
[16]
Kim M., Han D.K., Ko H., Joint patch clustering-based dictionary learning for multimodal image fusion, Information Fusion 27 (2016) 198–214,.
[17]
Li X., Cui G., Dong Y., Graph regularized non-negative low-rank matrix factorization for image clustering, IEEE Transactions on Cybernetics 47 (11) (2017) 3840–3853,.
[18]
Li S., Kang X., Hu J., Image fusion with guided filtering, IEEE Transactions on Image Processing 22 (7) (2013) 2864–2875,.
[19]
Li J., Liu L., Song H., Huang Y., Jiang J., Yang J., DCTNet: A heterogeneous dual-branch multi-cascade network for infrared and visible image fusion, IEEE Transactions on Instrumentation and Measurement 72 (2023) 1–14,.
[20]
Li H., Wu X.-J., DenseFuse: A fusion approach to infrared and visible images, IEEE Transactions on Image Processing 28 (5) (2019) 2614–2623,.
[21]
Li H., Wu X.J., Kittler J., MDLatLRR: A novel decomposition method for infrared and visible image fusion, IEEE Transactions on Image Processing 29 (2020) 4733–4746,.
[22]
Li H., Wu X.-J., Kittler J., RFN-Nest: An end-to-end residual fusion network for infrared and visible images, Information Fusion 73 (2021) 72–86,.
[23]
Li H., Xu T., Wu X.-J., Lu J., Kittler J., LRRNet: A novel representation learning guided fusion network for infrared and visible images, IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (9) (2023) 11040–11052,.
[24]
Liang J., Cao J., Sun G., Zhang K., Van Gool L., Timofte R., SwinIR: Image restoration using swin transformer, in: Proceedings of the IEEE/CVF international conference on computer vision workshops, ICCVW, 2021, pp. 1833–1844,.
[25]
Liu Y., Chen X., Cheng J., Peng H., A medical image fusion method based on convolutional neural networks, in: 2017 20th international conference on information fusion (fusion), 2017, pp. 1–7,.
[26]
Liu Y., Chen X., Peng H., Wang Z., Multi-focus image fusion with a deep convolutional neural network, Information Fusion 36 (2017) 191–207,.
[27]
Liu J., Fan X., Huang Z., Wu G., Liu R., Zhong W., Luo Z., Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, CVPR, 2022, pp. 5802–5811,.
[28]
Liu Z., Lin Y., Cao Y., Hu H., Wei Y., Zhang Z., Lin S., Guo B., Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF international conference on computer vision, ICCV, 2021,.
[29]
Ma J., Tang L., Fan F., Huang J., Mei X., Ma Y., SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer, IEEE-CAA Journal of Automatica Sinica 9 (7) (2022) 1200–1217,.
[30]
Ma J., Xu H., Jiang J., Mei X., Zhang X.-P., DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion, IEEE Transactions on Image Processing 29 (2020) 4980–4995,.
[31]
Ma J., Yu W., Liang P., Li C., Jiang J., FusionGAN: A generative adversarial network for infrared and visible image fusion, Information Fusion 48 (2019) 11–26,.
[32]
Ma J., Zhang H., Shao Z., Liang P., Xu H., GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion, IEEE Transactions on Instrumentation and Measurement 70 (2021) 1–14,.
[33]
Ma J., Zhou Z., Wang B., Zong H., Infrared and visible image fusion based on visual saliency map and weighted least square optimization, Infrared Physics & Technology 82 (2017) 8–17,.
[34]
Nencini F., Garzelli A., Baronti S., Alparone L., Remote sensing image fusion using the curvelet transform, Information Fusion 8 (2) (2007) 143–156,.
[35]
Qu L., Liu S., Wang M., Li S., Yin S., Song Z., Trans2Fuse: Empowering image fusion through self-supervised learning and multi-modal transformations via transformer networks, Expert Systems with Applications 236 (2024),.
[36]
Tang W., He F., Liu Y., Ydtr: infrared and visible image fusion via y-shape dynamic transformer, IEEE Transactions on Multimedia (2022),.
[37]
Tang W., He F., Liu Y., Duan Y., MATR: multimodal medical image fusion via multiscale adaptive transformer, IEEE Transactions on Image Processing 31 (2022) 5134–5149,.
[38]
Tang W., He F., Liu Y., Duan Y., Si T., Datfuse: Infrared and visible image fusion via dual attention transformer, IEEE Transactions on Circuits and Systems for Video Technology 33 (2023) 3159–3172,.
[39]
Wang G., Li W., Du J., Xiao B., Gao X., Medical image fusion and denoising algorithm based on a decomposition model of hybrid variation-sparse representation, IEEE Journal of Biomedical and Health Informatics 26 (11) (2022) 5584–5595,.
[40]
Wang M., Shang X., A fast image fusion with discrete cosine transform, IEEE Signal Processing Letters 27 (2020) 990–994,.
[41]
Wang Z., Shao W., Chen Y., Xu J., Zhang L., A cross-scale iterative attentional adversarial fusion network for infrared and visible images, IEEE Transactions on Circuits and Systems for Video Technology 33 (8) (2023) 3677–3688,.
[42]
Wu Y.-H., Liu Y., Zhan X., Cheng M.-M., P2T: Pyramid pooling transformer for scene understanding, IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (11) (2023) 12760–12771,.
[43]
Xiang T., Yan L., Gao R., A fusion algorithm for infrared and visible images based on adaptive dual-channel unit-linking PCNN in NSCT domain, Infrared Physics & Technology 69 (2015) 53–61,.
[44]
Xu H., Ma J., EMFusion: An unsupervised enhanced medical image fusion network, Information Fusion 76 (2021) 177–186,.
[45]
Xu H., Ma J., Jiang J., Guo X., Ling H., U2Fusion: A unified unsupervised image fusion network, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (1) (2020) 502–518,.
[46]
Yang X., Yang Y., Ma S., Li Z., Dong W., Woźniak M., SAMT-generator: A second-attention for image captioning based on multi-stage transformer network, Neurocomputing 593 (2024),.
[47]
Zhang Y., Liu Y., Sun P., Yan H., Zhao X., Zhang L., IFCNN: A general image fusion framework based on convolutional neural network, Information Fusion 54 (2020) 99–118,.
[48]
Zhang H., Ma J., SDNet: A versatile squeeze-and-decomposition network for real-time image fusion, International Journal of Computer Vision 129 (2021) 2761–2785,.
[49]
Zhao Z., Bai H., Zhang J., Zhang Y., Zhang K., Xu S., Chen D., Timofte R., Van Gool L., Equivariant multi-modality image fusion, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, CVPR, 2024.
[50]
Zheng N., Zhou M., Huang J., Hou J., Li H., Xu Y., Zhao F., Probing synergistic high-order interaction in infrared and visible image fusion, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, CVPR, 2024, pp. 26384–26395.
[51]
Zhu Z., Zheng M., Qi G., Wang D., Xiang Y., A phase congruency and local Laplacian energy based multi-modality medical image fusion method in NSCT domain, IEEE Access 7 (2019) 20811–20824,.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal  Volume 261, Issue C
Feb 2025
1514 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 February 2025

Author Tags

  1. Multi-modal image fusion
  2. Feature extraction
  3. Feature integration
  4. Self-attention transformer
  5. Cross-attention transformer

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media