research-article

MixFuse: : An iterative mix-attention transformer for multi-modal image fusion

Authors:

Jian YangAuthors Info & Claims

Volume 261, Issue C

https://doi.org/10.1016/j.eswa.2024.125427

Published: 01 February 2025 Publication History

Abstract

Multi-modal image fusion plays a crucial role in various visual systems. However, existing methods typically involve a multi-stage pipeline, i.e., feature extraction, integration, and reconstruction, which limits the effectiveness and efficiency of feature interaction and aggregation. In this paper, we propose MixFuse, a compact multi-modal image fusion framework based on Transformers. It smoothly unifies the process of feature extraction and integration, As its core, the Mix Attention Transformer Block (MATB) integrates the Cross-Attention Transformer Module (CATM) and the Self-Attention Transformer Module (SATM). The CATM introduces a symmetrical cross-attention mechanism to identify modality-specific and general features, filtering out irrelevant and redundant information. Meanwhile, the SATM is designed to refine the combined features via a self-attention mechanism, enhancing the internal consistency and proper preservation of the features. This successive cross and self-attention modules work together to enhance the generation of more accurate and refined feature maps, which are essential for later reconstruction. Extensive evaluation of MixFuse on five public datasets shows its superior performance and adaptability over state-of-the-art methods. The code and model will be released at https://github.com/Bitlijinfu/MixFuse.

Highlights

•

Proposing a compact framework for multi-modal image fusion.

•

Proposing a symmetrical attention Transformer to extract and integrate features.

•

Achieving superior performance in five typical scenarios.

References

[1]

Ali F.E., El-Dokany I., Saad A., Abd El-Samie F., A curvelet transform approach for the fusion of MR and CT images, Journal of Modern Optics 57 (4) (2010) 273–286,.

[2]

Bhavana V., Krishnappa H., Multi-modality medical image fusion using discrete wavelet transform, in: Proceedings of the 4th international conference on eco-friendly computing and communication systems, 2015, pp. 625–631,.

[3]

Chaki J., Woźniak M., A deep learning based four-fold approach to classify brain MRI: Btscnet, Biomedical Signal Processing and Control 85 (2023),.

[4]

Chen Y., Blum R.S., A new automated quality assessment algorithm for image fusion, Image and Vision Computing 27 (10) (2009) 1421–1432,.

Digital Library

[5]

Cvejic N., Bull D., Canagarajah N., Region-based multimodal image fusion using ICA bases, IEEE Sensors Journal 7 (5) (2007) 743–751,.

[6]

Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., Uszkoreit J., Houlsby N., An image is worth 16x16 words: Transformers for image recognition at scale, in: Proceedings of the international conference on learning representations, ICLR, 2021.

[7]

Du J., Li W., Lu K., Xiao B., An overview of multi-modal medical image fusion, Neurocomputing 215 (2016) 3–20,.

Digital Library

[8]

Du J., Li W., Xiao B., Anatomical-functional image fusion by information of interest in local Laplacian filtering domain, IEEE Transactions on Image Processing 26 (12) (2017) 5855–5866,.

Digital Library

[9]

Fang X., Yang Y., Fu Y., Visible-infrared person re-identification via semantic alignment and affinity inference, in: Proceedings of the IEEE/CVF international conference on computer vision, ICCV, 2023, pp. 11270–11279,.

[10]

Goyal B., Dogra A., Lepcha D.C., Koundal D., Alhudhaif A., Alenezi F., Althubiti S.A., Multi-modality image fusion for medical assistive technology management based on hybrid domain filtering, Expert Systems with Applications 209 (2022),.

Digital Library

[11]

Guo C., Fan D., Jiang Z., Zhang D., MDFN: Mask deep fusion network for visible and infrared image fusion without reference ground-truth, Expert Systems with Applications 211 (2023),.

Digital Library

[12]

Han J., Pauwels E.J., de Zeeuw P., Fast saliency-aware multi-modality image fusion, Neurocomputing 111 (2013) 70–80,.

Digital Library

[13]

Hill P., Al-Mualla M.E., Bull D., Perceptual image fusion using wavelets, IEEE Transactions on Image Processing 26 (3) (2017) 1076–1088,.

Digital Library

[14]

Huang Z., Sun S., Zhao J., Mao L., Multi-modal policy fusion for end-to-end autonomous driving, Information Fusion 98 (2023),.

Digital Library

[15]

Jie Y., Li X., wang M., Zhou F., Tan H., Medical image fusion based on extended difference-of-Gaussians and edge-preserving, Expert Systems with Applications 227 (2023),.

Digital Library

[16]

Kim M., Han D.K., Ko H., Joint patch clustering-based dictionary learning for multimodal image fusion, Information Fusion 27 (2016) 198–214,.

Digital Library

[17]

Li X., Cui G., Dong Y., Graph regularized non-negative low-rank matrix factorization for image clustering, IEEE Transactions on Cybernetics 47 (11) (2017) 3840–3853,.

[18]

Li S., Kang X., Hu J., Image fusion with guided filtering, IEEE Transactions on Image Processing 22 (7) (2013) 2864–2875,.

[19]

Li J., Liu L., Song H., Huang Y., Jiang J., Yang J., DCTNet: A heterogeneous dual-branch multi-cascade network for infrared and visible image fusion, IEEE Transactions on Instrumentation and Measurement 72 (2023) 1–14,.

[20]

Li H., Wu X.-J., DenseFuse: A fusion approach to infrared and visible images, IEEE Transactions on Image Processing 28 (5) (2019) 2614–2623,.

Digital Library

[21]

Li H., Wu X.J., Kittler J., MDLatLRR: A novel decomposition method for infrared and visible image fusion, IEEE Transactions on Image Processing 29 (2020) 4733–4746,.

Digital Library

[22]

Li H., Wu X.-J., Kittler J., RFN-Nest: An end-to-end residual fusion network for infrared and visible images, Information Fusion 73 (2021) 72–86,.

Digital Library

[23]

Li H., Xu T., Wu X.-J., Lu J., Kittler J., LRRNet: A novel representation learning guided fusion network for infrared and visible images, IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (9) (2023) 11040–11052,.

Digital Library

[24]

Liang J., Cao J., Sun G., Zhang K., Van Gool L., Timofte R., SwinIR: Image restoration using swin transformer, in: Proceedings of the IEEE/CVF international conference on computer vision workshops, ICCVW, 2021, pp. 1833–1844,.

[25]

Liu Y., Chen X., Cheng J., Peng H., A medical image fusion method based on convolutional neural networks, in: 2017 20th international conference on information fusion (fusion), 2017, pp. 1–7,.

[26]

Liu Y., Chen X., Peng H., Wang Z., Multi-focus image fusion with a deep convolutional neural network, Information Fusion 36 (2017) 191–207,.

Digital Library

[27]

Liu J., Fan X., Huang Z., Wu G., Liu R., Zhong W., Luo Z., Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, CVPR, 2022, pp. 5802–5811,.

[28]

Liu Z., Lin Y., Cao Y., Hu H., Wei Y., Zhang Z., Lin S., Guo B., Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF international conference on computer vision, ICCV, 2021,.

[29]

Ma J., Tang L., Fan F., Huang J., Mei X., Ma Y., SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer, IEEE-CAA Journal of Automatica Sinica 9 (7) (2022) 1200–1217,.

[30]

Ma J., Xu H., Jiang J., Mei X., Zhang X.-P., DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion, IEEE Transactions on Image Processing 29 (2020) 4980–4995,.

Digital Library

[31]

Ma J., Yu W., Liang P., Li C., Jiang J., FusionGAN: A generative adversarial network for infrared and visible image fusion, Information Fusion 48 (2019) 11–26,.

Digital Library

[32]

Ma J., Zhang H., Shao Z., Liang P., Xu H., GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion, IEEE Transactions on Instrumentation and Measurement 70 (2021) 1–14,.

[33]

Ma J., Zhou Z., Wang B., Zong H., Infrared and visible image fusion based on visual saliency map and weighted least square optimization, Infrared Physics & Technology 82 (2017) 8–17,.

[34]

Nencini F., Garzelli A., Baronti S., Alparone L., Remote sensing image fusion using the curvelet transform, Information Fusion 8 (2) (2007) 143–156,.

Digital Library

[35]

Qu L., Liu S., Wang M., Li S., Yin S., Song Z., Trans2Fuse: Empowering image fusion through self-supervised learning and multi-modal transformations via transformer networks, Expert Systems with Applications 236 (2024),.

Digital Library

[36]

Tang W., He F., Liu Y., Ydtr: infrared and visible image fusion via y-shape dynamic transformer, IEEE Transactions on Multimedia (2022),.

Digital Library

[37]

Tang W., He F., Liu Y., Duan Y., MATR: multimodal medical image fusion via multiscale adaptive transformer, IEEE Transactions on Image Processing 31 (2022) 5134–5149,.

Digital Library

[38]

Tang W., He F., Liu Y., Duan Y., Si T., Datfuse: Infrared and visible image fusion via dual attention transformer, IEEE Transactions on Circuits and Systems for Video Technology 33 (2023) 3159–3172,.

Digital Library

[39]

Wang G., Li W., Du J., Xiao B., Gao X., Medical image fusion and denoising algorithm based on a decomposition model of hybrid variation-sparse representation, IEEE Journal of Biomedical and Health Informatics 26 (11) (2022) 5584–5595,.

[40]

Wang M., Shang X., A fast image fusion with discrete cosine transform, IEEE Signal Processing Letters 27 (2020) 990–994,.

[41]

Wang Z., Shao W., Chen Y., Xu J., Zhang L., A cross-scale iterative attentional adversarial fusion network for infrared and visible images, IEEE Transactions on Circuits and Systems for Video Technology 33 (8) (2023) 3677–3688,.

Digital Library

[42]

Wu Y.-H., Liu Y., Zhan X., Cheng M.-M., P2T: Pyramid pooling transformer for scene understanding, IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (11) (2023) 12760–12771,.

Digital Library

[43]

Xiang T., Yan L., Gao R., A fusion algorithm for infrared and visible images based on adaptive dual-channel unit-linking PCNN in NSCT domain, Infrared Physics & Technology 69 (2015) 53–61,.

[44]

Xu H., Ma J., EMFusion: An unsupervised enhanced medical image fusion network, Information Fusion 76 (2021) 177–186,.

Digital Library

[45]

Xu H., Ma J., Jiang J., Guo X., Ling H., U2Fusion: A unified unsupervised image fusion network, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (1) (2020) 502–518,.

Digital Library

[46]

Yang X., Yang Y., Ma S., Li Z., Dong W., Woźniak M., SAMT-generator: A second-attention for image captioning based on multi-stage transformer network, Neurocomputing 593 (2024),.

Digital Library

[47]

Zhang Y., Liu Y., Sun P., Yan H., Zhao X., Zhang L., IFCNN: A general image fusion framework based on convolutional neural network, Information Fusion 54 (2020) 99–118,.

[48]

Zhang H., Ma J., SDNet: A versatile squeeze-and-decomposition network for real-time image fusion, International Journal of Computer Vision 129 (2021) 2761–2785,.

Digital Library

[49]

Zhao Z., Bai H., Zhang J., Zhang Y., Zhang K., Xu S., Chen D., Timofte R., Van Gool L., Equivariant multi-modality image fusion, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, CVPR, 2024.

[50]

Zheng N., Zhou M., Huang J., Hou J., Li H., Xu Y., Zhao F., Probing synergistic high-order interaction in infrared and visible image fusion, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, CVPR, 2024, pp. 26384–26395.

[51]

Zhu Z., Zheng M., Qi G., Wang D., Xiang Y., A phase congruency and local Laplacian energy based multi-modality medical image fusion method in NSCT domain, IEEE Access 7 (2019) 20811–20824,.

Index Terms

MixFuse: An iterative mix-attention transformer for multi-modal image fusion
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Image manipulation
      1. Image processing
2. Information systems
  1. Data management systems
    1. Information integration
      1. Extraction, transformation and loading

Index terms have been assigned to the content through auto-classification.

Recommendations

Enhanced taxonomic identification of fusulinid fossils through image–text integration using transformer
Abstract
The accurate taxonomic identification of fusulinid fossils holds significant scientific value in palaeontology, paleoecology, and palaeogeography. However, imbalanced image samples lead to the model preferring to learn features from categories ...
Highlights
- We collected and created a multi-version order fusulinid multimodal (OFM) dataset.
- We propose a cross-modal integration module without using the common tandem fusion.
- We propose a framework to overcome the fossil image sample ...
DRMF: Degradation-Robust Multi-Modal Image Fusion via Composable Diffusion Prior
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Existing multi-modal image fusion algorithms are typically designed for high-quality images and fail to tackle degradation (e.g., low light, low resolution, and noise), which restricts image fusion from unleashing the potential in practice. In this work, ...
Semantic-aware transformer with feature integration for remote sensing change detection
Abstract
Change detection (CD) aims to detect change objects of interest from bi-temporal images and is a hot research direction due to its value in human civilization. Existing CD methods usually employ convolution or transformer structures to extract ...
Graphical abstract

Display Omitted
Highlights
- We fuse convolution and transformer modules to model the local and global contexts.
- We design feature generation/integration modules to correlate bi-temporal features.
- We develop a function with the cross-entropy and dice losses to ...

Comments

Information & Contributors

Information

Published In

cover image Expert Systems with Applications: An International Journal

Expert Systems with Applications: An International Journal Volume 261, Issue C

Feb 2025

1514 pages

Issue’s Table of Contents

Copyright © 2024.

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 February 2025

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents