research-article

DCTNet: A Fusion of Transformer and CNN for Advanced Multimodal Medical Image Segmentation

Authors:

Huina WangAuthors Info & Claims

CIBDA '24: Proceedings of the 5th International Conference on Computer Information and Big Data Applications

Pages 762 - 767

https://doi.org/10.1145/3671151.3671286

Published: 23 July 2024 Publication History

Abstract

In the present investigation, we propose an advanced multimodal breast cancer segmentation framework, designated as DCTNet, which harnesses the synergistic capabilities of Convolutional Neural Networks (CNN) and Transformers. This innovative approach is designed to amalgamate and utilize diverse informational and feature-rich inputs from varied modalities, significantly refining the precision of automated delineation in breast cancer lesions. DCTNet incorporates dual CNN-based feature learning architectures to independently assimilate modality-specific features, concurrently minimizing cross-modality interference through an intricately structured encoder-decoder mechanism complemented by skip connections. Furthermore, we introduce a Transformer-based encoder dedicated to cross-modal shared learning, adept at extracting cohesive representations from multimodal inputs. These are seamlessly integrated with modality-specific features via a Cross-Modal Feature Fusion Module (CFM), thereby optimizing the feature representation through the CNN decoder pathway for superior segmentation outcomes. Rigorous experimental evaluations conducted on the DCI breast cancer dataset affirm DCTNet's capacity to either match or excel beyond the segmentation efficacy of prevailing advanced multimodal models. This exploration not only elucidates the efficacy and indispensability of integrating CNN with Transformer structures, the cross-modal feature fusion module, and multimodal contrastive loss in elevating the accuracy of breast cancer segmentation but also pioneers new directions for ensuing research in multimodal medical image analysis.

References

[1]

Morrow, M.; Strom, E. A.; Bassett, L. W.; Dershaw, D. D.; Fowble, B.; Giuliano, A.; Harris, J. R.; O'Malley, F.; Schnitt, S. J.; Singletary, S. E.; Standard for Breast Conservation Therapy in the Management of Invasive Breast Carcinoma. CA: A Cancer Journal for Clinicians 2002, 52 (5), 277-300.

[2]

Chen, C.; Dou, Q.; Jin, Y.; Chen, H.; Qin, J.; Heng, P. A. Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement and Gated Fusion. In Lecture Notes in Computer Science, 2019; Vol. 11766 LNCS.

[3]

Carneiro, G.; Nascimento, J.; Bradley, A. P. Unregistered multiview mammogram analysis with pre-trained deep learning models. In Lecture Notes in Computer Science, 2015; Vol. 9351.

[4]

Saeed, N.; Sobirov, I.; Al Majzoub, R.; Yaqub, M. TMSS: An End-to-End Transformer-Based Multimodal Network for Segmentation and Survival Prediction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022; Vol. 13437 LNCS.

[5]

Wang, W.; Chen, C.; Ding, M.; Yu, H.; Zha, S.; Li, J. TransBTS: Multimodal Brain Tumor Segmentation Using Transformer. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2021; Vol. 12901 LNCS.

[6]

Shi, J.; Kan, H.; Ruan, S.; Zhu, Z.; Zhao, M.; Qiao, L.; Wang, Z.; An, H.; Xue, X. H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 2023; Springer: pp 692-702.

Digital Library

[7]

Dobko, M.; Kolinko, D. I.; Viniavskyi, O.; Yelisieiev, Y. Combining CNNs with Transformer for Multimodal 3D MRI Brain Tumor Segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022; Vol. 12963 LNCS.

Digital Library

[8]

Zhang, Y.; He, N.; Yang, J.; Li, Y.; Wei, D.; Huang, Y.; Zhang, Y.; He, Z.; Zheng, Y. mmFormer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022; Vol. 13435 LNCS.

[9]

Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 2017, 2017-Janua, 5987-5995.

[10]

Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Lin, H.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R.; ResNeSt: Split-Attention Networks. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2022; Vol. 2022-June.

[11]

Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K. Q. Densely connected convolutional networks. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017; Vol. 2017-January.

[12]

Gulati, A.; Qin, J.; Chiu, C. C.; Parmar, N.; Zhang, Y.; Yu, J.; Han, W.; Wang, S.; Zhang, Z.; Wu, Y.; Conformer: Convolution-augmented transformer for speech recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2020; Vol. 2020-October.

[13]

Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015; Vol. 9351.

[14]

Isensee, F.; Petersen, J.; Klein, A.; Zimmerer, D.; Jaeger, P. F.; Kohl, S.; Wasserthal, J.; Koehler, G.; Norajitra, T.; Wirkert, S. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation. In Informatik aktuell, 2019.

[15]

Zhou, Z.; Siddiquee, M. M. R.; Tajbakhsh, N.; Liang, J. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE transactions on medical imaging 2019, 39 (6), 1856-1867.

[16]

Dolz, J.; Desrosiers, C.; Ben Ayed, I. IVD-Net: Intervertebral disc localization and segmentation in MRI with a multi-modal UNet. In International workshop and challenge on computational methods and clinical applications for spine imaging, 2018; Springer: pp 130-143.

Digital Library

[17]

Li, X.; Ma, S.; Tang, J.; Guo, F. TranSiam: Fusing multimodal visual features using transformer for medical image segmentation. arXiv preprint arXiv:2204.12185 2022.

[18]

Zhang, J.; Zhang, S.; Shen, X.; Lukasiewicz, T.; Xu, Z. Multi-ConDoS: Multimodal contrastive domain sharing generative adversarial networks for self-supervised medical image segmentation. IEEE Transactions on Medical Imaging 2023.

[19]

Marinov, Z.; Reiß, S.; Kersting, D.; Kleesiek, J.; Stiefelhagen, R. Mirror u-net: Marrying multimodal fission with multi-task learning for semantic segmentation in medical imaging. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023; pp 2283-2293.

Index Terms

DCTNet: A Fusion of Transformer and CNN for Advanced Multimodal Medical Image Segmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

mmFormer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation
Medical Image Computing and Computer Assisted Intervention – MICCAI 2022
Abstract
Accurate brain tumor segmentation from Magnetic Resonance Imaging (MRI) is desirable to joint learning of multimodal images. However, in clinical practice, it is not always possible to acquire a complete set of MRIs, and the problem of missing ...
Tumor Segmentation in Weakly Paired Anatomical and Functional MRI images with Multimodal Information Fusion
ICBIP '23: Proceedings of the 2023 8th International Conference on Biomedical Signal and Image Processing

Multimodal magnetic resonance imaging (MRI) contains complementary information in anatomical and functional images that help the accurate diagnosis and treatment evaluation of lung cancers. Accurately segmenting tumor regions in each modality can help ...
Structured Multimodal Fusion Network for Referring Image Segmentation
ICMI '22: Proceedings of the 2022 International Conference on Multimodal Interaction

Referring image segmentation aims to segment one particular object referred by a natural language expression in the image. One major challenge of this task is how to understand and align vision and language to distinguish the referent. Another major ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CIBDA '24: Proceedings of the 5th International Conference on Computer Information and Big Data Applications

April 2024

1285 pages

ISBN:9798400718106

DOI:10.1145/3671151

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

CIBDA 2024

CIBDA 2024: 5th International Conference on Computer Information and Big Data Applications

April 26 - 28, 2024

Wuhan, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
11
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)6

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents