Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation

Huang, Xiaoshuang; Li, Hongxiang; Cao, Meng; Chen, Long; You, Chenyu; An, Dong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.02845v1 (cs)

[Submitted on 3 Apr 2024 (this version), latest version 7 Jul 2024 (v2)]

Title:Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation

Authors:Xiaoshuang Huang, Hongxiang Li, Meng Cao, Long Chen, Chenyu You, Dong An

View PDF HTML (experimental)

Abstract:Recent developments underscore the potential of textual information in enhancing learning models for a deeper understanding of medical visual semantics. However, language-guided medical image segmentation still faces a challenging issue. Previous works employ implicit and ambiguous architectures to embed textual information. This leads to segmentation results that are inconsistent with the semantics represented by the language, sometimes even diverging significantly. To this end, we propose a novel cross-modal conditioned Reconstruction for Language-guided Medical Image Segmentation (RecLMIS) to explicitly capture cross-modal interactions, which assumes that well-aligned medical visual features and medical notes can effectively reconstruct each other. We introduce conditioned interaction to adaptively predict patches and words of interest. Subsequently, they are utilized as conditioning factors for mutual reconstruction to align with regions described in the medical notes. Extensive experiments demonstrate the superiority of our RecLMIS, surpassing LViT by 3.74% mIoU on the publicly available MosMedData+ dataset and achieving an average increase of 1.89% mIoU for cross-domain tests on our QATA-CoV19 dataset. Simultaneously, we achieve a relative reduction of 20.2% in parameter count and a 55.5% decrease in computational load. The code will be available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.02845 [cs.CV]
	(or arXiv:2404.02845v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.02845

Submission history

From: Xiaoshuang Huang [view email]
[v1] Wed, 3 Apr 2024 16:23:37 UTC (9,529 KB)
[v2] Sun, 7 Jul 2024 17:57:36 UTC (7,683 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators