Anatomical Structure-Guided Medical Vision-Language Pre-training

Li, Qingqiu; Yan, Xiaohan; Xu, Jilan; Yuan, Runtian; Zhang, Yuejie; Feng, Rui; Shen, Quanli; Zhang, Xiaobo; Wang, Shujun

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.09294 (cs)

[Submitted on 14 Mar 2024]

Title:Anatomical Structure-Guided Medical Vision-Language Pre-training

Authors:Qingqiu Li, Xiaohan Yan, Jilan Xu, Runtian Yuan, Yuejie Zhang, Rui Feng, Quanli Shen, Xiaobo Zhang, Shujun Wang

View PDF HTML (experimental)

Abstract:Learning medical visual representations through vision-language pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (ASG) framework. Specifically, we parse raw reports into triplets <anatomical region, finding, existence>, and fully utilize each element as supervision to enhance representation learning. For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists, considering them as the minimum semantic units to explore fine-grained local alignment. For finding and existence, we regard them as image tags, applying an image-tag recognition decoder to associate image features with their respective tags within each sample and constructing soft labels for contrastive learning to improve the semantic association of different image-report pairs. We evaluate the proposed ASG framework on two downstream tasks, including five public benchmarks. Experimental results demonstrate that our method outperforms the state-of-the-art methods.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2403.09294 [cs.CV]
	(or arXiv:2403.09294v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.09294

Submission history

From: Qingqiu Li [view email]
[v1] Thu, 14 Mar 2024 11:29:47 UTC (1,367 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Anatomical Structure-Guided Medical Vision-Language Pre-training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Anatomical Structure-Guided Medical Vision-Language Pre-training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators