Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

Wu, Ji-Jia; Chang, Andy Chia-Hao; Chuang, Chieh-Yu; Chen, Chun-Pei; Liu, Yu-Lun; Chen, Min-Hung; Hu, Hou-Ning; Chuang, Yung-Yu; Lin, Yen-Yu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.04231 (cs)

[Submitted on 5 Apr 2024]

Title:Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

Authors:Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang, Chun-Pei Chen, Yu-Lun Liu, Min-Hung Chen, Hou-Ning Hu, Yung-Yu Chuang, Yen-Yu Lin

View PDF HTML (experimental)

Abstract:This paper addresses text-supervised semantic segmentation, aiming to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations. Existing methods have demonstrated that contrastive learning on image-text pairs effectively aligns visual segments with the meanings of texts. We notice that there is a discrepancy between text alignment and semantic segmentation: A text often consists of multiple semantic concepts, whereas semantic segmentation strives to create semantically homogeneous segments. To address this issue, we propose a novel framework, Image-Text Co-Decomposition (CoDe), where the paired image and text are jointly decomposed into a set of image regions and a set of word segments, respectively, and contrastive learning is developed to enforce region-word alignment. To work with a vision-language model, we present a prompt learning mechanism that derives an extra representation to highlight an image segment or a word segment of interest, with which more effective features can be extracted from that segment. Comprehensive experimental results demonstrate that our method performs favorably against existing text-supervised semantic segmentation methods on six benchmark datasets.

Comments:	CVPR 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.04231 [cs.CV]
	(or arXiv:2404.04231v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.04231

Submission history

From: Ji-Jia Wu [view email]
[v1] Fri, 5 Apr 2024 17:25:17 UTC (1,876 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators