Decoupling Zero-Shot Semantic Segmentation

Ding, Jian; Xue, Nan; Xia, Gui-Song; Dai, Dengxin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2112.07910 (cs)

[Submitted on 15 Dec 2021 (v1), last revised 15 Apr 2022 (this version, v2)]

Title:Decoupling Zero-Shot Semantic Segmentation

Authors:Jian Ding, Nan Xue, Gui-Song Xia, Dengxin Dai

View PDF

Abstract:Zero-shot semantic segmentation (ZS3) aims to segment the novel categories that have not been seen in the training. Existing works formulate ZS3 as a pixel-level zeroshot classification problem, and transfer semantic knowledge from seen classes to unseen ones with the help of language models pre-trained only with texts. While simple, the pixel-level ZS3 formulation shows the limited capability to integrate vision-language models that are often pre-trained with image-text pairs and currently demonstrate great potential for vision tasks. Inspired by the observation that humans often perform segment-level semantic labeling, we propose to decouple the ZS3 into two sub-tasks: 1) a classagnostic grouping task to group the pixels into segments. 2) a zero-shot classification task on segments. The former task does not involve category information and can be directly transferred to group pixels for unseen classes. The latter task performs at segment-level and provides a natural way to leverage large-scale vision-language models pre-trained with image-text pairs (e.g. CLIP) for ZS3. Based on the decoupling formulation, we propose a simple and effective zero-shot semantic segmentation model, called ZegFormer, which outperforms the previous methods on ZS3 standard benchmarks by large margins, e.g., 22 points on the PASCAL VOC and 3 points on the COCO-Stuff in terms of mIoU for unseen classes. Code will be released at this https URL.

Comments:	Accepted by CVPR 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2112.07910 [cs.CV]
	(or arXiv:2112.07910v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2112.07910

Submission history

From: Ding Jian [view email]
[v1] Wed, 15 Dec 2021 06:21:47 UTC (21,554 KB)
[v2] Fri, 15 Apr 2022 10:28:07 UTC (22,193 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Decoupling Zero-Shot Semantic Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Decoupling Zero-Shot Semantic Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators