Open-vocabulary Panoptic Segmentation with Embedding Modulation

Chen, Xi; Li, Shuang; Lim, Ser-Nam; Torralba, Antonio; Zhao, Hengshuang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2303.11324 (cs)

[Submitted on 20 Mar 2023 (v1), last revised 15 Jul 2023 (this version, v2)]

Title:Open-vocabulary Panoptic Segmentation with Embedding Modulation

Authors:Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao

View PDF

Abstract:Open-vocabulary image segmentation is attracting increasing attention due to its critical applications in the real world. Traditional closed-vocabulary segmentation methods are not able to characterize novel objects, whereas several recent open-vocabulary attempts obtain unsatisfactory results, i.e., notable performance reduction on the closed vocabulary and massive demand for extra data. To this end, we propose OPSNet, an omnipotent and data-efficient framework for Open-vocabulary Panoptic Segmentation. Specifically, the exquisitely designed Embedding Modulation module, together with several meticulous components, enables adequate embedding enhancement and information exchange between the segmentation model and the visual-linguistic well-aligned CLIP encoder, resulting in superior segmentation performance under both open- and closed-vocabulary settings with much fewer need of additional data. Extensive experimental evaluations are conducted across multiple datasets (e.g., COCO, ADE20K, Cityscapes, and PascalContext) under various circumstances, where the proposed OPSNet achieves state-of-the-art results, which demonstrates the effectiveness and generality of the proposed approach. The code and trained models will be made publicly available.

Comments:	ICCV2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2303.11324 [cs.CV]
	(or arXiv:2303.11324v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.11324

Submission history

From: Xi Chen [view email]
[v1] Mon, 20 Mar 2023 17:58:48 UTC (1,431 KB)
[v2] Sat, 15 Jul 2023 11:04:26 UTC (1,431 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Open-vocabulary Panoptic Segmentation with Embedding Modulation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Open-vocabulary Panoptic Segmentation with Embedding Modulation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators