A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

Zhang, Junyi; Herrmann, Charles; Hur, Junhwa; Cabrera, Luisa Polania; Jampani, Varun; Sun, Deqing; Yang, Ming-Hsuan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2305.15347 (cs)

[Submitted on 24 May 2023 (v1), last revised 28 Nov 2023 (this version, v2)]

Title:A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

Authors:Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Polania Cabrera, Varun Jampani, Deqing Sun, Ming-Hsuan Yang

View PDF

Abstract:Text-to-image diffusion models have made significant advances in generating and editing high-quality images. As a result, numerous approaches have explored the ability of diffusion model features to understand and process single images for downstream tasks, e.g., classification, semantic segmentation, and stylization. However, significantly less is known about what these features reveal across multiple, different images and objects. In this work, we exploit Stable Diffusion (SD) features for semantic and dense correspondence and discover that with simple post-processing, SD features can perform quantitatively similar to SOTA representations. Interestingly, the qualitative analysis reveals that SD features have very different properties compared to existing representation learning features, such as the recently released DINOv2: while DINOv2 provides sparse but accurate matches, SD features provide high-quality spatial information but sometimes inaccurate semantic matches. We demonstrate that a simple fusion of these two features works surprisingly well, and a zero-shot evaluation using nearest neighbors on these fused features provides a significant performance gain over state-of-the-art methods on benchmark datasets, e.g., SPair-71k, PF-Pascal, and TSS. We also show that these correspondences can enable interesting applications such as instance swapping in two images.

Comments:	Accepted by NeurIPS 23, project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.15347 [cs.CV]
	(or arXiv:2305.15347v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2305.15347

Submission history

From: Junyi Zhang [view email]
[v1] Wed, 24 May 2023 16:59:26 UTC (5,207 KB)
[v2] Tue, 28 Nov 2023 17:47:46 UTC (6,163 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators