Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering

Sobol, Ido; Xu, Chenfeng; Litany, Or

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.18677 (cs)

[Submitted on 29 May 2024 (v1), last revised 24 Oct 2024 (this version, v2)]

Title:Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering

Authors:Ido Sobol, Chenfeng Xu, Or Litany

View PDF HTML (experimental)

Abstract:Generating realistic images from arbitrary views based on a single source image remains a significant challenge in computer vision, with broad applications ranging from e-commerce to immersive virtual experiences. Recent advancements in diffusion models, particularly the Zero-1-to-3 model, have been widely adopted for generating plausible views, videos, and 3D models. However, these models still struggle with inconsistencies and implausibility in new views generation, especially for challenging changes in viewpoint. In this work, we propose Zero-to-Hero, a novel test-time approach that enhances view synthesis by manipulating attention maps during the denoising process of Zero-1-to-3. By drawing an analogy between the denoising process and stochastic gradient descent (SGD), we implement a filtering mechanism that aggregates attention maps, enhancing generation reliability and authenticity. This process improves geometric consistency without requiring retraining or significant computational resources. Additionally, we modify the self-attention mechanism to integrate information from the source view, reducing shape distortions. These processes are further supported by a specialized sampling schedule. Experimental results demonstrate substantial improvements in fidelity and consistency, validated on a diverse set of out-of-distribution objects. Additionally, we demonstrate the general applicability and effectiveness of Zero-to-Hero in multi-view, and image generation conditioned on semantic maps and pose.

Comments:	NeurIPS 2024. Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.18677 [cs.CV]
	(or arXiv:2405.18677v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.18677

Submission history

From: Ido Sobol [view email]
[v1] Wed, 29 May 2024 00:58:22 UTC (3,762 KB)
[v2] Thu, 24 Oct 2024 12:51:54 UTC (11,223 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators