Multi-View Unsupervised Image Generation with Cross Attention Guidance

Cerkezi, Llukman; Davtyan, Aram; Sameni, Sepehr; Favaro, Paolo

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.04337 (cs)

[Submitted on 7 Dec 2023]

Title:Multi-View Unsupervised Image Generation with Cross Attention Guidance

Authors:Llukman Cerkezi, Aram Davtyan, Sepehr Sameni, Paolo Favaro

View PDF HTML (experimental)

Abstract:The growing interest in novel view synthesis, driven by Neural Radiance Field (NeRF) models, is hindered by scalability issues due to their reliance on precisely annotated multi-view images. Recent models address this by fine-tuning large text2image diffusion models on synthetic multi-view data. Despite robust zero-shot generalization, they may need post-processing and can face quality issues due to the synthetic-real domain gap. This paper introduces a novel pipeline for unsupervised training of a pose-conditioned diffusion model on single-category datasets. With the help of pretrained self-supervised Vision Transformers (DINOv2), we identify object poses by clustering the dataset through comparing visibility and locations of specific object parts. The pose-conditioned diffusion model, trained on pose labels, and equipped with cross-frame attention at inference time ensures cross-view consistency, that is further aided by our novel hard-attention guidance. Our model, MIRAGE, surpasses prior work in novel view synthesis on real images. Furthermore, MIRAGE is robust to diverse textures and geometries, as demonstrated with our experiments on synthetic images generated with pretrained Stable Diffusion.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.04337 [cs.CV]
	(or arXiv:2312.04337v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.04337

Submission history

From: Llukman Cerkezi [view email]
[v1] Thu, 7 Dec 2023 14:55:13 UTC (12,380 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-View Unsupervised Image Generation with Cross Attention Guidance

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-View Unsupervised Image Generation with Cross Attention Guidance

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators