Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 170 results for author: Vedaldi, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04343  [pdf, other

    cs.CV

    Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image

    Authors: Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, João F. Henriques, Christian Rupprecht, Andrea Vedaldi

    Abstract: In this paper, we propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient. For generalisability, we start from a "foundation" model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor. For efficiency, we base this extension on feed-forward Gaussian Splatting. Specifically… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project page: https://www.robots.ox.ac.uk/~vgg/research/flash3d/

  2. arXiv:2404.19760  [pdf, other

    cs.CV cs.GR

    Lightplane: Highly-Scalable Components for Neural 3D Fields

    Authors: Ang Cao, Justin Johnson, Andrea Vedaldi, David Novotny

    Abstract: Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision. However, current designs for these 2D-3D mapping are memory-intensive, posing a significant bottleneck for existing methods and hindering new applications. In response, we propose a pair of highly scalable components for 3D neural fields: Lightplane Render and Splatter, w… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Project Page: https://lightplane.github.io/ Code: https://github.com/facebookresearch/lightplane

  3. arXiv:2404.19758  [pdf, other

    cs.CV

    Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting

    Authors: Paul Engstler, Andrea Vedaldi, Iro Laina, Christian Rupprecht

    Abstract: 3D scene generation has quickly become a challenging new research direction, fueled by consistent improvements of 2D generative diffusion models. Most prior work in this area generates scenes by iteratively stitching newly generated frames with existing geometry. These works often depend on pre-trained monocular depth estimators to lift the generated images into 3D, fusing them with the existing s… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Project page: https://research.paulengstler.com/invisible-stitch/

  4. arXiv:2404.18929  [pdf, other

    cs.CV

    DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing

    Authors: Minghao Chen, Iro Laina, Andrea Vedaldi

    Abstract: We consider the problem of editing 3D objects and scenes based on open-ended language instructions. The established paradigm to solve this problem is to use a 2D image generator or editor to guide the 3D editing process. However, this is often slow as it requires do update a computationally expensive 3D representations such as a neural radiance field, and to do so by using contradictory guidance f… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Project Page: https://silent-chen.github.io/DGE/

  5. arXiv:2403.15382  [pdf, other

    cs.CV

    DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

    Authors: Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

    Abstract: We introduce DragAPart, a method that, given an image and a set of drags as input, can generate a new image of the same object in a new state, compatible with the action of the drags. Differently from prior works that focused on repositioning objects, DragAPart predicts part-level interactions, such as opening and closing a drawer. We study this problem as a proxy for learning a generalist motion… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Project page: https://dragapart.github.io/

  6. arXiv:2403.10997  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields

    Authors: Yash Bhalgat, Iro Laina, João F. Henriques, Andrew Zisserman, Andrea Vedaldi

    Abstract: Understanding complex scenes at multiple levels of abstraction remains a formidable challenge in computer vision. To address this, we introduce Nested Neural Feature Fields (N2F2), a novel approach that employs hierarchical supervision to learn a single feature field, wherein different dimensions within the same high-dimensional feature encode scene properties at varying granularities. Our method… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  7. arXiv:2402.10128  [pdf, other

    cs.CV cs.GR cs.LG

    GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

    Authors: Abdullah Hamdi, Luke Melas-Kyriazi, Jinjie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi

    Abstract: Advancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalized Exponential Splatting), a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes, requiring far fewer particles to represe… ▽ More

    Submitted 24 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: CVPR 2024 paper. project website https://abdullahamdi.com/ges

  8. arXiv:2402.08682  [pdf, other

    cs.CV cs.AI cs.LG

    IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

    Authors: Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos

    Abstract: Most text-to-3D generators build upon off-the-shelf text-to-image models trained on billions of images. They use variants of Score Distillation Sampling (SDS), which is slow, somewhat unstable, and prone to artifacts. A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly. In th… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  9. arXiv:2401.02400  [pdf, other

    cs.CV

    Learning the 3D Fauna of the Web

    Authors: Zizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu

    Abstract: Learning 3D models of all animals on the Earth requires massively scaling up existing solutions. With this ultimate goal in mind, we develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly. One crucial bottleneck of modeling animals is the limited availability of training data, which we overcome by simply learning from 2D Interne… ▽ More

    Submitted 1 April, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: The first two authors contributed equally to this work. The last three authors contributed equally. Project page: https://kyleleey.github.io/3DFauna/

  10. arXiv:2312.13150  [pdf, other

    cs.CV

    Splatter Image: Ultra-Fast Single-View 3D Reconstruction

    Authors: Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi

    Abstract: We introduce the \method, an ultra-efficient approach for monocular 3D object reconstruction. Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images. We apply Gaussian Splatting to monocular reconstruction by learning a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS. Our main… ▽ More

    Submitted 16 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: CVPR 2024. Project page: https://szymanowiczs.github.io/splatter-image.html . Code: https://github.com/szymanowiczs/splatter-image , Demo: https://huggingface.co/spaces/szymanowiczs/splatter_image

  11. arXiv:2312.09246  [pdf, other

    cs.CV

    SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

    Authors: Minghao Chen, Junyu Xie, Iro Laina, Andrea Vedaldi

    Abstract: We propose a novel feed-forward 3D editing framework called Shap-Editor. Prior research on editing 3D objects primarily concentrated on editing individual objects by leveraging off-the-shelf 2D image editing networks. This is achieved via a process called distillation, which transfers knowledge from the 2D network to 3D assets. Distillation necessitates at least tens of minutes per asset to attain… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Project Page: https://silent-chen.github.io/Shap-Editor/

  12. arXiv:2312.08744  [pdf, other

    cs.CV cs.GR

    GOEnFusion: Gradient Origin Encodings for 3D Forward Diffusion Models

    Authors: Animesh Karnewar, Andrea Vedaldi, Niloy J. Mitra, David Novotny

    Abstract: The recently introduced Forward-Diffusion method allows to train a 3D diffusion model using only 2D images for supervision. However, it does not easily generalise to different 3D representations and requires a computationally expensive auto-regressive sampling process to generate the underlying 3D scenes. In this paper, we propose GOEn: Gradient Origin Encoding (pronounced "gone"). GOEn can encode… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: project page at: https://holodiffusion.github.io/goenfusion

  13. arXiv:2312.04551  [pdf, other

    cs.CV

    Free3D: Consistent Novel View Synthesis without 3D Representation

    Authors: Chuanxia Zheng, Andrea Vedaldi

    Abstract: We introduce Free3D, a simple accurate method for monocular open-set novel view synthesis (NVS). Similar to Zero-1-to-3, we start from a pre-trained 2D image generator for generalization, and fine-tune it for NVS. Compared to other works that took a similar approach, we obtain significant improvements without resorting to an explicit 3D representation, which is slow and memory-consuming, and witho… ▽ More

    Submitted 30 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: webpage: https://chuanxiaz.com/free3d/, code: https://github.com/lyndonzheng/Free3D

  14. arXiv:2312.02350  [pdf, other

    cs.CV

    Instant Uncertainty Calibration of NeRFs Using a Meta-calibrator

    Authors: Niki Amini-Naieni, Tomas Jakab, Andrea Vedaldi, Ronald Clark

    Abstract: Although Neural Radiance Fields (NeRFs) have markedly improved novel view synthesis, accurate uncertainty quantification in their image predictions remains an open problem. The prevailing methods for estimating uncertainty, including the state-of-the-art Density-aware NeRF Ensembles (DANE) [29], quantify uncertainty without calibration. This frequently leads to over- or under-confidence in image p… ▽ More

    Submitted 19 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  15. arXiv:2311.17055  [pdf, other

    cs.CV cs.AI cs.IT cs.LG

    No Representation Rules Them All in Category Discovery

    Authors: Sagar Vaze, Andrea Vedaldi, Andrew Zisserman

    Abstract: In this paper we tackle the problem of Generalized Category Discovery (GCD). Specifically, given a dataset with labelled and unlabelled images, the task is to cluster all images in the unlabelled subset, whether or not they belong to the labelled categories. Our first contribution is to recognize that most existing GCD benchmarks only contain labels for a single clustering of the data, making it d… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023

  16. arXiv:2308.14244  [pdf, other

    cs.CV cs.GR

    HoloFusion: Towards Photo-realistic 3D Generative Modeling

    Authors: Animesh Karnewar, Niloy J. Mitra, Andrea Vedaldi, David Novotny

    Abstract: Diffusion-based image generators can now produce high-quality and diverse samples, but their success has yet to fully translate to 3D generation: existing diffusion methods can either generate low-resolution but 3D consistent outputs, or detailed 2D views of 3D objects but with potential structural defects and lacking view consistency or realism. We present HoloFusion, a method that combines the b… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 conference; project page at: https://holodiffusion.github.io/holofusion

  17. arXiv:2307.15139  [pdf, other

    cs.CV

    Online Clustered Codebook

    Authors: Chuanxia Zheng, Andrea Vedaldi

    Abstract: Vector Quantisation (VQ) is experiencing a comeback in machine learning, where it is increasingly used in representation learning. However, optimizing the codevectors in existing VQ-VAE is not entirely trivial. A problem is codebook collapse, where only a small subset of codevectors receive gradients useful for their optimisation, whereas a majority of them simply ``dies off'' and is never updated… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: The project page: https://chuanxiaz.com/cvq/

  18. arXiv:2307.12067  [pdf, other

    cs.CV

    Replay: Multi-modal Multi-view Acted Videos for Casual Holography

    Authors: Roman Shapovalov, Yanir Kleiman, Ignacio Rocco, David Novotny, Andrea Vedaldi, Changan Chen, Filippos Kokkinos, Ben Graham, Natalia Neverova

    Abstract: We introduce Replay, a collection of multi-view, multi-modal videos of humans interacting socially. Each scene is filmed in high production quality, from different viewpoints with several static cameras, as well as wearable action cameras, and recorded with a large array of microphones at different positions in the room. Overall, the dataset contains over 4000 minutes of footage and over 7 million… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

    Comments: Accepted for ICCV 2023. Roman, Yanir, and Ignacio contributed equally

  19. arXiv:2307.07635  [pdf, other

    cs.CV

    CoTracker: It is Better to Track Together

    Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht

    Abstract: We introduce CoTracker, a transformer-based model that tracks dense points in a frame jointly across a video sequence. This differs from most existing state-of-the-art approaches that track points independently, ignoring their correlation. We show that joint tracking results in a significantly higher tracking accuracy and robustness. We also provide several technical innovations, including the con… ▽ More

    Submitted 26 December, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Code and model weights are available at: https://co-tracker.github.io/

  20. arXiv:2306.09316  [pdf, other

    cs.CV

    Diffusion Models for Zero-Shot Open-Vocabulary Segmentation

    Authors: Laurynas Karazija, Iro Laina, Andrea Vedaldi, Christian Rupprecht

    Abstract: The variety of objects in the real world is nearly unlimited and is thus impossible to capture using models trained on a fixed set of categories. As a result, in recent years, open-vocabulary methods have attracted the interest of the community. This paper proposes a new method for zero-shot open-vocabulary segmentation. Prior work largely relies on contrastive training using image-text pairs, lev… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Project page https://www.robots.ox.ac.uk/~vgg/research/ovdiff

  21. arXiv:2306.08731  [pdf, other

    cs.CV

    EPIC Fields: Marrying 3D Geometry and Video Understanding

    Authors: Vadim Tschernezki, Ahmad Darkhalil, Zhifan Zhu, David Fouhey, Iro Laina, Diane Larlus, Dima Damen, Andrea Vedaldi

    Abstract: Neural rendering is fuelling a unification of learning, 3D geometry and video understanding that has been waiting for more than two decades. Progress, however, is still hampered by a lack of suitable datasets and benchmarks. To address this gap, we introduce EPIC Fields, an augmentation of EPIC-KITCHENS with 3D camera information. Like other datasets for neural rendering, EPIC Fields removes the c… ▽ More

    Submitted 1 February, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Published at NeurIPS 2023. 24 pages, 15 figures. Project Webpage: http://epic-kitchens.github.io/epic-fields

  22. arXiv:2306.07881  [pdf, other

    cs.CV

    Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data

    Authors: Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi

    Abstract: We present Viewset Diffusion, a diffusion-based generator that outputs 3D objects while only using multi-view 2D data for supervision. We note that there exists a one-to-one mapping between viewsets, i.e., collections of several 2D views of an object, and 3D models. Hence, we train a diffusion model to generate viewsets, but design the neural network generator to reconstruct internally correspondi… ▽ More

    Submitted 1 September, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: International Conference on Computer Vision 2023

  23. arXiv:2306.04633  [pdf, other

    cs.CV cs.AI cs.LG

    Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion

    Authors: Yash Bhalgat, Iro Laina, João F. Henriques, Andrew Zisserman, Andrea Vedaldi

    Abstract: Instance segmentation in 3D is a challenging task due to the lack of large-scale annotated datasets. In this paper, we show that this task can be addressed effectively by leveraging instead 2D pre-trained models for instance segmentation. We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation, which encourages multi-view consistency across fra… ▽ More

    Submitted 1 December, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (Spotlight). Code: https://github.com/yashbhalgat/Contrastive-Lift

  24. arXiv:2305.02296  [pdf, other

    cs.CV cs.AI

    DynamicStereo: Consistent Dynamic Depth from Stereo Videos

    Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht

    Abstract: We consider the problem of reconstructing a dynamic scene observed from a stereo camera. Most existing methods for depth from stereo treat different stereo frames independently, leading to temporally inconsistent depth predictions. Temporal consistency is especially important for immersive AR or VR scenarios, where flickering greatly diminishes the user experience. We propose DynamicStereo, a nove… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: CVPR 2023; project page available at https://dynamic-stereo.github.io/

  25. arXiv:2304.10535  [pdf, other

    cs.CV

    Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion

    Authors: Tomas Jakab, Ruining Li, Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi

    Abstract: We present Farm3D, a method for learning category-specific 3D reconstructors for articulated objects, relying solely on "free" virtual supervision from a pre-trained 2D diffusion-based image generator. Recent approaches can learn a monocular network that predicts the 3D shape, albedo, illumination, and viewpoint of any object occurrence, given a collection of single-view images of an object catego… ▽ More

    Submitted 14 May, 2024; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: In 3DV 2024, Project page: http://farm3d.github.io

  26. arXiv:2304.06712  [pdf, other

    cs.CV

    What does CLIP know about a red circle? Visual prompt engineering for VLMs

    Authors: Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi

    Abstract: Large-scale Vision-Language Models, such as CLIP, learn powerful image-text representations that have found numerous applications, from zero-shot classification to text-to-image generation. Despite that, their capabilities for solving novel discriminative tasks via prompting fall behind those of large language models, such as GPT-3. Here we explore the idea of visual prompt engineering for solving… ▽ More

    Submitted 18 August, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: ICCV 2023 Oral

  27. arXiv:2304.03373  [pdf, other

    cs.CV

    Training-Free Layout Control with Cross-Attention Guidance

    Authors: Minghao Chen, Iro Laina, Andrea Vedaldi

    Abstract: Recent diffusion-based generators can produce high-quality images from textual prompts. However, they often disregard textual instructions that specify the spatial layout of the composition. We propose a simple approach that achieves robust layout control without the need for training or fine-tuning of the image generator. Our technique manipulates the cross-attention layers that the model uses to… ▽ More

    Submitted 29 November, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: WACV 2024, Project Page: https://silent-chen.github.io/layout-guidance/

  28. arXiv:2304.03110  [pdf, other

    cs.CV

    Continual Detection Transformer for Incremental Object Detection

    Authors: Yaoyao Liu, Bernt Schiele, Andrea Vedaldi, Christian Rupprecht

    Abstract: Incremental object detection (IOD) aims to train an object detector in phases, each with annotations for new object categories. As other incremental settings, IOD is subject to catastrophic forgetting, which is often addressed by techniques such as knowledge distillation (KD) and exemplar replay (ER). However, KD and ER do not work well if applied directly to state-of-the-art transformer-based obj… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR 2023

  29. arXiv:2303.16509  [pdf, other

    cs.CV cs.GR

    HoloDiffusion: Training a 3D Diffusion Model using 2D Images

    Authors: Animesh Karnewar, Andrea Vedaldi, David Novotny, Niloy Mitra

    Abstract: Diffusion models have emerged as the best approach for generative modeling of 2D images. Part of their success is due to the possibility of training them on millions if not billions of images with a stable learning objective. However, extending these models to 3D remains difficult for two reasons. First, finding a large quantity of 3D training data is much more complex than for 2D images. Second,… ▽ More

    Submitted 21 May, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 conference; project page at: https://holodiffusion.github.io/

  30. arXiv:2303.11898  [pdf, other

    cs.CV cs.GR

    Real-time volumetric rendering of dynamic humans

    Authors: Ignacio Rocco, Iurii Makarov, Filippos Kokkinos, David Novotny, Benjamin Graham, Natalia Neverova, Andrea Vedaldi

    Abstract: We present a method for fast 3D reconstruction and real-time rendering of dynamic humans from monocular videos with accompanying parametric body fits. Our method can reconstruct a dynamic human in less than 3h using a single GPU, compared to recent state-of-the-art alternatives that take up to 72h. These speedups are obtained by using a lightweight deformation model solely based on linear blend sk… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Project page: https://real-time-humans.github.io/

  31. arXiv:2302.10668  [pdf, other

    cs.CV cs.AI cs.LG

    $PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction

    Authors: Luke Melas-Kyriazi, Christian Rupprecht, Andrea Vedaldi

    Abstract: Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision. In this paper, we propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process. Our method takes as input a single RGB image along with its camera pose and gradually denoises a set of 3… ▽ More

    Submitted 23 February, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Project page: https://lukemelas.github.io/projection-conditioned-point-cloud-diffusion

  32. arXiv:2302.10663  [pdf, other

    cs.CV cs.AI cs.LG

    RealFusion: 360° Reconstruction of Any Object from a Single Image

    Authors: Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi

    Abstract: We consider the problem of reconstructing a full 360° photographic model of an object from a single image of it. We do so by fitting a neural radiance field to the image, but find this problem to be severely ill-posed. We thus take an off-the-self conditional image generator based on diffusion and engineer a prompt that encourages it to "dream up" novel views of the object. Using an approach inspi… ▽ More

    Submitted 23 February, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Project page: https://lukemelas.github.io/realfusion

  33. arXiv:2301.11280  [pdf, other

    cs.CV cs.AI cs.LG

    Text-To-4D Dynamic Scene Generation

    Authors: Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman

    Abstract: We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed from any camera locat… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  34. arXiv:2301.08730  [pdf, other

    cs.CV cs.SD eess.AS

    Novel-View Acoustic Synthesis

    Authors: Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi

    Abstract: We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint? We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space by analyzing the input audio-visual cues. To benc… ▽ More

    Submitted 24 October, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

    Comments: Accepted at CVPR 2023. Project page: https://vision.cs.utexas.edu/projects/nvas

  35. arXiv:2212.03236  [pdf, other

    cs.CV

    Self-Supervised Correspondence Estimation via Multiview Registration

    Authors: Mohamed El Banani, Ignacio Rocco, David Novotny, Andrea Vedaldi, Natalia Neverova, Justin Johnson, Benjamin Graham

    Abstract: Video provides us with the spatio-temporal consistency needed for visual learning. Recent approaches have utilized this signal to learn correspondence estimation from close-by frame pairs. However, by only relying on close-by frame pairs, those approaches miss out on the richer long-range consistency between distant overlapping frames. To address this, we propose a self-supervised approach for cor… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Accepted to WACV 2023. Project page: https://mbanani.github.io/syncmatch/

  36. arXiv:2211.12497  [pdf, other

    cs.CV

    MagicPony: Learning Articulated 3D Animals in the Wild

    Authors: Shangzhe Wu, Ruining Li, Tomas Jakab, Christian Rupprecht, Andrea Vedaldi

    Abstract: We consider the problem of predicting the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse given a single test image as input. We present a new method, dubbed MagicPony, that learns this predictor purely from in-the-wild single-view images of the object category, with minimal assumptions about the topology of deformation. At its core is an implicit-exp… ▽ More

    Submitted 3 April, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: CVPR 2023. Project Page: https://3dmagicpony.github.io/

  37. arXiv:2211.03889  [pdf, other

    cs.CV

    Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories

    Authors: Samarth Sinha, Roman Shapovalov, Jeremy Reizenstein, Ignacio Rocco, Natalia Neverova, Andrea Vedaldi, David Novotny

    Abstract: Obtaining photorealistic reconstructions of objects from sparse views is inherently ambiguous and can only be achieved by learning suitable reconstruction priors. Earlier works on sparse rigid object reconstruction successfully learned such priors from large datasets such as CO3D. In this paper, we extend this approach to dynamic objects. We use cats and dogs as a representative example and introd… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  38. arXiv:2210.12148  [pdf, other

    cs.CV cs.AI cs.LG

    Unsupervised Multi-object Segmentation by Predicting Probable Motion Patterns

    Authors: Laurynas Karazija, Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi

    Abstract: We propose a new approach to learn to segment multiple image objects without manual supervision. The method can extract objects form still images, but uses videos for supervision. While prior works have considered motion for segmentation, a key insight is that, while motion can be used to identify objects, not all objects are necessarily in motion: the absence of motion does not imply the absence… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

  39. arXiv:2209.03494  [pdf, other

    cs.CV cs.GR

    Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations

    Authors: Vadim Tschernezki, Iro Laina, Diane Larlus, Andrea Vedaldi

    Abstract: We present Neural Feature Fusion Fields (N3F), a method that improves dense 2D image feature extractors when the latter are applied to the analysis of multiple images reconstructible as a 3D scene. Given an image feature extractor, for example pre-trained using self-supervision, N3F uses it as a teacher to learn a student network defined in 3D space. The 3D student network is similar to a neural r… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: 3DV2022, Oral. Project page: https://www.robots.ox.ac.uk/~vadim/n3f/

  40. arXiv:2209.03268  [pdf, other

    cs.CV

    Measuring the Interpretability of Unsupervised Representations via Quantized Reverse Probing

    Authors: Iro Laina, Yuki M. Asano, Andrea Vedaldi

    Abstract: Self-supervised visual representation learning has recently attracted significant research interest. While a common way to evaluate self-supervised representations is through transfer to various downstream tasks, we instead investigate the problem of measuring their interpretability, i.e. understanding the semantics encoded in raw representations. We formulate the latter as estimating the mutual i… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: Published at ICLR 2022. Appendix included, 26 pages

  41. arXiv:2206.06340  [pdf, other

    cs.CV

    SNeS: Learning Probably Symmetric Neural Surfaces from Incomplete Data

    Authors: Eldar Insafutdinov, Dylan Campbell, João F. Henriques, Andrea Vedaldi

    Abstract: We present a method for the accurate 3D reconstruction of partly-symmetric objects. We build on the strengths of recent advances in neural reconstruction and rendering such as Neural Radiance Fields (NeRF). A major shortcoming of such approaches is that they fail to reconstruct any part of the object which is not clearly visible in the training image, which is often the case for in-the-wild images… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: First two authors contributed equally

  42. arXiv:2205.07844  [pdf, other

    cs.CV

    Guess What Moves: Unsupervised Video and Image Segmentation by Anticipating Motion

    Authors: Subhabrata Choudhury, Laurynas Karazija, Iro Laina, Andrea Vedaldi, Christian Rupprecht

    Abstract: Motion, measured via optical flow, provides a powerful cue to discover and learn objects in images and videos. However, compared to using appearance, it has some blind spots, such as the fact that objects become invisible if they do not move. In this work, we propose an approach that combines the strengths of motion-based and appearance-based segmentation. We propose to supervise an image segmenta… ▽ More

    Submitted 13 October, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

    Comments: BMVC 2022

  43. arXiv:2205.07839  [pdf, other

    cs.CV cs.AI

    Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization

    Authors: Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi

    Abstract: Unsupervised localization and segmentation are long-standing computer vision challenges that involve decomposing an image into semantically-meaningful segments without any labeled data. These tasks are particularly interesting in an unsupervised setting due to the difficulty and cost of obtaining dense image annotations, but existing unsupervised approaches struggle with complex scenes containing… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: Published at CVPR 2022. Project Page: https://lukemelas.github.io/deep-spectral-segmentation

  44. arXiv:2205.01668  [pdf, other

    cs.CV

    End-to-End Visual Editing with a Generatively Pre-Trained Artist

    Authors: Andrew Brown, Cheng-Yang Fu, Omkar Parkhi, Tamara L. Berg, Andrea Vedaldi

    Abstract: We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change. Differently from prior works, we solve this problem by learning a conditional probability distribution of the edits, end-to-end. Training such a model requires addressing a fundamental technical challenge: the lack of example edits for training. To this end, we… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

  45. arXiv:2201.02609  [pdf, other

    cs.CV cs.LG

    Generalized Category Discovery

    Authors: Sagar Vaze, Kai Han, Andrea Vedaldi, Andrew Zisserman

    Abstract: In this paper, we consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set. Here, the unlabelled images may come from labelled classes or from novel ones. Existing recognition methods are not able to deal with this setting, because they make several restrictive assumptions, such as the unl… ▽ More

    Submitted 18 June, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

    Comments: CVPR 22. Changes from pre-print highlighted in GitHub repo

  46. arXiv:2112.12761  [pdf, other

    cs.CV cs.GR

    BANMo: Building Animatable 3D Neural Models from Many Casual Videos

    Authors: Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea Vedaldi, Hanbyul Joo

    Abstract: Prior work for articulated 3D shape reconstruction often relies on specialized sensors (e.g., synchronized multi-camera systems), or pre-built 3D deformable models (e.g., SMAL or SMPL). Such methods are not able to scale to diverse sets of objects in the wild. We present BANMo, a method that requires neither a specialized sensor nor a pre-defined template shape. BANMo builds high-fidelity, articul… ▽ More

    Submitted 3 April, 2023; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: CVPR 2022 camera-ready version (last update: May 2022)

  47. arXiv:2112.04432  [pdf, other

    cs.CV eess.AS

    Audio-Visual Synchronisation in the wild

    Authors: Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

    Abstract: In this paper, we consider the problem of audio-visual synchronisation applied to videos `in-the-wild' (ie of general classes beyond speech). As a new task, we identify and curate a test set with high audio-visual correlation, namely VGG-Sound Sync. We compare a number of transformer-based architectural variants specifically designed to model audio and visual signals of arbitrary length, while sig… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

  48. arXiv:2111.06349  [pdf, other

    cs.CV cs.LG

    Unsupervised Part Discovery from Contrastive Reconstruction

    Authors: Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi

    Abstract: The goal of self-supervised visual representation learning is to learn strong, transferable image representations, with the majority of research focusing on object or scene level. On the other hand, representation learning at part level has received significantly less attention. In this paper, we propose an unsupervised approach to object part discovery and segmentation and make three contribution… ▽ More

    Submitted 21 March, 2022; v1 submitted 11 November, 2021; originally announced November 2021.

    Comments: NeurIPS 2021. Project page: https://www.robots.ox.ac.uk/~vgg/research/unsup-parts/

  49. arXiv:2111.03651  [pdf, other

    cs.CV cs.CL

    The Curious Layperson: Fine-Grained Image Recognition without Expert Labels

    Authors: Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi

    Abstract: Most of us are not experts in specific fields, such as ornithology. Nonetheless, we do have general image and language understanding capabilities that we use to match what we see to expert resources. This allows us to expand our knowledge and perform novel tasks without ad-hoc external supervision. On the contrary, machines have a much harder time consulting expert-curated knowledge bases unless t… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: To appear in BMVC 2021 (Oral). Project page: https://www.robots.ox.ac.uk/~vgg/research/clever/

  50. arXiv:2110.09936  [pdf, other

    cs.CV cs.GR

    NeuralDiff: Segmenting 3D objects that move in egocentric videos

    Authors: Vadim Tschernezki, Diane Larlus, Andrea Vedaldi

    Abstract: Given a raw video sequence taken from a freely-moving camera, we study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground containing the objects that move in the video sequence. This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large appar… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: 3DV2021. Project page: https://www.robots.ox.ac.uk/~vadim/neuraldiff/