Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–32 of 32 results for author: Laina, I

.
  1. arXiv:2405.10255  [pdf, other

    cs.CV cs.RO

    When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

    Authors: Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu

    Abstract: As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context lear… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  2. arXiv:2404.19758  [pdf, other

    cs.CV

    Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting

    Authors: Paul Engstler, Andrea Vedaldi, Iro Laina, Christian Rupprecht

    Abstract: 3D scene generation has quickly become a challenging new research direction, fueled by consistent improvements of 2D generative diffusion models. Most prior work in this area generates scenes by iteratively stitching newly generated frames with existing geometry. These works often depend on pre-trained monocular depth estimators to lift the generated images into 3D, fusing them with the existing s… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Project page: https://research.paulengstler.com/invisible-stitch/

  3. arXiv:2404.18929  [pdf, other

    cs.CV

    DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing

    Authors: Minghao Chen, Iro Laina, Andrea Vedaldi

    Abstract: We consider the problem of editing 3D objects and scenes based on open-ended language instructions. A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process, obviating the need for 3D data. However, this process is often inefficient due to the need for iterative updates of costly 3D representations, such as neural radiance fields, either through in… ▽ More

    Submitted 22 July, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: ECCV 2024. Project Page: https://silent-chen.github.io/DGE/

  4. arXiv:2403.10997  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields

    Authors: Yash Bhalgat, Iro Laina, João F. Henriques, Andrew Zisserman, Andrea Vedaldi

    Abstract: Understanding complex scenes at multiple levels of abstraction remains a formidable challenge in computer vision. To address this, we introduce Nested Neural Feature Fields (N2F2), a novel approach that employs hierarchical supervision to learn a single feature field, wherein different dimensions within the same high-dimensional feature encode scene properties at varying granularities. Our method… ▽ More

    Submitted 28 July, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: ECCV 2024

  5. arXiv:2402.08682  [pdf, other

    cs.CV cs.AI cs.LG

    IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

    Authors: Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos

    Abstract: Most text-to-3D generators build upon off-the-shelf text-to-image models trained on billions of images. They use variants of Score Distillation Sampling (SDS), which is slow, somewhat unstable, and prone to artifacts. A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly. In th… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  6. arXiv:2312.09246  [pdf, other

    cs.CV

    SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

    Authors: Minghao Chen, Junyu Xie, Iro Laina, Andrea Vedaldi

    Abstract: We propose a novel feed-forward 3D editing framework called Shap-Editor. Prior research on editing 3D objects primarily concentrated on editing individual objects by leveraging off-the-shelf 2D image editing networks. This is achieved via a process called distillation, which transfers knowledge from the 2D network to 3D assets. Distillation necessitates at least tens of minutes per asset to attain… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Project Page: https://silent-chen.github.io/Shap-Editor/

  7. arXiv:2311.14665  [pdf, other

    cs.CV

    Understanding Self-Supervised Features for Learning Unsupervised Instance Segmentation

    Authors: Paul Engstler, Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina

    Abstract: Self-supervised learning (SSL) can be used to solve complex visual tasks without human labels. Self-supervised representations encode useful semantic information about images, and as a result, they have already been used for tasks such as unsupervised semantic segmentation. In this paper, we investigate self-supervised representations for instance segmentation without any manual annotations. We fi… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  8. arXiv:2306.09316  [pdf, other

    cs.CV

    Diffusion Models for Zero-Shot Open-Vocabulary Segmentation

    Authors: Laurynas Karazija, Iro Laina, Andrea Vedaldi, Christian Rupprecht

    Abstract: The variety of objects in the real world is nearly unlimited and is thus impossible to capture using models trained on a fixed set of categories. As a result, in recent years, open-vocabulary methods have attracted the interest of the community. This paper proposes a new method for zero-shot open-vocabulary segmentation. Prior work largely relies on contrastive training using image-text pairs, lev… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Project page https://www.robots.ox.ac.uk/~vgg/research/ovdiff

  9. arXiv:2306.08731  [pdf, other

    cs.CV

    EPIC Fields: Marrying 3D Geometry and Video Understanding

    Authors: Vadim Tschernezki, Ahmad Darkhalil, Zhifan Zhu, David Fouhey, Iro Laina, Diane Larlus, Dima Damen, Andrea Vedaldi

    Abstract: Neural rendering is fuelling a unification of learning, 3D geometry and video understanding that has been waiting for more than two decades. Progress, however, is still hampered by a lack of suitable datasets and benchmarks. To address this gap, we introduce EPIC Fields, an augmentation of EPIC-KITCHENS with 3D camera information. Like other datasets for neural rendering, EPIC Fields removes the c… ▽ More

    Submitted 1 February, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Published at NeurIPS 2023. 24 pages, 15 figures. Project Webpage: http://epic-kitchens.github.io/epic-fields

  10. arXiv:2306.04633  [pdf, other

    cs.CV cs.AI cs.LG

    Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion

    Authors: Yash Bhalgat, Iro Laina, João F. Henriques, Andrew Zisserman, Andrea Vedaldi

    Abstract: Instance segmentation in 3D is a challenging task due to the lack of large-scale annotated datasets. In this paper, we show that this task can be addressed effectively by leveraging instead 2D pre-trained models for instance segmentation. We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation, which encourages multi-view consistency across fra… ▽ More

    Submitted 1 December, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (Spotlight). Code: https://github.com/yashbhalgat/Contrastive-Lift

  11. arXiv:2304.03373  [pdf, other

    cs.CV

    Training-Free Layout Control with Cross-Attention Guidance

    Authors: Minghao Chen, Iro Laina, Andrea Vedaldi

    Abstract: Recent diffusion-based generators can produce high-quality images from textual prompts. However, they often disregard textual instructions that specify the spatial layout of the composition. We propose a simple approach that achieves robust layout control without the need for training or fine-tuning of the image generator. Our technique manipulates the cross-attention layers that the model uses to… ▽ More

    Submitted 29 November, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: WACV 2024, Project Page: https://silent-chen.github.io/layout-guidance/

  12. arXiv:2302.10663  [pdf, other

    cs.CV cs.AI cs.LG

    RealFusion: 360° Reconstruction of Any Object from a Single Image

    Authors: Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi

    Abstract: We consider the problem of reconstructing a full 360° photographic model of an object from a single image of it. We do so by fitting a neural radiance field to the image, but find this problem to be severely ill-posed. We thus take an off-the-self conditional image generator based on diffusion and engineer a prompt that encourages it to "dream up" novel views of the object. Using an approach inspi… ▽ More

    Submitted 23 February, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Project page: https://lukemelas.github.io/realfusion

  13. arXiv:2210.12148  [pdf, other

    cs.CV cs.AI cs.LG

    Unsupervised Multi-object Segmentation by Predicting Probable Motion Patterns

    Authors: Laurynas Karazija, Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi

    Abstract: We propose a new approach to learn to segment multiple image objects without manual supervision. The method can extract objects form still images, but uses videos for supervision. While prior works have considered motion for segmentation, a key insight is that, while motion can be used to identify objects, not all objects are necessarily in motion: the absence of motion does not imply the absence… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

  14. arXiv:2209.03494  [pdf, other

    cs.CV cs.GR

    Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations

    Authors: Vadim Tschernezki, Iro Laina, Diane Larlus, Andrea Vedaldi

    Abstract: We present Neural Feature Fusion Fields (N3F), a method that improves dense 2D image feature extractors when the latter are applied to the analysis of multiple images reconstructible as a 3D scene. Given an image feature extractor, for example pre-trained using self-supervision, N3F uses it as a teacher to learn a student network defined in 3D space. The 3D student network is similar to a neural r… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: 3DV2022, Oral. Project page: https://www.robots.ox.ac.uk/~vadim/n3f/

  15. arXiv:2209.03268  [pdf, other

    cs.CV

    Measuring the Interpretability of Unsupervised Representations via Quantized Reverse Probing

    Authors: Iro Laina, Yuki M. Asano, Andrea Vedaldi

    Abstract: Self-supervised visual representation learning has recently attracted significant research interest. While a common way to evaluate self-supervised representations is through transfer to various downstream tasks, we instead investigate the problem of measuring their interpretability, i.e. understanding the semantics encoded in raw representations. We formulate the latter as estimating the mutual i… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: Published at ICLR 2022. Appendix included, 26 pages

  16. arXiv:2205.07844  [pdf, other

    cs.CV

    Guess What Moves: Unsupervised Video and Image Segmentation by Anticipating Motion

    Authors: Subhabrata Choudhury, Laurynas Karazija, Iro Laina, Andrea Vedaldi, Christian Rupprecht

    Abstract: Motion, measured via optical flow, provides a powerful cue to discover and learn objects in images and videos. However, compared to using appearance, it has some blind spots, such as the fact that objects become invisible if they do not move. In this work, we propose an approach that combines the strengths of motion-based and appearance-based segmentation. We propose to supervise an image segmenta… ▽ More

    Submitted 13 October, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

    Comments: BMVC 2022

  17. arXiv:2205.07839  [pdf, other

    cs.CV cs.AI

    Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization

    Authors: Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi

    Abstract: Unsupervised localization and segmentation are long-standing computer vision challenges that involve decomposing an image into semantically-meaningful segments without any labeled data. These tasks are particularly interesting in an unsupervised setting due to the difficulty and cost of obtaining dense image annotations, but existing unsupervised approaches struggle with complex scenes containing… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: Published at CVPR 2022. Project Page: https://lukemelas.github.io/deep-spectral-segmentation

  18. arXiv:2111.10265  [pdf, other

    cs.CV cs.LG

    ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation

    Authors: Laurynas Karazija, Iro Laina, Christian Rupprecht

    Abstract: There has been a recent surge in methods that aim to decompose and segment scenes into multiple objects in an unsupervised manner, i.e., unsupervised multi-object segmentation. Performing such a task is a long-standing goal of computer vision, offering to unlock object-level reasoning without requiring dense annotations to train segmentation models. Despite significant progress, current models are… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

    Comments: NeurIPS 2021 Datasets and Benchmarks

  19. arXiv:2111.06349  [pdf, other

    cs.CV cs.LG

    Unsupervised Part Discovery from Contrastive Reconstruction

    Authors: Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi

    Abstract: The goal of self-supervised visual representation learning is to learn strong, transferable image representations, with the majority of research focusing on object or scene level. On the other hand, representation learning at part level has received significantly less attention. In this paper, we propose an unsupervised approach to object part discovery and segmentation and make three contribution… ▽ More

    Submitted 21 March, 2022; v1 submitted 11 November, 2021; originally announced November 2021.

    Comments: NeurIPS 2021. Project page: https://www.robots.ox.ac.uk/~vgg/research/unsup-parts/

  20. arXiv:2111.03651  [pdf, other

    cs.CV cs.CL

    The Curious Layperson: Fine-Grained Image Recognition without Expert Labels

    Authors: Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi

    Abstract: Most of us are not experts in specific fields, such as ornithology. Nonetheless, we do have general image and language understanding capabilities that we use to match what we see to expert resources. This allows us to expand our knowledge and perform novel tasks without ad-hoc external supervision. On the contrary, machines have a much harder time consulting expert-curated knowledge bases unless t… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: To appear in BMVC 2021 (Oral). Project page: https://www.robots.ox.ac.uk/~vgg/research/clever/

  21. arXiv:2105.08127  [pdf, other

    cs.CV cs.AI

    Finding an Unsupervised Image Segmenter in Each of Your Deep Generative Models

    Authors: Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi

    Abstract: Recent research has shown that numerous human-interpretable directions exist in the latent space of GANs. In this paper, we develop an automatic procedure for finding directions that lead to foreground-background image separation, and we use these directions to train an image segmentation model without human supervision. Our method is generator-agnostic, producing strong segmentation results with… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Project page and GitHub link: https://lukemelas.github.io/unsupervised-image-segmentation & https://github.com/lukemelas/unsupervised-image-segmentation

  22. arXiv:2010.14551  [pdf, other

    cs.CV

    Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning

    Authors: Iro Laina, Ruth C. Fong, Andrea Vedaldi

    Abstract: The increasing impact of black box models, and particularly of unsupervised ones, comes with an increasing interest in tools to understand and interpret them. In this paper, we consider in particular how to characterise visual groupings discovered automatically by deep neural networks, starting with state-of-the-art clustering methods. In some cases, clusters readily correspond to an existing labe… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020

  23. arXiv:2004.03677  [pdf, other

    cs.CV

    Semantic Image Manipulation Using Scene Graphs

    Authors: Helisa Dhamo, Azade Farshad, Iro Laina, Nassir Navab, Gregory D. Hager, Federico Tombari, Christian Rupprecht

    Abstract: Image manipulation can be considered a special case of image generation where the image to be produced is a modification of an existing image. Image generation and manipulation have been, for the most part, tasks that operate on raw pixels. However, the remarkable progress in learning rich image and object representations has opened the way for tasks such as text-to-image or layout-to-image genera… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

    Comments: CVPR 2020

  24. arXiv:1908.09317  [pdf, other

    cs.CV

    Towards Unsupervised Image Captioning with Shared Multimodal Embeddings

    Authors: Iro Laina, Christian Rupprecht, Nassir Navab

    Abstract: Understanding images without explicit supervision has become an important problem in computer vision. In this paper, we address image captioning by generating language descriptions of scenes without learning from annotated pairs of images and their captions. The core component of our approach is a shared latent space that is structured by visual concepts. In this space, the two modalities should b… ▽ More

    Submitted 25 August, 2019; originally announced August 2019.

    Comments: ICCV 2019

  25. arXiv:1902.06426  [pdf, other

    cs.CV

    2017 Robotic Instrument Segmentation Challenge

    Authors: Max Allan, Alex Shvets, Thomas Kurmann, Zichen Zhang, Rahul Duggal, Yun-Hsuan Su, Nicola Rieke, Iro Laina, Niveditha Kalavakonda, Sebastian Bodenstedt, Luis Herrera, Wenqi Li, Vladimir Iglovikov, Huoling Luo, Jian Yang, Danail Stoyanov, Lena Maier-Hein, Stefanie Speidel, Mahdi Azizian

    Abstract: In mainstream computer vision and machine learning, public datasets such as ImageNet, COCO and KITTI have helped drive enormous improvements by enabling researchers to understand the strengths and limitations of different algorithms via performance comparison. However, this type of approach has had limited translation to problems in robotic assisted surgery as this field has never established the… ▽ More

    Submitted 21 February, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

  26. arXiv:1811.00793  [pdf, other

    cs.CV

    Dealing with Ambiguity in Robotic Grasping via Multiple Predictions

    Authors: Ghazal Ghazaei, Iro Laina, Christian Rupprecht, Federico Tombari, Nassir Navab, Kianoush Nazarpour

    Abstract: Humans excel in grasping and manipulating objects because of their life-long experience and knowledge about the 3D shape and weight distribution of objects. However, the lack of such intuition in robots makes robotic grasping an exceptionally challenging task. There are often several equally viable options of grasping an object. However, this ambiguity is not modeled in conventional systems that e… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

    Comments: ACCV 2018

  27. Peeking Behind Objects: Layered Depth Prediction from a Single Image

    Authors: Helisa Dhamo, Keisuke Tateno, Iro Laina, Nassir Navab, Federico Tombari

    Abstract: While conventional depth estimation can infer the geometry of a scene from a single RGB image, it fails to estimate scene regions that are occluded by foreground objects. This limits the use of depth prediction in augmented and virtual reality applications, that aim at scene exploration by synthesizing the scene from a different vantage point, or at diminished reality. To address this issue, we sh… ▽ More

    Submitted 23 July, 2018; originally announced July 2018.

  28. arXiv:1803.11544  [pdf, other

    cs.CV

    Guide Me: Interacting with Deep Networks

    Authors: Christian Rupprecht, Iro Laina, Nassir Navab, Gregory D. Hager, Federico Tombari

    Abstract: Interaction and collaboration between humans and intelligent machines has become increasingly important as machine learning methods move into real-world applications that involve end users. While much prior work lies at the intersection of natural language and vision, such as image captioning or image generation from text descriptions, less focus has been placed on the use of language to guide or… ▽ More

    Submitted 30 March, 2018; originally announced March 2018.

    Comments: CVPR 2018

  29. arXiv:1704.03489  [pdf, other

    cs.CV

    CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction

    Authors: Keisuke Tateno, Federico Tombari, Iro Laina, Nassir Navab

    Abstract: Given the recent advances in depth prediction from Convolutional Neural Networks (CNNs), this paper investigates how predicted depth maps from a deep neural network can be deployed for accurate and dense monocular reconstruction. We propose a method where CNN-predicted dense depth maps are naturally fused together with depth measurements obtained from direct monocular SLAM. Our fusion scheme privi… ▽ More

    Submitted 11 April, 2017; originally announced April 2017.

    Comments: 10 pages, 6 figures, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, June, 2017. The first two authors contribute equally to this paper

  30. arXiv:1703.10701  [pdf, other

    cs.CV

    Concurrent Segmentation and Localization for Tracking of Surgical Instruments

    Authors: Iro Laina, Nicola Rieke, Christian Rupprecht, Josué Page Vizcaíno, Abouzar Eslami, Federico Tombari, Nassir Navab

    Abstract: Real-time instrument tracking is a crucial requirement for various computer-assisted interventions. In order to overcome problems such as specular reflections and motion blur, we propose a novel method that takes advantage of the interdependency between localization and segmentation of the surgical tool. In particular, we reformulate the 2D instrument pose estimation as heatmap regression and ther… ▽ More

    Submitted 1 August, 2017; v1 submitted 30 March, 2017; originally announced March 2017.

    Comments: I. Laina and N. Rieke contributed equally to this work. Accepted to MICCAI 2017

  31. arXiv:1612.00197  [pdf, other

    cs.CV

    Learning in an Uncertain World: Representing Ambiguity Through Multiple Hypotheses

    Authors: Christian Rupprecht, Iro Laina, Robert DiPietro, Maximilian Baust, Federico Tombari, Nassir Navab, Gregory D. Hager

    Abstract: Many prediction tasks contain uncertainty. In some cases, uncertainty is inherent in the task itself. In future prediction, for example, many distinct outcomes are equally valid. In other cases, uncertainty arises from the way data is labeled. For example, in object detection, many objects of interest often go unlabeled, and in human pose estimation, occluded joints are often labeled with ambiguou… ▽ More

    Submitted 22 August, 2017; v1 submitted 1 December, 2016; originally announced December 2016.

    Comments: ICCV 2017

  32. arXiv:1606.00373  [pdf, other

    cs.CV

    Deeper Depth Prediction with Fully Convolutional Residual Networks

    Authors: Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, Nassir Navab

    Abstract: This paper addresses the problem of estimating the depth map of a scene given a single RGB image. We propose a fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps. In order to improve the output resolution, we present a novel way to efficiently learn feature map up-sampling within the network. For optimization, we… ▽ More

    Submitted 19 September, 2016; v1 submitted 1 June, 2016; originally announced June 2016.

    Comments: Published at IEEE International Conference on 3D Vision (3DV) 2016