Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 60 results for author: Savva, M

.
  1. arXiv:2408.02211  [pdf, other

    cs.GR

    SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements

    Authors: Hou In Ivan Tam, Hou In Derek Pun, Austin T. Wang, Angel X. Chang, Manolis Savva

    Abstract: Despite advances in text-to-3D generation methods, generation of multi-object arrangements remains challenging. Current methods exhibit failures in generating physically plausible arrangements that respect the provided text description. We present SceneMotifCoder (SMC), an example-driven framework for generating 3D object arrangements through visual program learning. SMC leverages large language m… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  2. arXiv:2403.14937  [pdf, other

    cs.CV

    Survey on Modeling of Articulated Objects

    Authors: Jiayi Liu, Manolis Savva, Ali Mahdavi-Amiri

    Abstract: 3D modeling of articulated objects is a research problem within computer vision, graphics, and robotics. Its objective is to understand the shape and motion of the articulated components, represent the geometry and mobility of object parts, and create realistic models that reflect articulated objects in the real world. This survey provides a comprehensive overview of the current state-of-the-art i… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  3. arXiv:2403.13289  [pdf, other

    cs.CV

    Text-to-3D Shape Generation

    Authors: Han-Hung Lee, Manolis Savva, Angel X. Chang

    Abstract: Recent years have seen an explosion of work and interest in text-to-3D shape generation. Much of the progress is driven by advances in 3D representations, large-scale pretraining and representation learning for text and image data enabling generative AI models, and differentiable rendering. Computational systems that can perform text-to-3D shape generation have captivated the popular imagination a… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  4. arXiv:2403.12301  [pdf, other

    cs.CV

    R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding

    Authors: Qirui Wu, Sonia Raychaudhuri, Daniel Ritchie, Manolis Savva, Angel X Chang

    Abstract: We introduce the Reality-linked 3D Scenes (R3DS) dataset of synthetic 3D scenes mirroring the real-world scene arrangements from Matterport3D panoramas. Compared to prior work, R3DS has more complete and densely populated scenes with objects linked to real-world observations in panoramas. R3DS also provides an object support hierarchy, and matching object sets (e.g., same chairs around a dining ta… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  5. arXiv:2401.00405  [pdf, other

    cs.CV

    Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects

    Authors: Qirui Wu, Daniel Ritchie, Manolis Savva, Angel X. Chang

    Abstract: Single-view 3D shape retrieval is a challenging task that is increasingly important with the growth of available 3D data. Prior work that has studied this task has not focused on evaluating how realistic occlusions impact performance, and how shape retrieval methods generalize to scenarios where either the target 3D shape database contains unseen shapes, or the input image contains unseen objects.… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  6. arXiv:2312.09570  [pdf, other

    cs.CV

    CAGE: Controllable Articulation GEneration

    Authors: Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi-Amiri, Manolis Savva

    Abstract: We address the challenge of generating 3D articulated objects in a controllable fashion. Currently, modeling articulated 3D objects is either achieved through laborious manual authoring, or using methods from prior work that are hard to scale and control directly. We leverage the interplay between part shape, connectivity, and motion using a denoising diffusion-based method with attention modules… ▽ More

    Submitted 20 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: CVPR 2024. Project page: https://3dlg-hcvc.github.io/cage/

  7. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

  8. arXiv:2310.13135  [pdf, other

    cs.CV

    LeTFuser: Light-weight End-to-end Transformer-Based Sensor Fusion for Autonomous Driving with Multi-Task Learning

    Authors: Pedram Agand, Mohammad Mahdavian, Manolis Savva, Mo Chen

    Abstract: In end-to-end autonomous driving, the utilization of existing sensor fusion techniques and navigational control methods for imitation learning proves inadequate in challenging situations that involve numerous dynamic agents. To address this issue, we introduce LeTFuser, a lightweight transformer-based algorithm for fusing multiple RGB-D camera representations. To perform perception and control tas… ▽ More

    Submitted 1 December, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: 11 pages, 2 figures, 3 tables. CVPR Workshops (VCAD). 2023

  9. arXiv:2308.07391  [pdf, other

    cs.CV cs.AI cs.GR

    PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects

    Authors: Jiayi Liu, Ali Mahdavi-Amiri, Manolis Savva

    Abstract: We address the task of simultaneous part-level reconstruction and motion parameter estimation for articulated objects. Given two sets of multi-view images of an object in two static articulation states, we decouple the movable part from the static part and reconstruct shape and appearance while predicting the motion parameters. To tackle this problem, we present PARIS: a self-supervised, end-to-en… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: Presented at ICCV 2023. Project website: https://3dlg-hcvc.github.io/paris/

  10. arXiv:2306.11565  [pdf, other

    cs.RO cs.AI cs.CV

    HomeRobot: Open-Vocabulary Mobile Manipulation

    Authors: Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander William Clegg, John Turner, Zsolt Kira, Manolis Savva, Angel Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton

    Abstract: HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it invol… ▽ More

    Submitted 10 January, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 37 pages, 22 figures, 8 tables

  11. arXiv:2306.11290  [pdf, other

    cs.CV

    Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation

    Authors: Mukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X. Chang, Manolis Savva

    Abstract: We contribute the Habitat Synthetic Scene Dataset, a dataset of 211 high-quality 3D scenes, and use it to test navigation agent generalization to realistic 3D environments. Our dataset represents real interiors and contains a diverse set of 18,656 models of real-world objects. We investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find… ▽ More

    Submitted 7 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

  12. arXiv:2305.18557  [pdf, other

    cs.CV

    Evaluating 3D Shape Analysis Methods for Robustness to Rotation Invariance

    Authors: Supriya Gadi Patil, Angel X. Chang, Manolis Savva

    Abstract: This paper analyzes the robustness of recent 3D shape descriptors to SO(3) rotations, something that is fundamental to shape modeling. Specifically, we formulate the task of rotated 3D object instance detection. To do so, we consider a database of 3D indoor scenes, where objects occur in different orientations. We benchmark different methods for feature extraction and classification in the context… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 20th Conference on Robots and Vision (CRV) 2023

  13. arXiv:2304.03696  [pdf, other

    cs.RO cs.CV

    MOPA: Modular Object Navigation with PointGoal Agents

    Authors: Sonia Raychaudhuri, Tommaso Campari, Unnat Jain, Manolis Savva, Angel X. Chang

    Abstract: We propose a simple but effective modular approach MOPA (Modular ObjectNav with PointGoal agents) to systematically investigate the inherent modularity of the object navigation task in Embodied AI. MOPA consists of four modules: (a) an object detection module trained to identify objects from RGB images, (b) a map building module to build a semantic map of the observed objects, (c) an exploration m… ▽ More

    Submitted 27 January, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

  14. arXiv:2304.03188  [pdf, other

    cs.GR

    Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes

    Authors: Akshay Gadi Patil, Supriya Gadi Patil, Manyi Li, Matthew Fisher, Manolis Savva, Hao Zhang

    Abstract: This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning models for such scene mod… ▽ More

    Submitted 21 August, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: Published in Computer Graphics Forum, Aug 2023

  15. arXiv:2303.14087  [pdf, other

    cs.CV

    OPDMulti: Openable Part Detection for Multiple Objects

    Authors: Xiaohao Sun, Hanxiao Jiang, Manolis Savva, Angel Xuan Chang

    Abstract: Openable part detection is the task of detecting the openable parts of an object in a single-view image, and predicting corresponding motion parameters. Prior work investigated the unrealistic setting where all input images only contain a single openable object. We generalize this task to scenes with multiple objects each potentially possessing openable parts, and create a corresponding dataset ba… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  16. arXiv:2301.13261  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    Emergence of Maps in the Memories of Blind Navigation Agents

    Authors: Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra

    Abstract: Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines -- specifically, artificial intelligence (AI) navigation agents -- also build implicit (or 'mental') maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks ac… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: Accepted to ICLR 2023

  17. arXiv:2210.06849  [pdf, other

    cs.CV

    Retrospectives on the Embodied AI Workshop

    Authors: Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi , et al. (14 additional authors not shown)

    Abstract: We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of… ▽ More

    Submitted 4 December, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

  18. arXiv:2210.05633  [pdf, other

    cs.CV

    Habitat-Matterport 3D Semantics Dataset

    Authors: Karmesh Yadav, Ram Ramrakhya, Santhosh Kumar Ramakrishnan, Theo Gervet, John Turner, Aaron Gokaslan, Noah Maestre, Angel Xuan Chang, Dhruv Batra, Manolis Savva, Alexander William Clegg, Devendra Singh Chaplot

    Abstract: We present the Habitat-Matterport 3D Semantics (HM3DSEM) dataset. HM3DSEM is the largest dataset of 3D real-world spaces with densely annotated semantics that is currently available to the academic community. It consists of 142,646 object instance annotations across 216 3D spaces and 3,100 rooms within those spaces. The scale, quality, and diversity of object annotations far exceed those of prior… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: 15 Pages, 11 Figures, 6 Tables

  19. arXiv:2209.05612  [pdf, other

    cs.CV

    Articulated 3D Human-Object Interactions from RGB Videos: An Empirical Analysis of Approaches and Challenges

    Authors: Sanjay Haresh, Xiaohao Sun, Hanxiao Jiang, Angel X. Chang, Manolis Savva

    Abstract: Human-object interactions with articulated objects are common in everyday life. Despite much progress in single-view 3D reconstruction, it is still challenging to infer an articulated 3D object model from an RGB video showing a person manipulating the object. We canonicalize the task of articulated 3D human-object interaction reconstruction from RGB video, and carry out a systematic benchmark of f… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: 3DV 2022

  20. arXiv:2205.03797  [pdf, other

    cs.NI

    Fuzzy-Logic Based IDS for Detecting Jamming Attacks in Wireless Mesh IoT Networks

    Authors: Michael Savva, Iacovos Ioannou, Vasos Vassiliou

    Abstract: The investigation in this paper targets the design and the evaluation of jamming intrusion detection based on Fuzzy Logic in wireless mesh IoT Networks in a distributed manner. Our approach uses information collected at local nodes and from the sink as input to the fuzzy logic controller. In order to find the best set of inputs, distributed or centralized, we made a comparison between five differe… ▽ More

    Submitted 8 May, 2022; originally announced May 2022.

  21. arXiv:2203.16421  [pdf, other

    cs.CV

    OPD: Single-view 3D Openable Part Detection

    Authors: Hanxiao Jiang, Yongsen Mao, Manolis Savva, Angel X. Chang

    Abstract: We address the task of predicting what parts of an object can open and how they move when they do so. The input is a single image of an object, and as output we detect what parts of the object can open, and the motion parameters describing the articulation of each openable part. To tackle this task, we create two datasets of 3D objects: OPDSynth based on existing synthetic objects, and OPDReal bas… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  22. arXiv:2112.07022  [pdf, other

    cs.GR cs.CV cs.LG

    Learning Body-Aware 3D Shape Generative Models

    Authors: Bryce Blinn, Alexander Ding, R. Kenny Jones, Manolis Savva, Srinath Sridhar, Daniel Ritchie

    Abstract: The shape of many objects in the built environment is dictated by their relationships to the human body: how will a person interact with this object? Existing data-driven generative models of 3D shapes produce plausible objects but do not reason about the relationship of those objects to the human body. In this paper, we learn body-aware generative models of 3D shapes. Specifically, we train gener… ▽ More

    Submitted 20 January, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: 11 pages, 8 figures

  23. Roominoes: Generating Novel 3D Floor Plans From Existing 3D Rooms

    Authors: Kai Wang, Xianghao Xu, Leon Lei, Selena Ling, Natalie Lindsay, Angel X. Chang, Manolis Savva, Daniel Ritchie

    Abstract: Realistic 3D indoor scene datasets have enabled significant recent progress in computer vision, scene understanding, autonomous navigation, and 3D reconstruction. But the scale, diversity, and customizability of existing datasets is limited, and it is time-consuming and expensive to scan and annotate more. Fortunately, combinatorics is on our side: there are enough individual rooms in existing 3D… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Comments: Symposium on Geometry Processing (SGP) 2021

    Journal ref: Computer Graphics Forum, 40: 57-69 (2021)

  24. arXiv:2110.05769  [pdf, other

    cs.CV cs.AI cs.LG cs.MA

    Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents

    Authors: Shivansh Patel, Saim Wani, Unnat Jain, Alexander Schwing, Svetlana Lazebnik, Manolis Savva, Angel X. Chang

    Abstract: Communication between embodied AI agents has received increasing attention in recent years. Despite its use, it is still unclear whether the learned communication is interpretable and grounded in perception. To study the grounding of emergent forms of communication, we first introduce the collaborative multi-object navigation task CoMON. In this task, an oracle agent has detailed environment infor… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: Project page: https://shivanshpatel35.github.io/comon/ ; the first three authors contributed equally

  25. arXiv:2109.08238  [pdf, other

    cs.CV cs.AI

    Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI

    Authors: Santhosh K. Ramakrishnan, Aaron Gokaslan, Erik Wijmans, Oleksandr Maksymets, Alex Clegg, John Turner, Eric Undersander, Wojciech Galuba, Andrew Westbury, Angel X. Chang, Manolis Savva, Yili Zhao, Dhruv Batra

    Abstract: We present the Habitat-Matterport 3D (HM3D) dataset. HM3D is a large-scale dataset of 1,000 building-scale 3D reconstructions from a diverse set of real-world locations. Each scene in the dataset consists of a textured 3D mesh reconstruction of interiors such as multi-floor residences, stores, and other private indoor spaces. HM3D surpasses existing datasets available for academic research in te… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: 21 pages, 14 figures

  26. arXiv:2108.08420  [pdf, other

    cs.CV

    D3D-HOI: Dynamic 3D Human-Object Interactions from Videos

    Authors: Xiang Xu, Hanbyul Joo, Greg Mori, Manolis Savva

    Abstract: We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions. Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints. Each manipulated object (e.g., microwave oven) is represented with a matching 3D parametric model. This data allows us to… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

  27. arXiv:2106.14405  [pdf, other

    cs.LG cs.RO

    Habitat 2.0: Training Home Assistants to Rearrange their Habitat

    Authors: Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra

    Abstract: We introduce Habitat 2.0 (H2.0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios. We make comprehensive contributions to all levels of the embodied AI stack - data, simulation, and benchmark tasks. Specifically, we present: (i) ReplicaCAD: an artist-authored, annotated, reconfigurable 3D dataset of apartments (matching real spa… ▽ More

    Submitted 1 July, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

  28. arXiv:2106.06629  [pdf, other

    cs.CV

    Mirror3D: Depth Refinement for Mirror Surfaces

    Authors: Jiaqi Tan, Weijie Lin, Angel X. Chang, Manolis Savva

    Abstract: Despite recent progress in depth sensing and 3D reconstruction, mirror surfaces are a significant source of errors. To address this problem, we create the Mirror3D dataset: a 3D mirror plane dataset based on three RGBD datasets (Matterport3D, NYUv2 and ScanNet) containing 7,011 mirror instance masks and 3D planes. We then develop Mirror3DNet: a module that refines raw sensor depth or estimated dep… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: Paper presented at CVPR 2021. For code, data and pretrained models, see https://3dlg-hcvc.github.io/mirror3d/

  29. arXiv:2106.05375  [pdf, other

    cs.CV cs.GR

    Plan2Scene: Converting Floorplans to 3D Scenes

    Authors: Madhawa Vidanapathirana, Qirui Wu, Yasutaka Furukawa, Angel X. Chang, Manolis Savva

    Abstract: We address the task of converting a floorplan and a set of associated photos of a residence into a textured 3D mesh model, a task which we call Plan2Scene. Our system 1) lifts a floorplan image to a 3D mesh model; 2) synthesizes surface textures based on the input photos; and 3) infers textures for unobserved surfaces using a graph neural network architecture. To train and evaluate our system we c… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: This paper is accepted to CVPR 2021. For code, data and pretrained models, see https://3dlg-hcvc.github.io/plan2scene/

  30. arXiv:2103.07013  [pdf, other

    cs.LG cs.AI cs.CV cs.GR

    Large Batch Simulation for Deep Reinforcement Learning

    Authors: Brennan Shacklett, Erik Wijmans, Aleksei Petrenko, Manolis Savva, Dhruv Batra, Vladlen Koltun, Kayvon Fatahalian

    Abstract: We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19,000 frames of experience per second on a single GPU and up to 72,000 frames per second on a single eight-GPU machine. The key idea of our approach is to design a 3D renderer and embodied navigation simulator around… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: Published as a conference paper at ICLR 2021

  31. arXiv:2012.06547  [pdf, other

    cs.CV cs.IR

    LayoutGMN: Neural Graph Matching for Structural Layout Similarity

    Authors: Akshay Gadi Patil, Manyi Li, Matthew Fisher, Manolis Savva, Hao Zhang

    Abstract: We present a deep neural network to predict structural similarity between 2D layouts by leveraging Graph Matching Networks (GMN). Our network, coined LayoutGMN, learns the layout metric via neural graph matching, using an attention-based GMN designed under a triplet network setting. To train our network, we utilize weak labels obtained by pixel-wise Intersection-over-Union (IoUs) to define the tri… ▽ More

    Submitted 5 April, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

  32. arXiv:2012.03912  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation

    Authors: Saim Wani, Shivansh Patel, Unnat Jain, Angel X. Chang, Manolis Savva

    Abstract: Navigation tasks in photorealistic 3D environments are challenging because they require perception and effective planning under partial observability. Recent work shows that map-like memory is useful for long-horizon navigation tasks. However, a focused investigation of the impact of maps on navigation tasks of varying complexity has not yet been performed. We propose the multiON task, which requi… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

    Comments: Project page: https://shivanshpatel35.github.io/multi-ON/ ; the first three authors contributed equally

  33. arXiv:2011.01975  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    Rearrangement: A Challenge for Embodied AI

    Authors: Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su

    Abstract: We describe a framework for research and evaluation in Embodied AI. Our proposal is based on a canonical task: Rearrangement. A standard task can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings. In the rearrangement task, the goal is to bring a given physical environment into a specified state. The goal state can be specifie… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: Authors are listed in alphabetical order

  34. arXiv:2008.02203  [pdf, ps, other

    physics.flu-dyn physics.ao-ph

    Inertia-gravity-wave scattering by geostrophic turbulence

    Authors: Miles A. C. Savva, Hossein A. Kafiabad, Jacques Vanneste

    Abstract: In rotating stratified flows including in the atmosphere and ocean, inertia-gravity waves (IGWs) often coexist with a geostrophically balanced turbulent flow. Advection and refraction by this flow lead to wave scattering, redistributing IGW energy in the position--wavenumber phase space. We give a detailed description of this process by deriving a kinetic equation governing the evolution of the IG… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

  35. arXiv:2007.02919  [pdf, other

    cs.CV

    MCMI: Multi-Cycle Image Translation with Mutual Information Constraints

    Authors: Xiang Xu, Megha Nawhal, Greg Mori, Manolis Savva

    Abstract: We present a mutual information-based framework for unsupervised image-to-image translation. Our MCMI approach treats single-cycle image translation models as modules that can be used recurrently in a multi-cycle translation setting where the translation process is bounded by mutual information constraints between the input and output images. The proposed mutual information constraints can improve… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

  36. arXiv:2006.13171  [pdf, other

    cs.CV cs.RO

    ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects

    Authors: Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans

    Abstract: We revisit the problem of Object-Goal Navigation (ObjectNav). In its simplest form, ObjectNav is defined as the task of navigating to an object, specified by its label, in an unexplored environment. In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e.g., find a chair, by navigating to it. As the community… ▽ More

    Submitted 30 August, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

  37. arXiv:1912.06321  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance?

    Authors: Abhishek Kadian, Joanne Truong, Aaron Gokaslan, Alexander Clegg, Erik Wijmans, Stefan Lee, Manolis Savva, Sonia Chernova, Dhruv Batra

    Abstract: Does progress in simulation translate to progress on robots? If one method outperforms another in simulation, how likely is that trend to hold in reality on a robot? We examine this question for embodied PointGoal navigation, developing engineering tools and a research paradigm for evaluating a simulator by its sim2real predictivity. First, we develop Habitat-PyRobot Bridge (HaPy), a library for s… ▽ More

    Submitted 16 August, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

    Journal ref: IEEE Robotics and Automation Letters (RA-L) 2020

  38. arXiv:1911.00357  [pdf, other

    cs.CV cs.AI cs.LG

    DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

    Authors: Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra

    Abstract: We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever stale), making it conceptually simple and easy to implement. In our experiments on training virtua… ▽ More

    Submitted 19 January, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

  39. arXiv:1909.13165  [pdf, other

    cs.RO cs.AI cs.LG

    Relational Graph Learning for Crowd Navigation

    Authors: Changan Chen, Sha Hu, Payam Nikdel, Greg Mori, Manolis Savva

    Abstract: We present a relational graph learning approach for robotic crowd navigation using model-based deep reinforcement learning that plans actions by looking into the future. Our approach reasons about the relations between all agents based on their latent features and uses a Graph Convolutional Network to encode higher-order interactions in each agent's state representation, which is subsequently leve… ▽ More

    Submitted 3 August, 2020; v1 submitted 28 September, 2019; originally announced September 2019.

    Comments: Accepted to IROS 2020. Added links to codes and video demo

  40. arXiv:1906.05797  [pdf, other

    cs.CV cs.GR eess.IV

    The Replica Dataset: A Digital Replica of Indoor Spaces

    Authors: Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra , et al. (5 additional authors not shown)

    Abstract: We introduce Replica, a dataset of 18 highly photo-realistic 3D indoor scene reconstructions at room and building scale. Each scene consists of a dense mesh, high-resolution high-dynamic-range (HDR) textures, per-primitive semantic class and instance information, and planar mirror and glass reflectors. The goal of Replica is to enable machine learning (ML) research that relies on visually, geometr… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

  41. arXiv:1904.01201  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    Habitat: A Platform for Embodied AI Research

    Authors: Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra

    Abstract: We present Habitat, a platform for research in embodied artificial intelligence (AI). Habitat enables training embodied agents (virtual robots) in highly efficient photorealistic 3D simulation. Specifically, Habitat consists of: (i) Habitat-Sim: a flexible, high-performance 3D simulator with configurable agents, sensors, and generic 3D dataset handling. Habitat-Sim is fast -- when rendering a scen… ▽ More

    Submitted 24 November, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

    Comments: ICCV 2019

  42. arXiv:1903.03757  [pdf, other

    cs.CV

    Hierarchy Denoising Recursive Autoencoders for 3D Scene Layout Prediction

    Authors: Yifei Shi, Angel Xuan Chang, Zhelun Wu, Manolis Savva, Kai Xu

    Abstract: Indoor scenes exhibit rich hierarchical structure in 3D object layouts. Many tasks in 3D scene understanding can benefit from reasoning jointly about the hierarchical context of a scene, and the identities of objects. We present a variational denoising recursive autoencoder (VDRAE) that generates and iteratively refines a hierarchical representation of 3D object layouts, interleaving bottom-up enc… ▽ More

    Submitted 10 April, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

    Comments: CVPR 2019

  43. arXiv:1902.03997  [pdf, ps, other

    physics.ao-ph physics.flu-dyn

    Diffusion of inertia-gravity waves by geostrophic turbulence

    Authors: Hossein Kafiabad, Miles A. C. Savva, Jacques Vanneste

    Abstract: The scattering of inertia-gravity waves by large-scale geostrophic turbulence in a rapidly rotating, strongly stratified fluid leads to the diffusion of wave energy on the constant-frequency cone in wavenumber space. We derive the corresponding diffusion equation and relate its diffusivity to the wave characteristics and the energy spectrum of the turbulent flow. We check the predictions of this e… ▽ More

    Submitted 9 April, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

  44. arXiv:1811.11187  [pdf, other

    cs.CV

    Scan2CAD: Learning CAD Model Alignment in RGB-D Scans

    Authors: Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X. Chang, Matthias Nießner

    Abstract: We present Scan2CAD, a novel data-driven method that learns to align clean 3D CAD models from a shape database to the noisy and incomplete geometry of a commodity RGB-D scan. For a 3D reconstruction of an indoor scene, our method takes as input a set of CAD models, and predicts a 9DoF pose that aligns each model to the underlying scan geometry. To tackle this problem, we create a new scan-to-CAD a… ▽ More

    Submitted 27 November, 2018; originally announced November 2018.

    Comments: Video: https://youtu.be/PiHSYpgLTfA

  45. arXiv:1807.06757  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    On Evaluation of Embodied Navigation Agents

    Authors: Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir

    Abstract: Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence. The past two years have seen a surge of creative work on navigation. This creative output has produced a plethora of sometimes incompatible task definitions and evaluation protocols. To coordinate ongoing and future research in this area, we have convened a working group to study emp… ▽ More

    Submitted 17 July, 2018; originally announced July 2018.

    Comments: Report of a working group on empirical methodology in navigation research. Authors are listed in alphabetical order

  46. arXiv:1804.06347  [pdf, other

    physics.ao-ph physics.flu-dyn

    Scattering of internal tides by barotropic quasigeostrophic flows

    Authors: Miles A. C. Savva, Jacques Vanneste

    Abstract: Oceanic internal tides and other inertia-gravity waves propagate in an energetic turbulent flow whose lengthscales are similar to the wavelengths. Advection and refraction by this flow cause the scattering of the waves, redistributing their energy in wavevector space. As a result, initially plane waves radiated from a source such as a topographic ridge become spatially incoherent away from the sou… ▽ More

    Submitted 20 August, 2018; v1 submitted 17 April, 2018; originally announced April 2018.

    Comments: 26 pages, 7 figures

  47. arXiv:1803.08495  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings

    Authors: Kevin Chen, Christopher B. Choy, Manolis Savva, Angel X. Chang, Thomas Funkhouser, Silvio Savarese

    Abstract: We present a method for generating colored 3D shapes from natural language. To this end, we first learn joint embeddings of freeform text descriptions and colored 3D shapes. Our model combines and extends learning by association and metric learning approaches to learn implicit cross-modal connections, and produces a joint representation that captures the many-to-many relations between language and… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

  48. arXiv:1712.04569  [pdf, other

    cs.CV

    Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View

    Authors: Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

    Abstract: We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation (<= 50%) in the form of an RGB-D image. To make this possible, Im2Pano3D leverages strong contextual priors learned from large-scale synthetic and real-world in… ▽ More

    Submitted 12 December, 2017; originally announced December 2017.

    Comments: Video summary: https://youtu.be/Au3GmktK-So

  49. arXiv:1712.03931  [pdf, other

    cs.LG cs.AI cs.CV cs.GR cs.RO

    MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments

    Authors: Manolis Savva, Angel X. Chang, Alexey Dosovitskiy, Thomas Funkhouser, Vladlen Koltun

    Abstract: We present MINOS, a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments. The simulator leverages large datasets of complex 3D environments and supports flexible configuration of multimodal sensor suites. We use MINOS to benchmark deep-learning-based navigation methods, to analyze the influence of environmental complexity… ▽ More

    Submitted 11 December, 2017; originally announced December 2017.

    Comments: MINOS is a simulator designed to support research on end-to-end navigation

  50. arXiv:1710.06104  [pdf, other

    cs.CV

    Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55

    Authors: Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra , et al. (25 additional authors not shown)

    Abstract: We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database. The benchmark consists of two tasks: part-level segmentation of 3D shapes and 3D reconstruction from single view images. Ten teams have participated in the challenge and the best performing teams have outperformed state-of-the-art approaches on both tasks. A few novel deep learni… ▽ More

    Submitted 27 October, 2017; v1 submitted 17 October, 2017; originally announced October 2017.