Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 113 results for author: Schwing, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10543  [pdf, other

    cs.CV cs.AI

    NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows

    Authors: Zhenggang Tang, Zhongzheng Ren, Xiaoming Zhao, Bowen Wen, Jonathan Tremblay, Stan Birchfield, Alexander Schwing

    Abstract: We present a method for automatically modifying a NeRF representation based on a single observation of a non-rigid transformed version of the original scene. Our method defines the transformation as a 3D flow, specifically as a weighted linear blending of rigid transformations of 3D anchor points that are defined on the surface of the scene. In order to identify anchor points, we introduce a novel… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 8 pages of main paper, CVPR 2024. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024

  2. arXiv:2404.07991  [pdf, other

    cs.CV

    GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

    Authors: Jing Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G. Schwing, Shenlong Wang

    Abstract: We introduce GoMAvatar, a novel approach for real-time, memory-efficient, high-quality animatable human modeling. GoMAvatar takes as input a single monocular video to create a digital avatar capable of re-articulation in new poses and real-time rendering from novel viewpoints, while seamlessly integrating with rasterization-based graphics pipelines. Central to our method is the Gaussians-on-Mesh r… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024; project page: https://wenj.github.io/GoMAvatar/

  3. arXiv:2404.03657  [pdf, other

    cs.CV cs.AI

    OW-VISCap: Open-World Video Instance Segmentation and Captioning

    Authors: Anwesa Choudhuri, Girish Chowdhary, Alexander G. Schwing

    Abstract: Open-world video instance segmentation is an important video understanding task. Yet most methods either operate in a closed-world setting, require an additional user-input, or use classic region-based proposals to identify never before seen objects. Further, these methods only assign a one-word label to detected objects, and don't generate rich object-centric descriptions. They also often suffer… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Project page: https://anwesachoudhuri.github.io/OpenWorldVISCap/

  4. arXiv:2312.14154  [pdf, other

    cs.CV

    Virtual Pets: Animatable Animal Generation in 3D Scenes

    Authors: Yen-Chi Cheng, Chieh Hubert Lin, Chaoyang Wang, Yash Kant, Sergey Tulyakov, Alexander Schwing, Liangyan Gui, Hsin-Ying Lee

    Abstract: Toward unlocking the potential of generative models in immersive 4D experiences, we introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment. To circumvent the limited availability of 3D motion data aligned with environmental geometry, we leverage monocular internet videos and extract deformable NeRF representations for the fo… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Preprint. Project page: https://yccyenchicheng.github.io/VirtualPets/

  5. arXiv:2312.02189  [pdf, other

    cs.CV cs.AI

    StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D

    Authors: Pengsheng Guo, Hans Hao, Adam Caccavale, Zhongzheng Ren, Edward Zhang, Qi Shan, Aditya Sankar, Alexander G. Schwing, Alex Colburn, Fangchang Ma

    Abstract: In the realm of text-to-3D generation, utilizing 2D diffusion models through score distillation sampling (SDS) frequently leads to issues such as blurred appearances and multi-faced geometry, primarily due to the intrinsically noisy nature of the SDS loss. Our analysis identifies the core of these challenges as the interaction among noise levels in the 2D diffusion process, the architecture of the… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  6. arXiv:2311.01331  [pdf, other

    cs.LG cs.AI

    Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

    Authors: Kai Yan, Alexander G. Schwing, Yu-xiong Wang

    Abstract: In real-world scenarios, arbitrary interactions with the environment can often be costly, and actions of expert demonstrations are not always available. To reduce the need for both, offline Learning from Observations (LfO) is extensively studied: the agent learns to solve a task given only expert states and task-agnostic non-expert state-action pairs. The state-of-the-art DIstribution Correction E… ▽ More

    Submitted 9 June, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: 25 pages. Accepted to ICML 2024

  7. arXiv:2311.01329  [pdf, other

    cs.LG cs.AI

    A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

    Authors: Kai Yan, Alexander G. Schwing, Yu-Xiong Wang

    Abstract: Offline imitation from observations aims to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available. Offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. The state-of-the-art "DIstribution Correction Estimation" (DICE) methods minimize divergence of state occupancy bet… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 35 pages; Accepted as a poster for NeurIPS2023

  8. arXiv:2310.12982  [pdf, other

    cs.CV

    Putting the Object Back into Video Object Segmentation

    Authors: Ho Kei Cheng, Seoung Wug Oh, Brian Price, Joon-Young Lee, Alexander Schwing

    Abstract: We present Cutie, a video object segmentation (VOS) network with object-level memory reading, which puts the object representation from memory back into the video object segmentation result. Recent works on VOS employ bottom-up pixel-level memory reading which struggles due to matching noise, especially in the presence of distractors, resulting in lower performance in more challenging data. In con… ▽ More

    Submitted 11 April, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: CVPR 2024 Highlight. Project page: https://hkchengrex.github.io/Cutie

  9. arXiv:2310.08587  [pdf, other

    cs.CV

    Pseudo-Generalized Dynamic View Synthesis from a Video

    Authors: Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Angel Bautista, Joshua M. Susskind, Alexander G. Schwing

    Abstract: Rendering scenes observed in a monocular video from novel viewpoints is a challenging problem. For static scenes the community has studied both scene-specific optimization techniques, which optimize on every test scene, and generalized techniques, which only run a deep net forward pass on a test scene. In contrast, for dynamic scenes, scene-specific optimization techniques exist, but, to our best… ▽ More

    Submitted 19 February, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICLR 2024; Originally titled as "Is Generalized Dynamic Novel View Synthesis from Monocular Videos Possible Today?"; Project page: https://xiaoming-zhao.github.io/projects/pgdvs

  10. arXiv:2309.03903  [pdf, other

    cs.CV

    Tracking Anything with Decoupled Video Segmentation

    Authors: Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, Joon-Young Lee

    Abstract: Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic b… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023. Project page: https://hkchengrex.github.io/Tracking-Anything-with-DEVA

  11. arXiv:2305.13650  [pdf, other

    cs.LG cs.AI

    Robust Model-Based Optimization for Challenging Fitness Landscapes

    Authors: Saba Ghaffari, Ehsan Saleh, Alexander G. Schwing, Yu-Xiong Wang, Martin D. Burke, Saurabh Sinha

    Abstract: Protein design, a grand challenge of the day, involves optimization on a fitness landscape, and leading methods adopt a model-based approach where a model is trained on a training set (protein sequences and fitness) and proposes candidates to explore next. These methods are challenged by sparsity of high-fitness samples in the training set, a problem that has been in the literature. A less recogni… ▽ More

    Submitted 3 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  12. arXiv:2305.12393  [pdf, other

    cs.LG cs.NE

    Layer Collaboration in the Forward-Forward Algorithm

    Authors: Guy Lorberbom, Itai Gat, Yossi Adi, Alex Schwing, Tamir Hazan

    Abstract: Backpropagation, which uses the chain rule, is the de-facto standard algorithm for optimizing neural networks nowadays. Recently, Hinton (2022) proposed the forward-forward algorithm, a promising alternative that optimizes neural nets layer-by-layer, without propagating gradients throughout the network. Although such an approach has several advantages over back-propagation and shows promising resu… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  13. arXiv:2304.12406  [pdf, other

    cs.CV

    AutoFocusFormer: Image Segmentation off the Grid

    Authors: Chen Ziwen, Kaushik Patnaik, Shuangfei Zhai, Alvin Wan, Zhile Ren, Alex Schwing, Alex Colburn, Li Fuxin

    Abstract: Real world images often have highly imbalanced content density. Some areas are very uniform, e.g., large patches of blue sky, while other areas are scattered with many small objects. Yet, the commonly used successive grid downsampling strategy in convolutional deep networks treats all areas equally. Hence, small objects are represented in very few spatial locations, leading to worse results in tas… ▽ More

    Submitted 25 October, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

    ACM Class: I.4.6; I.4.8

  14. arXiv:2303.00165  [pdf, other

    cs.CV cs.AI

    Diffusion Probabilistic Fields

    Authors: Peiye Zhuang, Samira Abnar, Jiatao Gu, Alex Schwing, Joshua M. Susskind, Miguel Ángel Bautista

    Abstract: Diffusion probabilistic models have quickly become a major approach for generative modeling of images, 3D geometry, video and other domains. However, to adapt diffusion generative modeling to these domains the denoising network needs to be carefully designed for each domain independently, oftentimes under the assumption that data lives in a Euclidean grid. In this paper we introduce Diffusion Prob… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

    Comments: Accepted to ICLR 2023. 20 pages, 17 figures

  15. arXiv:2212.04493  [pdf, other

    cs.CV cs.LG

    SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

    Authors: Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexander Schwing, Liangyan Gui

    Abstract: In this work, we present a novel framework built to simplify 3D asset generation for amateur users. To enable interactive generation, our method supports a variety of input modalities that can be easily provided by a human, including images, text, partially observed shapes and combinations of these, further allowing to adjust the strength of each input. At the core of our approach is an encoder-de… ▽ More

    Submitted 21 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: In CVPR 2023. Project page and code is available at: https://yccyenchicheng.github.io/SDFusion/. Fix some typos

  16. arXiv:2211.14694  [pdf, other

    cs.LG cs.CV

    DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data

    Authors: Tiantian Fang, Ruoyu Sun, Alex Schwing

    Abstract: Generative adversarial nets (GANs) have been remarkably successful at learning to sample from distributions specified by a given dataset, particularly if the given dataset is reasonably large compared to its dimensionality. However, given limited data, classical GANs have struggled, and strategies like output-regularization, data-augmentation, use of pre-trained models and pruning have been shown… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

    Comments: Accepted to NeurIPS 2022

  17. arXiv:2210.11668  [pdf, other

    cs.RO cs.CV

    RGB-Only Reconstruction of Tabletop Scenes for Collision-Free Manipulator Control

    Authors: Zhenggang Tang, Balakumar Sundaralingam, Jonathan Tremblay, Bowen Wen, Ye Yuan, Stephen Tyree, Charles Loop, Alexander Schwing, Stan Birchfield

    Abstract: We present a system for collision-free control of a robot manipulator that uses only RGB views of the world. Perceptual input of a tabletop scene is provided by multiple images of an RGB camera (without depth) that is either handheld or mounted on the robot end effector. A NeRF-like process is used to reconstruct the 3D geometry of the scene, from which the Euclidean full signed distance function… ▽ More

    Submitted 10 March, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: ICRA 2023. Project page at https://ngp-mpc.github.io/

  18. arXiv:2210.09496  [pdf, other

    cs.LG cs.AI

    CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with Demonstrations

    Authors: Kai Yan, Alexander G. Schwing, Yu-Xiong Wang

    Abstract: Although reinforcement learning has found widespread use in dense reward settings, training autonomous agents with sparse rewards remains challenging. To address this difficulty, prior work has shown promising results when using not only task-specific demonstrations but also task-agnostic albeit somewhat related demonstrations. In most cases, the available demonstrations are distilled into an impl… ▽ More

    Submitted 21 October, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: 27 pages; published as NeurIPS 2022 poster paper

  19. arXiv:2210.08974  [pdf

    cs.CY

    Coordinated Science Laboratory 70th Anniversary Symposium: The Future of Computing

    Authors: Klara Nahrstedt, Naresh Shanbhag, Vikram Adve, Nancy Amato, Romit Roy Choudhury, Carl Gunter, Nam Sung Kim, Olgica Milenkovic, Sayan Mitra, Lav Varshney, Yurii Vlasov, Sarita Adve, Rashid Bashir, Andreas Cangellaris, James DiCarlo, Katie Driggs-Campbell, Nick Feamster, Mattia Gazzola, Karrie Karahalios, Sanmi Koyejo, Paul Kwiat, Bo Li, Negar Mehr, Ravish Mehra, Andrew Miller , et al. (3 additional authors not shown)

    Abstract: In 2021, the Coordinated Science Laboratory CSL, an Interdisciplinary Research Unit at the University of Illinois Urbana-Champaign, hosted the Future of Computing Symposium to celebrate its 70th anniversary. CSL's research covers the full computing stack, computing's impact on society and the resulting need for social responsibility. In this white paper, we summarize the major technological points… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

  20. arXiv:2210.08001  [pdf, other

    cs.CV cs.AI cs.LG

    Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks

    Authors: Renan A. Rojas-Gomez, Teck-Yian Lim, Alexander G. Schwing, Minh N. Do, Raymond A. Yeh

    Abstract: We propose learnable polyphase sampling (LPS), a pair of learnable down/upsampling layers that enable truly shift-invariant and equivariant convolutional networks. LPS can be trained end-to-end from data and generalizes existing handcrafted downsampling layers. It is widely applicable as it can be integrated into any convolutional network by replacing down/upsampling layers. We evaluate LPS on ima… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  21. arXiv:2210.06143  [pdf, ps, other

    cs.LG stat.ML

    On the Importance of Gradient Norm in PAC-Bayesian Bounds

    Authors: Itai Gat, Yossi Adi, Alexander Schwing, Tamir Hazan

    Abstract: Generalization bounds which assess the difference between the true risk and the empirical risk, have been studied extensively. However, to obtain bounds, current techniques use strict assumptions such as a uniformly bounded or a Lipschitz loss function. To avoid these assumptions, in this paper, we follow an alternative approach: we relax uniform bounds assumptions by using on-average bounded loss… ▽ More

    Submitted 2 November, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: NeurIPS 22. arXiv admin note: text overlap with arXiv:2002.09866

  22. arXiv:2210.05825  [pdf, other

    cs.CV cs.AI

    Controllable Radiance Fields for Dynamic Face Synthesis

    Authors: Peiye Zhuang, Liqian Ma, Oluwasanmi Koyejo, Alexander G. Schwing

    Abstract: Recent work on 3D-aware image synthesis has achieved compelling results using advances in neural rendering. However, 3D-aware synthesis of face dynamics hasn't received much attention. Here, we study how to explicitly control generative model synthesis of face dynamics exhibiting non-rigid motion (e.g., facial expression change), while simultaneously ensuring 3D-awareness. For this we propose a Co… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to 3DV 2022. 13 pages, 15 figures

  23. arXiv:2210.04287  [pdf, other

    cs.CV

    Learning to Decompose Visual Features with Latent Textual Prompts

    Authors: Feng Wang, Manling Li, Xudong Lin, Hairong Lv, Alexander G. Schwing, Heng Ji

    Abstract: Recent advances in pre-training vision-language models like CLIP have shown great potential in learning transferable visual representations. Nonetheless, for downstream inference, CLIP-like models suffer from either 1) degraded accuracy and robustness in the case of inaccurate text descriptions during retrieval-based inference (the challenge for zero-shot protocol); or 2) breaking the well-establi… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

  24. arXiv:2208.02817  [pdf, other

    cs.CV cs.AI

    Occupancy Planes for Single-view RGB-D Human Reconstruction

    Authors: Xiaoming Zhao, Yuan-Ting Hu, Zhongzheng Ren, Alexander G. Schwing

    Abstract: Single-view RGB-D human reconstruction with implicit functions is often formulated as per-point classification. Specifically, a set of 3D locations within the view-frustum of the camera are first projected independently onto the image and a corresponding feature is subsequently extracted for each 3D location. The feature of each 3D location is then used to classify independently whether the corres… ▽ More

    Submitted 1 December, 2022; v1 submitted 4 August, 2022; originally announced August 2022.

    Comments: AAAI2023; Code: https://github.com/Xiaoming-Zhao/oplanes

  25. arXiv:2207.14289  [pdf, other

    cs.CV cs.AI

    Initialization and Alignment for Adversarial Texture Optimization

    Authors: Xiaoming Zhao, Zhizhen Zhao, Alexander G. Schwing

    Abstract: While recovery of geometry from image and video data has received a lot of attention in computer vision, methods to capture the texture for a given geometry are less mature. Specifically, classical methods for texture generation often assume clean geometry and reasonably well-aligned image data. While very recent methods, e.g., adversarial texture optimization, better handle lower-quality data obt… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

    Comments: ECCV 2022; Project Page: https://xiaoming-zhao.github.io/projects/advtex_init_align/

  26. arXiv:2207.10642  [pdf, other

    cs.CV cs.AI

    Generative Multiplane Images: Making a 2D GAN 3D-Aware

    Authors: Xiaoming Zhao, Fangchang Ma, David Güera, Zhile Ren, Alexander G. Schwing, Alex Colburn

    Abstract: What is really needed to make an existing 2D GAN 3D-aware? To answer this question, we modify a classical GAN, i.e., StyleGANv2, as little as possible. We find that only two modifications are absolutely necessary: 1) a multiplane image style generator branch which produces a set of alpha maps conditioned on their depth; 2) a pose-conditioned discriminator. We refer to the generated output as a 'ge… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: ECCV2022; Project Page: https://xiaoming-zhao.github.io/projects/gmpi/

  27. arXiv:2207.07115  [pdf, other

    cs.CV

    XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

    Authors: Ho Kei Cheng, Alexander G. Schwing

    Abstract: We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video object segmentation typically only uses one type of feature memory. For videos longer than a minute, a single feature memory model tightly links memory consumption and accuracy. In contrast, following the Atkinson-Shiffrin… ▽ More

    Submitted 18 July, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022. Project page: https://hkchengrex.github.io/XMem

  28. arXiv:2205.14929  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Neural Volumetric Object Selection

    Authors: Zhongzheng Ren, Aseem Agarwala, Bryan Russell, Alexander G. Schwing, Oliver Wang

    Abstract: We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF). Our approach takes a set of foreground and background 2D user scribbles in one view and automatically estimates a 3D segmentation of the desired object, which can be rendered into novel views. To achieve this result, we propose a novel voxel fe… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: CVPR 2022 camera ready

  29. arXiv:2205.06111  [pdf, other

    cs.AI cs.CL

    Asking for Knowledge: Training RL Agents to Query External Knowledge Using Language

    Authors: Iou-Jen Liu, Xingdi Yuan, Marc-Alexandre Côté, Pierre-Yves Oudeyer, Alexander G. Schwing

    Abstract: To solve difficult tasks, humans ask questions to acquire knowledge from external sources. In contrast, classical reinforcement learning agents lack such an ability and often resort to exploratory behavior. This is exacerbated as few present-day environments support querying for knowledge. In order to study how agents can be taught to query external knowledge via language, we first introduce two n… ▽ More

    Submitted 3 July, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: ICML 2022; Project page: https://ioujenliu.github.io/AFK/

  30. arXiv:2204.07157  [pdf, other

    cs.CV

    Joint Forecasting of Panoptic Segmentations with Difference Attention

    Authors: Colin Graber, Cyril Jazra, Wenjie Luo, Liangyan Gui, Alexander Schwing

    Abstract: Forecasting of a representation is important for safe and effective autonomy. For this, panoptic segmentations have been studied as a compelling representation in recent work. However, recent state-of-the-art on panoptic segmentation forecasting suffers from two issues: first, individual object instances are treated independently of each other; second, individual object instance forecasts are merg… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: Accepted by CVPR 2022 (Oral)

  31. arXiv:2204.03643  [pdf, other

    cs.CV

    Total Variation Optimization Layers for Computer Vision

    Authors: Raymond A. Yeh, Yuan-Ting Hu, Zhongzheng Ren, Alexander G. Schwing

    Abstract: Optimization within a layer of a deep-net has emerged as a new direction for deep-net layer design. However, there are two main challenges when applying these layers to computer vision tasks: (a) which optimization problem within a layer is useful?; (b) how to ensure that computation within a layer remains efficient? To study question (a), in this work, we propose total variation (TV) minimization… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: CVPR 2022

  32. arXiv:2204.03640  [pdf, other

    cs.LG cs.CV

    Equivariance Discovery by Learned Parameter-Sharing

    Authors: Raymond A. Yeh, Yuan-Ting Hu, Mark Hasegawa-Johnson, Alexander G. Schwing

    Abstract: Designing equivariance as an inductive bias into deep-nets has been a prominent approach to build effective models, e.g., a convolutional neural network incorporates translation equivariance. However, incorporating these inductive biases requires knowledge about the equivariance properties of the data, which may not be available, e.g., when encountering a new domain. To address this, we study how… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: AISTATS 2022

  33. arXiv:2112.10764  [pdf, other

    cs.CV cs.AI cs.LG

    Mask2Former for Video Instance Segmentation

    Authors: Bowen Cheng, Anwesa Choudhuri, Ishan Misra, Alexander Kirillov, Rohit Girdhar, Alexander G. Schwing

    Abstract: We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes. Specifically, Mask2Former sets a new state-of-the-art of 60.4 AP on YouT… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

    Comments: Code and models: https://github.com/facebookresearch/Mask2Former

  34. arXiv:2112.10728  [pdf, other

    cs.CL cs.CV

    MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

    Authors: Revanth Gangi Reddy, Xilin Rui, Manling Li, Xudong Lin, Haoyang Wen, Jaemin Cho, Lifu Huang, Mohit Bansal, Avirup Sil, Shih-Fu Chang, Alexander Schwing, Heng Ji

    Abstract: Recently, there has been an increasing interest in building question answering (QA) models that reason across multiple modalities, such as text and images. However, QA using images is often limited to just picking the answer from a pre-defined set of options. In addition, images in the real world, especially in news, have objects that are co-referential to the text, with complementary information… ▽ More

    Submitted 4 May, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: Accepted at AAAI 2022

  35. arXiv:2112.02091  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Class-agnostic Reconstruction of Dynamic Objects from Videos

    Authors: Zhongzheng Ren, Xiaoming Zhao, Alexander G. Schwing

    Abstract: We introduce REDO, a class-agnostic framework to REconstruct the Dynamic Objects from RGBD or calibrated videos. Compared to prior work, our problem setting is more realistic yet more challenging for three reasons: 1) due to occlusion or camera settings an object of interest may never be entirely visible, but we aim to reconstruct the complete shape; 2) we aim to handle different object dynamics i… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: NeurIPS 2021

  36. arXiv:2112.01527  [pdf, other

    cs.CV cs.AI cs.LG

    Masked-attention Mask Transformer for Universal Image Segmentation

    Authors: Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar

    Abstract: Image segmentation is about grouping pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmenta… ▽ More

    Submitted 15 June, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: CVPR 2022. Project page/code/models: https://bowenc0221.github.io/mask2former

  37. arXiv:2110.14375  [pdf, other

    cs.LG cs.CV cs.MM

    Perceptual Score: What Data Modalities Does Your Model Perceive?

    Authors: Itai Gat, Idan Schwartz, Alexander Schwing

    Abstract: Machine learning advances in the last decade have relied significantly on large-scale datasets that continue to grow in size. Increasingly, those datasets also contain different data modalities. However, large multi-modal datasets are hard to annotate, and annotations may contain biases that we are often unaware of. Deep-net-based classifiers, in turn, are prone to exploit those biases and to find… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021

  38. arXiv:2110.12320  [pdf, other

    cs.CV cs.AI cs.CL cs.HC cs.IR

    CoVA: Context-aware Visual Attention for Webpage Information Extraction

    Authors: Anurendra Kumar, Keval Morabia, Jingjin Wang, Kevin Chen-Chuan Chang, Alexander Schwing

    Abstract: Webpage information extraction (WIE) is an important step to create knowledge bases. For this, classical WIE methods leverage the Document Object Model (DOM) tree of a website. However, use of the DOM tree poses significant challenges as context and appearance are encoded in an abstract manner. To address this challenge we propose to reformulate WIE as a context-aware Webpage Object Detection task… ▽ More

    Submitted 23 October, 2021; originally announced October 2021.

    Comments: 11 Pages, 6 Figures, 3 Tables

  39. arXiv:2110.05769  [pdf, other

    cs.CV cs.AI cs.LG cs.MA

    Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents

    Authors: Shivansh Patel, Saim Wani, Unnat Jain, Alexander Schwing, Svetlana Lazebnik, Manolis Savva, Angel X. Chang

    Abstract: Communication between embodied AI agents has received increasing attention in recent years. Despite its use, it is still unclear whether the learned communication is interpretable and grounded in perception. To study the grounding of emergent forms of communication, we first introduce the collaborative multi-object navigation task CoMON. In this task, an oracle agent has detailed environment infor… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: Project page: https://shivanshpatel35.github.io/comon/ ; the first three authors contributed equally

  40. arXiv:2108.11550  [pdf, other

    cs.CV cs.AI cs.LG

    The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

    Authors: Xiaoming Zhao, Harsh Agrawal, Dhruv Batra, Alexander Schwing

    Abstract: It is fundamental for personal robots to reliably navigate to a specified goal. To study this task, PointGoal navigation has been introduced in simulated Embodied AI environments. Recent advances solve this PointGoal navigation task with near-perfect accuracy (99.6% success) in photo-realistically simulated environments, assuming noiseless egocentric vision, noiseless actuation, and most important… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  41. arXiv:2108.03319  [pdf, other

    cs.AI

    Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning

    Authors: Iou-Jen Liu, Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Solving complex real-world tasks, e.g., autonomous fleet control, often involves a coordinated team of multiple agents which learn strategies from visual inputs via reinforcement learning. Many existing multi-agent reinforcement learning (MARL) algorithms however don't scale to environments where agents operate on visual inputs. To address this issue, algorithmically, recent works have focused on… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: IROS 2021; Project page: https://ioujenliu.github.io/SemanticTracklets/

  42. Ordered Attention for Coherent Visual Storytelling

    Authors: Tom Braude, Idan Schwartz, Alexander Schwing, Ariel Shamir

    Abstract: We address the problem of visual storytelling, i.e., generating a story for a given sequence of images. While each sentence of the story should describe a corresponding image, a coherent story also needs to be consistent and relate to both future and past images. To achieve this we develop ordered image attention (OIA). OIA models interactions between the sentence-corresponding image and important… ▽ More

    Submitted 11 October, 2022; v1 submitted 4 August, 2021; originally announced August 2021.

    Comments: 9 pages, 7 figures

  43. arXiv:2107.11444  [pdf, other

    cs.AI

    Cooperative Exploration for Multi-Agent Deep Reinforcement Learning

    Authors: Iou-Jen Liu, Unnat Jain, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Exploration is critical for good results in deep reinforcement learning and has attracted much attention. However, existing multi-agent deep reinforcement learning algorithms still use mostly noise-based techniques. Very recently, exploration methods that consider cooperation among multiple agents have been developed. However, existing methods suffer from a common challenge: agents struggle to ide… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

    Comments: ICML 2021; Project Page: https://ioujenliu.github.io/CMAE/

  44. arXiv:2107.06278  [pdf, other

    cs.CV

    Per-Pixel Classification is Not All You Need for Semantic Segmentation

    Authors: Bowen Cheng, Alexander G. Schwing, Alexander Kirillov

    Abstract: Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks in a unified manner using the exact same model, loss, and training procedure. Following this ob… ▽ More

    Submitted 31 October, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: NeurIPS 2021, Spotlight. Project page: https://bowenc0221.github.io/maskformer

  45. arXiv:2105.14710  [pdf, other

    cs.LG stat.ML

    Robustifying $\ell_\infty$ Adversarial Training to the Union of Perturbation Models

    Authors: Ameya D. Patil, Michael Tuttle, Alexander G. Schwing, Naresh R. Shanbhag

    Abstract: Classical adversarial training (AT) frameworks are designed to achieve high adversarial accuracy against a single attack type, typically $\ell_\infty$ norm-bounded perturbations. Recent extensions in AT have focused on defending against the union of multiple perturbations but this benefit is obtained at the expense of a significant (up to $10\times$) increase in training complexity over single-att… ▽ More

    Submitted 11 June, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

  46. arXiv:2105.08612  [pdf, other

    cs.CV cs.GR cs.LG

    SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction from Video Data

    Authors: Yuan-Ting Hu, Jiahong Wang, Raymond A. Yeh, Alexander G. Schwing

    Abstract: Extracting detailed 3D information of objects from video data is an important goal for holistic scene understanding. While recent methods have shown impressive results when reconstructing meshes of objects from a single image, results often remain ambiguous as part of the object is unobserved. Moreover, existing image-based datasets for mesh reconstruction don't permit to study models which integr… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

    Comments: CVPR 2021 Oral

  47. arXiv:2105.06461  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    3D Spatial Recognition without Spatially Labeled 3D

    Authors: Zhongzheng Ren, Ishan Misra, Alexander G. Schwing, Rohit Girdhar

    Abstract: We introduce WyPR, a Weakly-supervised framework for Point cloud Recognition, requiring only scene-level class tags as supervision. WyPR jointly addresses three core 3D recognition tasks: point-level semantic segmentation, 3D proposal generation, and 3D object detection, coupling their predictions through self and cross-task consistency losses. We show that in conjunction with standard multiple-in… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

    Comments: CVPR 2021

  48. arXiv:2105.06441  [pdf, other

    cs.CV cs.AI cs.IR

    DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video Summarization

    Authors: Safa Messaoud, Ismini Lourentzou, Assma Boughoula, Mona Zehni, Zhizhen Zhao, Chengxiang Zhai, Alexander G. Schwing

    Abstract: The recent growth of web video sharing platforms has increased the demand for systems that can efficiently browse, retrieve and summarize video content. Query-aware multi-video summarization is a promising technique that caters to this demand. In this work, we introduce a novel Query-Aware Hierarchical Pointer Network for Multi-Video Summarization, termed DeepQAMVS, that jointly optimizes multiple… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

  49. arXiv:2105.00931  [pdf, other

    cs.CV cs.AI cs.LG cs.MA

    GridToPix: Training Embodied Agents with Minimal Supervision

    Authors: Unnat Jain, Iou-Jen Liu, Svetlana Lazebnik, Aniruddha Kembhavi, Luca Weihs, Alexander Schwing

    Abstract: While deep reinforcement learning (RL) promises freedom from hand-labeled data, great successes, especially for Embodied AI, require significant work to create supervision via carefully shaped rewards. Indeed, without shaped rewards, i.e., with only terminal rewards, present-day Embodied AI results degrade significantly across Embodied AI problems from single-agent Habitat-based PointGoal Navigati… ▽ More

    Submitted 13 October, 2021; v1 submitted 14 April, 2021; originally announced May 2021.

    Comments: Project page: https://unnat.github.io/gridtopix/ ; last two authors contributed equally

  50. arXiv:2104.03962  [pdf, other

    cs.CV

    Panoptic Segmentation Forecasting

    Authors: Colin Graber, Grace Tsai, Michael Firman, Gabriel Brostow, Alexander Schwing

    Abstract: Our goal is to forecast the near future given a set of recent observations. We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents which need not only passively analyze an observation but also must react to it in real-time. Importantly, accurate forecasting hinges upon the chosen scene decomposition. We think that superior forecasting can be achiev… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: CVPR 2021