Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 80 results for author: Kanazawa, A

.
  1. arXiv:2406.09417  [pdf, other

    cs.CV cs.GR cs.LG

    Rethinking Score Distillation as a Bridge Between Image Distributions

    Authors: David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa

    Abstract: Score distillation sampling (SDS) has proven to be an important tool, enabling the use of large-scale diffusion priors for tasks operating in data-poor domains. Unfortunately, SDS has a number of characteristic artifacts that limit its usefulness in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project webpage: https://sds-bridge.github.io/

  2. arXiv:2405.10320  [pdf, other

    cs.CV

    Toon3D: Seeing Cartoons from a New Perspective

    Authors: Ethan Weber, Riley Peterlinz, Rohan Mathur, Frederik Warburg, Alexei A. Efros, Angjoo Kanazawa

    Abstract: In this work, we recover the underlying 3D structure of non-geometrically consistent scenes. We focus our analysis on hand-drawn images from cartoons and anime. Many cartoons are created by artists without a 3D rendering engine, which means that any new image of a scene is hand-drawn. The hand-drawn images are usually faithful representations of the world, but only in a qualitative sense, since it… ▽ More

    Submitted 17 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Please see our project page: https://toon3d.studio

  3. arXiv:2405.05530  [pdf, other

    cs.CV

    NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry

    Authors: Yash Khandelwal, Mayur Arvind, Sriram Kumar, Ashish Gupta, Sachin Kumar Danisetty, Piyush Bagad, Anish Madan, Mayank Lunayach, Aditya Annavajjala, Abhishek Maiti, Sansiddh Jain, Aman Dalmia, Namrata Deka, Jerome White, Jigar Doshi, Angjoo Kanazawa, Rahul Panicker, Alpan Raval, Srinivas Rana, Makarand Tapaswi

    Abstract: Malnutrition among newborns is a top public health concern in developing countries. Identification and subsequent growth monitoring are key to successful interventions. However, this is challenging in rural communities where health systems tend to be inaccessible and under-equipped, with poor adherence to protocol. Our goal is to equip health workers and public health systems with a solution for c… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted at CVPM Workshop at CVPR 2024

  4. arXiv:2404.16221  [pdf, other

    cs.CV cs.DC cs.GR

    NeRF-XL: Scaling NeRFs with Multiple GPUs

    Authors: Ruilong Li, Sanja Fidler, Angjoo Kanazawa, Francis Williams

    Abstract: We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPUs, thus enabling the training and rendering of NeRFs with an arbitrarily large capacity. We begin by revisiting existing multi-GPU approaches, which decompose large scenes into multiple independently trained NeRFs, and identify several fundamental issues with these methods that hinder improve… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Webpage: https://research.nvidia.com/labs/toronto-ai/nerfxl/

  5. arXiv:2404.05072  [pdf, other

    cs.CV

    Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind

    Authors: Chiara Plizzari, Shubham Goel, Toby Perrett, Jacob Chalk, Angjoo Kanazawa, Dima Damen

    Abstract: As humans move around, performing their daily tasks, they are able to recall where they have positioned objects in their environment, even if these objects are currently out of sight. In this paper, we aim to mimic this spatial cognition ability. We thus formulate the task of Out of Sight, Not Out of Mind - 3D tracking active objects using observations captured through an egocentric camera. We int… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 21 pages including references and appendix. Project Webpage: http://dimadamen.github.io/OSNOM/

  6. arXiv:2404.03652  [pdf, other

    cs.CV

    The More You See in 2D, the More You Perceive in 3D

    Authors: Xinyang Han, Zelin Gao, Angjoo Kanazawa, Shubham Goel, Yossi Gandelsman

    Abstract: Humans can infer 3D structure from 2D images of an object based on past experience and improve their 3D understanding as they see more images. Inspired by this behavior, we introduce SAP3D, a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images. Given a few unposed images of an object, we adapt a pre-trained view-conditioned diffusion model together with… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Project page: https://sap3d.github.io/

  7. arXiv:2401.09419  [pdf, other

    cs.CV cs.GR

    GARField: Group Anything with Radiance Fields

    Authors: Chung Min Kim, Mingxuan Wu, Justin Kerr, Ken Goldberg, Matthew Tancik, Angjoo Kanazawa

    Abstract: Grouping is inherently ambiguous due to the multiple levels of granularity in which one can decompose a scene -- should the wheels of an excavator be considered separate or part of the whole? We present Group Anything with Radiance Fields (GARField), an approach for decomposing 3D scenes into a hierarchy of semantically meaningful groups from posed image inputs. To do this we embrace group ambigui… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Project site: https://www.garfield.studio/ First three authors contributed equally

  8. arXiv:2401.01885  [pdf, other

    cs.CV

    From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

    Authors: Evonne Ng, Javier Romero, Timur Bagautdinov, Shaojie Bai, Trevor Darrell, Angjoo Kanazawa, Alexander Richard

    Abstract: We present a framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction. Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands. The key behind our method is in combining the benefits of sample diversity from vector quantization with the high-frequency… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  9. arXiv:2312.05251  [pdf, other

    cs.CV

    Reconstructing Hands in 3D with Transformers

    Authors: Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, Jitendra Malik

    Abstract: We present an approach that can reconstruct hands in 3D from monocular input. Our approach for Hand Mesh Recovery, HaMeR, follows a fully transformer-based architecture and can analyze hands with significantly increased accuracy and robustness compared to previous work. The key to HaMeR's success lies in scaling up both the data used for training and the capacity of the deep network for hand recon… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  10. arXiv:2312.04560  [pdf, other

    cs.CV cs.AI cs.GR

    NeRFiller: Completing Scenes via Generative 3D Inpainting

    Authors: Ethan Weber, Aleksander Hołyński, Varun Jampani, Saurabh Saxena, Noah Snavely, Abhishek Kar, Angjoo Kanazawa

    Abstract: We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting using off-the-shelf 2D visual generative models. Often parts of a captured 3D scene or object are missing due to mesh reconstruction failures or a lack of observations (e.g., contact regions, such as the bottom of objects, or hard-to-reach areas). We approach this challenging 3D inpaintin… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Project page: https://ethanweber.me/nerfiller

  11. arXiv:2312.02121  [pdf, ps, other

    cs.MS cs.CV cs.GR math.NA

    Mathematical Supplement for the $\texttt{gsplat}$ Library

    Authors: Vickie Ye, Angjoo Kanazawa

    Abstract: This report provides the mathematical details of the gsplat library, a modular toolbox for efficient differentiable Gaussian splatting, as proposed by Kerbl et al. It provides a self-contained reference for the computations involved in the forward and backward passes of differentiable Gaussian splatting. To facilitate practical usage and development, we provide a user friendly Python API that expo… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Find the library at: https://docs.gsplat.studio/

  12. arXiv:2310.07204  [pdf, other

    cs.AI cs.CV cs.GR cs.LG

    State of the Art on Diffusion Models for Visual Computing

    Authors: Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski, Angjoo Kanazawa, C. Karen Liu, Lingjie Liu, Ben Mildenhall, Matthias Nießner, Björn Ommer, Christian Theobalt, Peter Wonka, Gordon Wetzstein

    Abstract: The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applicat… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  13. arXiv:2309.07970  [pdf, other

    cs.RO cs.CV

    Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping

    Authors: Adam Rashid, Satvik Sharma, Chung Min Kim, Justin Kerr, Lawrence Chen, Angjoo Kanazawa, Ken Goldberg

    Abstract: Grasping objects by a specific part is often crucial for safety and for executing downstream tasks. Yet, learning-based grasp planners lack this behavior unless they are trained on specific object part data, making it a significant challenge to scale object diversity. Instead, we propose LERF-TOGO, Language Embedded Radiance Fields for Task-Oriented Grasping of Objects, which uses vision-language… ▽ More

    Submitted 18 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: See the project website at: lerftogo.github.io

  14. arXiv:2308.10897  [pdf, other

    cs.CV

    Can Language Models Learn to Listen?

    Authors: Evonne Ng, Sanjay Subramanian, Dan Klein, Angjoo Kanazawa, Trevor Darrell, Shiry Ginosar

    Abstract: We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words. Given an input transcription of the speaker's words with their timestamps, our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE. Since gesture is a language component, we propose t… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: ICCV 2023; Project page: https://people.eecs.berkeley.edu/~evonne_ng/projects/text2listen/

  15. arXiv:2307.05473  [pdf, other

    cs.CV

    Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives

    Authors: Tom Monnier, Jake Austin, Angjoo Kanazawa, Alexei A. Efros, Mathieu Aubry

    Abstract: Given a set of calibrated images of a scene, we present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. While many approaches focus on recovering high-fidelity 3D scenes, we focus on parsing a scene into mid-level 3D representations made of a small set of textured primitives. Such representations are interpretable, easy to manipulate a… ▽ More

    Submitted 26 December, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: Project webpage with code and videos: https://www.tmonnier.com/DBW. V2 update includes comparisons based on NeuS, hyperparameter analysis and failure cases

  16. arXiv:2306.09337  [pdf, other

    cs.CV

    Generative Proxemics: A Prior for 3D Social Interaction from Images

    Authors: Lea Müller, Vickie Ye, Georgios Pavlakos, Michael Black, Angjoo Kanazawa

    Abstract: Social interaction is a fundamental aspect of human behavior and communication. The way individuals position themselves in relation to others, also known as proxemics, conveys social cues and affects the dynamics of social interaction. Reconstructing such interaction from images presents challenges because of mutual occlusion and the limited availability of large training datasets. To address this… ▽ More

    Submitted 12 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Project website: muelea.github.io/buddi

  17. arXiv:2305.20091  [pdf, other

    cs.CV

    Humans in 4D: Reconstructing and Tracking Humans with Transformers

    Authors: Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, Jitendra Malik

    Abstract: We present an approach to reconstruct humans and track them over time. At the core of our approach, we propose a fully "transformerized" version of a network for human mesh recovery. This network, HMR 2.0, advances the state of the art and shows the capability to analyze unusual poses that have in the past been difficult to reconstruct from single images. To analyze video, we use 3D reconstruction… ▽ More

    Submitted 31 August, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: In ICCV 2023. Project Webpage: https://shubham-goel.github.io/4dhumans/

  18. arXiv:2305.04966  [pdf, other

    cs.CV

    NerfAcc: Efficient Sampling Accelerates NeRFs

    Authors: Ruilong Li, Hang Gao, Matthew Tancik, Angjoo Kanazawa

    Abstract: Optimizing and rendering Neural Radiance Fields is computationally expensive due to the vast number of samples required by volume rendering. Recent works have included alternative sampling approaches to help accelerate their methods, however, they are often not the focus of the work. In this paper, we investigate and compare multiple sampling approaches and demonstrate that improved sampling is ge… ▽ More

    Submitted 24 October, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Website: https://www.nerfacc.com

    Journal ref: ICCV 2023

  19. arXiv:2304.10532  [pdf, other

    cs.CV cs.AI cs.GR

    Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs

    Authors: Frederik Warburg, Ethan Weber, Matthew Tancik, Aleksander Holynski, Angjoo Kanazawa

    Abstract: Casually captured Neural Radiance Fields (NeRFs) suffer from artifacts such as floaters or flawed geometry when rendered outside the camera trajectory. Existing evaluation protocols often do not capture these effects, since they usually only assess image quality at every 8th frame of the training capture. To push forward progress in novel-view synthesis, we propose a new dataset and evaluation pro… ▽ More

    Submitted 17 October, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: ICCV 2023, project page: https://ethanweber.me/nerfbusters

  20. arXiv:2304.02061  [pdf, other

    cs.CV

    Generating Continual Human Motion in Diverse 3D Scenes

    Authors: Aymen Mir, Xavier Puig, Angjoo Kanazawa, Gerard Pons-Moll

    Abstract: We introduce a method to synthesize animator guided human motion across 3D scenes. Given a set of sparse (3 or 4) joint locations (such as the location of a person's hand and two feet) and a seed motion sequence in a 3D scene, our method generates a plausible motion sequence starting from the seed motion while satisfying the constraints imposed by the provided keypoints. We decompose the continual… ▽ More

    Submitted 30 October, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  21. arXiv:2304.01199  [pdf, other

    cs.CV

    On the Benefits of 3D Pose and Tracking for Human Action Recognition

    Authors: Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Christoph Feichtenhofer, Jitendra Malik

    Abstract: In this work we study the benefits of using tracking and 3D poses for action recognition. To achieve this, we take the Lagrangian view on analysing actions over a trajectory of human motion rather than at a fixed point in space. Taking this stand allows us to use the tracklets of people to predict their actions. In this spirit, first we show the benefits of using 3D pose to infer actions, and stud… ▽ More

    Submitted 7 August, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: CVPR2023 (project page: https://brjathu.github.io/LART)

  22. arXiv:2303.12789  [pdf, other

    cs.CV cs.GR

    Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions

    Authors: Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa

    Abstract: We propose a method for editing NeRF scenes with text-instructions. Given a NeRF of a scene and the collection of images used to reconstruct it, our method uses an image-conditioned diffusion model (InstructPix2Pix) to iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction. We demonstrate that our proposed meth… ▽ More

    Submitted 1 June, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: Project website: https://instruct-nerf2nerf.github.io; v1. Revisions to related work and discussion

  23. arXiv:2303.09553  [pdf, other

    cs.CV cs.GR

    LERF: Language Embedded Radiance Fields

    Authors: Justin Kerr, Chung Min Kim, Ken Goldberg, Angjoo Kanazawa, Matthew Tancik

    Abstract: Humans describe the physical world using natural language to refer to specific 3D locations based on a vast range of properties: visual appearance, semantics, abstract associations, or actionable affordances. In this work we propose Language Embedded Radiance Fields (LERFs), a method for grounding language embeddings from off-the-shelf models like CLIP into NeRF, which enable these types of open-e… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: Project website can be found at https://lerf.io

  24. arXiv:2303.04383  [pdf, ps, other

    math.AG

    BCOV cusp forms of lattice polarized K3 surfaces

    Authors: Shinobu Hosono, Atsushi Kanazawa

    Abstract: We introduce the BCOV formula for the lattice polarized K3 surfaces. We find that it yields cusp forms expressed by certain eta products for many families of rank 19 lattice polarized K3 surfaces over $\mathbb{P}^{1}$. Moreover, for Clingher-Doran's family of $U\oplus E_{8}(-1)\oplus E_{7}(-1)$-polarized K3 surfaces, we obtain the Igusa cusp forms $χ_{10}$ and $χ_{12}$ from the formula. Inspired b… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: 36 pages + 10 pages

  25. arXiv:2302.12827  [pdf, other

    cs.CV

    Decoupling Human and Camera Motion from Videos in the Wild

    Authors: Vickie Ye, Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

    Abstract: We propose a method to reconstruct global human trajectories from videos in the wild. Our optimization method decouples the camera and human motion, which allows us to place people in the same world coordinate frame. Most existing methods do not model the camera motion; methods that rely on the background pixels to infer 3D human motion usually require a full scene reconstruction, which is often n… ▽ More

    Submitted 20 March, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: Project site: https://vye16.github.io/slahmr. CVPR 2023

  26. Nerfstudio: A Modular Framework for Neural Radiance Field Development

    Authors: Matthew Tancik, Ethan Weber, Evonne Ng, Ruilong Li, Brent Yi, Justin Kerr, Terrance Wang, Alexander Kristoffersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, David McAllister, Angjoo Kanazawa

    Abstract: Neural Radiance Fields (NeRF) are a rapidly growing area of research with wide-ranging applications in computer vision, graphics, robotics, and more. In order to streamline the development and deployment of NeRF research, we propose a modular PyTorch framework, Nerfstudio. Our framework includes plug-and-play components for implementing NeRF-based methods, which make it easy for researchers and pr… ▽ More

    Submitted 16 October, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: Project page at https://nerf.studio

  27. arXiv:2301.10241  [pdf, other

    cs.CV

    K-Planes: Explicit Radiance Fields in Space, Time, and Appearance

    Authors: Sara Fridovich-Keil, Giacomo Meanti, Frederik Warburg, Benjamin Recht, Angjoo Kanazawa

    Abstract: We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions. Our model uses d choose 2 planes to represent a d-dimensional scene, providing a seamless way to go from static (d=3) to dynamic (d=4) scenes. This planar factorization makes adding dimension-specific priors easy, e.g. temporal smoothness and multi-resolution spatial structure, and induces a natural decomposition… ▽ More

    Submitted 24 March, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

    Comments: Project page https://sarafridov.github.io/K-Planes/

  28. arXiv:2210.13445  [pdf, other

    cs.CV

    Monocular Dynamic View Synthesis: A Reality Check

    Authors: Hang Gao, Ruilong Li, Shubham Tulsiani, Bryan Russell, Angjoo Kanazawa

    Abstract: We study the recent progress on dynamic view synthesis (DVS) from monocular video. Though existing approaches have demonstrated impressive results, we show a discrepancy between the practical capture process and the existing experimental protocols, which effectively leaks in multi-view signals during training. We define effective multi-view factors (EMFs) to quantify the amount of multi-view signa… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022. Project page: https://hangg7.com/dycheck. Code: https://github.com/KAIR-BAIR/dycheck

  29. arXiv:2210.04847  [pdf, ps, other

    cs.CV cs.GR

    NerfAcc: A General NeRF Acceleration Toolbox

    Authors: Ruilong Li, Matthew Tancik, Angjoo Kanazawa

    Abstract: We propose NerfAcc, a toolbox for efficient volumetric rendering of radiance fields. We build on the techniques proposed in Instant-NGP, and extend these techniques to not only support bounded static scenes, but also for dynamic scenes and unbounded scenes. NerfAcc comes with a user-friendly Python API, and is ready for plug-and-play acceleration of most NeRFs. Various examples are provided to sho… ▽ More

    Submitted 10 May, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Webpage: https://www.nerfacc.com/; Updated Write-up: arXiv:2305.04966

  30. arXiv:2209.02836  [pdf, other

    cs.CV cs.LG

    Studying Bias in GANs through the Lens of Race

    Authors: Vongani H. Maluleke, Neerja Thakkar, Tim Brooks, Ethan Weber, Trevor Darrell, Alexei A. Efros, Angjoo Kanazawa, Devin Guillory

    Abstract: In this work, we study how the performance and evaluation of generative image models are impacted by the racial composition of their training datasets. By examining and controlling the racial distributions in various training datasets, we are able to observe the impacts of different training distributions on generated image quality and the racial distributions of the generated images. Our results… ▽ More

    Submitted 14 September, 2022; v1 submitted 6 September, 2022; originally announced September 2022.

    Comments: ECCV 2022. Project Page: https://neerja.me/bias-gans/

    ACM Class: I.4

  31. arXiv:2207.14279  [pdf, other

    cs.CV

    The One Where They Reconstructed 3D Humans and Environments in TV Shows

    Authors: Georgios Pavlakos, Ethan Weber, Matthew Tancik, Angjoo Kanazawa

    Abstract: TV shows depict a wide variety of human behaviors and have been studied extensively for their potential to be a rich source of data for many applications. However, the majority of the existing work focuses on 2D recognition tasks. In this paper, we make the observation that there is a certain persistence in TV shows, i.e., repetition of the environments and the humans, which makes possible the 3D… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

    Comments: ECCV 2022. Project page: http://ethanweber.me/sitcoms3D/

  32. arXiv:2207.11148  [pdf, other

    cs.CV

    InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images

    Authors: Zhengqi Li, Qianqian Wang, Noah Snavely, Angjoo Kanazawa

    Abstract: We present a method for learning to generate unbounded flythrough videos of natural scenes starting from a single view, where this capability is learned from a collection of single photographs, without requiring camera poses or even multiple views of each scene. To achieve this, we propose a novel self-supervised view generation training paradigm, where we sample and rendering virtual camera traje… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: ECCV 2022 (Oral Presentation)

  33. arXiv:2206.10457  [pdf, other

    cs.CV

    Domain Adaptive 3D Pose Augmentation for In-the-wild Human Mesh Recovery

    Authors: Zhenzhen Weng, Kuan-Chieh Wang, Angjoo Kanazawa, Serena Yeung

    Abstract: The ability to perceive 3D human bodies from a single image has a multitude of applications ranging from entertainment and robotics to neuroscience and healthcare. A fundamental challenge in human mesh recovery is in collecting the ground truth 3D mesh targets required for training, which requires burdensome motion capturing systems and is often limited to indoor laboratories. As a result, while p… ▽ More

    Submitted 13 September, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

  34. arXiv:2206.08929  [pdf, other

    cs.CV cs.AI

    TAVA: Template-free Animatable Volumetric Actors

    Authors: Ruilong Li, Julian Tanke, Minh Vo, Michael Zollhofer, Jurgen Gall, Angjoo Kanazawa, Christoph Lassner

    Abstract: Coordinate-based volumetric representations have the potential to generate photo-realistic virtual avatars from images. However, virtual avatars also need to be controllable even to a novel pose that may not have been observed. Traditional techniques, such as LBS, provide such a function; yet it usually requires a hand-designed body template, 3D scan data, and limited appearance models. On the oth… ▽ More

    Submitted 20 June, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: Code: https://github.com/facebookresearch/tava; Project Website: https://www.liruilong.cn/projects/tava/

  35. arXiv:2204.08451  [pdf, other

    cs.CV

    Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion

    Authors: Evonne Ng, Hanbyul Joo, Liwen Hu, Hao Li, Trevor Darrell, Angjoo Kanazawa, Shiry Ginosar

    Abstract: We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion. We combine the motion and speech audio of the speaker using a motion-audio cross attention transformer. Furthermore, we enable non-deterministic prediction by learning a discrete latent rep… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

  36. arXiv:2204.07151  [pdf, other

    cs.CV

    Deformable Sprites for Unsupervised Video Decomposition

    Authors: Vickie Ye, Zhengqi Li, Richard Tucker, Angjoo Kanazawa, Noah Snavely

    Abstract: We describe a method to extract persistent elements of a dynamic scene from an input video. We represent each scene element as a \emph{Deformable Sprite} consisting of three components: 1) a 2D texture image for the entire video, 2) per-frame masks for the element, and 3) non-rigid deformations that map the texture image into each video frame. The resulting decomposition allows for applications su… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: CVPR 2022 Oral. Project Site: https://deformable-sprites.github.io

  37. arXiv:2112.05131  [pdf, other

    cs.CV cs.GR

    Plenoxels: Radiance Fields without Neural Networks

    Authors: Alex Yu, Sara Fridovich-Keil, Matthew Tancik, Qinhong Chen, Benjamin Recht, Angjoo Kanazawa

    Abstract: We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis. Plenoxels represent a scene as a sparse 3D grid with spherical harmonics. This representation can be optimized from calibrated images via gradient methods and regularization without any neural components. On standard, benchmark tasks, Plenoxels are optimized two orders of magnitude faster than Neural Radiance Fi… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

    Comments: For video and code, please see https://alexyu.net/plenoxels

  38. arXiv:2112.04477  [pdf, other

    cs.CV

    Tracking People by Predicting 3D Appearance, Location & Pose

    Authors: Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

    Abstract: In this paper, we present an approach for tracking people in monocular videos, by predicting their future 3D representations. To achieve this, we first lift people to 3D from a single frame in a robust way. This lifting includes information about the 3D pose of the person, his or her location in the 3D space, and the 3D appearance. As we track a person, we collect 3D observations over time in a tr… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

    Comments: Project Page : https://brjathu.github.io/PHALP/

  39. arXiv:2111.07868  [pdf, other

    cs.CV

    Tracking People with 3D Representations

    Authors: Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

    Abstract: We present a novel approach for tracking multiple people in video. Unlike past approaches which employ 2D representations, we focus on using 3D representations of people, located in three-dimensional space. To this end, we develop a method, Human Mesh and Appearance Recovery (HMAR) which in addition to extracting the 3D geometry of the person as a SMPL mesh, also extracts appearance as a texture m… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  40. arXiv:2108.07262  [pdf, ps, other

    math.AG math-ph math.SG

    Attractor mechanisms of moduli spaces of Calabi-Yau 3-folds

    Authors: Yu-Wei Fan, Atsushi Kanazawa

    Abstract: We investigate the complex and Kähler attractor mechanisms of moduli spaces of Calabi-Yau 3-folds. The complex attractor mechanism was previously studied by Ferrara-Kallosh-Strominger, Moore and others in string theory. It is concerned with the minimizing problems of the normalized central charges of 3-cycles and defines a new interesting class of Calabi-Yau 3-folds called, the complex attractor v… ▽ More

    Submitted 16 August, 2021; originally announced August 2021.

    MSC Class: 14J33; 14J32; 14J28; 53D37; 32Q15; 32Q25

  41. arXiv:2108.05197  [pdf, ps, other

    math.AG math-ph math.SG

    Mirror symmetry and rigid structures of generalized K3 surfaces

    Authors: Atsushi Kanazawa

    Abstract: The present article is concerned with mirror symmetry for generalized K3 surfaces, with particular emphasis on complex and Kähler rigid structures. Inspired by the works of Dolgachev, Aspinwall-Morrison and Huybrechts, we introduce a formulation of mirror symmetry for generalized K3 surfaces by Mukai lattice polarizations, fixing the problems in the conventional formulation of mirror symmetry for… ▽ More

    Submitted 18 January, 2024; v1 submitted 11 August, 2021; originally announced August 2021.

    Comments: exposition improved

    MSC Class: 14J33; 14J32; 14J28; 53D37

  42. arXiv:2104.11224  [pdf, other

    cs.CV cs.GR

    KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

    Authors: Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, Angjoo Kanazawa

    Abstract: We introduce KeypointDeformer, a novel unsupervised method for shape control through automatically discovered 3D keypoints. We cast this as the problem of aligning a source 3D object to a target 3D object from the same object category. Our method analyzes the difference between the shapes of the two objects by comparing their latent representations. This latent representation is in the form of 3D… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 (oral). Project page: http://tomasjakab.github.io/KeypointDeformer

  43. arXiv:2104.03954  [pdf, other

    cs.CV cs.GR

    De-rendering the World's Revolutionary Artefacts

    Authors: Shangzhe Wu, Ameesh Makadia, Jiajun Wu, Noah Snavely, Richard Tucker, Angjoo Kanazawa

    Abstract: Recent works have shown exciting results in unsupervised image de-rendering -- learning to decompose 3D shape, appearance, and lighting from single-image collections without explicit supervision. However, many of these assume simplistic material and lighting models. We propose a method, termed RADAR, that can recover environment illumination and surface materials from real single-image collections… ▽ More

    Submitted 31 August, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: CVPR 2021. Project page: https://sorderender.github.io/

  44. AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control

    Authors: Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, Angjoo Kanazawa

    Abstract: Synthesizing graceful and life-like behaviors for physically simulated characters has been a fundamental challenge in computer animation. Data-driven methods that leverage motion tracking are a prominent class of techniques for producing high fidelity motions for a wide range of behaviors. However, the effectiveness of these tracking-based methods often hinges on carefully designed objective funct… ▽ More

    Submitted 12 May, 2022; v1 submitted 5 April, 2021; originally announced April 2021.

  45. arXiv:2103.14024  [pdf, other

    cs.CV cs.GR

    PlenOctrees for Real-time Rendering of Neural Radiance Fields

    Authors: Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, Angjoo Kanazawa

    Abstract: We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octree-based 3D representation which supports view-dependent effects. Our method can render 800x800 images at more than 150 FPS, which is over 3000 times faster than conventional NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform free-viewpoint rendering of sc… ▽ More

    Submitted 17 August, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: ICCV 2021 (Oral)

  46. arXiv:2101.08779  [pdf, other

    cs.CV cs.GR cs.MM

    AI Choreographer: Music Conditioned 3D Dance Generation with AIST++

    Authors: Ruilong Li, Shan Yang, David A. Ross, Angjoo Kanazawa

    Abstract: We present AIST++, a new multi-modal dataset of 3D dance motion and music, along with FACT, a Full-Attention Cross-modal Transformer network for generating 3D dance motion conditioned on music. The proposed AIST++ dataset contains 5.2 hours of 3D dance motion in 1408 sequences, covering 10 dance genres with multi-view videos with known camera poses -- the largest dataset of this kind to our knowle… ▽ More

    Submitted 30 July, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

    Comments: Project page: https://google.github.io/aichoreographer/; Dataset page: https://google.github.io/aistplusplus_dataset/

  47. arXiv:2012.09856  [pdf, other

    cs.CV

    Reconstructing Hand-Object Interactions in the Wild

    Authors: Zhe Cao, Ilija Radosavovic, Angjoo Kanazawa, Jitendra Malik

    Abstract: In this work we explore reconstructing hand-object interactions in the wild. The core challenge of this problem is the lack of appropriate 3D labeled data. To overcome this issue, we propose an optimization-based procedure which does not require direct 3D supervision. The general strategy we adopt is to exploit all available related data (2D bounding boxes, 2D hand keypoints, 2D instance masks, 3D… ▽ More

    Submitted 30 December, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: Project page: https://people.eecs.berkeley.edu/~zhecao/rhoi/

  48. arXiv:2012.09855  [pdf, other

    cs.CV cs.GR

    Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image

    Authors: Andrew Liu, Richard Tucker, Varun Jampani, Ameesh Makadia, Noah Snavely, Angjoo Kanazawa

    Abstract: We introduce the problem of perpetual view generation - long-range generation of novel views corresponding to an arbitrarily long camera trajectory given a single image. This is a challenging problem that goes far beyond the capabilities of current view synthesis methods, which quickly degenerate when presented with large camera motions. Methods for video generation also have limited ability to pr… ▽ More

    Submitted 30 November, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: ICCV 2021 (oral); Project page: https://infinite-nature.github.io/; Video: https://www.youtube.com/watch?v=oXUf6anNAtc

  49. arXiv:2012.09843  [pdf, other

    cs.CV

    Human Mesh Recovery from Multiple Shots

    Authors: Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

    Abstract: Videos from edited media like movies are a useful, yet under-explored source of information. The rich variety of appearance and interactions between humans depicted over a large temporal context in these films could be a valuable source of data. However, the richness of data comes at the expense of fundamental challenges such as abrupt shot changes and close up shots of actors with heavy truncatio… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

  50. arXiv:2012.02190  [pdf, other

    cs.CV cs.GR cs.LG

    pixelNeRF: Neural Radiance Fields from One or Few Images

    Authors: Alex Yu, Vickie Ye, Matthew Tancik, Angjoo Kanazawa

    Abstract: We propose pixelNeRF, a learning framework that predicts a continuous neural scene representation conditioned on one or few input images. The existing approach for constructing neural radiance fields involves optimizing the representation to every scene independently, requiring many calibrated views and significant compute time. We take a step towards resolving these shortcomings by introducing an… ▽ More

    Submitted 30 May, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

    Comments: CVPR 2021