Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 144 results for author: Black, M J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08472  [pdf, other

    cs.LG cs.AI

    RILe: Reinforced Imitation Learning

    Authors: Mert Albaba, Sammy Christen, Christoph Gebhardt, Thomas Langarek, Michael J. Black, Otmar Hilliges

    Abstract: Reinforcement Learning has achieved significant success in generating complex behavior but often requires extensive reward function engineering. Adversarial variants of Imitation Learning and Inverse Reinforcement Learning offer an alternative by learning policies from expert demonstrations via a discriminator. Employing discriminators increases their data- and computational efficiency over the st… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2405.14869  [pdf, other

    cs.CV cs.AI cs.GR

    PuzzleAvatar: Assembling 3D Avatars from Personal Albums

    Authors: Yuliang Xiu, Yufei Ye, Zhen Liu, Dimitrios Tzionas, Michael J. Black

    Abstract: Generating personalized 3D avatars is crucial for AR/VR. However, recent text-to-3D methods that generate avatars for celebrities or fictional characters, struggle with everyday people. Methods for faithful reconstruction typically require full-body images in controlled settings. What if a user could just upload their personal "OOTD" (Outfit Of The Day) photo collection and get a faithful avatar i… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: video: https://youtu.be/0hpXH2tVPk4

  3. ContourCraft: Learning to Resolve Intersections in Neural Multi-Garment Simulations

    Authors: Artur Grigorev, Giorgio Becherini, Michael J. Black, Otmar Hilliges, Bernhard Thomaszewski

    Abstract: Learning-based approaches to cloth simulation have started to show their potential in recent years. However, handling collisions and intersections in neural simulations remains a largely unsolved problem. In this work, we present \moniker{}, a learning-based solution for handling intersections in neural cloth simulations. Unlike conventional approaches that critically rely on intersection-free inp… ▽ More

    Submitted 24 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted for publication by SIGGRAPH 2024, conference track

  4. arXiv:2405.04533  [pdf, other

    cs.CV cs.LG

    ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning

    Authors: Jing Lin, Yao Feng, Weiyang Liu, Michael J. Black

    Abstract: Numerous methods have been proposed to detect, estimate, and analyze properties of people in images, including the estimation of 3D pose, shape, contact, human-object interaction, emotion, and more. Each of these methods works in isolation instead of synergistically. Here we address this problem and build a language-driven human understanding system -- ChatHuman, which combines and integrates the… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Project page: https://chathuman.github.io

  5. arXiv:2404.16752  [pdf, other

    cs.CV

    TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

    Authors: Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Yao Feng, Michael J. Black

    Abstract: We address the problem of regressing 3D human pose and shape from a single image, with a focus on 3D accuracy. The current best methods leverage large datasets of 3D pseudo-ground-truth (p-GT) and 2D keypoints, leading to robust performance. With such methods, we observe a paradoxical decline in 3D pose accuracy with increasing 2D accuracy. This is caused by biases in the p-GT and the use of an ap… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  6. arXiv:2404.15383  [pdf, other

    cs.CV cs.AI

    WANDR: Intention-guided Human Motion Generation

    Authors: Markos Diomataris, Nikos Athanasiou, Omid Taheri, Xi Wang, Otmar Hilliges, Michael J. Black

    Abstract: Synthesizing natural human motions that enable a 3D human avatar to walk and reach for arbitrary goals in 3D space remains an unsolved problem with many applications. Existing methods (data-driven or using reinforcement learning) are limited in terms of generalization and motion naturalness. A primary obstacle is the scarcity of training data that combines locomotion with goal reaching. To address… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  7. arXiv:2404.15228  [pdf, other

    cs.CV cs.CL

    Re-Thinking Inverse Graphics With Large Language Models

    Authors: Peter Kulits, Haiwen Feng, Weiyang Liu, Victoria Abrevaya, Michael J. Black

    Abstract: Inverse graphics -- the task of inverting an image into physical variables that, when rendered, enable reproduction of the observed scene -- is a fundamental challenge in computer vision and graphics. Disentangling an image into its constituent elements, such as the shape, color, and material properties of the objects of the 3D scene that produced it, requires a comprehensive understanding of the… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 31 pages; project page: https://ig-llm.is.tue.mpg.de/

  8. arXiv:2404.10685  [pdf, other

    cs.CV cs.GR

    Generating Human Interaction Motions in Scenes with Text Control

    Authors: Hongwei Yi, Justus Thies, Michael J. Black, Xue Bin Peng, Davis Rempe

    Abstract: We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Previous text-to-motion methods focus on characters in isolation without considering scenes due to the limited availability of datasets that include motion, text descriptions, and interactive scenes. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model,… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Project Page: https://research.nvidia.com/labs/toronto-ai/tesmo/

  9. arXiv:2404.03042  [pdf, other

    cs.CV

    AWOL: Analysis WithOut synthesis using Language

    Authors: Silvia Zuffi, Michael J. Black

    Abstract: Many classical parametric 3D shape models exist, but creating novel shapes with such models requires expert knowledge of their parameters. For example, imagine creating a specific type of tree using procedural graphics or a new kind of animal from a statistical shape model. Our key idea is to leverage language to control such existing models to produce novel shapes. This involves learning a mappin… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  10. arXiv:2403.14611  [pdf, other

    cs.CV

    Explorative Inbetweening of Time and Space

    Authors: Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black, Xuaner Zhang

    Abstract: We introduce bounded generation as a generalized task to control video generation to synthesize arbitrary camera and subject motion based only on a given start and end frame. Our objective is to fully leverage the inherent generalization capability of an image-to-video model without additional training or fine-tuning of the original model. This is achieved through the proposed new sampling strateg… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: project page at https://time-reversal.github.io

  11. arXiv:2401.08559  [pdf, other

    cs.CV cs.GR cs.LG

    Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation

    Authors: Mathis Petrovich, Or Litany, Umar Iqbal, Michael J. Black, Gül Varol, Xue Bin Peng, Davis Rempe

    Abstract: Recent advances in generative modeling have led to promising progress on synthesizing 3D human motion from text, with methods that can generate character animations from short prompts and specified durations. However, using a single text prompt as input lacks the fine-grained control needed by animators, such as composing multiple actions and defining precise durations for parts of the motion. To… ▽ More

    Submitted 24 May, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: CVPR 2024, HuMoGen Workshop

  12. arXiv:2401.00374  [pdf, other

    cs.CV

    EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

    Authors: Haiyang Liu, Zihao Zhu, Giorgio Becherini, Yichen Peng, Mingyang Su, You Zhou, Xuefei Zhe, Naoya Iwamoto, Bo Zheng, Michael J. Black

    Abstract: We propose EMAGE, a framework to generate full-body human gestures from audio and masked gestures, encompassing facial, local body, hands, and global movements. To achieve this, we first introduce BEAT2 (BEAT-SMPLX-FLAME), a new mesh-level holistic co-speech dataset. BEAT2 combines a MoShed SMPL-X body with FLAME head parameters and further refines the modeling of head, neck, and finger movements,… ▽ More

    Submitted 30 March, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

    Comments: Fix typos; Conflict of Interest Disclosure; CVPR Camera Ready; Project Page: https://pantomatrix.github.io/EMAGE/

  13. arXiv:2312.16737  [pdf, other

    cs.CV

    HMP: Hand Motion Priors for Pose and Shape Estimation from Video

    Authors: Enes Duran, Muhammed Kocabas, Vasileios Choutas, Zicong Fan, Michael J. Black

    Abstract: Understanding how humans interact with the world necessitates accurate 3D hand pose estimation, a task complicated by the hand's high degree of articulation, frequent occlusions, self-occlusions, and rapid motions. While most existing methods rely on single-image inputs, videos have useful cues to address aforementioned issues. However, existing video-based 3D hand datasets are insufficient for tr… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Journal ref: WACV 2024

  14. arXiv:2312.14579  [pdf, other

    cs.CV

    Environment-Specific People

    Authors: Mirela Ostrek, Soubhik Sanyal, Carol O'Sullivan, Michael J. Black, Justus Thies

    Abstract: Despite significant progress in generative image synthesis and full-body generation in particular, state-of-the-art methods are either context-independent, overly reliant to text prompts, or bound to the curated training datasets, such as fashion images with monotonous backgrounds. Here, our goal is to generate people in clothing that is semantically appropriate for a given scene. To this end, we… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  15. arXiv:2312.11666  [pdf, other

    cs.CV cs.GR

    HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

    Authors: Vanessa Sklyarova, Egor Zakharov, Otmar Hilliges, Michael J. Black, Justus Thies

    Abstract: We present HAAR, a new strand-based generative model for 3D human hairstyles. Specifically, based on textual inputs, HAAR produces 3D hairstyles that could be used as production-level assets in modern computer graphics engines. Current AI-based generative models take advantage of powerful 2D priors to reconstruct 3D content in the form of point clouds, meshes, or volumetric functions. However, by… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: For more results please refer to the project page https://haar.is.tue.mpg.de/

  16. arXiv:2312.07531  [pdf, other

    cs.CV

    WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

    Authors: Soyong Shin, Juyong Kim, Eni Halilaj, Michael J. Black

    Abstract: The estimation of 3D human motion from video has progressed rapidly but current methods still have several key limitations. First, most methods estimate the human in camera coordinates. Second, prior work on estimating humans in global coordinates often assumes a flat ground plane and produces foot sliding. Third, the most accurate methods rely on computationally expensive optimization pipelines,… ▽ More

    Submitted 18 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  17. arXiv:2312.04466  [pdf, other

    cs.CV

    Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion

    Authors: Kiran Chhatre, Radek Daněček, Nikos Athanasiou, Giorgio Becherini, Christopher Peters, Michael J. Black, Timo Bolkart

    Abstract: Existing methods for synthesizing 3D human gestures from speech have shown promising results, but they do not explicitly model the impact of emotions on the generated gestures. Instead, these methods directly output animations from speech without control over the expressed emotion. To address this limitation, we present AMUSE, an emotional speech-driven body animation model based on latent diffusi… ▽ More

    Submitted 1 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2024. Webpage: https://amuse.is.tue.mpg.de/

  18. arXiv:2311.18836  [pdf, other

    cs.CV

    ChatPose: Chatting about 3D Human Pose

    Authors: Yao Feng, Jing Lin, Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Michael J. Black

    Abstract: We introduce ChatPose, a framework employing Large Language Models (LLMs) to understand and reason about 3D human poses from images or textual descriptions. Our work is motivated by the human ability to intuitively understand postures from a single image or a brief description, a process that intertwines image interpretation, world knowledge, and an understanding of body language. Traditional huma… ▽ More

    Submitted 23 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Home page: https://yfeng95.github.io/ChatPose/

  19. arXiv:2311.18448  [pdf, other

    cs.CV

    HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

    Authors: Zicong Fan, Maria Parelli, Maria Eleni Kadoglou, Muhammed Kocabas, Xu Chen, Michael J. Black, Otmar Hilliges

    Abstract: Since humans interact with diverse objects every day, the holistic 3D capture of these interactions is important to understand and model human behaviour. However, most existing methods for hand-object reconstruction from RGB either assume pre-scanned object templates or heavily rely on limited 3D hand-object data, restricting their ability to scale and generalize to more unconstrained interaction… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  20. arXiv:2311.06243  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

    Authors: Weiyang Liu, Zeju Qiu, Yao Feng, Yuliang Xiu, Yuxuan Xue, Longhui Yu, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, Bernhard Schölkopf

    Abstract: Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly larg… ▽ More

    Submitted 28 April, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: ICLR 2024 (v2: 34 pages, 19 figures)

  21. FLARE: Fast Learning of Animatable and Relightable Mesh Avatars

    Authors: Shrisha Bharadwaj, Yufeng Zheng, Otmar Hilliges, Michael J. Black, Victoria Fernandez-Abrevaya

    Abstract: Our goal is to efficiently learn personalized animatable 3D head avatars from videos that are geometrically accurate, realistic, relightable, and compatible with current rendering systems. While 3D meshes enable efficient processing and are highly portable, they lack realism in terms of shape and appearance. Neural representations, on the other hand, are realistic but lack compatibility and are sl… ▽ More

    Submitted 27 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: 15 pages, Accepted: ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia), 2023

    Journal ref: Volume 42, article number 204, year 2023

  22. arXiv:2310.15168  [pdf, other

    cs.CV cs.GR cs.LG

    Ghost on the Shell: An Expressive Representation of General 3D Shapes

    Authors: Zhen Liu, Yao Feng, Yuliang Xiu, Weiyang Liu, Liam Paull, Michael J. Black, Bernhard Schölkopf

    Abstract: The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they 1) enable fast physics-based rendering with realistic material and lighting, 2) support physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D s… ▽ More

    Submitted 24 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 Oral (v3: 30 pages, 19 figures, Project Page: https://gshell3d.github.io/)

  23. arXiv:2310.13768  [pdf, other

    cs.CV

    PACE: Human and Camera Motion Estimation from in-the-wild Videos

    Authors: Muhammed Kocabas, Ye Yuan, Pavlo Molchanov, Yunrong Guo, Michael J. Black, Otmar Hilliges, Jan Kautz, Umar Iqbal

    Abstract: We present a method to estimate human motion in a global scene from moving cameras. This is a highly challenging task due to the coupling of human and camera motions in the video. To address this problem, we propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features. Unlike existing methods that use SLAM… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 3DV 2024. Project page: https://nvlabs.github.io/PACE/

  24. arXiv:2310.09449  [pdf, other

    cs.CV cs.LG

    Pairwise Similarity Learning is SimPLE

    Authors: Yandong Wen, Weiyang Liu, Yao Feng, Bhiksha Raj, Rita Singh, Adrian Weller, Michael J. Black, Bernhard Schölkopf

    Abstract: In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL). PSL subsumes a wide range of important applications, such as open-set face recognition, speaker verification, image retrieval and person re-identification. The goal of PSL is to learn a pairwise similarity function assigning a higher similarity score to positive pairs (i.e., a pair of samples w… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Published in ICCV 2023 (Project page: https://simple.is.tue.mpg.de/)

  25. arXiv:2309.15273  [pdf, other

    cs.CV

    DECO: Dense Estimation of 3D Human-Scene Contact In The Wild

    Authors: Shashank Tripathi, Agniv Chatterjee, Jean-Claude Passy, Hongwei Yi, Dimitrios Tzionas, Michael J. Black

    Abstract: Understanding how humans use physical contact to interact with the world is key to enabling human-centric artificial intelligence. While inferring 3D contact is crucial for modeling realistic and physically-plausible human-object interactions, existing methods either focus on 2D, consider body joints rather than the surface, use coarse 3D body regions, or do not generalize to in-the-wild images. I… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted as Oral in ICCV'23. Project page: https://deco.is.tue.mpg.de

  26. arXiv:2309.07125  [pdf, other

    cs.CV

    Text-Guided Generation and Editing of Compositional 3D Avatars

    Authors: Hao Zhang, Yao Feng, Peter Kulits, Yandong Wen, Justus Thies, Michael J. Black

    Abstract: Our goal is to create a realistic 3D facial avatar with hair and accessories using only a text description. While this challenge has attracted significant recent interest, existing methods either lack realism, produce unrealistic shapes, or do not support editing, such as modifications to the hairstyle. We argue that existing methods are limited because they employ a monolithic modeling approach,… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Home page: https://yfeng95.github.io/teca

  27. arXiv:2309.06441  [pdf, other

    cs.CV cs.AI cs.GR

    Learning Disentangled Avatars with Hybrid 3D Representations

    Authors: Yao Feng, Weiyang Liu, Timo Bolkart, Jinlong Yang, Marc Pollefeys, Michael J. Black

    Abstract: Tremendous efforts have been made to learn animatable and photorealistic human avatars. Towards this end, both explicit and implicit 3D representations are heavily studied for a holistic modeling and capture of the whole human (e.g., body, clothing, face and hair), but neither representation is an optimal choice in terms of representation efficacy since different parts of the human avatar have dif… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: home page: https://yfeng95.github.io/delta. arXiv admin note: text overlap with arXiv:2210.01868

  28. arXiv:2308.12965  [pdf, other

    cs.CV

    POCO: 3D Pose and Shape Estimation with Confidence

    Authors: Sai Kumar Dwivedi, Cordelia Schmid, Hongwei Yi, Michael J. Black, Dimitrios Tzionas

    Abstract: The regression of 3D Human Pose and Shape (HPS) from an image is becoming increasingly accurate. This makes the results useful for downstream tasks like human action recognition or 3D graphics. Yet, no regressor is perfect, and accuracy can be affected by ambiguous image evidence or by poses and appearance that are unseen during training. Most current HPS regressors, however, do not report the con… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  29. arXiv:2308.11617  [pdf, other

    cs.CV

    GRIP: Generating Interaction Poses Using Latent Consistency and Spatial Cues

    Authors: Omid Taheri, Yi Zhou, Dimitrios Tzionas, Yang Zhou, Duygu Ceylan, Soren Pirk, Michael J. Black

    Abstract: Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment. Consequently, modeling realistic hand-object interactions, including the subtle motion of individual fingers, is critical for applications in computer graphics, computer vision, and mixed reality. Prior work on capturing and modeling humans interacting with objects in 3… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: The project has been started during Omid Taheri's internship at Adobe and as a collaboration with the Max Planck Institute for Intelligent Systems

  30. arXiv:2308.10899  [pdf, other

    cs.AI

    TADA! Text to Animatable Digital Avatars

    Authors: Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxaing Tang, Yangyi Huang, Justus Thies, Michael J. Black

    Abstract: We introduce TADA, a simple-yet-effective approach that takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures, that can be animated and rendered with traditional graphics pipelines. Existing text-based character generation methods are limited in terms of geometry and texture quality, and cannot be realistically animated due to inconsistent a… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  31. arXiv:2308.10638  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes

    Authors: Soubhik Sanyal, Partha Ghosh, Jinlong Yang, Michael J. Black, Justus Thies, Timo Bolkart

    Abstract: We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans. Specifically, we devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies. Training such a model is challenging, as datasets of textured 3D meshes for humans are limited in size and accessibility. Our key observation is that there exist medium-s… ▽ More

    Submitted 6 May, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: Updated to camera ready version of CVPR 2024

  32. arXiv:2307.09882  [pdf, other

    cs.LG cs.AI

    Adversarial Likelihood Estimation With One-Way Flows

    Authors: Omri Ben-Dov, Pravir Singh Gupta, Victoria Abrevaya, Michael J. Black, Partha Ghosh

    Abstract: Generative Adversarial Networks (GANs) can produce high-quality samples, but do not provide an estimate of the probability density around the samples. However, it has been noted that maximizing the log-likelihood within an energy-based setting can lead to an adversarial framework where the discriminator provides unnormalized density (often called energy). We further develop this perspective, incor… ▽ More

    Submitted 2 October, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

  33. arXiv:2306.16940  [pdf, other

    cs.CV

    BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion

    Authors: Michael J. Black, Priyanka Patel, Joachim Tesch, Jinlong Yang

    Abstract: We show, for the first time, that neural networks trained only on synthetic data achieve state-of-the-art accuracy on the problem of 3D human pose and shape (HPS) estimation from real images. Previous synthetic datasets have been small, unrealistic, or lacked realistic clothing. Achieving sufficient realism is non-trivial and we show how to do this for full bodies in motion. Specifically, our BEDL… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Journal ref: CVPR 2023

  34. Emotional Speech-Driven Animation with Content-Emotion Disentanglement

    Authors: Radek Daněček, Kiran Chhatre, Shashank Tripathi, Yandong Wen, Michael J. Black, Timo Bolkart

    Abstract: To be widely adopted, 3D facial avatars must be animated easily, realistically, and directly from speech signals. While the best recent methods generate 3D animations that are synchronized with the input audio, they largely ignore the impact of emotions on facial expressions. Realistic facial animation requires lip-sync together with the natural expression of emotion. To that end, we propose EMOTE… ▽ More

    Submitted 26 September, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: SIGGRAPH Asia 2023 Conference Paper

  35. arXiv:2306.07437  [pdf, other

    cs.CV

    Instant Multi-View Head Capture through Learnable Registration

    Authors: Timo Bolkart, Tianye Li, Michael J. Black

    Abstract: Existing methods for capturing datasets of 3D heads in dense semantic correspondence are slow, and commonly address the problem in two separate steps; multi-view stereo (MVS) reconstruction followed by non-rigid registration. To simplify this process, we introduce TEMPEH (Towards Estimation of 3D Meshes from Performances of Expressive Heads) to directly infer 3D heads in dense correspondence from… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2023

  36. arXiv:2306.02850  [pdf, other

    cs.CV

    TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments

    Authors: Yu Sun, Qian Bao, Wu Liu, Tao Mei, Michael J. Black

    Abstract: Although the estimation of 3D human pose and shape (HPS) is rapidly progressing, current methods still cannot reliably estimate moving humans in global coordinates, which is critical for many applications. This is particularly challenging when the camera is also moving, entangling human and camera motion. To address these issues, we adopt a novel 5D representation (space, time, and identity) that… ▽ More

    Submitted 20 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Project page: https://www.yusun.work/TRACE/TRACE.html

  37. arXiv:2305.02312  [pdf, other

    cs.CV

    AG3D: Learning to Generate 3D Avatars from 2D Image Collections

    Authors: Zijian Dong, Xu Chen, Jinlong Yang, Michael J. Black, Otmar Hilliges, Andreas Geiger

    Abstract: While progress in 2D generative models of human appearance has been rapid, many applications require 3D avatars that can be animated and rendered. Unfortunately, most existing methods for learning generative models of 3D humans with diverse shape and appearance require 3D training data, which is limited and expensive to acquire. The key to progress is hence to learn generative models of 3D avatars… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: Project Page: https://zj-dong.github.io/AG3D/

  38. arXiv:2305.00976  [pdf, other

    cs.CV cs.CL

    TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis

    Authors: Mathis Petrovich, Michael J. Black, Gül Varol

    Abstract: In this paper, we present TMR, a simple yet effective approach for text to 3D human motion retrieval. While previous work has only treated retrieval as a proxy evaluation metric, we tackle it as a standalone task. Our method extends the state-of-the-art text-to-motion synthesis model TEMOS, and incorporates a contrastive loss to better structure the cross-modal latent space. We show that maintaini… ▽ More

    Submitted 25 August, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: ICCV 2023 Camera Ready, project page: https://mathis.petrovich.fr/tmr/

  39. arXiv:2304.10528  [pdf, other

    cs.CV

    Generalizing Neural Human Fitting to Unseen Poses With Articulated SE(3) Equivariance

    Authors: Haiwen Feng, Peter Kulits, Shichen Liu, Michael J. Black, Victoria Abrevaya

    Abstract: We address the problem of fitting a parametric human body model (SMPL) to point cloud data. Optimization-based methods require careful initialization and are prone to becoming trapped in local optima. Learning-based methods address this but do not generalize well when the input pose is far from those seen during training. For rigid point clouds, remarkable generalization has been achieved by lever… ▽ More

    Submitted 19 September, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: Accepted at ICCV 2023 as an oral presentation. Project page: https://arteq.is.tue.mpg.de ; Update V2: Camera-Ready version, fix metric issues and numeric bug of ID performance

  40. arXiv:2304.10482  [pdf, other

    cs.CV cs.GR

    Reconstructing Signing Avatars From Video Using Linguistic Priors

    Authors: Maria-Paola Forte, Peter Kulits, Chun-Hao Huang, Vasileios Choutas, Dimitrios Tzionas, Katherine J. Kuchenbecker, Michael J. Black

    Abstract: Sign language (SL) is the primary method of communication for the 70 million Deaf people around the world. Video dictionaries of isolated signs are a core SL learning tool. Replacing these with 3D avatars can aid learning and enable AR/VR applications, improving access to technology and online media. However, little work has attempted to estimate expressive 3D avatars from SL video; occlusion, noi… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

  41. arXiv:2304.10417  [pdf, other

    cs.CV

    SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation

    Authors: Nikos Athanasiou, Mathis Petrovich, Michael J. Black, Gül Varol

    Abstract: Our goal is to synthesize 3D human motions given textual inputs describing simultaneous actions, for example 'waving hand' while 'walking' at the same time. We refer to generating such simultaneous movements as performing 'spatial compositions'. In contrast to temporal compositions that seek to transition from one action to another, spatial compositing requires understanding which body parts are i… ▽ More

    Submitted 26 March, 2024; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: Teaser Fixed

  42. arXiv:2303.18246  [pdf, other

    cs.CV cs.AI cs.GR

    3D Human Pose Estimation via Intuitive Physics

    Authors: Shashank Tripathi, Lea Müller, Chun-Hao P. Huang, Omid Taheri, Michael J. Black, Dimitrios Tzionas

    Abstract: Estimating 3D humans from images often produces implausible bodies that lean, float, or penetrate the floor. Such methods ignore the fact that bodies are typically supported by the scene. A physics engine can be used to enforce physical plausibility, but these are not differentiable, rely on unrealistic proxy bodies, and are difficult to integrate into existing optimization and learning frameworks… ▽ More

    Submitted 24 July, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

    Comments: Accepted in CVPR'23. Project page: https://ipman.is.tue.mpg.de

  43. arXiv:2303.08133  [pdf, other

    cs.GR cs.AI cs.CV cs.LG

    MeshDiffusion: Score-based Generative 3D Mesh Modeling

    Authors: Zhen Liu, Yao Feng, Michael J. Black, Derek Nowrouzezahrai, Liam Paull, Weiyang Liu

    Abstract: We consider the task of generating realistic 3D shapes, which is useful for a variety of applications such as automatic scene generation and physical simulation. Compared to other 3D representations like voxels and point clouds, meshes are more desirable in practice, because (1) they enable easy and arbitrary manipulation of shapes for relighting and simulation, and (2) they can fully leverage the… ▽ More

    Submitted 15 April, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: ICLR 2023 (Spotlight, Notable-top-25%)

  44. arXiv:2303.03373  [pdf, other

    cs.CV

    Detecting Human-Object Contact in Images

    Authors: Yixin Chen, Sai Kumar Dwivedi, Michael J. Black, Dimitrios Tzionas

    Abstract: Humans constantly contact objects to move and perform tasks. Thus, detecting human-object contact is important for building human-centered artificial intelligence. However, there exists no robust method to detect contact between the body and the scene from an image, and there exists no dataset to learn such a detector. We fill this gap with HOT ("Human-Object conTact"), a new dataset of human-obje… ▽ More

    Submitted 4 April, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted at CVPR 2023

  45. arXiv:2212.08377  [pdf, other

    cs.CV cs.GR

    PointAvatar: Deformable Point-based Head Avatars from Videos

    Authors: Yufeng Zheng, Wang Yifan, Gordon Wetzstein, Michael J. Black, Otmar Hilliges

    Abstract: The ability to create realistic, animatable and relightable head avatars from casual video sequences would open up wide ranging applications in communication and entertainment. Current methods either build on explicit 3D morphable meshes (3DMM) or exploit neural implicit representations. The former are limited by fixed topology, while the latter are non-trivial to deform and inefficient to render.… ▽ More

    Submitted 28 February, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: Project page: https://zhengyuf.github.io/PointAvatar/ Code base: https://github.com/zhengyuf/pointavatar

  46. arXiv:2212.07422  [pdf, other

    cs.CV cs.AI cs.GR

    ECON: Explicit Clothed humans Optimized via Normal integration

    Authors: Yuliang Xiu, Jinlong Yang, Xu Cao, Dimitrios Tzionas, Michael J. Black

    Abstract: The combination of deep learning, artist-curated scans, and Implicit Functions (IF), is enabling the creation of detailed, clothed, 3D humans from images. However, existing methods are far from perfect. IF-based methods recover free-form geometry, but produce disembodied limbs or degenerate shapes for novel poses or clothes. To increase robustness for these cases, existing work uses an explicit pa… ▽ More

    Submitted 23 March, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: Homepage: https://xiuyuliang.cn/econ Code: https://github.com/YuliangXiu/ECON

  47. arXiv:2212.07242  [pdf, other

    cs.CV

    HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics

    Authors: Artur Grigorev, Bernhard Thomaszewski, Michael J. Black, Otmar Hilliges

    Abstract: We propose a method that leverages graph neural networks, multi-level message passing, and unsupervised training to enable real-time prediction of realistic clothing dynamics. Whereas existing methods based on linear blend skinning must be trained for specific garments, our method is agnostic to body shape and applies to tight-fitting garments as well as loose, free-flowing clothing. Our method fu… ▽ More

    Submitted 16 June, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 16965-16974

  48. arXiv:2212.04420  [pdf, other

    cs.CV cs.GR

    Generating Holistic 3D Human Motion from Speech

    Authors: Hongwei Yi, Hualin Liang, Yifei Liu, Qiong Cao, Yandong Wen, Timo Bolkart, Dacheng Tao, Michael J. Black

    Abstract: This work addresses the problem of generating 3D holistic body motions from human speech. Given a speech recording, we synthesize sequences of 3D body poses, hand gestures, and facial expressions that are realistic and diverse. To achieve this, we first build a high-quality dataset of 3D holistic body meshes with synchronous speech. We then define a novel speech-to-motion generation framework in w… ▽ More

    Submitted 17 June, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: Project Webpage: https://talkshow.is.tue.mpg.de; CVPR2023

  49. arXiv:2212.04360  [pdf, other

    cs.CV cs.GR

    MIME: Human-Aware 3D Scene Generation

    Authors: Hongwei Yi, Chun-Hao P. Huang, Shashank Tripathi, Lea Hering, Justus Thies, Michael J. Black

    Abstract: Generating realistic 3D worlds occupied by moving humans has many applications in games, architecture, and synthetic data creation. But generating such scenes is expensive and labor intensive. Recent work generates human poses and motions given a 3D scene. Here, we take the opposite approach and generate 3D indoor scenes given 3D human motion. Such motions can come from archival motion capture or… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: Project Page: https://mime.is.tue.mpg.de

  50. arXiv:2211.15601  [pdf, other

    cs.CV

    Fast-SNARF: A Fast Deformer for Articulated Neural Fields

    Authors: Xu Chen, Tianjian Jiang, Jie Song, Max Rietmann, Andreas Geiger, Michael J. Black, Otmar Hilliges

    Abstract: Neural fields have revolutionized the area of 3D reconstruction and novel view synthesis of rigid scenes. A key challenge in making such methods applicable to articulated objects, such as the human body, is to model the deformation of 3D locations between the rest pose (a canonical space) and the deformed space. We propose a new articulation module for neural fields, Fast-SNARF, which finds accura… ▽ More

    Submitted 1 December, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: github page: https://github.com/xuchen-ethz/fast-snarf