PaintHuman: Towards High-Fidelity Text-to-3D Human Texturing via Denoised Score Distillation

Authors

  • Jianhui Yu University of Sydney
  • Hao Zhu Shanghai AI Laboratory
  • Liming Jiang Nanyang Technological University
  • Chen Change Loy Nanyang Technological University
  • Weidong Cai University of Sydney
  • Wayne Wu Shanghai AI Laboratory

DOI:

https://doi.org/10.1609/aaai.v38i7.28504

Keywords:

CV: 3D Computer Vision, CV: Language and Vision, ML: Deep Generative Models & Autoencoders

Abstract

Recent advances in zero-shot text-to-3D human generation, which employ the human model prior (e.g., SMPL) or Score Distillation Sampling (SDS) with pre-trained text-to-image diffusion models, have been groundbreaking. However, SDS may provide inaccurate gradient directions under the weak diffusion guidance, as it tends to produce over-smoothed results and generate body textures that are inconsistent with the detailed mesh geometry. Therefore, directly leveraging existing strategies for high-fidelity text-to-3D human texturing is challenging. In this work, we propose a model called PaintHuman to addresses the challenges from two perspectives. We first propose a novel score function, Denoised Score Distillation (DSD), which directly modifies the SDS by introducing negative gradient components to iteratively correct the gradient direction and generate high-quality textures. In addition, we use the depth map as a geometric guide to ensure that the texture is semantically aligned to human mesh surfaces. To guarantee the quality of rendered results, we employ geometry-aware networks to predict surface materials and render realistic human textures. Extensive experiments, benchmarked against state-of-the-art (SoTA) methods, validate the efficacy of our approach.Project page: https://painthuman.github.io/.

Published

2024-03-24

How to Cite

Yu, J., Zhu, H., Jiang, L., Loy, C. C., Cai, W., & Wu, W. (2024). PaintHuman: Towards High-Fidelity Text-to-3D Human Texturing via Denoised Score Distillation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 6800-6807. https://doi.org/10.1609/aaai.v38i7.28504

Issue

Section

AAAI Technical Track on Computer Vision VI