research-article

Multi-view Shape Generation for a 3D Human-like Body

Authors:

Xiangyang XueAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 19, Issue 1

Article No.: 11, Pages 1 - 22

https://doi.org/10.1145/3514248

Published: 05 January 2023 Publication History

Abstract

Three-dimensional (3D) human-like body reconstruction via a single RGB image has attracted significant research attention recently. Most of the existing methods rely on the Skinned Multi-Person Linear model and thus can only predict unified human bodies. Moreover, meshes reconstructed by current methods sometimes perform well from a canonical view but not from other views, as the reconstruction process is commonly supervised by only a single view. To address these limitations, this article proposes a multi-view shape generation network for a 3D human-like body. Particularly, we propose a coarse-to-fine learning model that gradually deforms a template body toward the ground truth body. Our model utilizes the information of multi-view renderings and corresponding 3D vertex transformation as supervision. Such supervision will help to generate 3D bodies well aligned to all views. To accurately operate mesh deformation, a graph convolutional network structure is introduced to support the shape generation from 3D vertex representation. Additionally, a graph up-pooling operation is designed over the intermediate representations of the graph convolutional network, and thus our model can generate 3D shapes with higher resolution. Novel loss functions are employed to help optimize the whole multi-view generation model, resulting in smoother surfaces. In addition, two multi-view human body datasets are produced and contributed to the community. Extensive experiments conducted on the benchmark datasets demonstrate the efficacy of our model over the competitors.

References

[1]

Thiemo Alldieck, Marcus Magnor, Bharat Lal Bhatnagar, Christian Theobalt, and Gerard Pons-Moll. 2019. Learning to reconstruct people in clothing from a single RGB camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 1175–1186.

[2]

Rıza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. DensePose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR’19). 7297–7306.

[3]

Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).

Digital Library

[4]

Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. 2019. Multi-Garment net: Learning to dress 3D people from images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). IEEE, Los Alamitos, CA.

[5]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European Conference on Computer Vision. Springer, 561–578.

[6]

Federica Bogo, Javier Romero, Matthew Loper, and Michael J. Black. 2014. FAUST: Dataset and evaluation for 3D mesh registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 3794–3801.

Digital Library

[7]

Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel. 2008. Numerical Geometry of Non-rigid Shapes. Springer Science & Business Media.

Digital Library

[8]

Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. 2017. Geometric deep learning: Going beyond Euclidean data. IEEE Signal Processing Magazine 34, 4 (2017), 18–42.

[9]

Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).

[10]

Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh. 2019. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2019), 172–186.

Digital Library

[11]

Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, et al. 2015. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015).

[12]

Zhuwen Li Yanwei Fu Chao Wen, Yinda Zhang. 2019. Pixel2Mesh++: Multi-View 3D mesh generation via deformation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV’19).

[13]

Hongsuk Choi, Gyeongsik Moon, and Kyoung Mu Lee. 2020. Pose2Mesh: Graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In Proceedings of the European Conference on Computer Vision (ECCV’20).

Digital Library

[14]

Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and Silvio Savarese. 2016. 3D-R2N2: A unified approach for single and multi-view 3d object reconstruction. In European Conference on Computer Vision. Springer, 628–644.

[15]

Blender Online Community. 2018. Blender—A 3D Modelling and Rendering Package. Stichting Blender Foundation, Amsterdam. http://www.blender.org.

[16]

Enric Corona, Albert Pumarola, Guillem Alenya, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2021. SMPLicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11875–11885.

[17]

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3844–3852.

[18]

Haoqiang Fan, Hao Su, and Leonidas J. Guibas. 2017. A point set generation network for 3D object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 605–613.

[19]

The SAE Foundation. n.d. Civilian American and European Surface Anthropometry Resource Project—CAESAR. Retrieved February 25, 2022 from http://store.sae.org/caesar/.

[20]

Michelle Girvan and Mark E. J. Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821–7826.

[21]

John C. Gower. 1975. Generalized procrustes analysis. Psychometrika 40, 1 (1975), 33–51.

[22]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.

[23]

Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, and Bernt Schiele. 2016. DeeperCut: A deeper, stronger, and faster multi-person pose estimation model. In European Conference on Computer Vision. Springer, 34–50.

[24]

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2014), 1325–1339.

Digital Library

[25]

Sam Johnson and Mark Everingham. 2010. Clustered pose and nonlinear appearance models for human pose estimation. In Proceedings of the British Machine Vision Conference.

[26]

Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 7122–7131.

[27]

Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[28]

Nikos Kolotouros, Georgios Pavlakos, and Kostas Daniilidis. 2019. Convolutional mesh regression for single-image human shape reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 4501–4510.

[29]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.

Digital Library

[30]

Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, and Peter V. Gehler. 2017. Unite the people: Closing the loop between 3D and 2D human representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6050–6059.

[31]

Thomas Lewiner, Hélio Lopes, Antônio Wilson Vieira, and Geovan Tavares. 2003. Efficient implementation of Marching Cubes’ cases with topological guarantees. Journal of Graphics Tools 8, 2 (2003), 1–15.

[32]

Junbang Liang and Ming C. Lin. 2019. Shape-aware human pose and shape reconstruction using multi-view images. In Proceedings of the International Conference on Computer Vision (ICCV’19).

[33]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics 34, 6 (2015), 248.

Digital Library

[34]

William E. Lorensen and Harvey E. Cline. 1987. Marching Cubes: A high resolution 3D surface construction algorithm. ACM SIGGRAPH Computer Graphics 21, 4 (1987), 163–169.

Digital Library

[35]

Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision. Springer, 483–499.

[36]

Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In Proceedings of the International Conference on Machine Learning. 2014–2023.

[37]

Mohamed Omran, Christoph Lassner, Gerard Pons-Moll, Peter Gehler, and Bernt Schiele. 2018. Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In Proceedings of the 2018 International Conference on 3D Vision (3DV’18). IEEE, Los Alamitos, CA, 484–494.

[38]

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. 2019. Expressive body capture: 3D hands, face, and body from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).

[39]

Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018. Learning to estimate 3D human pose and shape from a single color image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR’18). 459–468.

[40]

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9054–9063.

[41]

Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter V. Gehler, and Bernt Schiele. 2016. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR’16). 4929–4937.

[42]

Albert Pumarola, Jordi Sanchez, Gary Choi, Alberto Sanfeliu, and Francesc Moreno-Noguer. 2019. 3DPeople: Modeling the geometry of dressed humans. In Proceedings of the International Conference on Computer Vision (ICCV’19).

[43]

Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. PIFu: Pixel-Aligned implicit function for high-resolution clothed human digitization. arXiv preprint arXiv:1905.05172 (2019).

[44]

Shunsuke Saito, Tomas Simon, Jason Saragih, and Hanbyul Joo. 2020. PIFuHD: Multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).

[45]

Leonid Sigal, Alexandru O. Balan, and Michael J. Black. 2010. HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision 87, 1-2 (2010), 4.

Digital Library

[46]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[47]

Hsiao-Yu Tung, Hsiao-Wei Tung, Ersin Yumer, and Katerina Fragkiadaki. 2017. Self-supervised learning of motion capture. In Advances in Neural Information Processing Systems. 5236–5246.

[48]

Gul Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, and Cordelia Schmid. 2018. BodyNet: Volumetric inference of 3D human body shapes. In Proceedings of the European Conference on Computer Vision (ECCV’18). 20–36.

Digital Library

[49]

Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. 2018. Pixel2Mesh: Generating 3D mesh models from single RGB images. In Proceedings of the European Conference on Computer Vision (ECCV’18). 52–67.

Digital Library

[50]

Yang Wang. 2021. Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (2021), 1–25.

Digital Library

[51]

Zhengyou Zhang. 1994. Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision 13, 2 (1994), 119–152.

Digital Library

[52]

Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, and Dimitris N. Metaxas. 2019. Semantic graph convolutional networks for 3D human pose regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3425–3435.

[53]

Zerong Zheng, Tao Yu, Yebin Liu, and Qionghai Dai. 2021. PaMIR: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence PP, 99 (2021), 1.

[54]

Hao Zhu, Xinxin Zuo, Sen Wang, Xun Cao, and Ruigang Yang. 2019. Detailed human shape estimation from a single image by hierarchical mesh deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 4491–4500.

Cited By

Cheng YYan YZhu WPan YPan BYang X(2024)Head3D: Complete 3D Head Generation via Tri-plane Feature DistillationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363571720:6(1-20)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3635717
Tian FKim S(2024)LEAPSE: Learning Environment Affordances for 3D Human Pose and Shape EstimationIEEE Transactions on Image Processing10.1109/TIP.2024.339371633(3285-3300)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3393716
Mu XZhang HShi JHou JMa JYang Y(2024)Fashion intelligence in the Metaverse: promise and future prospectsArtificial Intelligence Review10.1007/s10462-024-10703-857:3Online publication date: 20-Feb-2024
https://doi.org/10.1007/s10462-024-10703-8
Show More Cited By

Index Terms

Multi-view Shape Generation for a 3D Human-like Body
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        3D imaging

Recommendations

Multi-view Canonical Pose 3D Human Body Reconstruction Based on Volumetric TSDF
Computer Vision – ECCV 2022 Workshops
Abstract
In this report, we present our solution for track1, multi-view based 3D human body reconstruction, of the ECCV 2022 WCPA Challenge: From Face, Body and Fashion to 3D Virtual Avatars 1. We developed a variant network based on TetraTSDF to ...
Three-Dimensional Human Body Reconstruction Based on Volumetric Rendering Learning
CACML '24: Proceedings of the 2024 3rd Asia Conference on Algorithms, Computing and Machine Learning

Currently, three-dimensional human body reconstruction technology plays a significant role in various domains related to everyday life, such as virtual reality, augmented reality, and e-commerce. In previous studies, Neural Radiance Fields (NeRF) have ...
MANet: Multi-level Attention Network for 3D Human Shape and Pose Estimation
Advances in Computer Graphics
Abstract
Although there has been some progress in 3D human pose and shape estimation, accurately predicting complex human poses is still challenging. To tackle this issue and improve the accuracy of the human mesh reconstruction, we propose an end-to-end ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19, Issue 1

January 2023

505 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3572858

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2023

Online AM: 18 February 2022

Accepted: 28 January 2022

Revised: 26 January 2022

Received: 21 July 2021

Published in TOMM Volume 19, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

Shanghai Municipal Science and Technology Major Project
STCSM Project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
1,204
Total Downloads

Downloads (Last 12 months)371
Downloads (Last 6 weeks)24

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cheng YYan YZhu WPan YPan BYang X(2024)Head3D: Complete 3D Head Generation via Tri-plane Feature DistillationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363571720:6(1-20)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3635717
Tian FKim S(2024)LEAPSE: Learning Environment Affordances for 3D Human Pose and Shape EstimationIEEE Transactions on Image Processing10.1109/TIP.2024.339371633(3285-3300)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3393716
Mu XZhang HShi JHou JMa JYang Y(2024)Fashion intelligence in the Metaverse: promise and future prospectsArtificial Intelligence Review10.1007/s10462-024-10703-857:3Online publication date: 20-Feb-2024
https://doi.org/10.1007/s10462-024-10703-8
Pesavento MVolino MHilton A(2024)COSMU: Complete 3D Human Shape from Monocular Unconstrained ImagesComputer Vision – ECCV 202410.1007/978-3-031-72933-1_12(201-219)Online publication date: 3-Oct-2024
https://doi.org/10.1007/978-3-031-72933-1_12
Zhu JPeng BLi WShen HHuang QLei J(2023)Modeling Long-range Dependencies and Epipolar Geometry for Multi-view StereoACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359644519:6(1-17)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3596445
Zeng RSu MYu RWang X(2023)CD2: Fine-grained 3D Mesh Reconstruction with Twice Chamfer DistanceACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358269419:6(1-25)Online publication date: 31-May-2023
https://dl.acm.org/doi/10.1145/3582694
Pereira ACarvalho PPereira NViana PCôrte-Real L(2023)From a Visual Scene to a Virtual Representation: A Cross-Domain ReviewIEEE Access10.1109/ACCESS.2023.328349511(57916-57933)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3283495
Dong YYuan QPeng RWang SSun J(2023)An iterative 3D human body reconstruction method driven by personalized dimensional prior knowledgeApplied Intelligence10.1007/s10489-023-05214-y54:1(738-748)Online publication date: 19-Dec-2023
https://dl.acm.org/doi/10.1007/s10489-023-05214-y

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents