Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multi-view Shape Generation for a 3D Human-like Body

Published: 05 January 2023 Publication History

Abstract

Three-dimensional (3D) human-like body reconstruction via a single RGB image has attracted significant research attention recently. Most of the existing methods rely on the Skinned Multi-Person Linear model and thus can only predict unified human bodies. Moreover, meshes reconstructed by current methods sometimes perform well from a canonical view but not from other views, as the reconstruction process is commonly supervised by only a single view. To address these limitations, this article proposes a multi-view shape generation network for a 3D human-like body. Particularly, we propose a coarse-to-fine learning model that gradually deforms a template body toward the ground truth body. Our model utilizes the information of multi-view renderings and corresponding 3D vertex transformation as supervision. Such supervision will help to generate 3D bodies well aligned to all views. To accurately operate mesh deformation, a graph convolutional network structure is introduced to support the shape generation from 3D vertex representation. Additionally, a graph up-pooling operation is designed over the intermediate representations of the graph convolutional network, and thus our model can generate 3D shapes with higher resolution. Novel loss functions are employed to help optimize the whole multi-view generation model, resulting in smoother surfaces. In addition, two multi-view human body datasets are produced and contributed to the community. Extensive experiments conducted on the benchmark datasets demonstrate the efficacy of our model over the competitors.

References

[1]
Thiemo Alldieck, Marcus Magnor, Bharat Lal Bhatnagar, Christian Theobalt, and Gerard Pons-Moll. 2019. Learning to reconstruct people in clothing from a single RGB camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 1175–1186.
[2]
Rıza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. DensePose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR’19). 7297–7306.
[3]
Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).
[4]
Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. 2019. Multi-Garment net: Learning to dress 3D people from images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). IEEE, Los Alamitos, CA.
[5]
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European Conference on Computer Vision. Springer, 561–578.
[6]
Federica Bogo, Javier Romero, Matthew Loper, and Michael J. Black. 2014. FAUST: Dataset and evaluation for 3D mesh registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 3794–3801.
[7]
Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel. 2008. Numerical Geometry of Non-rigid Shapes. Springer Science & Business Media.
[8]
Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. 2017. Geometric deep learning: Going beyond Euclidean data. IEEE Signal Processing Magazine 34, 4 (2017), 18–42.
[9]
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).
[10]
Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh. 2019. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2019), 172–186.
[11]
Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, et al. 2015. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015).
[12]
Zhuwen Li Yanwei Fu Chao Wen, Yinda Zhang. 2019. Pixel2Mesh++: Multi-View 3D mesh generation via deformation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV’19).
[13]
Hongsuk Choi, Gyeongsik Moon, and Kyoung Mu Lee. 2020. Pose2Mesh: Graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In Proceedings of the European Conference on Computer Vision (ECCV’20).
[14]
Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and Silvio Savarese. 2016. 3D-R2N2: A unified approach for single and multi-view 3d object reconstruction. In European Conference on Computer Vision. Springer, 628–644.
[15]
Blender Online Community. 2018. Blender—A 3D Modelling and Rendering Package. Stichting Blender Foundation, Amsterdam. http://www.blender.org.
[16]
Enric Corona, Albert Pumarola, Guillem Alenya, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2021. SMPLicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11875–11885.
[17]
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3844–3852.
[18]
Haoqiang Fan, Hao Su, and Leonidas J. Guibas. 2017. A point set generation network for 3D object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 605–613.
[19]
The SAE Foundation. n.d. Civilian American and European Surface Anthropometry Resource Project—CAESAR. Retrieved February 25, 2022 from http://store.sae.org/caesar/.
[20]
Michelle Girvan and Mark E. J. Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821–7826.
[21]
John C. Gower. 1975. Generalized procrustes analysis. Psychometrika 40, 1 (1975), 33–51.
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.
[23]
Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, and Bernt Schiele. 2016. DeeperCut: A deeper, stronger, and faster multi-person pose estimation model. In European Conference on Computer Vision. Springer, 34–50.
[24]
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2014), 1325–1339.
[25]
Sam Johnson and Mark Everingham. 2010. Clustered pose and nonlinear appearance models for human pose estimation. In Proceedings of the British Machine Vision Conference.
[26]
Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 7122–7131.
[27]
Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[28]
Nikos Kolotouros, Georgios Pavlakos, and Kostas Daniilidis. 2019. Convolutional mesh regression for single-image human shape reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 4501–4510.
[29]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.
[30]
Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, and Peter V. Gehler. 2017. Unite the people: Closing the loop between 3D and 2D human representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6050–6059.
[31]
Thomas Lewiner, Hélio Lopes, Antônio Wilson Vieira, and Geovan Tavares. 2003. Efficient implementation of Marching Cubes’ cases with topological guarantees. Journal of Graphics Tools 8, 2 (2003), 1–15.
[32]
Junbang Liang and Ming C. Lin. 2019. Shape-aware human pose and shape reconstruction using multi-view images. In Proceedings of the International Conference on Computer Vision (ICCV’19).
[33]
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics 34, 6 (2015), 248.
[34]
William E. Lorensen and Harvey E. Cline. 1987. Marching Cubes: A high resolution 3D surface construction algorithm. ACM SIGGRAPH Computer Graphics 21, 4 (1987), 163–169.
[35]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision. Springer, 483–499.
[36]
Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In Proceedings of the International Conference on Machine Learning. 2014–2023.
[37]
Mohamed Omran, Christoph Lassner, Gerard Pons-Moll, Peter Gehler, and Bernt Schiele. 2018. Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In Proceedings of the 2018 International Conference on 3D Vision (3DV’18). IEEE, Los Alamitos, CA, 484–494.
[38]
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. 2019. Expressive body capture: 3D hands, face, and body from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).
[39]
Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018. Learning to estimate 3D human pose and shape from a single color image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR’18). 459–468.
[40]
Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9054–9063.
[41]
Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter V. Gehler, and Bernt Schiele. 2016. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR’16). 4929–4937.
[42]
Albert Pumarola, Jordi Sanchez, Gary Choi, Alberto Sanfeliu, and Francesc Moreno-Noguer. 2019. 3DPeople: Modeling the geometry of dressed humans. In Proceedings of the International Conference on Computer Vision (ICCV’19).
[43]
Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. PIFu: Pixel-Aligned implicit function for high-resolution clothed human digitization. arXiv preprint arXiv:1905.05172 (2019).
[44]
Shunsuke Saito, Tomas Simon, Jason Saragih, and Hanbyul Joo. 2020. PIFuHD: Multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).
[45]
Leonid Sigal, Alexandru O. Balan, and Michael J. Black. 2010. HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision 87, 1-2 (2010), 4.
[46]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[47]
Hsiao-Yu Tung, Hsiao-Wei Tung, Ersin Yumer, and Katerina Fragkiadaki. 2017. Self-supervised learning of motion capture. In Advances in Neural Information Processing Systems. 5236–5246.
[48]
Gul Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, and Cordelia Schmid. 2018. BodyNet: Volumetric inference of 3D human body shapes. In Proceedings of the European Conference on Computer Vision (ECCV’18). 20–36.
[49]
Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. 2018. Pixel2Mesh: Generating 3D mesh models from single RGB images. In Proceedings of the European Conference on Computer Vision (ECCV’18). 52–67.
[50]
Yang Wang. 2021. Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (2021), 1–25.
[51]
Zhengyou Zhang. 1994. Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision 13, 2 (1994), 119–152.
[52]
Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, and Dimitris N. Metaxas. 2019. Semantic graph convolutional networks for 3D human pose regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3425–3435.
[53]
Zerong Zheng, Tao Yu, Yebin Liu, and Qionghai Dai. 2021. PaMIR: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence PP, 99 (2021), 1.
[54]
Hao Zhu, Xinxin Zuo, Sen Wang, Xun Cao, and Ruigang Yang. 2019. Detailed human shape estimation from a single image by hierarchical mesh deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 4491–4500.

Cited By

View all
  • (2024)Head3D: Complete 3D Head Generation via Tri-plane Feature DistillationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363571720:6(1-20)Online publication date: 8-Mar-2024
  • (2024)LEAPSE: Learning Environment Affordances for 3D Human Pose and Shape EstimationIEEE Transactions on Image Processing10.1109/TIP.2024.339371633(3285-3300)Online publication date: 6-May-2024
  • (2024)Fashion intelligence in the Metaverse: promise and future prospectsArtificial Intelligence Review10.1007/s10462-024-10703-857:3Online publication date: 20-Feb-2024
  • Show More Cited By

Index Terms

  1. Multi-view Shape Generation for a 3D Human-like Body

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1
    January 2023
    505 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3572858
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 January 2023
    Online AM: 18 February 2022
    Accepted: 28 January 2022
    Revised: 26 January 2022
    Received: 21 July 2021
    Published in TOMM Volume 19, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D reconstruction
    2. human body reconstruction
    3. multi-view stereo

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • Shanghai Municipal Science and Technology Major Project
    • STCSM Project

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)371
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Head3D: Complete 3D Head Generation via Tri-plane Feature DistillationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363571720:6(1-20)Online publication date: 8-Mar-2024
    • (2024)LEAPSE: Learning Environment Affordances for 3D Human Pose and Shape EstimationIEEE Transactions on Image Processing10.1109/TIP.2024.339371633(3285-3300)Online publication date: 6-May-2024
    • (2024)Fashion intelligence in the Metaverse: promise and future prospectsArtificial Intelligence Review10.1007/s10462-024-10703-857:3Online publication date: 20-Feb-2024
    • (2024)COSMU: Complete 3D Human Shape from Monocular Unconstrained ImagesComputer Vision – ECCV 202410.1007/978-3-031-72933-1_12(201-219)Online publication date: 3-Oct-2024
    • (2023)Modeling Long-range Dependencies and Epipolar Geometry for Multi-view StereoACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359644519:6(1-17)Online publication date: 12-Jul-2023
    • (2023)CD2: Fine-grained 3D Mesh Reconstruction with Twice Chamfer DistanceACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358269419:6(1-25)Online publication date: 31-May-2023
    • (2023)From a Visual Scene to a Virtual Representation: A Cross-Domain ReviewIEEE Access10.1109/ACCESS.2023.328349511(57916-57933)Online publication date: 2023
    • (2023)An iterative 3D human body reconstruction method driven by personalized dimensional prior knowledgeApplied Intelligence10.1007/s10489-023-05214-y54:1(738-748)Online publication date: 19-Dec-2023

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media