research-article

Recovering 3D Human Mesh From Monocular Images: A Survey

Authors:

Limin WangAuthors Info & Claims

IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 45, Issue 12

Pages 15406 - 15425

https://doi.org/10.1109/TPAMI.2023.3298850

Published: 26 July 2023 Publication History

Abstract

Estimating human pose and shape from monocular images is a long-standing problem in computer vision. Since the release of statistical body models, 3D human mesh recovery has been drawing broader attention. With the same goal of obtaining well-aligned and physically plausible mesh results, two paradigms have been developed to overcome challenges in the 2D-to-3D lifting process: i) an optimization-based paradigm, where different data terms and regularization terms are exploited as optimization objectives; and ii) a regression-based paradigm, where deep learning techniques are embraced to solve the problem in an end-to-end fashion. Meanwhile, continuous efforts are devoted to improving the quality of 3D mesh labels for a wide range of datasets. Though remarkable progress has been achieved in the past decade, the task is still challenging due to flexible body motions, diverse appearances, complex environments, and insufficient in-the-wild annotations. To the best of our knowledge, this is the first survey that focuses on the task of monocular 3D human mesh recovery. We start with the introduction of body models and then elaborate recovery frameworks and training objectives by providing in-depth analyses of their strengths and weaknesses. We also summarize datasets, evaluation metrics, and benchmark results. Open issues and future directions are discussed in the end, hoping to motivate researchers and facilitate their research in this area.

References

[1]

Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “OpenPose: Realtime multi-person 2D pose estimation using part affinity fields,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172–186, Jan. 2021.

Digital Library

[2]

H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, “RMPE: Regional multi-person pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2353–2362.

[3]

S. Kreiss, L. Bertoni, and A. Alahi, “OpenPifPaf: Composite fields for semantic keypoint detection and spatio-temporal association,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 8, pp. 13 498–13 511, Aug. 2022.

[4]

Q. Chen, T. Ge, Y. Xu, Z. Zhang, X. Yang, and K. Gai, “Semantic human matting,” in Proc. ACM Int. Conf. Multimedia, 2018, pp. 618–626.

[5]

J. Zhao, J. Li, Y. Cheng, T. Sim, S. Yan, and J. Feng, “Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing,” in Proc. ACM Int. Conf. Multimedia, 2018, pp. 792–800.

[6]

K. Grauman, G. Shakhnarovich, and T. Darrell, “Inferring 3D structure with a statistical image-based shape model,” in Proc. IEEE Int. Conf. Comput. Vis., 2003, pp. 641–647.

[7]

A. Agarwal and B. Triggs, “Recovering 3D human pose from monocular images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 1, pp. 44–58, Jan. 2006.

Digital Library

[8]

J. Martinez, R. Hossain, J. Romero, and J. J. Little, “A simple yet effective baseline for 3D human pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2659–2668.

[9]

G. Pavlakos, X. Zhou, K. G. Derpanis, and K. Daniilidis, “Coarse-to-fine volumetric prediction for single-image 3D human pose,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1263–1272.

[10]

X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei, “Integral human pose regression,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 536–553.

[11]

D. Mehta et al., “XNect: Real-time multi-person 3D motion capture with a single RGB camera,” ACM Trans. Graph., vol. 39, no. 4, 2020, Art. no.

Digital Library

[12]

P. Weinzaepfel, R. Brégier, H. Combaluzier, V. Leroy, and G. Rogez, “DOPE: Distillation of part experts for whole-body 3D pose estimation in the wild,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 380–397.

[13]

F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black, “Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image,” in Proc. Eur. Conf. Comput. Vis., Springer, 2016, pp. 561–578.

[14]

Y. Huang et al., “Towards accurate marker-less human shape and pose estimation over time,” in Proc. Int. Conf. 3D Vis., 2017, pp. 421–430.

[15]

A. Zanfir, E. Marinoiu, and C. Sminchisescu, “Monocular 3D pose and shape estimation of multiple people in natural scenes - The importance of multiple scene constraints,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2148–2157.

[16]

A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, “End-to-end recovery of human shape and pose,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7122–7131.

[17]

G. Pavlakos, L. Zhu, X. Zhou, and K. Daniilidis, “Learning to estimate 3D human pose and shape from a single color image,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 459–468.

[18]

M. Omran, C. Lassner, G. Pons-Moll, P. Gehler, and B. Schiele, “Neural body fitting: Unifying deep learning and model-based human pose and shape estimation,” in Proc. Int. Conf. 3D Vis., 2018, pp. 484–494.

[19]

H. Zhang et al., “PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11426–11436.

[20]

M. Kocabas, C.-H. P. Huang, O. Hilliges, and M. J. Black, “PARE: Part attention regressor for 3D human body estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11 127–11 137.

[21]

H. Joo, N. Neverova, and A. Vedaldi, “Exemplar fine-tuning for 3D human pose fitting towards in-the-wild 3D human pose estimation,” in Proc. Int. Conf. 3D Vis., 2021, pp. 42–52.

[22]

G. Pavlakos et al., “Expressive body capture: 3D hands, face, and body from a single image,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10 975–10 985.

[23]

V. Choutas, G. Pavlakos, T. Bolkart, D. Tzionas, and M. J. Black, “Monocular expressive body regression through body-driven attention,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 20–40.

[24]

Y. Feng, V. Choutas, T. Bolkart, D. Tzionas, and M. J. Black, “Collaborative regression of expressive bodies using moderation,” in Proc. Int. Conf. 3D Vis., 2021, pp. 792–804.

[25]

G. Moon, H. Choi, and K. M. Lee, “Accurate 3D hand pose estimation for whole-body 3D human mesh estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2022, pp. 2308–2317.

[26]

Y. Zhang, Z. Li, L. An, M. Li, T. Yu, and Y. Liu, “Lightweight multi-person total motion capture using sparse multi-view cameras,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 5560–5569.

[27]

M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “SMPL: A skinned multi-person linear model,” ACM Trans. Graph., vol. 34, no. 6, pp. 1–16, 2015.

Digital Library

[28]

T. Yu et al., “DoubleFusion: Real-time capture of human performances with inner body shapes from a single depth sensor,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7287–7296.

[29]

Z. Zheng, T. Yu, Y. Liu, and Q. Dai, “PaMIR: Parametric model-conditioned implicit representation for image-based human reconstruction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 3170–3184, Jun. 2022.

[30]

Y. Zheng et al., “DeepMultiCap: Performance capture of multiple characters using sparse multiview cameras,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 6239–6249.

[31]

K. Li et al., “Image-guided human reconstruction via multi-scale graph transformation networks,” IEEE Trans. Image Process., vol. 30, pp. 5239–5251, 2021.

[32]

Q. Feng, Y. Liu, Y.-K. Lai, J. Yang, and K. Li, “FOF: Learning Fourier occupancy field for monocular real-time human reconstruction,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2022, pp. 7397–7409.

[33]

Y. Xiu, J. Yang, D. Tzionas, and M. J. Black, “ICON: Implicit clothed humans obtained from normals,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 13 286–13 296.

[34]

Y. Xiu, J. Yang, X. Cao, D. Tzionas, and M. J. Black, “ECON: Explicit clothed humans optimized via normal integration,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 512–523.

[35]

S. Peng et al., “Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 9054–9063.

[36]

T. Hu, T. Yu, Z. Zheng, H. Zhang, Y. Liu, and M. Zwicker, “HVTR: Hybrid volumetric-textural rendering for human avatars,” in Proc. Int. Conf. 3D Vis., 2022, pp. 197–208.

[37]

Z. Huang, Y. Xu, C. Lassner, H. Li, and T. Tung, “ARCH: Animatable reconstruction of clothed humans,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3093–3102.

[38]

Q. Ma, J. Yang, S. Tang, and M. J. Black, “The power of points for modeling humans in clothing,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 10 974–10 984.

[39]

Z. Zheng, H. Huang, T. Yu, H. Zhang, Y. Guo, and Y. Liu, “Structured local radiance fields for human avatar modeling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 15 893–15 903.

[40]

Z. Zheng, X. Zhao, H. Zhang, B. Liu, and Y. Liu, “AvatarReX: Real-time expressive full-body avatars,” ACM Trans. Graph., vol. 42, no. 4, Aug. 2023, Art. no.

Digital Library

[41]

D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis, “SCAPE: Shape completion and animation of people,” ACM Trans. Graph., vol. 24, pp. 408–416, 2005.

Digital Library

[42]

C. Lassner, J. Romero, M. Kiefel, F. Bogo, M. J. Black, and P. V. Gehler, “Unite the people: Closing the loop between 3D and 2D human representations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 6050–6059.

[43]

H.-Y. F. Tung, H.-W. Tung, E. Yumer, and K. Fragkiadaki, “Self-supervised learning of motion capture,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2017, pp. 5236–5246.

[44]

N. Kolotouros, G. Pavlakos, M. J. Black, and K. Daniilidis, “Learning to reconstruct 3D human pose and shape via model-fitting in the loop,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 2252–2261.

[45]

L. Chen, S. Peng, and X. Zhou, “Towards efficient and photorealistic 3D human reconstruction: A brief survey,” Vis. Inform., vol. 5, no. 4, pp. 11–19, 2021.

[46]

A. Tewari et al., “State of the art on neural rendering,” Comput. Graph. Forum, vol. 39, pp. 701–727, 2020.

[47]

Y. Chen, Y. Tian, and M. He, “Monocular human pose estimation: A survey of deep learning-based methods,” Comput. Vis. Image Understanding, vol. 192, 2020, Art. no.

[48]

C. Zheng et al., “Deep learning-based human pose estimation: A survey,” 2020,.

[49]

W. Liu, Q. Bao, Y. Sun, and T. Mei, “Recent advances of monocular 2D and 3D human pose estimation: A deep learning perspective,” ACM Comput. Surv., vol. 55, no. 4, pp. 1–41, 2022.

Digital Library

[50]

H.-J. Lee and Z. Chen, “Determination of 3D human body postures from a single view,” Comput. Vis. Graph. Image Process., vol. 30, no. 2, pp. 148–168, 1985.

[51]

R. Nevatia and T. O. Binford, “Description and recognition of curved objects,” Artif. Intell., vol. 8, no. 1, pp. 77–98, 1977.

Digital Library

[52]

S. X. Ju, M. J. Black, and Y. Yacoob, “Cardboard people: A parameterized model of articulated image motion,” in Proc. 2nd Int. Conf. Autom. Face Gesture Recognit., 1996, pp. 38–44.

[53]

V. Blanz and T. Vetter, “A morphable model for the synthesis of 3D faces,” in Proc. 26th Annu. Conf. Comput. Graph. Interactive Techn., 1999, pp. 187–194.

[54]

D. Marr and H. K. Nishihara, “Representation and recognition of the spatial organization of three-dimensional shapes,” in Proc. Roy. Soc. London Ser. B Biol. Sci., vol. 200, no. 1140, pp. 269–294, 1978.

[55]

K. Rohr, “Towards model-based recognition of human movements in image sequences,” CVGIP: Image Understanding, vol. 59, no. 1, pp. 94–115, 1994.

Digital Library

[56]

S. Wachter and H.-H. Nagel, “Tracking of persons in monocular image sequences,” Comput. Vis. Image Understanding, vol. 74, no. 3, pp. 174–192, 1999.

Digital Library

[57]

H. Sidenbladh, M. J. Black, and D. J. Fleet, “Stochastic tracking of 3D human figures using 2D image motion,” in Proc. Eur. Conf. Comput. Vis., Springer, 2000, pp. 702–718.

[58]

L. Sigal, A. O. Balan, and M. J. Black, “HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion,” Int. J. Comput. Vis., vol. 87, no. 1/2, 2010, Art. no.

[59]

M. Wang, F. Qiu, W. Liu, C. Qian, X. Zhou, and L. Ma, “Monocular human pose and shape reconstruction using part differentiable rendering,” Comput. Graph. Forum, vol. 39, pp. 351–362, 2020.

[60]

A. Pentland and B. Horowitz, “Recovery of nonrigid motion and structure,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 7, pp. 730–742, Jul. 1991.

Digital Library

[61]

D. Metaxas and D. Terzopoulos, “Shape and nonrigid motion estimation through physics-based synthesis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 6, pp. 580–591, Jun. 1993.

Digital Library

[62]

D. M. Gavrila, Vision-Based 3-D Tracking of Humans in Action. College Park, MD, USA: Univ. Maryland, 1996.

[63]

C. Sminchisescu and B. Triggs, “Estimating articulated human motion with covariance scaled sampling,” Int. J. Robot. Res., vol. 22, no. 6, pp. 371–391, 2003.

[64]

R. Plänkers and P. Fua, “Tracking and modeling people in video sequences,” Comput. Vis. Image Understanding, vol. 81, no. 3, pp. 285–302, 2001.

Digital Library

[65]

L. Kakadiaris and D. Metaxas, “Model-based estimation of 3D human motion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 12, pp. 1453–1459, Dec. 2000.

Digital Library

[66]

G. Pons-Moll and B. Rosenhahn, “Model-based pose estimation,” in Visual Analysis of Humans. London, U.K.: Springer, 2011, pp. 139–170.

[67]

B. Allen, B. Curless, and Z. Popović, “The space of human body shapes: Reconstruction and parameterization from range scans,” ACM Trans. Graph., vol. 22, no. 3, pp. 587–594, 2003.

Digital Library

[68]

N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel, “A statistical model of human pose and body shape,” Comput. Graph. Forum, vol. 28, pp. 337–346, 2009.

[69]

Y. Chen, Z. Liu, and Z. Zhang, “Tensor-based human body modeling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 105–112.

[70]

O. Freifeld and M. J. Black, “Lie bodies: A manifold representation of 3D human shape,” in Proc. Eur. Conf. Comput. Vis., Springer, 2012, pp. 1–14.

[71]

D. A. Hirshberg, M. Loper, E. Rachlin, and M. J. Black, “Coregistration: Simultaneous alignment and modeling of articulated 3D shape,” in Proc. Eur. Conf. Comput. Vis., Springer, 2012, pp. 242–255.

[72]

G. Pons-Moll, J. Romero, N. Mahmood, and M. J. Black, “Dyna: A model of dynamic human shape in motion,” ACM Trans. Graph., vol. 34, no. 4, pp. 1–14, 2015.

Digital Library

[73]

B. Allen, B. Curless, Z. Popović, and A. Hertzmann, “Learning a correlated model of identity and pose-dependent body shape variation for real-time synthesis,” in Proc. ACM SIGGRAPH/Eurographics Symp. Comput. Animation, 2006, pp. 147–156.

[74]

N. Hasler, T. Thormählen, B. Rosenhahn, and H.-P. Seidel, “Learning skeletons for shape and pose,” in Proc. ACM SIGGRAPH Symp. Interactive 3D Graph. Games, 2010, pp. 23–30.

[75]

H. Wang, R. A. Güler, I. Kokkinos, G. Papandreou, and S. Zafeiriou, “BLSM: A bone-level skinned model of the human mesh,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 1–17.

[76]

S. Zuffi and M. J. Black, “The stitched puppet: A graphical model of 3D human shape and pose,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3537–3546.

[77]

A. Mohr and M. Gleicher, “Building efficient, accurate character skins from examples,” ACM Trans. Graph., vol. 22, no. 3, pp. 562–568, 2003.

Digital Library

[78]

T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero, “Learning a model of facial shape and expression from 2D scans,” ACM Trans. Graph., vol. 36, no. 6, pp. 194–1, 2017.

Digital Library

[79]

J. Romero, D. Tzionas, and M. J. Black, “Embodied hands: Modeling and capturing hands and bodies together,” ACM Trans. Graph., vol. 36, no. 6, pp. 1–17, 2017.

Digital Library

[80]

N. Hesse et al., “Learning an infant body model from RGB-D data for accurate full body motion analysis,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Interv., Springer, 2018, pp. 792–800.

[81]

I. Santesteban, E. Garces, M. A. Otaduy, and D. Casas, “SoftSMPL: Data-driven modeling of nonlinear soft-tissue dynamics for parametric humans,” Comput. Graph. Forum, vol. 39, pp. 65–75, 2020.

[82]

A. A. Osman, T. Bolkart, and M. J. Black, “STAR: Sparse trained articulated human body regressor,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 598–613.

[83]

B. Deng et al., “NASA: Neural articulated shape approximation,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 612–628.

[84]

M. Mihajlovic, Y. Zhang, M. J. Black, and S. Tang, “LEAP: Learning articulated occupancy of people,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 10 461–10 471.

[85]

X. Chen, Y. Zheng, M. J. Black, O. Hilliges, and A. Geiger, “SNARF: Differentiable forward skinning for animating non-rigid neural implicit shapes,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11 594–11 604.

[86]

M. Mihajlovic, S. Saito, A. Bansal, M. Zollhoefer, and S. Tang, “COAP: Compositional articulated occupancy of people,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 13 201–13 210.

[87]

X. Sun et al., “Learning semantic-aware disentangled representation for flexible 3D human body editing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 16985–16994.

[88]

H. Joo, T. Simon, and Y. Sheikh, “Total capture: A 3D deformation model for tracking faces, hands, and bodies,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 8320–8329.

[89]

C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou, “FaceWarehouse: A 3D facial expression database for visual computing,” IEEE Trans. Vis. Comput. Graphics, vol. 20, no. 3, pp. 413–425, Mar. 2014.

Digital Library

[90]

K. M. Robinette et al., “Civilian American and European surface anthropometry resource (CAESAR) final report,” US Air Force Res. Lab., Dayton, OH, USA, Tech. Rep., 2002.

[91]

A. A. Osman, T. Bolkart, D. Tzionas, and M. J. Black, “SUPR: A sparse unified part-based human representation,” in Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 568–585.

[92]

H. Xu, E. G. Bazavan, A. Zanfir, W. T. Freeman, R. Sukthankar, and C. Sminchisescu, “GHUM & GHUML: Generative 3D human shape and articulated pose models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6184–6193.

[93]

A. O. Balan, L. Sigal, M. J. Black, J. E. Davis, and H. W. Haussecker, “Detailed human shape and pose from images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2007, pp. 1–8.

[94]

M. Loper, N. Mahmood, and M. J. Black, “MoSh: Motion and shape capture from sparse markers,” ACM Trans. Graph., vol. 33, no. 6, pp. 1–13, 2014.

Digital Library

[95]

C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 7, pp. 1325–1339, Jul. 2014.

Digital Library

[96]

T. von Marcard, R. Henschel, M. J. Black, B. Rosenhahn, and G. Pons-Moll, “Recovering accurate 3D human pose in the wild using IMUs and a moving camera,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 601–617.

[97]

G. Varol et al., “Learning from synthetic humans,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 109–117.

[98]

N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black, “AMASS: Archive of motion capture as surface shapes,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 5442–5451.

[99]

M. Kocabas, N. Athanasiou, and M. J. Black, “VIBE: Video inference for human body pose and shape estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5253–5263.

[100]

A. O. Bălan and M. J. Black, “The naked truth: Estimating body shape under clothing,” in Proc. Eur. Conf. Comput. Vis., Springer, 2008, pp. 15–29.

[101]

P. Guan, A. Weiss, A. O. Balan, and M. J. Black, “Estimating human shape and pose from a single image,” in Proc. IEEE Int. Conf. Comput. Vis., 2009, pp. 1381–1388.

[102]

N. Hasler, H. Ackermann, B. Rosenhahn, T. Thormählen, and H.-P. Seidel, “Multilinear pose and body shape estimation of dressed subjects from image sets,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2010, pp. 1823–1830.

[103]

S. Zhou, H. Fu, L. Liu, D. Cohen-Or, and X. Han, “Parametric reshaping of human bodies in images,” ACM Trans. Graph., vol. 29, no. 4, pp. 1–10, 2010.

Digital Library

[104]

Carnegie Mellon University - CMU graphics lab - motion capture library, 2010. [Online]. Available: http://mocap.cs.cmu.edu/

[105]

D. Xiang, H. Joo, and Y. Sheikh, “Monocular total capture: Posing face, body, and hands in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10 965–10 974.

[106]

R. A. Güler and I. Kokkinos, “HoloPose: Holistic 3D human reconstruction in-the-wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10 884–10 894.

[107]

R. A. Güler, N. Neverova, and I. Kokkinos, “DensePose: Dense human pose estimation in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7297–7306.

[108]

J. Song, X. Chen, and O. Hilliges, “Human body model fitting by learned gradient descent,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 744–760.

[109]

U. Iqbal, K. Xie, Y. Guo, J. Kautz, and P. Molchanov, “KAMA: 3D keypoint aware body mesh articulation,” in Proc. Int. Conf. 3D Vis., 2021, pp. 689–699.

[110]

Z. Yu et al., “Skeleton2Mesh: Kinematics prior injected unsupervised human mesh recovery,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 8619–8629.

[111]

J. Li, C. Xu, Z. Chen, S. Bian, L. Yang, and C. Lu, “HybrIK: A hybrid analytical-neural inverse kinematics solution for 3D human pose and shape estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 3383–3393.

[112]

J. Li, S. Bian, Q. Liu, J. Tang, F. Wang, and C. Lu, “NIKI: Neural inverse kinematics with invertible neural networks for 3D human pose and shape estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 12 933–12 942.

[113]

K. Shetty et al., “PLIKS: A pseudo-linear inverse kinematic solver for 3D human body estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 574–584.

[114]

H. Zhang, J. Cao, G. Lu, W. Ouyang, and Z. Sun, “DaNet: Decompose-and-aggregate network for 3D human shape and pose estimation,” in Proc. ACM Int. Conf. Multimedia, 2019, pp. 935–944.

[115]

C. Rockwell and D. F. Fouhey, “Full-body awareness from partial observations,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 522–539.

[116]

A. Sengupta, I. Budvytis, and R. Cipolla, “Synthetic training for accurate 3D human pose and shape estimation in the wild,” in Proc. Brit. Mach. Vis. Conf., 2020, pp. 1–13.

[117]

X. Xu, H. Chen, F. Moreno-Noguer, L. A. Jeni, and F. De la Torre, “3D human shape and pose from a single low-resolution image with self-supervised learning,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 284–300.

[118]

A. Zanfir, E. G. Bazavan, H. Xu, W. T. Freeman, R. Sukthankar, and C. Sminchisescu, “Weakly supervised 3D human pose and shape reconstruction with normalizing flows,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 465–481.

[119]

M. Kocabas, C.-H. P. Huang, J. Tesch, L. Muller, O. Hilliges, and M. J. Black, “SPEC: Seeing people in the wild with an estimated camera,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11 035–11 045.

[120]

Z. Li, J. Liu, Z. Zhang, S. Xu, and Y. Yan, “CLIFF: Carrying location information in full frames into human pose and shape estimation,” in Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 590–606.

[121]

A. Kanazawa, J. Y. Zhang, P. Felsen, and J. Malik, “Learning 3D human dynamics from video,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5614–5623.

[122]

Y. Xu, S.-C. Zhu, and T. Tung, “DenseRaC: Joint 3D pose and shape estimation by dense render-and-compare,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 7760–7770.

[123]

Y. Sun, Y. Ye, W. Liu, W. Gao, Y. Fu, and T. Mei, “Human mesh recovery from monocular images via a skeleton-disentangled representation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 5349–5358.

[124]

G. Georgakis, R. Li, S. Karanam, T. Chen, J. Košecká, and Z. Wu, “Hierarchical kinematic human mesh recovery,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 768–784.

[125]

Y. Zhou, C. Barnes, J. Lu, J. Yang, and H. Li, “On the continuity of rotation representations in neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5745–5753.

[126]

H. Zhang, J. Cao, G. Lu, W. Ouyang, and Z. Sun, “Learning 3D human shape and pose from dense body parts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 5, pp. 2610–2627, May 2022.

[127]

Z. Luo, S. A. Golestaneh, and K. M. Kitani, “3D human motion estimation via motion compression and refinement,” in Proc. Asian Conf. Comput. Vis., 2020, pp. 324–340.

[128]

Y. Zhou, M. Habermann, I. Habibie, A. Tewari, C. Theobalt, and F. Xu, “Monocular real-time full body capture with inter-part correlations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 4811–4822.

[129]

H. Choi, G. Moon, J. Y. Chang, and K. M. Lee, “Beyond static features for temporally consistent 3D human pose and shape from a video,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 1964–1973.

[130]

G. Varol et al., “BodyNet: Volumetric inference of 3D human body shapes,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 20–36.

[131]

Z. Zheng, T. Yu, Y. Wei, Q. Dai, and Y. Liu, “DeepHuman: 3D human reconstruction from a single image,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 7738–7748.

[132]

N. Kolotouros, G. Pavlakos, and K. Daniilidis, “Convolutional mesh regression for single-image human shape reconstruction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4501–4510.

[133]

G. Moon and K. M. Lee, “I2L-MeshNet: Image-to-Lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 752–768.

[134]

K. Lin, L. Wang, and Z. Liu, “Mesh Graphormer,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 12919–12928.

[135]

K. Lin, L. Wang, and Z. Liu, “End-to-end human pose and mesh reconstruction with transformers,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 1954–1963.

[136]

T. Luan, Y. Wang, J. Zhang, Z. Wang, Z. Zhou, and Y. Qiao, “PC-HMR: Pose calibration for 3D human mesh recovery from 2D images/videos,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 2269–2276.

[137]

M. Zanfir, A. Zanfir, E. G. Bazavan, W. T. Freeman, R. Sukthankar, and C. Sminchisescu, “THUNDR: Transformer-based 3D human reconstruction with markers,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 12951–12960.

[138]

P. Yao, Z. Fang, F. Wu, Y. Feng, and J. Li, “DenseBody: Directly regressing dense 3D human pose and shape from a single color image,” 2019,.

[139]

W. Zeng, W. Ouyang, P. Luo, W. Liu, and X. Wang, “3D human mesh regression with dense correspondence,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 7054–7063.

[140]

T. Zhang, B. Huang, and Y. Wang, “Object-occluded human shape and pose estimation from a single color image,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 7376–7385.

[141]

B. Biggs, D. Novotny, S. Ehrhardt, H. Joo, B. Graham, and A. Vedaldi, “3D multi-bodies: Fitting sets of plausible 3D human models to ambiguous image data,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2020, Art. no.

[142]

A. Sengupta, I. Budvytis, and R. Cipolla, “Probabilistic 3D human shape and pose estimation from multiple unconstrained images in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 16 094–16 104.

[143]

N. Kolotouros, G. Pavlakos, D. Jayaraman, and K. Daniilidis, “Probabilistic modeling for human mesh recovery,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11 605–11 614.

[144]

A. Sengupta, I. Budvytis, and R. Cipolla, “Hierarchical kinematic probability distributions for 3D human shape and pose estimation from images in the wild,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11 219–11 229.

[145]

Q. Fang, K. Chen, Y. Fan, Q. Shuai, J. Li, and W. Zhang, “Learning analytical posterior probability for human mesh recovery,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 8781–8791.

[146]

A. Sengupta, I. Budvytis, and R. Cipolla, “HuManiFlow: Ancestor-conditioned normalising flows on SO (3) manifolds for human pose and shape distribution estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 4779–4789.

[147]

N. Rueegg, C. Lassner, M. Black, and K. Schindler, “Chained representation cycling: Learning to estimate 3D human pose and shape by cycling between representations,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 5561–5569.

[148]

A. Zanfir, E. G. Bazavan, M. Zanfir, W. T. Freeman, R. Sukthankar, and C. Sminchisescu, “Neural descent for visual 3D human pose and shape,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 14 484–14 493.

[149]

C. Doersch and A. Zisserman, “Sim2real transfer learning for 3D human pose estimation: Motion to the rescue,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2019, pp. 12 949–12 961.

[150]

H. Choi, G. Moon, and K. M. Lee, “Pose2Mesh: Graph convolutional network for 3D human pose and mesh recovery from a 2D human pose,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 769–787.

[151]

Z. Li, B. Xu, H. Huang, C. Lu, and Y. Guo, “Deep two-stream video inference for human body pose and shape estimation,” in Proc. Winter Conf. Appl. Comput. Vis., 2022, pp. 430–439.

[152]

X. Gong et al., “Self-supervised human mesh recovery with cross-representation alignment,” in Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 212–230.

[153]

H. Choi, G. Moon, J. Park, and K. M. Lee, “Learning to estimate robust 3D human mesh from in-the-wild crowded scenes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 1465–1474.

[154]

G. Pavlakos, J. Malik, and A. Kanazawa, “Human mesh recovery from multiple shots,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 1485–1495.

[155]

Y. Rong, Z. Liu, C. Li, K. Cao, and C. C. Loy, “Delving deep into hybrid annotations for 3D human recovery in the wild,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 5340–5348.

[156]

S. K. Dwivedi, N. Athanasiou, M. Kocabas, and M. J. Black, “Learning to regress bodies from images using differentiable semantic rendering,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11 250–11 259.

[157]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.

[158]

J. Wang et al., “Deep high-resolution representation learning for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 10, pp. 3349–3364, Oct. 2021.

[159]

H. Cho, Y. Cho, J. Ahn, and J. Kim, “Implicit 3D human mesh recovery using consistency with pose and shape from unseen-view,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 21 148–21 158.

[160]

S. Goel, G. Pavlakos, J. Rajasegaran, A. Kanazawa, and J. Malik, “Humans in 4D: Reconstructing and tracking humans with transformers,” 2023,.

[161]

A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Representations, 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy

[162]

J. Cho, K. Youwang, and T.-H. Oh, “Cross-attention of disentangled modalities for 3D human mesh recovery with transformers,” in Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 342–359.

[163]

C. Zheng, X. Liu, G.-J. Qi, and C. Chen, “POTTER: Pooling attention transformer for efficient human mesh recovery,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 1611–1620.

[164]

J. Kim, M.-G. Gwon, H. Park, H. Kwon, G.-M. Um, and W. Kim, “Sampling is matter: Point-guided 3D human mesh reconstruction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 12 880–12 889.

[165]

Y. Yoshiyasu, “Deformable mesh transformer for 3D human mesh recovery,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 17 006–17 015.

[166]

V. Choutas, L. Müller, C.-H. P. Huang, S. Tang, D. Tzionas, and M. J. Black, “Accurate 3D body shape regression using metric and semantic attributes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 2718–2728.

[167]

X. Ma, J. Su, C. Wang, W. Zhu, and Y. Wang, “3D human mesh estimation from virtual markers,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 534–543.

[168]

T. Fan, K. V. Alwala, D. Xiang, W. Xu, T. Murphey, and M. Mukadam, “Revitalizing optimization for 3D human pose and shape estimation: A sparse constrained formulation,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11437–11446.

[169]

H. Zhang et al., “PyMAF-X: Towards well-aligned full-body model regression from monocular images,” IEEE Trans. Pattern Anal. Mach. Intell., early access, May 01, 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10113183

Digital Library

[170]

Z. Wang, J. Yang, and C. Fowlkes, “The best of both worlds: Combining model-based and nonparametric approaches for 3D human body estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2022, pp. 2318–2327.

[171]

W. Jiang, N. Kolotouros, G. Pavlakos, X. Zhou, and K. Daniilidis, “Coherent reconstruction of multiple humans from a single image,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5579–5588.

[172]

N. Ugrinovic, A. Ruiz, A. Agudo, A. Sanfeliu, and F. Moreno-Noguer, “Body size and depth disambiguation in multi-person reconstruction from single images,” in Proc. Int. Conf. 3D Vis., 2021, pp. 53–63.

[173]

M. Fieraru, M. Zanfir, T. Szente, E. Bazavan, V. Olaru, and C. Sminchisescu, “REMIPS: Physically consistent 3D reconstruction of multiple interacting people under weak supervision,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2021, pp. 19 385–19 397.

[174]

J. Cha, M. Saqlain, G. Kim, M. Shin, and S. Baek, “Multi-person 3D pose and shape estimation via inverse kinematics and refinement,” in Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 660–677.

[175]

R. Khirodkar, S. Tripathi, and K. Kitani, “Occluded human mesh recovery,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 1715–1725.

[176]

A. Zanfir, E. Marinoiu, M. Zanfir, A.-I. Popa, and C. Sminchisescu, “Deep network for the integrated 3D sensing of multiple people in natural images,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2018, pp. 8410–8419.

[177]

Y. Sun, Q. Bao, W. Liu, Y. Fu, M. J. Black, and T. Mei, “Monocular, one-stage, regression of multiple 3D people,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11 179–11 188.

[178]

Y. Sun, W. Liu, Q. Bao, Y. Fu, T. Mei, and M. J. Black, “Putting people in their place: Monocular regression of 3D people in depth,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 13233–13242.

[179]

Z. Qiu et al., “PSVT: End-to-end multi-person 3D pose and shape estimation with progressive video transformers,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 21 254–21 263.

[180]

A. Arnab, C. Doersch, and A. Zisserman, “Exploiting temporal context for 3D human pose estimation in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3395–3404.

[181]

G.-H. Lee and S.-W. Lee, “Uncertainty-aware human mesh recovery from video by learning part-based 3D dynamics,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 12 375–12 384.

[182]

Z. Wan, Z. Li, M. Tian, J. Liu, S. Yi, and H. Li, “Encoder-decoder with multi-level attention for 3D human shape and pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 13 033–13 042.

[183]

Y. Yuan, S.-E. Wei, T. Simon, K. Kitani, and J. Saragih, “SimPoE: Simulated character control for 3D human pose estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 7159–7169.

[184]

W.-L. Wei, J.-C. Lin, T.-L. Liu, and H.-Y. M. Liao, “Capturing humans in motion: Temporal-attentive 3D human pose and shape estimation from monocular video,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 13 211–13 220.

[185]

J. Rajasegaran, G. Pavlakos, A. Kanazawa, and J. Malik, “Tracking people with 3D representations,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2021, pp. 23703–23713.

[186]

Y. Yuan, U. Iqbal, P. Molchanov, K. Kitani, and J. Kautz, “GLAMR: Global occlusion-aware human mesh recovery with dynamic cameras,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 11 038–11 049.

[187]

L. Sigal, A. Balan, and M. Black, “Combined discriminative and generative articulated pose and non-rigid shape estimation,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2007, pp. 1337–1344.

[188]

M. Hassan, V. Choutas, D. Tzionas, and M. J. Black, “Resolving 3D human pose ambiguities with 3D scene constraints,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 2282–2292.

[189]

M. Shi et al., “MotioNet: 3D human motion reconstruction from monocular video with skeleton consistency,” ACM Trans. Graph., vol. 40, no. 1, pp. 1–15, 2020.

Digital Library

[190]

S. Zhang, Y. Zhang, F. Bogo, M. Pollefeys, and S. Tang, “Learning motion priors for 2D human body capture in 3D scenes,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11 343–11 353.

[191]

L. Müller, A. A. Osman, S. Tang, C.-H. P. Huang, and M. J. Black, “On self-contact and human pose,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 9990–9999.

[192]

D. Rempe, L. J. Guibas, A. Hertzmann, B. Russell, R. Villegas, and J. Yang, “Contact and human dynamics from monocular video,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 71–87.

[193]

D. Rempe, T. Birdal, A. Hertzmann, J. Yang, S. Sridhar, and L. J. Guibas, “HuMoR: 3D human motion model for robust pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11468–11479.

[194]

I. Akhter and M. J. Black, “Pose-conditioned joint angle limits for 3D human pose reconstruction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 1446–1455.

[195]

J. Zhang, D. Yu, J. H. Liew, X. Nie, and J. Feng, “Body meshes as points,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 546–556.

[196]

S. Baek, K. I. Kim, and T.-K. Kim, “Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 1067–1076.

[197]

A. Boukhayma, R. D. Bem, and P. H. Torr, “3D hand shape and pose from images in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10 843–10 852.

[198]

S. Hampali, M. Rad, M. Oberweger, and V. Lepetit, “HOnnotate: A method for 3D annotation of hand and object poses,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3193–3203.

[199]

Y. Hasson et al., “Learning joint reconstruction of hands and manipulated objects,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 11 807–11 816.

[200]

U. Iqbal, P. Molchanov, T. Breuel, J. Gall, and J. Kautz, “Hand pose estimation via latent 2.5D heatmap regression,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 125–143.

[201]

D. Kulon, H. Wang, R. A. Güler, M. M. Bronstein, and S. Zafeiriou, “Single image 3D hand reconstruction with mesh convolutions,” in Proc. Brit. Mach. Vis. Conf., 2019, pp. 1–14.

[202]

F. Mueller et al., “GANerated hands for real-time 3D hand tracking from monocular RGB,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 49–59.

[203]

B. Tekin, F. Bogo, and M. Pollefeys, “H+O: Unified egocentric recognition of 3D hand-object poses and interactions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4511–4520.

[204]

C. Zimmermann and T. Brox, “Learning to estimate 3D hand pose from single RGB images,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 4913–4921.

[205]

X. Zhang, Q. Li, H. Mo, W. Zhang, and W. Zheng, “End-to-end hand mesh recovery from a monocular RGB image,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 2354–2364.

[206]

L. Ge et al., “3D hand shape and pose estimation from a single RGB image,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10 833–10 842.

[207]

D. Kulon, R. A. Güler, I. Kokkinos, M. M. Bronstein, and S. Zafeiriou, “Weakly-supervised mesh-convolutional hand reconstruction in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4989–4999.

[208]

J. Park, Y. Oh, G. Moon, H. Choi, and K. M. Lee, “HandOccNet: Occlusion-robust 3D hand mesh estimation network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 1496–1505.

[209]

G. Moon, S.-I. Yu, H. Wen, T. Shiratori, and K. M. Lee, “InterHand2.6M: A dataset and baseline for 3D interacting hand pose estimation from a single RGB image,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 548–564.

[210]

J. Wang et al., “RGB2Hands: Real-time tracking of 3D hand interactions from monocular RGB video,” ACM Trans. Graph., vol. 39, no. 6, pp. 1–16, 2020.

Digital Library

[211]

B. Zhang et al., “Interacting two-hand 3D pose and shape reconstruction from single color image,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11 354–11 363.

[212]

M. Li et al., “Interacting attention graph for single image two-hand reconstruction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 2751–2760.

[213]

C. Wang, F. Zhu, and S. Wen, “MeMaHand: Exploiting mesh-mano interaction for single image two-hand reconstruction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 564–573.

[214]

J. Lee, M. Sung, H. Choi, and T.-K. Kim, “Im2Hands: Learning attentive implicit representation of interacting two-hand shapes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 21 169–21 178.

[215]

Z. Yu, S. Huang, C. Fang, T. P. Breckon, and J. Wang, “ACR: Attention collaboration-based regressor for arbitrary two-hand reconstruction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 12 955–12 964.

[216]

G. Moon, “Bringing inputs to shared domains for 3D interacting hands recovery in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 17 028–17 037.

[217]

T. Chatzis, A. Stergioulas, D. Konstantinidis, K. Dimitropoulos, and P. Daras, “A comprehensive study on deep learning-based 3D hand pose estimation methods,” Appl. Sci., vol. 10, no. 19, 2020, Art. no.

[218]

L. Huang, B. Zhang, Z. Guo, Y. Xiao, Z. Cao, and J. Yuan, “Survey on depth and RGB image-based 3D hand shape and pose estimation,” Virtual Reality Intell. Hardware, vol. 3, no. 3, pp. 207–234, 2021.

[219]

O. Aldrian and W. A. Smith, “Inverse rendering of faces with a 3D morphable model,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 5, pp. 1080–1093, May 2013.

Digital Library

[220]

T. Vetter and V. Blanz, “Estimating coloured 3D face models from single images: An example based approach,” in Proc. Eur. Conf. Comput. Vis., 1998, pp. 499–513.

[221]

J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner, “Face2Face: Real-time face capture and reenactment of RGB videos,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2387–2395.

[222]

Y. Feng, F. Wu, X. Shao, Y. Wang, and X. Zhou, “Joint 3D face reconstruction and dense alignment with position map regression network,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 557–574.

[223]

A. S. Jackson, A. Bulat, V. Argyriou, and G. Tzimiropoulos, “Large pose 3D face reconstruction from a single image via direct volumetric CNN regression,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1031–1039.

[224]

S. Sanyal, T. Bolkart, H. Feng, and M. J. Black, “Learning to regress 3D face shape and expression from an image without 3D supervision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7763–7772.

[225]

A. Tewari et al., “Self-supervised multi-level face model learning for monocular reconstruction at over 250 Hz,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2549–2559.

[226]

A. Tewari et al., “MoFA: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 3735–3744.

[227]

Y. Deng, J. Yang, S. Xu, D. Chen, Y. Jia, and X. Tong, “Accurate 3D face reconstruction with weakly-supervised learning: From single image to image set,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2019, pp. 285–295.

[228]

L. Tran, F. Liu, and X. Liu, “Towards high-fidelity nonlinear 3D face morphable model,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 1126–1135.

[229]

Y. Feng, H. Feng, M. J. Black, and T. Bolkart, “Learning an animatable detailed 3D face model from in-the-wild images,” ACM Trans. Graph., vol. 40, no. 4, pp. 88:1–88:13, 2021.

[230]

L. Wang, Z. Chen, T. Yu, C. Ma, L. Li, and Y. Liu, “FaceVerse: A fine-grained and detail-controllable 3D face morphable model from a hybrid dataset,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 20301–20310.

[231]

W. Zielonka, T. Bolkart, and J. Thies, “Towards metrical reconstruction of human faces,” in Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 250–269.

[232]

M. M. Loper and M. J. Black, “OpenDR: An approximate differentiable renderer,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 154–169.

[233]

N. Ravi et al., “Accelerating 3D deep learning with PyTorch3D,” 2020,.

[234]

K. Genova, F. Cole, A. Maschinot, A. Sarna, D. Vlasic, and W. T. Freeman, “Unsupervised training for 3D morphable model regression,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 8377–8386.

[235]

Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “VGGFace2: A dataset for recognising faces across pose and age,” in Proc. Int. Conf. Autom. Face Gesture Recognit., 2018, pp. 67–74.

[236]

B. Egger et al., “3D morphable face models - past, present and future,” ACM Trans. Graph., vol. 39, no. 5, pp. 157:1–157:38, 2020.

[237]

H. Yi et al., “Generating holistic 3D human motion from speech,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 469–480.

[238]

N. Zioulis and J. F. O’Brien, “KBody: Towards general, robust, and aligned monocular whole-body estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2023, pp. 6214–6224.

[239]

T. Simon, H. Joo, I. Matthews, and Y. Sheikh, “Hand keypoint detection in single images using multiview bootstrapping,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 4645–4653.

[240]

Y. Rong, T. Shiratori, and H. Joo, “FrankMocap: A monocular 3D whole-body pose estimation system via regression and integration,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops, 2021, pp. 1749–1759.

[241]

J. Li, S. Bian, C. Xu, Z. Chen, L. Yang, and C. Lu, “HybrIK-X: Hybrid analytical-neural inverse kinematics for whole-body mesh recovery,” 2023,.

[242]

Y. Sun, T. Huang, Q. Bao, W. Liu, G. Wenpeng, and Y. Fu, “Learning monocular mesh recovery of multiple body parts via synthesis,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2022, pp. 2669–2673.

[243]

J. Lin, A. Zeng, H. Wang, L. Zhang, and Y. Li, “One-stage 3D whole-body mesh recovery with component aware transformer,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 21 159–21 168.

[244]

M.-P. Forte et al., “Reconstructing signing avatars from video using linguistic priors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 12 791–12 801.

[245]

J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” 2018,.

[246]

K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2961–2969.

[247]

H. Wen et al., “Crowd3D: Towards hundreds of people reconstruction from a single image,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 8937–8946.

[248]

B. Zhang, K. Ma, S. Wu, and Z. Yuan, “Two-stage co-segmentation network based on discriminative representation for recovering human mesh from videos,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 5662–5670.

[249]

X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7794–7803.

[250]

A. Vaswani et al., “Attention is all you need,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2017, pp. 5998–6008.

[251]

X. Shen, Z. Yang, X. Wang, J. Ma, C. Zhou, and Y. Yang, “Global-to-local modeling for video-based 3D human pose and shape estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 8887–8896.

[252]

S. Guan, J. Xu, Y. Wang, B. Ni, and X. Yang, “Bilevel online adaptation for out-of-domain human mesh reconstruction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 10 472–10 481.

[253]

S. Tripathi, S. Ranade, A. Tyagi, and A. Agrawal, “PoseNet3D: Learning temporally consistent 3D human pose via knowledge distillation,” in Proc. Int. Conf. 3D Vis., 2020, pp. 311–321.

[254]

V. Ye, G. Pavlakos, J. Malik, and A. Kanazawa, “Decoupling human and camera motion from videos in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 21 222–21 232.

[255]

Y. Sun, Q. Bao, W. Liu, T. Mei, and M. J. Black, “TRACE: 5D temporal regression of avatars with dynamic cameras in 3D environments,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 8856–8866.

[256]

J. Li, S. Bian, C. Xu, G. Liu, G. Yu, and C. Lu, “D&D: Learning human dynamics from dynamic camera,” in Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 479–496.

[257]

Z. Weng and S. Yeung, “Holistic 3D human and scene mesh estimation from single view images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 334–343.

[258]

J. Y. Zhang, S. Pepose, H. Joo, D. Ramanan, J. Malik, and A. Kanazawa, “Perceiving 3D human-object spatial arrangements from a single image in the wild,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 34–51.

[259]

X. Xie, B. L. Bhatnagar, and G. Pons-Moll, “CHORE: Contact, human and object reconstruction from a single RGB image,” in Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 125–145.

[260]

H. Yi et al., “Human-aware object placement for visual environment reconstruction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 3959–3970.

[261]

Z. Luo, S. Iwase, Y. Yuan, and K. M. Kitani, “Embodied scene-aware human pose estimation,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2022, pp. 6815–6828.

[262]

Z. Shen, Z. Cen, S. Peng, Q. Shuai, H. Bao, and X. Zhou, “Learning human mesh recovery in 3D scenes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 17 038–17 047.

[263]

I. Kissos, L. Fritz, M. Goldman, O. Meir, E. Oks, and M. Kliger, “Beyond weak perspective for monocular 3D human pose estimation,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 541–554.

[264]

E. Gärtner, M. Andriluka, E. Coumans, and C. Sminchisescu, “Differentiable dynamics for articulated 3D human motion reconstruction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 13 190–13 200.

[265]

E. Gärtner, M. Andriluka, H. Xu, and C. Sminchisescu, “Trajectory optimization for physics-based reconstruction of 3D human pose from monocular video,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 13 106–13 115.

[266]

K. Xie, T. Wang, U. Iqbal, Y. Guo, S. Fidler, and F. Shkurti, “Physics-based human motion estimation and synthesis from videos,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 11 532–11 541.

[267]

B. Huang, L. Pan, Y. Yang, J. Ju, and Y. Wang, “Neural MoCon: Neural motion control for physically plausible human motion capture,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 6417–6426.

[268]

S. Tripathi, L. Müller, C.-H. P. Huang, O. Taheri, M. J. Black, and D. Tzionas, “3D human pose estimation via intuitive physics,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 4713–4725.

[269]

M. Fieraru, M. Zanfir, E. Oneata, A.-I. Popa, V. Olaru, and C. Sminchisescu, “Learning complex 3D human self-contact,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 1343–1351.

[270]

M. Teschner et al., “Collision detection for deformable objects,” Comput. Graph. Forum, vol. 24, no. 1, pp. 61–81, 2005.

[271]

I. Goodfellow et al., “Generative adversarial nets,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2014, pp. 2672–2680.

[272]

D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in Proc. Int. Conf. Learn. Representations, 2014. [Online]. Available: https://openreview.net/forum?id=33X9fd2-9FyZd

[273]

D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 1530–1538.

[274]

Y. Rong, Z. Liu, and C. C. Loy, “Chasing the tail in monocular 3D human reconstruction with prototype memory,” IEEE Trans. Image Process., vol. 31, pp. 2907–2919, 2022.

[275]

A. Davydov, A. Remizova, V. Constantin, S. Honari, M. Salzmann, and P. Fua, “Adversarial parametric pose prior,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 10 997–11 005.

[276]

L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real NVP,” in Proc. Int. Conf. Learn. Representations, 2017. [Online]. Available: https://openreview.net/forum?id=HkpbnH9lx

[277]

M. Kaufmann, E. Aksan, J. Song, F. Pece, R. Ziegler, and O. Hilliges, “Convolutional autoencoders for human motion infilling,” in Proc. Int. Conf. 3D Vis., 2020, pp. 918–927.

[278]

Y. He et al., “ChallenCap: Monocular 3D capture of challenging human performances using multi-modal references,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 11 400–11 411.

[279]

J. Li et al., “Task-generic hierarchical human motion prior using VAEs,” in Proc. Int. Conf. 3D Vis., 2021, pp. 771–781.

[280]

J. Xu et al., “Exploring versatile prior for human motion via motion frequency guidance,” in Proc. Int. Conf. 3D Vis., 2021, pp. 606–616.

[281]

A. Zeng, L. Yang, X. Ju, J. Li, J. Wang, and Q. Xu, “SmoothNet: A plug-and-play network for refining human poses in videos,” in Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 625–642.

[282]

S. Zhang, Y. Zhang, Q. Ma, M. J. Black, and S. Tang, “PLACE: Proximity learning of articulation and contact in 3D environments,” in Proc. Int. Conf. 3D Vis., 2020, pp. 642–651.

[283]

Y. Zhang, M. Hassan, H. Neumann, M. J. Black, and S. Tang, “Generating 3D people in scenes without people,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6194–6204.

[284]

M. Liu, D. Yang, Y. Zhang, Z. Cui, J. M. Rehg, and S. Tang, “4D human body capture from egocentric video via 3D scene grounding,” in Proc. Int. Conf. 3D Vis., 2021, pp. 930–939.

[285]

C.-H. Chen, A. Tyagi, A. Agrawal, D. Drover, S. Stojanov, and J. M. Rehg, “Unsupervised 3D pose estimation with geometric self-supervision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5714–5724.

[286]

H. Rhodin, M. Salzmann, and P. Fua, “Unsupervised geometry-aware representation for 3D human pose estimation,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 750–767.

[287]

J. N. Kundu, M. Rakesh, V. Jampani, R. M. Venkatesh, and R. Venkatesh Babu, “Appearance consensus driven self-supervised human mesh recovery,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 794–812.

[288]

W.-S. Zheng, S. Gong, and T. Xiang, “Associating groups of people,” in Proc. Brit. Mach. Vis. Conf., 2009, pp. 1–11.

[289]

G. Lisanti, N. Martinel, A. Del Bimbo, and G. Luca Foresti, “Group re-identification via unsupervised transfer of sparse features encoding,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2449–2458.

[290]

W. Xu et al., “MonoPerfCap: Human performance capture from monocular video,” ACM Trans. Graph., vol. 37, no. 2, pp. 1–15, 2018.

Digital Library

[291]

Q. Ma et al., “Learning to dress 3D people in generative clothing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6468–6477.

[292]

H. Zhu, X. Zuo, S. Wang, X. Cao, and R. Yang, “Detailed human shape estimation from a single image by hierarchical mesh deformation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4491–4500.

[293]

Q. Ma, S. Saito, J. Yang, S. Tang, and M. J. Black, “SCALE: Modeling clothed humans with a surface codec of articulated local elements,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 16 082–16 093.

[294]

S. Lin, H. Zhang, Z. Zheng, R. Shao, and Y. Liu, “Learning implicit templates for point-based clothed human modeling,” in Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 210–228.

[295]

H. Zhang et al., “CloSET: Modeling clothed humans on continuous surface with explicit template decomposition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 501–511.

[296]

S. Saito, Z. Huang, R. Natsume, S. Morishima, A. Kanazawa, and H. Li, “PIFu: Pixel-aligned implicit function for high-resolution clothed human digitization,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 2304–2314.

[297]

Z. Li, T. Yu, C. Pan, Z. Zheng, and Y. Liu, “Robust 3D self-portraits in seconds,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 1344–1353.

[298]

T. Alldieck, M. Zanfir, and C. Sminchisescu, “Photorealistic monocular 3D reconstruction of humans wearing clothing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 1506–1515.

[299]

B. L. Bhatnagar, C. Sminchisescu, C. Theobalt, and G. Pons-Moll, “Combining implicit function learning and parametric models for 3D human reconstruction,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 311–329.

[300]

R. Shao et al., “DoubleField: Bridging the neural surface and radiance fields for high-fidelity human reconstruction and rendering,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 15851–15861.

[301]

Y. Feng, J. Yang, M. Pollefeys, M. J. Black, and T. Bolkart, “SCARF: Capturing and animation of body and clothing from monocular video,” in Proc. SIGGRAPH Asia Conf. Papers, 2022, Art. no.

[302]

G. Moon, H. Nam, T. Shiratori, and K. M. Lee, “3D clothed human reconstruction in the wild,” in Proc. Eur. Conf. Comput. Vis., Springer, 2022, pp. 184–200.

Cited By

Sun MYan QLiang ZKou DYang DYuan RZhao XLi MZhang LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)IF-Garments: Reconstructing Your Intersection-Free Multi-Layered Garments from Monocular VideosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681222(6588-6597)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681222
Tang TLiu HYou YWang TLi WCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from VideosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680881(1514-1523)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680881
Song YLiu YWu XHe QYuan ZLuo ACai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)MagicCartoon: 3D Pose and Shape Estimation for Bipedal Cartoon CharactersProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680844(8219-8227)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680844
Show More Cited By

Index Terms

Recovering 3D Human Mesh From Monocular Images: A Survey

Index terms have been assigned to the content through auto-classification.

Recommendations

Recovering 3D Human Pose from Monocular Images

We describe a learning-based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labeling of body parts in the image. Instead, it recovers pose by ...
Monocular Expressive 3D Human Reconstruction of Multiple People
ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Whole-body pose estimation aims to regress human pose models that include the body, hand, and facial details from RGB images. While the task of whole-body mesh recovery has been extensively studied in recent literature, the focus has predominantly been ...
A review of 3D human body pose estimation and mesh recovery
Abstract
3D human body pose estimation and mesh recovery refer to the approximation of body parts and joint locations and their recovery into a 3D model to visualize the characteristics of the target object from input imaging data. Recent ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 45, Issue 12

Dec. 2023

1966 pages

ISSN:0162-8828

Issue’s Table of Contents

0162-8828 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Computer Society

United States

Publication History

Published: 26 July 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun MYan QLiang ZKou DYang DYuan RZhao XLi MZhang LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)IF-Garments: Reconstructing Your Intersection-Free Multi-Layered Garments from Monocular VideosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681222(6588-6597)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681222
Tang TLiu HYou YWang TLi WCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from VideosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680881(1514-1523)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680881
Song YLiu YWu XHe QYuan ZLuo ACai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)MagicCartoon: 3D Pose and Shape Estimation for Bipedal Cartoon CharactersProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680844(8219-8227)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680844
Tian SSzafranski CZheng CYao FLouri AChen CZheng HDe V(2024)VITA: ViT Acceleration for Efficient 3D Human Mesh Recovery via Hardware-Algorithm Co-DesignProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656518(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3656518
Li BDeng YYang YZhao X(2024)An Embeddable Implicit IUVD Representation for Part-Based 3D Human Surface ReconstructionIEEE Transactions on Image Processing10.1109/TIP.2024.343007333(4334-4347)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3430073
Kushwaha MChoudhary JSingh D(2024)3DPMeshComputers and Graphics10.1016/j.cag.2024.103894119:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.cag.2024.103894
Sheng SZheng TRen ZZhang YFu W(2024)SS-MVMETRO: Semi-supervised multi-view human mesh recovery transformerApplied Intelligence10.1007/s10489-024-05435-954:6(5027-5043)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s10489-024-05435-9

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents