End-to-End Human Pose and Mesh Reconstruction with Transformers

Lin, Kevin; Wang, Lijuan; Liu, Zicheng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2012.09760 (cs)

[Submitted on 17 Dec 2020 (v1), last revised 15 Jun 2021 (this version, v3)]

Title:End-to-End Human Pose and Mesh Reconstruction with Transformers

Authors:Kevin Lin, Lijuan Wang, Zicheng Liu

View PDF

Abstract:We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image. Our method uses a transformer encoder to jointly model vertex-vertex and vertex-joint interactions, and outputs 3D joint coordinates and mesh vertices simultaneously. Compared to existing techniques that regress pose and shape parameters, METRO does not rely on any parametric mesh models like SMPL, thus it can be easily extended to other objects such as hands. We further relax the mesh topology and allow the transformer self-attention mechanism to freely attend between any two vertices, making it possible to learn non-local relationships among mesh vertices and joints. With the proposed masked vertex modeling, our method is more robust and effective in handling challenging situations like partial occlusions. METRO generates new state-of-the-art results for human mesh reconstruction on the public Human3.6M and 3DPW datasets. Moreover, we demonstrate the generalizability of METRO to 3D hand reconstruction in the wild, outperforming existing state-of-the-art methods on FreiHAND dataset. Code and pre-trained models are available at this https URL.

Comments:	CVPR 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2012.09760 [cs.CV]
	(or arXiv:2012.09760v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2012.09760

Submission history

From: Kevin Lin [view email]
[v1] Thu, 17 Dec 2020 17:17:29 UTC (4,868 KB)
[v2] Sun, 28 Mar 2021 01:20:42 UTC (5,220 KB)
[v3] Tue, 15 Jun 2021 15:56:07 UTC (5,220 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:End-to-End Human Pose and Mesh Reconstruction with Transformers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:End-to-End Human Pose and Mesh Reconstruction with Transformers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators