Clip fusion with bi-level optimization for human mesh reconstruction from monocular videos
Proceedings of the 31st ACM international conference on multimedia, 2023•dl.acm.org
Human mesh reconstruction (HMR) from monocular video is the key step to many mixed
reality and robotic applications. Although existing methods show promising results by
capturing frames' temporal information, these methods predict human mesh with the design
of implicit temporal learning modules in a sequence to frame manner. To mine more
temporal information from the video, we present a bi-level clip inference network for HMR,
which leverages both local motion and global context explicitly for dense 3D reconstruction …
reality and robotic applications. Although existing methods show promising results by
capturing frames' temporal information, these methods predict human mesh with the design
of implicit temporal learning modules in a sequence to frame manner. To mine more
temporal information from the video, we present a bi-level clip inference network for HMR,
which leverages both local motion and global context explicitly for dense 3D reconstruction …
Human mesh reconstruction (HMR) from monocular video is the key step to many mixed reality and robotic applications. Although existing methods show promising results by capturing frames' temporal information, these methods predict human mesh with the design of implicit temporal learning modules in a sequence to frame manner. To mine more temporal information from the video, we present a bi-level clip inference network for HMR, which leverages both local motion and global context explicitly for dense 3D reconstruction. Specifically, we propose a novel bi-level temporal fusion strategy that takes both neighboring and long-range relations into consideration. In addition, different from traditional frame-wise operation, we investigate an alternative perspective by treating video-based HMR as clip-wise inference. We evaluate the proposed method on multiple datasets (3DPW, Human3.6M, and MPI-INF-3DHP) quantitatively and qualitatively, demonstrating a significant improvement over existing methods (in terms of PA-MPJPE, ACC-Error etc). Furthermore, we extend the proposed method on more challenging Multiple Shots HMR task to demonstrate its generalizability. Some visual demos can be seen https://github.com/bicf0/bicf_demo.
ACM Digital Library