Geometry-aware 3D pose transfer using transformer autoencoder

Liu, Shanghuan; Gai, Shaoyan; Da, Feipeng; Waris, Fazal

doi:10.1007/s41095-023-0379-8

Geometry-aware 3D pose transfer using transformer autoencoder

Research Article
Open access
Published: 22 March 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Geometry-aware 3D pose transfer using transformer autoencoder

Download PDF

Shanghuan Liu¹,
Shaoyan Gai¹,
Feipeng Da¹ &
…
Fazal Waris¹

530 Accesses
Explore all metrics

Abstract

3D pose transfer over unorganized point clouds is a challenging generation task, which transfers a source’s pose to a target shape and keeps the target’s identity. Recent deep models have learned deformations and used the target’s identity as a style to modulate the combined features of two shapes or the aligned vertices of the source shape. However, all operations in these models are point-wise and independent and ignore the geometric information on the surface and structure of the input shapes. This disadvantage severely limits the generation and generalization capabilities. In this study, we propose a geometry-aware method based on a novel transformer autoencoder to solve this problem. An efficient self-attention mechanism, that is, cross-covariance attention, was utilized across our framework to perceive the correlations between points at different distances. Specifically, the transformer encoder extracts the target shape’s local geometry details for identity attributes and the source shape’s global geometry structure for pose information. Our transformer decoder efficiently learns deformations and recovers identity properties by fusing and decoding the extracted features in a geometry attentional manner, which does not require corresponding information or modulation steps. The experiments demonstrated that the geometry-aware method achieved state-of-the-art performance in a 3D pose transfer task. The implementation code and data are available at https://github.com/SEULSH/Geometry-Aware-3D-Pose-Transfer-Using-Transformer-Autoencoder.

Article PDF

Disentangled Shape and Pose Based on Attention and Mesh Autoencoder

Discrete Point Flow Networks for Efficient Point Cloud Generation

Learning Geometric Transformation for Point Cloud Completion

Article 08 June 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ye, Y. P.; Song, Z.; Zhao, J. High-fidelity 3D real-time facial animation using infrared structured light sensing system. Computers & Graphics Vol. 104, 46–58, 2022.
Article Google Scholar
Roberts, R. A.; dos Anjos, R. K.; Maejima, A.; Anjyo, K. Deformation transfer survey. Computers & Graphics Vol. 94, 52–61, 2021.
Article Google Scholar
Ben-Chen, M.; Weber, O.; Gotsman, C. Spatial deformation transfer. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 67–74, 2009.
Chu, H. K.; Lin, C. H. Example-based deformation transfer for 3D polygon models. Journal of Information Science and Engineering Vol. 26, No. 2, 379–391, 2010.
Google Scholar
Zhang, Y. Z.; Zheng, J. M.; Cai, Y. Y. Proxy-driven free-form deformation by topology-adjustable control lattice. Computers & Graphics Vol. 89, 167–177, 2020.
Article Google Scholar
Liao, Z.; Yang, J. M.; Saito, J.; Pons-Moll, G.; Zhou, Y. Skeleton-free pose transfer for stylized 3D characters. In: Computer Vision–ECCV 2022. Lecture Notes in Computer Science, Vol. 13662. Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; Hassner, T. Eds. Springer Cham, 640–656, 2022.
Chapter Google Scholar
Zhou, K. Y.; Bhatnagar, B. L.; Pons-Moll, G. Unsupervised shape and pose disentanglement for 3D meshes. In: Computer Vision–ECCV 2020. Lecture Notes in Computer Science, Vol. 12367. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 341–357, 2020.
Chapter Google Scholar
Cosmo, L.; Norelli, A.; Halimi, O.; Kimmel, R.; Rodolà, E. LIMP: Learning latent shape representations with metric preservation priors. In: Computer Vision–ECCV 2020. Lecture Notes in Computer Science, Vol. 12348. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 19–35, 2020.
Chapter Google Scholar
Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, 1510–1519, 2017.
Park, T.; Liu, M. Y.; Wang, T. C.; Zhu, J. Y. Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2332–2341, 2019.
Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
Chen, Y. G.; Chen, M. C.; Song, C. Y.; Ni, B. B. CartoonRenderer: An instance-based multi-style cartoon image translator. In: MultiMedia Modeling. Lecture Notes in Computer Science, Vol. 11961. Ro, Y., et al. Eds. Springer Cham, 176–187, 2020.
Chapter Google Scholar
Wang, J. S.; Wen, C.; Fu, Y. W.; Lin, H. T.; Zou, T. Y.; Xue, X. Y.; Zhang, Y. D. Neural pose transfer by spatially adaptive instance normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5830–5838, 2020.
Chen, H. Y.; Tang, H.; Yu, Z. T.; Sebe, N.; Zhao, G. Y. Geometry-contrastive transformer for generalized 3D pose transfer. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 36, No. 1, 258–266, 2022.
Article Google Scholar
Song, C.; Wei, J.; Li, R.; Liu, F.; Lin, G. 3D pose transfer with correspondence learning and mesh refinement. In: Proceedings of the Advances in Neural Information Processing Systems, Vol. 34, 2021.
Song, C. Y.; Wei, J. C.; Li, R. B.; Liu, F. Y.; Lin, G. S. Unsupervised 3D pose transfer with cross consistency and dual reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 45, No. 8, 10488–10499, 2023.
Article Google Scholar
Guo, M. H.; Xu, T. X.; Liu, J. J.; Liu, Z. N.; Jiang, P. T.; Mu, T. J.; Zhang, S. H.; Martin, R. R.; Cheng, M. M.; Hu, S. M. Attention mechanisms in computer vision: A survey. Computational Visual Media Vol. 8, No. 3, 331–368, 2022.
Article Google Scholar
Xu, Y. F.; Wei, H. P.; Lin, M. X.; Deng, Y. Y.; Sheng, K. K.; Zhang, M. D.; Tang, F.; Dong, W. M.; Huang, F. Y.; Xu, C. S. Transformers in computational visual media: A survey. Computational Visual Media Vol. 8, No. 1, 33–62, 2022.
Article Google Scholar
Sumner, R. W.; Popović J. Deformation transfer for triangle meshes. In: Proceedings of the ACM SIGGRAPH Papers, 399–405, 2004.
Xu, W. W.; Zhou, K.; Yu, Y. Z.; Tan, Q. F.; Peng, Q. S.; Guo, B. N. Gradient domain editing of deforming mesh sequences. ACM Transactions on Graphics Vol. 26, No. 3, 84–es, 2007.
Article Google Scholar
Domadiya, P. M.; Shah, D. P.; Mitra, S. Guided deformation transfer. In: Proceedings of the 16th ACM SIGGRAPH European Conference on Visual Media Production, Article No. 7, 2019.
Basset, J.; Wuhrer, S.; Boyer, E.; Multon, F. Contact preserving shape transfer: Retargeting motion from one shape to another. Computers & Graphics Vol. 89, 11–23, 2020.
Article Google Scholar
Yang, J.; Gao, L.; Lai, Y. K.; Rosin, P. L.; Xia, S. H. Biharmonic deformation transfer with automatic key point selection. Graphical Models Vol. 98, 1–13, 2018.
Article MathSciNet Google Scholar
Ben-Chen, M.; Weber, O.; Gotsman, C. Variational harmonic maps for space deformation. ACM Transactions on Graphics Vol. 28, No. 3, Article No. 34, 2009.
Jacobson, A.; Baran, I.; Popović J.; Sorkine, O. Bounded biharmonic weights for real-time deformation. ACM Transactions on Graphics Vol. 30, No. 4, Article No. 78, 2011.
Baran, I.; Vlasic, D.; Grinspun, E.; Popović J. Semantic deformation transfer. ACM Transactions on Graphics Vol. 28, No. 3, Article No. 36, 2009.
Chen, H.; Tang, H.; Sebe, N.; Zhao, G. AniFormer: Datadriven 3D animation with transformer. In: Proceedings of the British Machine Vision Conference, 2021.
Gao, L.; Yang, J.; Qiao, Y. L.; Lai, Y. K.; Rosin, P. L.; Xu, W. W.; Xia, S. H. Automatic unpaired shape deformation transfer. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 237, 2018.
Chen, H. Y.; Tang, H.; Shi, H. L.; Peng, W.; Sebe, N.; Zhao, G. Y. Intrinsic-extrinsic preserved GANs for unsupervised 3D pose transfer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 8610–8619, 2021.
Wang, Y. F.; Aigerman, N.; Kim, V. G.; Chaudhuri, S.; Sorkine-Hornung, O. Neural cages for detail-preserving 3D deformations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 72–80, 2020.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010, 2017.
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X. H.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
Lin, K.; Wang, L. J.; Liu, Z. C. End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1954–1963, 2021.
Lin, K.; Wang, L. J.; Liu, Z. C. Mesh graphormer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 12919–12928, 2021.
Misra, I.; Girdhar, R.; Joulin, A. An end-to-end transformer model for 3D object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2886–2897, 2021.
Mao, J. G.; Xue, Y. J.; Niu, M. Z.; Bai, H. Y.; Feng, J. S.; Liang, X. D.; Xu, H.; Xu, C. J. Voxel transformer for 3D object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 3144–3153, 2021.
Ali, A.; Touvron, H.; Caron, M.; Bojanowski, P.; Douze, M.; Joulin, A.; Laptev, I.; Neverova, N.; Synnaeve, G.; Verbeek, J.; et al. Xcit: Cross-covariance image transformers. In: Proceedings of the Advances in Neural Information Processing Systems, Vol. 34, 20014–20027, 2021.
Google Scholar
Chandran, P.; Zoss, G.; Gross, M.; Gotardo, P.; Bradley, D. Shape transformers: Topology-independent 3D shape models using transformers. Computer Graphics Forum Vol. 41, No. 2, 195–207, 2022.
Article Google Scholar
Loper, M.; Mahmood, N.; Romero, J.; Pons-Moll, G.; Black, M. J. SMPL: A skinned multi-person linear model. Seminal Graphics Papers: Pushing the Boundaries Vol. 2, Article No. 88, 851–866, 2023.
Google Scholar
Bogo, F.; Romero, J.; Loper, M.; Black, M. J. FAUST: Dataset and evaluation for 3D mesh registration. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3794–3801, 2014.
Bhatnagar, B.; Tiwari, G.; Theobalt, C.; Pons-Moll, G. Multi-garment net: Learning to dress 3D people from images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5419–5429, 2019.
Zuffi, S.; Kanazawa, A.; Jacobs, D. W.; Black, M. J. 3D menagerie: Modeling the 3D shape and pose of animals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5524–5532, 2017.
Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No. 721, 8026–8037, 2019.
Fan, H. Q.; Su, H.; Guibas, L. A point set generation network for 3D object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2463–2471, 2017.
Mahmood, N.; Ghorbani, N.; Troje, N. F.; Pons-Moll, G.; Black, M. AMASS: Archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5441–5450, 2019.

Download references

Acknowledgements

This work was supported by the Special Project on Basic Research of Frontier Leading Technology of Jiangsu Province, China (Grant No. BK20192004C).

Author information

Authors and Affiliations

Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Southeast University, Nanjing, 210096, China
Shanghuan Liu, Shaoyan Gai, Feipeng Da & Fazal Waris

Authors

Shanghuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shaoyan Gai
View author publications
You can also search for this author in PubMed Google Scholar
Feipeng Da
View author publications
You can also search for this author in PubMed Google Scholar
Fazal Waris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaoyan Gai.

Ethics declarations

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Shanghuan Liu received his B.E. and M.E. degrees from the College of Internet of Things Engineering, Hohai University, Nanjing, China, in 2015 and 2018, respectively. He is currently pursuing the Ph.D. degree in Southeast University, Nanjing, China, with a focus on 3D sensing and deep learning in computer vision.

Shaoyan Gai received his Ph.D. degree from Southeast University in 2008. He is currently an associate professor and a Ph.D. advisor at Southeast University. His main research interests include 3D measurement and 3D face recognition.

Feipeng Da received his Ph.D. degree from the School of Automation, Southeast University, in 1998. He is currently a professor with the School of Automation, Southeast University. He has published an academic monograph and authored or coauthored over 150 high quality articles, of which are retrieved by SCI, EI, and ISTP more than 100 times. He has 40 authorized invention patents, one authorized patent for utility models, four software copyrights, and three international invention patents (PCT applied). He also serves as a reviewer for the journals from different areas, such as Optics Express, Optics Letters, Optical and Lasers in Engineering, IEEE TRANSACTIONS ON NEURAL NETWORKS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS, PHYSICS LETTER A, Neural Networks, and Pattern Recognition.

Fazal Waris received his B.Sc. degree from NWFP University of Engineering and Technology, Peshawar and M.S. degree from University of Lahore, Pakistan. He is currently a Ph.D. candidate at Southeast University, Nanjing, China. His research interests include machine learning, computer vision, and deep learning.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Liu, S., Gai, S., Da, F. et al. Geometry-aware 3D pose transfer using transformer autoencoder. Comp. Visual Media (2024). https://doi.org/10.1007/s41095-023-0379-8

Download citation

Received: 13 March 2023
Accepted: 09 September 2023
Published: 22 March 2024
DOI: https://doi.org/10.1007/s41095-023-0379-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Geometry-aware 3D pose transfer using transformer autoencoder

Abstract

Article PDF

Similar content being viewed by others

Disentangled Shape and Pose Based on Attention and Mesh Autoencoder

Discrete Point Flow Networks for Efficient Point Cloud Generation

Learning Geometric Transformation for Point Cloud Completion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Geometry-aware 3D pose transfer using transformer autoencoder

Abstract

Article PDF

Similar content being viewed by others

Disentangled Shape and Pose Based on Attention and Mesh Autoencoder

Discrete Point Flow Networks for Efficient Point Cloud Generation

Learning Geometric Transformation for Point Cloud Completion

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation