View-Invariant Probabilistic Embedding for Human Pose

Sun, Jennifer J.; Zhao, Jiaping; Chen, Liang-Chieh; Schroff, Florian; Adam, Hartwig; Liu, Ting

doi:10.1007/978-3-030-58558-7_4

Jennifer J. Sun¹²,
Jiaping Zhao¹³,
Liang-Chieh Chen¹³,
Florian Schroff¹³,
Hartwig Adam¹³ &
…
Ting Liu¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12350))

Included in the following conference series:

European Conference on Computer Vision

4740 Accesses

Abstract

Depictions of similar human body configurations can vary with changing viewpoints. Using only 2D information, we would like to enable vision algorithms to recognize similarity in human body poses across multiple views. This ability is useful for analyzing body movements and human behaviors in images and videos. In this paper, we propose an approach for learning a compact view-invariant embedding space from 2D joint keypoints alone, without explicitly predicting 3D poses. Since 2D poses are projected from 3D space, they have an inherent ambiguity, which is difficult to represent through a deterministic mapping. Hence, we use probabilistic embeddings to model this input uncertainty. Experimental results show that our embedding model achieves higher accuracy when retrieving similar poses across different camera views, in comparison with 2D-to-3D pose lifting models. We also demonstrate the effectiveness of applying our embeddings to view-invariant action recognition and video alignment. Our code is available at https://github.com/google-research/google-research/tree/master/poem.

J.J. Sun—This work was done during the author’s internship at Google.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose

Article 16 November 2021

Supervised Spectral Embedding for Human Pose Estimation

Cross-View Action Recognition Using View-Invariant Pose Feature Learned from Synthetic Data with Domain Adaptation

References

Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: CVPR (2015)
Google Scholar
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)
Google Scholar
Bojchevski, A., Günnemann, S.: Deep Gaussian embedding of graphs: Unsupervised inductive learning via ranking. In: ICLR (2018)
Google Scholar
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: NeurIPS (1994)
Google Scholar
Cao, C., Zhang, Y., Zhang, C., Lu, H.: Body joint guided 3-D deep convolutional descriptors for action recognition. IEEE Trans. Cybern. 48(3), 1095–1108 (2017)
Article Google Scholar
Chen, C.H., Ramanan, D.: 3D human pose estimation = 2D pose estimation + matching. In: CVPR (2017)
Google Scholar
Chen, C.H., Tyagi, A., Agrawal, A., Drover, D., Stojanov, S., Rehg, J.M.: Unsupervised 3D pose estimation with geometric self-supervision. In: CVPR (2019)
Google Scholar
Chu, R., Sun, Y., Li, Y., Liu, Z., Zhang, C., Wei, Y.: Vehicle re-identification with viewpoint-aware metric learning. In: ICCV (2019)
Google Scholar
Drover, D., M. V, R., Chen, C.-H., Agrawal, A., Tyagi, A., Huynh, C.P.: Can 3D pose be learned from 2D projections alone? In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11132, pp. 78–94. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11018-5_7
Chapter Google Scholar
Du, W., Wang, Y., Qiao, Y.: RPAN: an end-to-end recurrent pose-attention network for action recognition in videos. In: ICCV (2017)
Google Scholar
Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: Temporal cycle-consistency learning. In: CVPR (2019)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR (2006)
Google Scholar
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv:1703.07737 (2017)
Ho, C.H., Morgado, P., Persekian, A., Vasconcelos, N.: PIEs: pose invariant embeddings. In: CVPR, pp. 12377–12386 (2019)
Google Scholar
Hu, W., Zhu, S.C.: Learning a probabilistic model mixing 3D and 2D primitives for view invariant object recognition. In: CVPR (2010)
Google Scholar
Huang, C., Loy, C.C., Tang, X.: Local similarity-aware deep feature embedding. In: NeurIPS (2016)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE TPAMI 36, 1325–1339 (2013)
Article Google Scholar
Iqbal, U., Garbade, M., Gall, J.: Pose for action-action for pose. In: FG (2017)
Google Scholar
Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Mining on manifolds: metric learning without labels. In: CVPR (2018)
Google Scholar
Iskakov, K., Burkov, E., Lempitsky, V., Malkov, Y.: Learnable triangulation of human pose. In: ICCV (2019)
Google Scholar
Jammalamadaka, N., Zisserman, A., Eichner, M., Ferrari, V., Jawahar, C.: Video retrieval by mimicking poses. In: ACM ICMR (2012)
Google Scholar
Ji, X., Liu, H.: Advances in view-invariant human motion analysis: a review. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40(1), 13–24 (2009)
Google Scholar
Ji, X., Liu, H., Li, Y., Brown, D.: Visual-based view-invariant human motion analysis: a review. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008. LNCS (LNAI), vol. 5177, pp. 741–748. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85563-7_93
Chapter Google Scholar
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: NeurIPS (2017)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)
Google Scholar
Kocabas, M., Karagoz, S., Akbas, E.: Self-supervised learning of 3D human pose using multi-view geometry. In: CVPR (2019)
Google Scholar
LeCun, Y., Huang, F.J., Bottou, L., et al.: Learning methods for generic object recognition with invariance to pose and lighting. In: CVPR (2004)
Google Scholar
Li, J., Wong, Y., Zhao, Q., Kankanhalli, M.: Unsupervised learning of view-invariant action representations. In: NeurIPS (2018)
Google Scholar
Liu, J., Akhtar, N., Ajmal, M.: Viewpoint invariant action recognition using RGB-D videos. IEEE Access 6, 70061–70071 (2018)
Article Google Scholar
Liu, M., Yuan, J.: Recognizing human actions as the evolution of pose estimation maps. In: CVPR (2018)
Google Scholar
Luvizon, D.C., Tabia, H., Picard, D.: Multi-task deep learning for real-time 3D human pose estimation and action recognition. arXiv:1912.08077 (2019)
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: ICCV (2017)
Google Scholar
Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 3DV (2017)
Google Scholar
Misra, I., Zitnick, C.L., Hebert, M.: Shuffle and learn: unsupervised learning using temporal order verification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 527–544. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_32
Chapter Google Scholar
Mori, G., et al.: Pose embeddings: A deep architecture for learning to match human poses. arXiv:1507.00302 (2015)
Nie, B.X., Xiong, C., Zhu, S.C.: Joint action recognition and pose estimation from video. In: CVPR (2015)
Google Scholar
Oh, S.J., Murphy, K., Pan, J., Roth, J., Schroff, F., Gallagher, A.: Modeling uncertainty with hedged instance embedding. In: ICLR (2019)
Google Scholar
Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: CVPR (2016)
Google Scholar
Ong, E.J., Micilotta, A.S., Bowden, R., Hilton, A.: Viewpoint invariant exemplar-based 3D human tracking. CVIU 104, 178–189 (2006)
Google Scholar
Papandreou, G., Zhu, T., Chen, L.-C., Gidaris, S., Tompson, J., Murphy, K.: PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 282–299. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_17
Chapter Google Scholar
Papandreou, G., et al.: Towards accurate multi-person pose estimation in the wild. In: CVPR (2017)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A., et al.: Deep face recognition. In: BMVC (2015)
Google Scholar
Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: CVPR (2019)
Google Scholar
Qiu, H., Wang, C., Wang, J., Wang, N., Zeng, W.: Cross View Fusion for 3D Human Pose Estimation. In: ICCV (2019)
Google Scholar
Rao, C., Shah, M.: View-invariance in action recognition. In: CVPR (2001)
Google Scholar
Hossain, M.R.I., Little, J.J.: Exploiting temporal information for 3D human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 69–86. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_5
Chapter Google Scholar
Rhodin, H., Constantin, V., Katircioglu, I., Salzmann, M., Fua, P.: Neural scene decomposition for multi-person motion capture. In: CVPR (2019)
Google Scholar
Rhodin, H., Salzmann, M., Fua, P.: Unsupervised geometry-aware representation for 3D human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 765–782. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_46
Chapter Google Scholar
Rhodin, H., et al.: Learning monocular 3D human pose estimation from multi-view images. In: CVPR (2018)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)
Google Scholar
Sermanet, P., et al.: Time-contrastive networks: self-supervised learning from video. In: ICRA (2018)
Google Scholar
Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 536–553. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_33
Chapter Google Scholar
Tekin, B., Márquez-Neila, P., Salzmann, M., Fua, P.: Learning to fuse 2D and 3D image cues for monocular body pose estimation. In: ICCV (2017)
Google Scholar
Tome, D., Toso, M., Agapito, L., Russell, C.: Rethinking pose in 3D: multi-stage refinement and recovery for markerless motion capture. In: 3DV (2018)
Google Scholar
Vilnis, L., McCallum, A.: Word representations via Gaussian embedding. In: ICLR (2015)
Google Scholar
Wang, J., et al.: Learning fine-grained image similarity with deep ranking. In: CVPR (2014)
Google Scholar
Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: CVPR (2015)
Google Scholar
Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: ICCV (2017)
Google Scholar
Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: CVPRW (2012)
Google Scholar
Zhang, W., Zhu, M., Derpanis, K.G.: From actemes to action: a strongly-supervised representation for detailed action understanding. In: ICCV (2013)
Google Scholar
Zheng, L., Huang, Y., Lu, H., Yang, Y.: Pose invariant embedding for deep person re-identification. IEEE TIP 28, 4500–4509 (2019)
MathSciNet MATH Google Scholar
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: ICCV (2017)
Google Scholar

Download references

Acknowledgment

We thank Yuxiao Wang, Debidatta Dwibedi, and Liangzhe Yuan from Google Research, Long Zhao from Rutgers University, and Xiao Zhang from University of Chicago for helpful discussions. We appreciate the support of Pietro Perona, Yisong Yue, and the Computational Vision Lab at Caltech for making this collaboration possible. The author Jennifer J. Sun is supported by NSERC (funding number PGSD3-532647-2019) and Caltech.

Author information

Authors and Affiliations

California Institute of Technology, Pasadena, USA
Jennifer J. Sun
Google Research, Los Angeles, USA
Jiaping Zhao, Liang-Chieh Chen, Florian Schroff, Hartwig Adam & Ting Liu

Authors

Jennifer J. Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jiaping Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Liang-Chieh Chen
View author publications
You can also search for this author in PubMed Google Scholar
Florian Schroff
View author publications
You can also search for this author in PubMed Google Scholar
Hartwig Adam
View author publications
You can also search for this author in PubMed Google Scholar
Ting Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jennifer J. Sun .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2333 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, J.J., Zhao, J., Chen, LC., Schroff, F., Adam, H., Liu, T. (2020). View-Invariant Probabilistic Embedding for Human Pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12350. Springer, Cham. https://doi.org/10.1007/978-3-030-58558-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-58558-7_4
Published: 29 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58557-0
Online ISBN: 978-3-030-58558-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

View-Invariant Probabilistic Embedding for Human Pose

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose

Supervised Spectral Embedding for Human Pose Estimation

Cross-View Action Recognition Using View-Invariant Pose Feature Learned from Synthetic Data with Domain Adaptation

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2333 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

View-Invariant Probabilistic Embedding for Human Pose

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose

Supervised Spectral Embedding for Human Pose Estimation

Cross-View Action Recognition Using View-Invariant Pose Feature Learned from Synthetic Data with Domain Adaptation

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2333 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation