Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3595916.3626384acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Learning Snippet-to-Motion Progression for Skeleton-based Human Motion Prediction

Published: 01 January 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Existing Graph Convolutional Networks to achieve human motion prediction largely adopt a one-step scheme, which output the prediction straight from history input, failing to exploit human motion patterns. We observe that human motions have transitional patterns and can be split into snippets representative of each transition. Each snippet can be reconstructed from its starting and ending poses referred to as the transitional poses. We propose a snippet-to-motion multi-stage framework that breaks motion prediction into sub-tasks easier to accomplish. Each sub-task integrates three modules: transitional pose prediction, snippet reconstruction, and snippet-to-motion prediction. Specifically, we propose to first predict only the transitional poses. Then we use them to reconstruct the corresponding snippets, obtaining a close approximation to the true motion sequence. Finally we refine them to produce the final prediction output. To implement the network, we propose a novel unified graph modeling, which allows for direct and effective feature propagation compared to existing approaches which rely on separate space-time modeling. Extensive experiments on Human 3.6M, CMU Mocap and 3DPW datasets verify the effectiveness of our method which achieves state-of-the-art performance.

    References

    [1]
    Najib Ben Aoun, Mahmoud Mejdoub, and Chokri Ben Amar. 2014. Graph-based approach for human action recognition using spatio-temporal features. Journal of Visual Communication and Image Representation 25, 2 (2014), 329–338.
    [2]
    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. In arXiv:1409.0473.
    [3]
    Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. In arXiv:1312.6203.
    [4]
    Judith Butepage, Michael J Black, Danica Kragic, and Hedvig Kjellstrom. 2017. Deep representation learning for human motion prediction and classification. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6158–6166.
    [5]
    Qiongjie Cui, Huaijiang Sun, and Fei Yang. 2020. Learning dynamic relationships for 3d human motion prediction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6519–6527.
    [6]
    Lingwei Dang, Yongwei Nie, Chengjiang Long, Qing Zhang, and Guiqing Li. 2021. MSR-GCN: Multi-scale residual graph convolution networks for human motion prediction. In IEEE/CVF International Conference on Computer Vision (ICCV). 11467–11476.
    [7]
    Pengxiang Ding and Jianqin Yin. 2022. Towards More Realistic Human Motion Prediction With Attention to Motion Coordination. IEEE Transactions on Circuits and Systems for Video Technology 32, 9 (2022), 5846–5858.
    [8]
    Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. 2015. Recurrent network models for human dynamics. In IEEE/CVF International Conference on Computer Vision (ICCV). 4346–4354.
    [9]
    Xiang Gao, Wei Hu, Jiaxiang Tang, Jiaying Liu, and Zongming Guo. 2019. Optimized skeleton-based action recognition via sparsified graph regression. In ACM International Conference on Multimedia (ACM MM). 601–610.
    [10]
    Shengnan Guo, Youfang Lin, Ning Feng, Chao Song, and Huaiyu Wan. 2019. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In AAAI conference on artificial intelligence (AAAI). 922–929.
    [11]
    Wen Guo, Yuming Du, Xi Shen, Vincent Lepetit, Xavier Alameda-Pineda, and Francesc Moreno-Noguer. 2023. Back to mlp: A simple baseline for human motion prediction. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 4809–4819.
    [12]
    Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in Neural Information Processing Systems (NeurIPS) 30 (2017).
    [13]
    Roei Herzig, Elad Levi, Huijuan Xu, Hang Gao, Eli Brosh, Xiaolong Wang, Amir Globerson, and Trevor Darrell. 2019. Spatio-temporal action graph networks. In IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).
    [14]
    Junhui Hou, Lap-Pui Chau, Nadia Magnenat-Thalmann, and Ying He. 2014. Compressing 3-D human motions via keyframe-based geometry videos. IEEE Transactions on Circuits and Systems for Video Technology 25, 1 (2014), 51–62.
    [15]
    Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2013. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2013), 1325–1339.
    [16]
    Sena Kiciroglu, Wei Wang, Mathieu Salzmann, and P. Fua. 2020. Long Term Motion Prediction Using Keyposes. In arXiv:2012.04731.
    [17]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. In arXiv:1412.6980.
    [18]
    Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. In arXiv:1609.02907.
    [19]
    Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and Gregory D Hager. 2017. Temporal convolutional networks for action segmentation and detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition workshops (CVPR). 156–165.
    [20]
    Bin Li, Xi Li, Zhongfei Zhang, and Fei Wu. 2019. Spatio-temporal graph routing for skeleton-based action recognition. In AAAI conference on artificial intelligence (AAAI). 8561–8568.
    [21]
    Chen Li, Zhen Zhang, Wee Sun Lee, and Gim Hee Lee. 2018. Convolutional sequence to sequence model for human dynamics. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5226–5234.
    [22]
    Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-structural graph convolutional networks for skeleton-based action recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition workshops (CVPR). 3595–3603.
    [23]
    Maosen Li, Siheng Chen, Yangheng Zhao, Ya Zhang, Yanfeng Wang, and Qi Tian. 2020. Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 214–223.
    [24]
    Maosen Li, Siheng Chen, Yangheng Zhao, Ya Zhang, Yanfeng Wang, and Qi Tian. 2021. Multiscale spatio-temporal graph neural networks for 3d skeleton-based motion prediction. IEEE Transactions on Image Processing 30 (2021), 7760–7775.
    [25]
    Han-Chao Liu, Fang-Lue Zhang, David Marshall, Luping Shi, and Shi-Min Hu. 2017. High-speed video generation with an event camera. The Visual Computer 33 (2017), 749–759.
    [26]
    Jinfu Liu, Xinshun Wang, Can Wang, Yuan Gao, and Mengyuan Liu. 2023. Temporal Decoupling Graph Convolutional Network for Skeleton-based Gesture Recognition. IEEE Transactions on Multimedia (2023).
    [27]
    Mengyuan Liu, Fanyang Meng, Chen Chen, and Songtao Wu. 2023. Novel Motion Patterns Matter for Practical Skeleton-based Action Recognition. In AAAI Conference on Artificial Intelligence (AAAI).
    [28]
    Mengyuan Liu and Junsong Yuan. 2018. Recognizing human actions as the evolution of pose estimation maps. In IEEE/CVF Conference on Computer Vision and Pattern Recognition workshops (CVPR). 1159–1168.
    [29]
    Xiaoli Liu, Jianqin Yin, Jin Liu, Pengxiang Ding, Jun Liu, and Huaping Liu. 2020. Trajectorycnn: a new spatio-temporal feature learning network for human motion prediction. IEEE Transactions on Circuits and Systems for Video Technology 31, 6 (2020), 2133–2146.
    [30]
    Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, and Wanli Ouyang. 2020. Disentangling and unifying graph convolutions for skeleton-based action recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition workshops (CVPR). 143–152.
    [31]
    Tiezheng Ma, Yongwei Nie, Chengjiang Long, Qing Zhang, and Guiqing Li. 2022. Progressively generating better initial guesses towards next stages for high-quality human motion prediction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6437–6446.
    [32]
    Wei Mao, Miaomiao Liu, and Mathieu Salzmann. 2020. History repeats itself: Human motion prediction via motion attention. In European Conference on Computer Vision (ECCV). 474–489.
    [33]
    Wei Mao, Miaomiao Liu, Mathieu Salzmann, and Hongdong Li. 2019. Learning trajectory dependencies for human motion prediction. In IEEE/CVF International Conference on Computer Vision (ICCV). 9489–9497.
    [34]
    Julieta Martinez, Michael J Black, and Javier Romero. 2017. On human motion prediction using recurrent neural networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2891–2900.
    [35]
    Qianhui Men, Edmond SL Ho, Hubert PH Shum, and Howard Leung. 2020. A quadruple diffusion convolutional recurrent network for human motion prediction. IEEE Transactions on Circuits and Systems for Video Technology 31, 9 (2020), 3417–3432.
    [36]
    Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 652–660.
    [37]
    Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE Transactions on Neural Networks 20, 1 (2008), 61–80.
    [38]
    Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Skeleton-based action recognition with directed graph neural networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition workshops (CVPR). 7912–7921.
    [39]
    Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2021. AdaSGN: Adapting Joint Number and Model Size for Efficient Skeleton-Based Action Recognition. In IEEE/CVF International Conference on Computer Vision (ICCV). 13413–13422.
    [40]
    Theodoros Sofianos, Alessio Sampieri, Luca Franco, and Fabio Galasso. 2021. Space-time-separable graph convolutional network for pose forecasting. In IEEE/CVF International Conference on Computer Vision (ICCV). 11209–11218.
    [41]
    Jin Tang, Jin Zhang, Rui Ding, Baoxuan Gu, and Jianqin Yin. 2023. Collaborative Multi-dynamic Pattern Modeling for Human Motion Prediction. IEEE Transactions on Circuits and Systems for Video Technology (2023).
    [42]
    Zhigang Tu, Zhisheng Huang, Yujin Chen, Di Kang, Linchao Bao, Bisheng Yang, and Junsong Yuan. 2023. Consistent 3d hand reconstruction in video via self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
    [43]
    Zhigang Tu, Xiangjian Liu, and Xuan Xiao. 2022. A general dynamic knowledge distillation method for visual analytics. IEEE Transactions on Image Processing 31 (2022), 6517–6531.
    [44]
    Zhigang Tu, Yuanzhong Liu, Yan Zhang, Qizi Mu, and Junsong Yuan. 2023. DTCM: Joint Optimization of Dark Enhancement and Action Recognition in Videos. IEEE Transactions on Image Processing (2023).
    [45]
    Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. In arXiv:1710.10903.
    [46]
    Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, and Chengqi Zhang. 2019. Graph wavenet for deep spatial-temporal graph modeling. In arXiv:1906.00121.
    [47]
    Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI conference on artificial intelligence (AAAI).
    [48]
    Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2017. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In arXiv:1709.04875.
    [49]
    Fang-Lue Zhang, Xian Wu, Rui-Long Li, Jue Wang, Zhao-Heng Zheng, and Shi-Min Hu. 2018. Detecting and removing visual distractors for video aesthetic enhancement. IEEE Transactions on Multimedia 20, 8 (2018), 1987–1999.
    [50]
    Honghong Zhou, Caili Guo, Hao Zhang, and Yanjun Wang. 2021. Learning multiscale correlations for human motion prediction. In IEEE International Conference on Development and Learning (ICDL). 1–7.

    Index Terms

    1. Learning Snippet-to-Motion Progression for Skeleton-based Human Motion Prediction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
      December 2023
      745 pages
      ISBN:9798400702051
      DOI:10.1145/3595916
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 January 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. graph convolution
      2. human motion prediction
      3. neural networks

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      MMAsia '23
      Sponsor:
      MMAsia '23: ACM Multimedia Asia
      December 6 - 8, 2023
      Tainan, Taiwan

      Acceptance Rates

      Overall Acceptance Rate 59 of 204 submissions, 29%

      Upcoming Conference

      MM '24
      The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 47
        Total Downloads
      • Downloads (Last 12 months)47
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media