Article

Graph-Temporal LSTM Networks for Skeleton-Based Action Recognition

Authors:

Peiyi ShenAuthors Info & Claims

Pattern Recognition and Computer Vision: Third Chinese Conference, PRCV 2020, Nanjing, China, October 16–18, 2020, Proceedings, Part II

Pages 480 - 491

https://doi.org/10.1007/978-3-030-60639-8_40

Published: 16 October 2020 Publication History

Abstract

Human action recognition is one of the challenging and active research fields. Recently, spatio-temporal graph convolutions for skeleton-based action recognition have attracted much attention. Several strategies, such as temporal downsampling, convolution striding, and temporal pooling, are used to handle long action sequences. Recurrent neural networks are typically used for the processing of sequential data. In this paper, we propose a deep architecture that combines spatio-temporal graph convolution and graph-temporal long short-term memory (GT-LSTM) for skeleton-based human action recognition. Initially, topology-learnable spatio-temporal graph convolutions are applied to learn the local spatio-temporal features of graph nodes and adaptively evolve graph topologies. Then, GT-LSTM successively performs the spatio-temporal feature fusion with the node sequence and the temporal dimension, for the final recognition. Experimental results on the NTU RGB+D and Kinetics-Skeleton datasets demonstrate that the proposed architecture can effectively perform graph node information aggregation, graph topology evolution, and spatio-temporal graph feature fusion. liu2017skeleton.

References

[1]

Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018)

[2]

Fernando, B., Gavves, E., Oramas, M.J., Ghodrati, A., Tuytelaars, T.: Modeling video evolution for action recognition. In: CVPR, pp. 5378–5387 (2015)

[3]

Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)

[4]

Kim, T.S., Reiter, A.: Interpretable 3d human action analysis with temporal convolutional networks. In: CVPRW, pp. 1623–1631 (2017)

[5]

Li, L., Zheng, W., Zhang, Z., Huang, Y., Wang, L.: Skeleton-based relational modeling for action recognition. arXiv preprint arXiv:1805.02556 (2018)

[6]

Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1904.12659 (2019)

[7]

Li, S., Li, W., Cook, C., Zhu, C., Gao, Y.: Independently recurrent neural network (INDRNN): building a longer and deeper RNN. In: CVPR, pp. 5457–5466 (2018)

[8]

Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: ECCV, pp. 816–833 (2016)

[9]

Liu J, Wang G, Duan LY, Abdiyeva K, and Kot AC Skeleton-based human action recognition with global context-aware attention LSTM networks IEEE Trans. Image Process. 2017 27 4 1586-1599

Digital Library

[10]

Qin Y, Mo L, Li C, and Luo J Skeleton-based action recognition by part-aware graph convolutional networks Visual Comput. 2019 36 3 621-631

Digital Library

[11]

Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: CVPR, pp. 1010–1019 (2016)

[12]

Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with directed graph neural networks. In: CVPR, pp. 7912–7921 (2019)

[13]

Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: CVPR, pp. 12026–12035 (2019)

[14]

Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: CVPR, pp. 1227–1236 (2019)

[15]

Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI, pp. 4263–4270 (2017)

[16]

Song, Y.F., Zhang, Z., Wang, L.: Richly activated graph convolutional network for action recognition with incomplete skeletons. In: ICIP (2019)

[17]

Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: CVPR, pp. 5323–5332 (2018)

[18]

Thakkar, K.C., Narayanan, P.J.: Part-based graph convolutional network for action recognition. In: BMVC, pp. 1–13 (2018)

[19]

Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: CVPR, pp. 6450–6459 (2018)

[20]

Tu Z, Li H, Zhang D, Dauwels J, Li B, and Yuan J Action-stage emphasized spatiotemporal VLAD for video action recognition IEEE Trans. Image Process. 2019 28 6 2799-2812

[21]

Wang L et al. Temporal segment networks for action recognition in videos IEEE Trans. Pattern Anal. Mach. Intell. 2018 41 11 2740-2755

[22]

Yan, S., Xiong, Y., Lin, D., Tang, X.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI, pp. 7444–7452 (2018)

[23]

Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: ICCV, pp. 2136–2145 (2017)

[24]

Zhang, X., Xu, C., Tian, X., Tao, D.: Graph edge convolutional neural networks for skeleton based action recognition. arXiv preprint arXiv:1805.06184 (2018)

Cited By

Jiang DZhang YHe SMing ALarson K(2024)M2BeatsProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/102(920-928)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/102
Wu HLiu YWang XYu XZhao A(2024)A Spatio-Temporal Multi-Subgraph Convolutional Network for Parkinson's Disease Detection Using Gait DataProceedings of the 2024 3rd International Conference on Artificial Intelligence and Intelligent Information Processing10.1145/3707292.3707338(23-32)Online publication date: 25-Oct-2024
https://dl.acm.org/doi/10.1145/3707292.3707338
Richardson MBotros FShi YSnow BGuo PZhang LDong JVertanen KMa SWang R(2024)StegoType: Surface Typing from Egocentric CamerasAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3672539.3686762(1-14)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3672539.3686762
Show More Cited By

Recommendations

Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure
Abstract
Skeleton-based action recognition has recently achieved much attention since they can robustly convey the action information. Recently, many studies have shown that graph convolutional networks (GCNs), which generalize CNNs to more generic non-...
Human action recognition based on quaternion spatial-temporal convolutional neural network and LSTM in RGB videos

Convolutional neural networks (CNN) are the state-of-the-art method for action recognition in various kinds of datasets. However, most existing CNN models are based on lower-level handcrafted features from gray or RGB image sequences from small datasets, ...
Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition
Computer Vision – ACCV 2022
Abstract
Skeleton-based action recognition approaches usually construct the skeleton sequence as spatial-temporal graphs and perform graph convolution on these graphs to extract discriminative features. However, due to the fixed topology shared among ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Pattern Recognition and Computer Vision: Third Chinese Conference, PRCV 2020, Nanjing, China, October 16–18, 2020, Proceedings, Part II

Oct 2020

706 pages

ISBN:978-3-030-60638-1

DOI:10.1007/978-3-030-60639-8

Editors:
Yuxin Peng
Peking University, Beijing, China
,
Qingshan Liu
Nanjing University of Information Science and Technology, Nanjing, China
,
Huchuan Lu
Dalian University of Technology, Dalian, China
,
Zhenan Sun
Chinese Academy of Sciences, Beijing, China
,
Chenglin Liu
Chinese Academy of Sciences, Beijing, China
,
Xilin Chen
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
,
Hongbin Zha
Peking University, Beijing, China
,
Jian Yang
Nanjing University of Science and Technology, Nanjing, China

© Springer Nature Switzerland AG 2020.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 16 October 2020

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiang DZhang YHe SMing ALarson K(2024)M2BeatsProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/102(920-928)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/102
Wu HLiu YWang XYu XZhao A(2024)A Spatio-Temporal Multi-Subgraph Convolutional Network for Parkinson's Disease Detection Using Gait DataProceedings of the 2024 3rd International Conference on Artificial Intelligence and Intelligent Information Processing10.1145/3707292.3707338(23-32)Online publication date: 25-Oct-2024
https://dl.acm.org/doi/10.1145/3707292.3707338
Richardson MBotros FShi YSnow BGuo PZhang LDong JVertanen KMa SWang R(2024)StegoType: Surface Typing from Egocentric CamerasAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3672539.3686762(1-14)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3672539.3686762
Liu JChen CLiu MCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Multi-Modality Co-Learning for Efficient Skeleton-based Action RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681015(4909-4918)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681015
Wu WZheng CYang ZChen CDas SLu ACai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed TransformerProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681009(4660-4669)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681009
Richardson MBotros FShi YGuo PSnow BZhang LDong JVertanen KMa SWang R(2024)StegoType: Surface Typing from Egocentric CamerasProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676343(1-14)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676343
Yu TZhang MHe PLee CCheesman CMahmud SZhang RGuimbretiere FZhang C(2024)SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose TrackingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676341(1-13)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676341
Yeh CFan YDai XSaini ULai VAboagye PWang JChen HZheng YZhuang ZWang LZhang WBaeza-Yates RBonchi F(2024)RPMixer: Shaking Up Time Series Forecasting with Random Projections for Large Spatial-Temporal DataProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671881(3919-3930)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671881
Annabi LMa ZNguyen SGrollman DBroadbent EJu WSoh HWilliams T(2024)Unsupervised Motion Retargeting for Human-Robot ImitationCompanion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610978.3640588(204-208)Online publication date: 11-Mar-2024
https://dl.acm.org/doi/10.1145/3610978.3640588
Zhang FChongyang DLiu KHongjin L(2024)Multi‐scale skeleton simplification graph convolutional network for skeleton‐based action recognitionIET Computer Vision10.1049/cvi2.1230018:7(992-1003)Online publication date: 31-Oct-2024
https://dl.acm.org/doi/10.1049/cvi2.12300
Show More Cited By

View Options

View options

Figures

Tables

Media

View Table of Conten