Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-60639-8_40guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Graph-Temporal LSTM Networks for Skeleton-Based Action Recognition

Published: 16 October 2020 Publication History

Abstract

Human action recognition is one of the challenging and active research fields. Recently, spatio-temporal graph convolutions for skeleton-based action recognition have attracted much attention. Several strategies, such as temporal downsampling, convolution striding, and temporal pooling, are used to handle long action sequences. Recurrent neural networks are typically used for the processing of sequential data. In this paper, we propose a deep architecture that combines spatio-temporal graph convolution and graph-temporal long short-term memory (GT-LSTM) for skeleton-based human action recognition. Initially, topology-learnable spatio-temporal graph convolutions are applied to learn the local spatio-temporal features of graph nodes and adaptively evolve graph topologies. Then, GT-LSTM successively performs the spatio-temporal feature fusion with the node sequence and the temporal dimension, for the final recognition. Experimental results on the NTU RGB+D and Kinetics-Skeleton datasets demonstrate that the proposed architecture can effectively perform graph node information aggregation, graph topology evolution, and spatio-temporal graph feature fusion. liu2017skeleton.

References

[1]
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018)
[2]
Fernando, B., Gavves, E., Oramas, M.J., Ghodrati, A., Tuytelaars, T.: Modeling video evolution for action recognition. In: CVPR, pp. 5378–5387 (2015)
[3]
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
[4]
Kim, T.S., Reiter, A.: Interpretable 3d human action analysis with temporal convolutional networks. In: CVPRW, pp. 1623–1631 (2017)
[5]
Li, L., Zheng, W., Zhang, Z., Huang, Y., Wang, L.: Skeleton-based relational modeling for action recognition. arXiv preprint arXiv:1805.02556 (2018)
[6]
Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1904.12659 (2019)
[7]
Li, S., Li, W., Cook, C., Zhu, C., Gao, Y.: Independently recurrent neural network (INDRNN): building a longer and deeper RNN. In: CVPR, pp. 5457–5466 (2018)
[8]
Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: ECCV, pp. 816–833 (2016)
[9]
Liu J, Wang G, Duan LY, Abdiyeva K, and Kot AC Skeleton-based human action recognition with global context-aware attention LSTM networks IEEE Trans. Image Process. 2017 27 4 1586-1599
[10]
Qin Y, Mo L, Li C, and Luo J Skeleton-based action recognition by part-aware graph convolutional networks Visual Comput. 2019 36 3 621-631
[11]
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: CVPR, pp. 1010–1019 (2016)
[12]
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with directed graph neural networks. In: CVPR, pp. 7912–7921 (2019)
[13]
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: CVPR, pp. 12026–12035 (2019)
[14]
Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: CVPR, pp. 1227–1236 (2019)
[15]
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI, pp. 4263–4270 (2017)
[16]
Song, Y.F., Zhang, Z., Wang, L.: Richly activated graph convolutional network for action recognition with incomplete skeletons. In: ICIP (2019)
[17]
Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: CVPR, pp. 5323–5332 (2018)
[18]
Thakkar, K.C., Narayanan, P.J.: Part-based graph convolutional network for action recognition. In: BMVC, pp. 1–13 (2018)
[19]
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: CVPR, pp. 6450–6459 (2018)
[20]
Tu Z, Li H, Zhang D, Dauwels J, Li B, and Yuan J Action-stage emphasized spatiotemporal VLAD for video action recognition IEEE Trans. Image Process. 2019 28 6 2799-2812
[21]
Wang L et al. Temporal segment networks for action recognition in videos IEEE Trans. Pattern Anal. Mach. Intell. 2018 41 11 2740-2755
[22]
Yan, S., Xiong, Y., Lin, D., Tang, X.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI, pp. 7444–7452 (2018)
[23]
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: ICCV, pp. 2136–2145 (2017)
[24]
Zhang, X., Xu, C., Tian, X., Tao, D.: Graph edge convolutional neural networks for skeleton based action recognition. arXiv preprint arXiv:1805.06184 (2018)

Cited By

View all
  • (2024)M2BeatsProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/102(920-928)Online publication date: 3-Aug-2024
  • (2024)A Spatio-Temporal Multi-Subgraph Convolutional Network for Parkinson's Disease Detection Using Gait DataProceedings of the 2024 3rd International Conference on Artificial Intelligence and Intelligent Information Processing10.1145/3707292.3707338(23-32)Online publication date: 25-Oct-2024
  • (2024)StegoType: Surface Typing from Egocentric CamerasAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3672539.3686762(1-14)Online publication date: 13-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Pattern Recognition and Computer Vision: Third Chinese Conference, PRCV 2020, Nanjing, China, October 16–18, 2020, Proceedings, Part II
Oct 2020
706 pages
ISBN:978-3-030-60638-1
DOI:10.1007/978-3-030-60639-8
  • Editors:
  • Yuxin Peng,
  • Qingshan Liu,
  • Huchuan Lu,
  • Zhenan Sun,
  • Chenglin Liu,
  • Xilin Chen,
  • Hongbin Zha,
  • Jian Yang

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 16 October 2020

Author Tags

  1. Human action recognition
  2. Graph convolution
  3. LSTM

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)M2BeatsProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/102(920-928)Online publication date: 3-Aug-2024
  • (2024)A Spatio-Temporal Multi-Subgraph Convolutional Network for Parkinson's Disease Detection Using Gait DataProceedings of the 2024 3rd International Conference on Artificial Intelligence and Intelligent Information Processing10.1145/3707292.3707338(23-32)Online publication date: 25-Oct-2024
  • (2024)StegoType: Surface Typing from Egocentric CamerasAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3672539.3686762(1-14)Online publication date: 13-Oct-2024
  • (2024)Multi-Modality Co-Learning for Efficient Skeleton-based Action RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681015(4909-4918)Online publication date: 28-Oct-2024
  • (2024)Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed TransformerProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681009(4660-4669)Online publication date: 28-Oct-2024
  • (2024)StegoType: Surface Typing from Egocentric CamerasProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676343(1-14)Online publication date: 13-Oct-2024
  • (2024)SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose TrackingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676341(1-13)Online publication date: 13-Oct-2024
  • (2024)RPMixer: Shaking Up Time Series Forecasting with Random Projections for Large Spatial-Temporal DataProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671881(3919-3930)Online publication date: 25-Aug-2024
  • (2024)Unsupervised Motion Retargeting for Human-Robot ImitationCompanion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610978.3640588(204-208)Online publication date: 11-Mar-2024
  • (2024)Multi‐scale skeleton simplification graph convolutional network for skeleton‐based action recognitionIET Computer Vision10.1049/cvi2.1230018:7(992-1003)Online publication date: 31-Oct-2024
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media