research-article

Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure

Authors:

Yongjian Sheng,

Yongjian JuAuthors Info & Claims

Multimedia Tools and Applications, Volume 80, Issue 19

Pages 29139 - 29162

https://doi.org/10.1007/s11042-021-11136-z

Published: 01 August 2021 Publication History

Abstract

Skeleton-based action recognition has recently achieved much attention since they can robustly convey the action information. Recently, many studies have shown that graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, are more exactly extracts spatial feature. Nevertheless, how to effectively extract global temporal features is still a challenge. In this work, firstly, a unique feature named temporal action graph is designed. It first attempts to express timing relationship with the form of graph. Secondly, temporal adaptive graph convolution structure (T-AGCN) are proposed. Through generating global adjacency matrix for temporal action graph, it can flexibly extract global temporal features in temporal dynamics. Thirdly, we further propose a novel model named spatial-temporal adaptive graph convolutional network (ST-AGCN) for skeletons-based action recognition to extract spatial-temporal feature and improve action recognition accuracy. ST-AGCN combines T-AGCN with spatial graph convolution to make up for the shortage of T-AGCN for spatial structure. Besides, ST-AGCN uses dual features to form a two-stream network which is able to further improve action recognition accuracy for hard-to-recognition sample. Finally, comparsive experiments on the two skeleton-based action recognition datasets, NTU-RGBD and SBU, demonstrate that T-AGCN and temporal action graph can effective explore global temporal information and ST-AGCN achieves certain improvement of recognition accuracy on both datasets.

References

[1]

Alex G Long short-term memory Neural Comput 1997 9 8 1735-1780

[2]

Amir S, Liu J, Ng T, Wang G (2016) NTU RGB+D: a large scale dataset for 3d human activity analysis. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1010–1019

[3]

Cao Z, Simon T, Wei S, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1302–1310

[4]

Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 4724–4733

[5]

Chen Y, Wang L, Li C, Hou Y, and Li W ConvNets-based action recognition from skeleton motion maps Multimed Tools Appl 2020 79 1707-1725

[6]

Daniel W, Remi R, and Edmond B A survey of vision-based methods for action representation, segmentation and recognition Computer Vision & Image Understanding 2011 115 2 224-241

[7]

Diao Z, Wang X, Zhang D, Liu Y, Kun K, He S (2019) Dynamic spatial-temporal graph convolutional neural networks for traffic forecasting. In: AAAI conference on artificial intelligence (AAAI), Honolulu, Hawaii, p 890–897

[8]

Ding W, Liu K, Fu X, and Cheng F Profile hmms for skeleton-based human action recognition Image Communication 2016 42 C 109-119

[9]

Dong X, Thanou D, Rabbat M, and Frossard P Learning graphs from data: a signal representation perspective IEEE Signal Process Mag 2019 36 3 44-63

[10]

Du Y, Wang W, and Wang L Hierarchical recurrent neural network for skeleton based action recognition IEEE Conference on computer vision and pattern recognition (CVPR) 2015 9 1110-1118

[11]

Feng J, Zhang S, and Xiao J Explorations of skeleton features for LSTM-based action recognition Multimed Tools Appl 2019 78 591-603

[12]

G-H G, Kim TK (2017) Transition forests: learning discriminative temporal transitions for action recognition and detection. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 407–415

[13]

Johansson G Visual perception of biological motion and a model for its analysis Percept Psychophys 1973 14 201-211

[14]

Karen S, Andrew Z (2014) Two-stream convolutional networks for action recognition in videos. In: International conference on neural information processing systems (NIPS), Montreal, QC, Canada, p 568–576

[15]

Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Viola F, Green T, Back T, Natsev P, Suleyman M, Zisserman A (2017) The kinetics human action video dataset arXiv preprint arXiv: 1705.06950

[16]

Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 4570–4579

[17]

Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: IEEE international conference on computer vision (ICCV), IEEE Computer Society, Venice, Italy, p 1012–1020

[18]

Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: IEEE international conference on multimedia & expo workshops (ICMEW), IEEE Computer Society, p 583–587

[19]

Li B, He M, Dai Y, Cheng X, and Chen Y 3D skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated CNN Multimed Tools Appl 2018 77 22901-22921

[20]

Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: twenty-seventh international joint conference on artificial intelligence (IJCAI), p 786-792

[21]

Li J, Xie X, and Pan Q Shi G (2020) SGM-net: skeleton-guided multimodal network for action recognition Pattern Recogn 2020 104 107356

[22]

Lie W, Le A, Lin G (2018) Human fall-down event detection based on 2D skeletons and deep learning approach. In: 2018 international workshop on advanced image technology (IWAIT). IEEE Computer Society, pp 1–4

[23]

Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision (ECCV). Springer, pp 816–833

[24]

Liu M, Liu H, and Chen C Enhanced skeleton visualization for view invariant human action recognition Pattern Recogn 2017 68 346-362

[25]

Liu H, Tu J, Liu M (2017) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv: 1705.08106

[26]

Liu H, Tu J, Liu M (2017) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv: 1705.08106

[27]

Mihai T, Mihai N, and Adina M Spatia-temporal features in action recognition using 3d skeletal joints Sensors 2019 19 2 423-442

[28]

Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS Workshop Autodiff Decision Program Chairs

[29]

Poppe R A survey on vision-based human action recognition Image Vis Comput 2010 28 6 976-990

[30]

Kim T. S, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE Computer Society, p 1623–1631

[31]

Salah A, Lepri B (2011) Second international workshop on human behavior understanding: inducing behavioral change. In: International conference on ambient intelligence (AmISEmeH). Springer-Verlag, pp 376–377

[32]

Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, Long Beach, CA, USA, p 7904–7913

[33]

Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 12018–12027

[34]

Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1227–1236

[35]

Song S, Lan C, Xing J, Zeng W, and Liu J Spatio-temporal attention-based LSTM networks for 3d action recognition and detection IEEE Trans Image Process 2018 27 7 3459-3471

[36]

Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 5686–5696

[37]

Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 5323–5332

[38]

Varol G, Laptev I, and Schmid C Long-term temporal convolutions for action recognition IEEE Trans Pattern Anal Mach Intell 2018 40 6 1510-1517

[39]

Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision (ECCV). Springer, pp 20–36

[40]

Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 7794–7803

[41]

Weng J, Liu M, Jiang X, Yuan J (2018) Deformable pose traversal convolution for 3d action and gesture recognition. In: European conference on computer vision (ECCV). Springer, pp 142–157

[42]

Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI conference on artificial intelligence (AAAI), New Orleans, Louisiana, USA, p 7444–7452

[43]

Yu G, Liu Z, Yuan J (2014) Discriminative order let mining for real-time recognition of human-object interaction. In: 2014 Asian conference on computer vision (ACCV), Singapore, 2014, p 50–65

[44]

Zhang P, Lan C, Xing J, Zeng W, Xue J, and Zheng N View adaptive neural networks for high performance skeleton-based human action recognition IEEE Trans Pattern Anal Mach Intell 2019 41 8 1963-1978

[45]

Zhang P, Lan C, Xing J, Zeng W, Xue J, and Zheng N View adaptive neural networks for high performance skeleton-based human action recognition IEEE Trans Pattern Anal Mach Intell 2019 41 8 1963-1978

[46]

Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2019) Semi-supervised classification with graph convolution networks arXiv preprint arXiv: 1904.01189

[47]

Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2018) Graph neural networks: a review of methods and applications arXiv preprint arXiv: 1812.08434

[48]

Zhu W, Lan C, Xing J, Zheng W, Li Y, Shen L (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI conference on artificial intelligence (AAAI). Phoenix, pp 3697–3603

Cited By

Yang PWang QChen HWu Z(2023)Position‐aware spatio‐temporal graph convolutional networks for skeleton‐based action recognitionIET Computer Vision10.1049/cvi2.1222317:7(844-854)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1049/cvi2.12223
Ashrafi SShokouhi SAyatollahi A(2023)Still image action recognition based on interactions between joints and objectsMultimedia Tools and Applications10.1007/s11042-023-14350-z82:17(25945-25971)Online publication date: 10-Jan-2023
https://dl.acm.org/doi/10.1007/s11042-023-14350-z
Li TRen LYang FDang Z(2022)Analysis of Human Information Recognition Model in Sports Based on Radial Basis Fuzzy Neural NetworkComputational Intelligence and Neuroscience10.1155/2022/56250062022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/5625006
Show More Cited By

Recommendations

Temporal-Aware Graph Convolution Network for Skeleton-based Action Recognition
ICCCV '21: Proceedings of the 4th International Conference on Control and Computer Vision

Graph convolutions networks (GCN) have drawn attention for skeleton-based action recognition because a skeleton with joints and bones can be naturally regarded as a graph structure. However, the existing methods are limited in temporal sequence modeling ...
Skeleton-based action recognition based on multidimensional adaptive dynamic temporal graph convolutional network
Abstract
Due to the superior capability to process the topology of graphs, graph convolutional networks are gaining popularity in the field of action recognition based on skeleton data. However, it remains difficult to effectively extract features with ...
Focus on temporal graph convolutional networks with unified attention for skeleton-based action recognition
Abstract
Graph convolutional networks (GCN) have received more and more attention in skeleton-based action recognition. Many existing GCN models pay more attention to spatial information and ignore temporal information, but the completion of actions must ...

Comments

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications

Multimedia Tools and Applications Volume 80, Issue 19

Aug 2021

1365 pages

ISSN:1380-7501

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 August 2021

Accepted: 03 June 2021

Revision received: 02 January 2021

Received: 15 August 2020

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Six Talent Peaks Project in Jiangsu Province
Postgraduate Research & Practice Innovation Program of Jiangsu Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang PWang QChen HWu Z(2023)Position‐aware spatio‐temporal graph convolutional networks for skeleton‐based action recognitionIET Computer Vision10.1049/cvi2.1222317:7(844-854)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1049/cvi2.12223
Ashrafi SShokouhi SAyatollahi A(2023)Still image action recognition based on interactions between joints and objectsMultimedia Tools and Applications10.1007/s11042-023-14350-z82:17(25945-25971)Online publication date: 10-Jan-2023
https://dl.acm.org/doi/10.1007/s11042-023-14350-z
Li TRen LYang FDang Z(2022)Analysis of Human Information Recognition Model in Sports Based on Radial Basis Fuzzy Neural NetworkComputational Intelligence and Neuroscience10.1155/2022/56250062022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/5625006
Tsai MChen C(2022)Enhancing the accuracy of a human emotion recognition method using spatial temporal graph convolutional networksMultimedia Tools and Applications10.1007/s11042-022-13653-x82:8(11285-11303)Online publication date: 12-Aug-2022
https://dl.acm.org/doi/10.1007/s11042-022-13653-x
Qi YHu JZhuang LPei X(2022)Semantic-guided multi-scale human skeleton action recognitionApplied Intelligence10.1007/s10489-022-03968-553:9(9763-9778)Online publication date: 12-Aug-2022
https://dl.acm.org/doi/10.1007/s10489-022-03968-5

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents