Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure

Published: 01 August 2021 Publication History

Abstract

Skeleton-based action recognition has recently achieved much attention since they can robustly convey the action information. Recently, many studies have shown that graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, are more exactly extracts spatial feature. Nevertheless, how to effectively extract global temporal features is still a challenge. In this work, firstly, a unique feature named temporal action graph is designed. It first attempts to express timing relationship with the form of graph. Secondly, temporal adaptive graph convolution structure (T-AGCN) are proposed. Through generating global adjacency matrix for temporal action graph, it can flexibly extract global temporal features in temporal dynamics. Thirdly, we further propose a novel model named spatial-temporal adaptive graph convolutional network (ST-AGCN) for skeletons-based action recognition to extract spatial-temporal feature and improve action recognition accuracy. ST-AGCN combines T-AGCN with spatial graph convolution to make up for the shortage of T-AGCN for spatial structure. Besides, ST-AGCN uses dual features to form a two-stream network which is able to further improve action recognition accuracy for hard-to-recognition sample. Finally, comparsive experiments on the two skeleton-based action recognition datasets, NTU-RGBD and SBU, demonstrate that T-AGCN and temporal action graph can effective explore global temporal information and ST-AGCN achieves certain improvement of recognition accuracy on both datasets.

References

[1]
Alex G Long short-term memory Neural Comput 1997 9 8 1735-1780
[2]
Amir S, Liu J, Ng T, Wang G (2016) NTU RGB+D: a large scale dataset for 3d human activity analysis. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1010–1019
[3]
Cao Z, Simon T, Wei S, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1302–1310
[4]
Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 4724–4733
[5]
Chen Y, Wang L, Li C, Hou Y, and Li W ConvNets-based action recognition from skeleton motion maps Multimed Tools Appl 2020 79 1707-1725
[6]
Daniel W, Remi R, and Edmond B A survey of vision-based methods for action representation, segmentation and recognition Computer Vision & Image Understanding 2011 115 2 224-241
[7]
Diao Z, Wang X, Zhang D, Liu Y, Kun K, He S (2019) Dynamic spatial-temporal graph convolutional neural networks for traffic forecasting. In: AAAI conference on artificial intelligence (AAAI), Honolulu, Hawaii, p 890–897
[8]
Ding W, Liu K, Fu X, and Cheng F Profile hmms for skeleton-based human action recognition Image Communication 2016 42 C 109-119
[9]
Dong X, Thanou D, Rabbat M, and Frossard P Learning graphs from data: a signal representation perspective IEEE Signal Process Mag 2019 36 3 44-63
[10]
Du Y, Wang W, and Wang L Hierarchical recurrent neural network for skeleton based action recognition IEEE Conference on computer vision and pattern recognition (CVPR) 2015 9 1110-1118
[11]
Feng J, Zhang S, and Xiao J Explorations of skeleton features for LSTM-based action recognition Multimed Tools Appl 2019 78 591-603
[12]
G-H G, Kim TK (2017) Transition forests: learning discriminative temporal transitions for action recognition and detection. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 407–415
[13]
Johansson G Visual perception of biological motion and a model for its analysis Percept Psychophys 1973 14 201-211
[14]
Karen S, Andrew Z (2014) Two-stream convolutional networks for action recognition in videos. In: International conference on neural information processing systems (NIPS), Montreal, QC, Canada, p 568–576
[15]
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Viola F, Green T, Back T, Natsev P, Suleyman M, Zisserman A (2017) The kinetics human action video dataset arXiv preprint arXiv: 1705.06950
[16]
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 4570–4579
[17]
Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: IEEE international conference on computer vision (ICCV), IEEE Computer Society, Venice, Italy, p 1012–1020
[18]
Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: IEEE international conference on multimedia & expo workshops (ICMEW), IEEE Computer Society, p 583–587
[19]
Li B, He M, Dai Y, Cheng X, and Chen Y 3D skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated CNN Multimed Tools Appl 2018 77 22901-22921
[20]
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: twenty-seventh international joint conference on artificial intelligence (IJCAI), p 786-792
[21]
Li J, Xie X, and Pan Q Shi G (2020) SGM-net: skeleton-guided multimodal network for action recognition Pattern Recogn 2020 104 107356
[22]
Lie W, Le A, Lin G (2018) Human fall-down event detection based on 2D skeletons and deep learning approach. In: 2018 international workshop on advanced image technology (IWAIT). IEEE Computer Society, pp 1–4
[23]
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision (ECCV). Springer, pp 816–833
[24]
Liu M, Liu H, and Chen C Enhanced skeleton visualization for view invariant human action recognition Pattern Recogn 2017 68 346-362
[25]
Liu H, Tu J, Liu M (2017) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv: 1705.08106
[26]
Liu H, Tu J, Liu M (2017) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv: 1705.08106
[27]
Mihai T, Mihai N, and Adina M Spatia-temporal features in action recognition using 3d skeletal joints Sensors 2019 19 2 423-442
[28]
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS Workshop Autodiff Decision Program Chairs
[29]
Poppe R A survey on vision-based human action recognition Image Vis Comput 2010 28 6 976-990
[30]
Kim T. S, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE Computer Society, p 1623–1631
[31]
Salah A, Lepri B (2011) Second international workshop on human behavior understanding: inducing behavioral change. In: International conference on ambient intelligence (AmISEmeH). Springer-Verlag, pp 376–377
[32]
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, Long Beach, CA, USA, p 7904–7913
[33]
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 12018–12027
[34]
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1227–1236
[35]
Song S, Lan C, Xing J, Zeng W, and Liu J Spatio-temporal attention-based LSTM networks for 3d action recognition and detection IEEE Trans Image Process 2018 27 7 3459-3471
[36]
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 5686–5696
[37]
Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 5323–5332
[38]
Varol G, Laptev I, and Schmid C Long-term temporal convolutions for action recognition IEEE Trans Pattern Anal Mach Intell 2018 40 6 1510-1517
[39]
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision (ECCV). Springer, pp 20–36
[40]
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 7794–7803
[41]
Weng J, Liu M, Jiang X, Yuan J (2018) Deformable pose traversal convolution for 3d action and gesture recognition. In: European conference on computer vision (ECCV). Springer, pp 142–157
[42]
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI conference on artificial intelligence (AAAI), New Orleans, Louisiana, USA, p 7444–7452
[43]
Yu G, Liu Z, Yuan J (2014) Discriminative order let mining for real-time recognition of human-object interaction. In: 2014 Asian conference on computer vision (ACCV), Singapore, 2014, p 50–65
[44]
Zhang P, Lan C, Xing J, Zeng W, Xue J, and Zheng N View adaptive neural networks for high performance skeleton-based human action recognition IEEE Trans Pattern Anal Mach Intell 2019 41 8 1963-1978
[45]
Zhang P, Lan C, Xing J, Zeng W, Xue J, and Zheng N View adaptive neural networks for high performance skeleton-based human action recognition IEEE Trans Pattern Anal Mach Intell 2019 41 8 1963-1978
[46]
Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2019) Semi-supervised classification with graph convolution networks arXiv preprint arXiv: 1904.01189
[47]
Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2018) Graph neural networks: a review of methods and applications arXiv preprint arXiv: 1812.08434
[48]
Zhu W, Lan C, Xing J, Zheng W, Li Y, Shen L (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI conference on artificial intelligence (AAAI). Phoenix, pp 3697–3603

Cited By

View all
  • (2023)Position‐aware spatio‐temporal graph convolutional networks for skeleton‐based action recognitionIET Computer Vision10.1049/cvi2.1222317:7(844-854)Online publication date: 13-Jul-2023
  • (2023)Still image action recognition based on interactions between joints and objectsMultimedia Tools and Applications10.1007/s11042-023-14350-z82:17(25945-25971)Online publication date: 10-Jan-2023
  • (2022)Analysis of Human Information Recognition Model in Sports Based on Radial Basis Fuzzy Neural NetworkComputational Intelligence and Neuroscience10.1155/2022/56250062022Online publication date: 1-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications
Multimedia Tools and Applications  Volume 80, Issue 19
Aug 2021
1365 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 August 2021
Accepted: 03 June 2021
Revision received: 02 January 2021
Received: 15 August 2020

Author Tags

  1. Skeleton-based action recognition
  2. GCN
  3. Temporal action graph
  4. Temporal adaptive graph convolution structure
  5. Spatia-temporal adaptive graph convolution network

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Position‐aware spatio‐temporal graph convolutional networks for skeleton‐based action recognitionIET Computer Vision10.1049/cvi2.1222317:7(844-854)Online publication date: 13-Jul-2023
  • (2023)Still image action recognition based on interactions between joints and objectsMultimedia Tools and Applications10.1007/s11042-023-14350-z82:17(25945-25971)Online publication date: 10-Jan-2023
  • (2022)Analysis of Human Information Recognition Model in Sports Based on Radial Basis Fuzzy Neural NetworkComputational Intelligence and Neuroscience10.1155/2022/56250062022Online publication date: 1-Jan-2022
  • (2022)Enhancing the accuracy of a human emotion recognition method using spatial temporal graph convolutional networksMultimedia Tools and Applications10.1007/s11042-022-13653-x82:8(11285-11303)Online publication date: 12-Aug-2022
  • (2022)Semantic-guided multi-scale human skeleton action recognitionApplied Intelligence10.1007/s10489-022-03968-553:9(9763-9778)Online publication date: 12-Aug-2022

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media