research-article

Graph convolutional networks for skeleton-based action recognition with LSTM using tool-information

Authors:

Young Min Seo and

Yong Suk ChoiAuthors Info & Claims

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

March 2021

Pages 986 - 993

https://doi.org/10.1145/3412841.3441974

Published: 22 April 2021 Publication History

Abstract

Skeleton-based action recognition using a Graph Convolutional Network (GCN) achieved remarkable results by reconstructing a person's skeleton into a graph. However, there are fundamental problems with existing GCN-based models. Generally, human action is greatly affected by the tools used, but in traditional GCN models, action recognition is performed without using the information in the tools. For example, a person holding a pen is limited to the act of writing. A person holding a ball is limited to the action of throwing or receiving the ball. In other words, it is an inaccurate method to judge action only by recognizing the movements of bones used in existing methods. Therefore, a graph was made to reflect the information on the tool. We identify tool-information using the LSTM classifier and propose GCNs for skeleton-based action recognition with LSTM using tool-information. Additionally, we apply a new graph construction and utilize the Learnable Adjacency Matrix. The proposed method is applied to the existing model and comparative evaluation was performed between the model with and without the applied algorithm. The evaluations showed consistent performance improvement, and the proposed method applied to the baseline models achieved state-of-the-art performance.

References

[1]

Congqi Cao, Cuiling Lan, Yifan Zhang, Wenjun Zeng, Hanqing Lu, and Yanning Zhang. 2018. Skeleton-based action recognition with gated convolutional neural networks. IEEE Transactions on Circuits and Systems for Video Technology 29, 11 (2018), 3247--3257.

Digital Library

[2]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7291--7299.

[3]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.

[4]

Ke Cheng, Yifan Zhang, Xiangyu He, Weihan Chen, Jian Cheng, and Hanqing Lu. 2020. Skeleton-Based Action Recognition With Shift Graph Convolutional Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 183--192.

[5]

Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1110--1118.

[6]

Basura Fernando, Efstratios Gavves, Jose M Oramas, Amir Ghodrati, and Tinne Tuytelaars. 2015. Modeling video evolution for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5378--5387.

[7]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[8]

Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017. A new representation of skeleton sequences for 3d action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3288--3297.

[9]

Tae Soo Kim and Austin Reiter. 2017. Interpretable 3d human action analysis with temporal convolutional networks. In 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, 1623--1631.

[10]

Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2017. Skeleton-based action recognition with convolutional neural networks. In 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 597--600.

[11]

Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3595--3603.

[12]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.

[13]

Jun Liu, Amir Shahroudy, Mauricio Lisboa Perez, Gang Wang, Ling-Yu Duan, and Alex Kot Chichung. 2019. Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE transactions on pattern analysis and machine intelligence (2019).

[14]

Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-temporal lstm with trust gates for 3d human action recognition. In European conference on computer vision. Springer, 816--833.

[15]

Mengyuan Liu, Hong Liu, and Chen Chen. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68 (2017), 346--362.

Digital Library

[16]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).

[17]

Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7263--7271.

[18]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115, 3 (2015), 211--252.

[19]

Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1010--1019.

[20]

Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12026--12035.

[21]

Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, and Tieniu Tan. 2019. An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1227--1236.

[22]

Chenyang Si, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. 2018. Skeleton-based action recognition with spatial reasoning and temporal stack learning. In Proceedings of the European Conference on Computer Vision (ECCV). 103--118.

Digital Library

[23]

Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems. 568--576.

[24]

Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2016. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. arXiv preprint arXiv:1611.06067 (2016).

[25]

Yansong Tang, Yi Tian,Jiwen Lu, Peiyang Li, and Jie Zhou. 2018. Deep progressive reinforcement learning for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5323--5332.

[26]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489--4497.

Digital Library

[27]

Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition. 588--595.

Digital Library

[28]

Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the IEEE international conference on computer vision. 3551--3558.

Digital Library

[29]

Yali Wang, Lei Zhou, and Yu Qiao. 2018. Temporal hallucinating for action recognition with few still images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5314--5322.

[30]

Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1801.07455 (2018).

[31]

Yifan Zhang, Congqi Cao, Jian Cheng, and Hanqing Lu. 2018. Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Transactions on Multimedia 20, 5 (2018), 1038--1050.

[32]

Wu Zheng, Lin Li, Zhaoxiang Zhang, Yan Huang, and Liang Wang. 2019. Relational network for skeleton-based action recognition. In 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 826--831.

Cited By

Chowdhury SSany MAhamed MDas SBadal FDas PTasneem ZHasan MIslam MAli MAbhi SIslam MSarker S(2023)A State-of-the-Art Computer Vision Adopting Non-Euclidean Deep-Learning ModelsInternational Journal of Intelligent Systems10.1155/2023/86746412023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/8674641
Qi YHu JZhuang LPei X(2022)Semantic-guided multi-scale human skeleton action recognitionApplied Intelligence10.1007/s10489-022-03968-5Online publication date: 12-Aug-2022
https://doi.org/10.1007/s10489-022-03968-5

Index Terms

Graph convolutional networks for skeleton-based action recognition with LSTM using tool-information
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
      2. Computer vision tasks
        Activity recognition and understanding

Recommendations

Forward-reverse adaptive graph convolutional networks for skeleton-based action recognition
Abstract
In the latest research based on skeleton data, the graph convolutional networks (GCN) based methods have achieved excellent performance on action recognition tasks. Existing GCN-based methods commonly adopt the strategy of fusing the ...
Read More
Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure
Abstract
Skeleton-based action recognition has recently achieved much attention since they can robustly convey the action information. Recently, many studies have shown that graph convolutional networks (GCNs), which generalize CNNs to more generic non-...
Read More
A comparative review of graph convolutional networks for human skeleton-based action recognition
Abstract
Human action recognition is one of the hottest topics in the research field, so there are many relevant review papers illustrating the multi-modality of data, the selection of feature vectors, and the pros and cons of classification networks. With ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

March 2021

2075 pages

ISBN:9781450381048

DOI:10.1145/3412841

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Jiman Hong
Soongsil University, South Korea
,
Program Chairs:
Alessio Bechini
University of Pisa, Italy
,
Eunjee Song
Baylor University

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Institute of Information & communications Technology Planning & Evaluation (IITP) by the Korea government (MSIT)
National Research Foundation of Korea (NRF) by the Korea government (MSIT)

Conference

SAC '21

Sponsor:

SIGAPP

SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing

March 22 - 26, 2021

Virtual Event, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
132
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)4

Other Metrics

View Author Metrics

Citations

Cited By

Chowdhury SSany MAhamed MDas SBadal FDas PTasneem ZHasan MIslam MAli MAbhi SIslam MSarker S(2023)A State-of-the-Art Computer Vision Adopting Non-Euclidean Deep-Learning ModelsInternational Journal of Intelligent Systems10.1155/2023/86746412023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/8674641
Qi YHu JZhuang LPei X(2022)Semantic-guided multi-scale human skeleton action recognitionApplied Intelligence10.1007/s10489-022-03968-5Online publication date: 12-Aug-2022
https://doi.org/10.1007/s10489-022-03968-5

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents