Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3412841.3441974acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Graph convolutional networks for skeleton-based action recognition with LSTM using tool-information

Published: 22 April 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Skeleton-based action recognition using a Graph Convolutional Network (GCN) achieved remarkable results by reconstructing a person's skeleton into a graph. However, there are fundamental problems with existing GCN-based models. Generally, human action is greatly affected by the tools used, but in traditional GCN models, action recognition is performed without using the information in the tools. For example, a person holding a pen is limited to the act of writing. A person holding a ball is limited to the action of throwing or receiving the ball. In other words, it is an inaccurate method to judge action only by recognizing the movements of bones used in existing methods. Therefore, a graph was made to reflect the information on the tool. We identify tool-information using the LSTM classifier and propose GCNs for skeleton-based action recognition with LSTM using tool-information. Additionally, we apply a new graph construction and utilize the Learnable Adjacency Matrix. The proposed method is applied to the existing model and comparative evaluation was performed between the model with and without the applied algorithm. The evaluations showed consistent performance improvement, and the proposed method applied to the baseline models achieved state-of-the-art performance.

    References

    [1]
    Congqi Cao, Cuiling Lan, Yifan Zhang, Wenjun Zeng, Hanqing Lu, and Yanning Zhang. 2018. Skeleton-based action recognition with gated convolutional neural networks. IEEE Transactions on Circuits and Systems for Video Technology 29, 11 (2018), 3247--3257.
    [2]
    Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7291--7299.
    [3]
    Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.
    [4]
    Ke Cheng, Yifan Zhang, Xiangyu He, Weihan Chen, Jian Cheng, and Hanqing Lu. 2020. Skeleton-Based Action Recognition With Shift Graph Convolutional Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 183--192.
    [5]
    Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1110--1118.
    [6]
    Basura Fernando, Efstratios Gavves, Jose M Oramas, Amir Ghodrati, and Tinne Tuytelaars. 2015. Modeling video evolution for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5378--5387.
    [7]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
    [8]
    Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. 2017. A new representation of skeleton sequences for 3d action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3288--3297.
    [9]
    Tae Soo Kim and Austin Reiter. 2017. Interpretable 3d human action analysis with temporal convolutional networks. In 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, 1623--1631.
    [10]
    Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2017. Skeleton-based action recognition with convolutional neural networks. In 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 597--600.
    [11]
    Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2019. Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3595--3603.
    [12]
    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.
    [13]
    Jun Liu, Amir Shahroudy, Mauricio Lisboa Perez, Gang Wang, Ling-Yu Duan, and Alex Kot Chichung. 2019. Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE transactions on pattern analysis and machine intelligence (2019).
    [14]
    Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-temporal lstm with trust gates for 3d human action recognition. In European conference on computer vision. Springer, 816--833.
    [15]
    Mengyuan Liu, Hong Liu, and Chen Chen. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68 (2017), 346--362.
    [16]
    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).
    [17]
    Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7263--7271.
    [18]
    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115, 3 (2015), 211--252.
    [19]
    Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1010--1019.
    [20]
    Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12026--12035.
    [21]
    Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, and Tieniu Tan. 2019. An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1227--1236.
    [22]
    Chenyang Si, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. 2018. Skeleton-based action recognition with spatial reasoning and temporal stack learning. In Proceedings of the European Conference on Computer Vision (ECCV). 103--118.
    [23]
    Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems. 568--576.
    [24]
    Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2016. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. arXiv preprint arXiv:1611.06067 (2016).
    [25]
    Yansong Tang, Yi Tian,Jiwen Lu, Peiyang Li, and Jie Zhou. 2018. Deep progressive reinforcement learning for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5323--5332.
    [26]
    Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489--4497.
    [27]
    Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition. 588--595.
    [28]
    Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the IEEE international conference on computer vision. 3551--3558.
    [29]
    Yali Wang, Lei Zhou, and Yu Qiao. 2018. Temporal hallucinating for action recognition with few still images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5314--5322.
    [30]
    Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1801.07455 (2018).
    [31]
    Yifan Zhang, Congqi Cao, Jian Cheng, and Hanqing Lu. 2018. Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Transactions on Multimedia 20, 5 (2018), 1038--1050.
    [32]
    Wu Zheng, Lin Li, Zhaoxiang Zhang, Yan Huang, and Liang Wang. 2019. Relational network for skeleton-based action recognition. In 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 826--831.

    Cited By

    View all
    • (2023)A State-of-the-Art Computer Vision Adopting Non-Euclidean Deep-Learning ModelsInternational Journal of Intelligent Systems10.1155/2023/86746412023Online publication date: 1-Jan-2023
    • (2022)Semantic-guided multi-scale human skeleton action recognitionApplied Intelligence10.1007/s10489-022-03968-5Online publication date: 12-Aug-2022

    Index Terms

    1. Graph convolutional networks for skeleton-based action recognition with LSTM using tool-information

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing
        March 2021
        2075 pages
        ISBN:9781450381048
        DOI:10.1145/3412841
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 22 April 2021

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. computer vision
        2. graph convolutional networks
        3. long short-term memory models
        4. object detection
        5. skeleton-based action recognition

        Qualifiers

        • Research-article

        Funding Sources

        • Institute of Information & communications Technology Planning & Evaluation (IITP) by the Korea government (MSIT)
        • National Research Foundation of Korea (NRF) by the Korea government (MSIT)

        Conference

        SAC '21
        Sponsor:
        SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing
        March 22 - 26, 2021
        Virtual Event, Republic of Korea

        Acceptance Rates

        Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)34
        • Downloads (Last 6 weeks)4

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)A State-of-the-Art Computer Vision Adopting Non-Euclidean Deep-Learning ModelsInternational Journal of Intelligent Systems10.1155/2023/86746412023Online publication date: 1-Jan-2023
        • (2022)Semantic-guided multi-scale human skeleton action recognitionApplied Intelligence10.1007/s10489-022-03968-5Online publication date: 12-Aug-2022

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media