Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Skeleton Action Recognition Based on Spatial-Temporal Dynamic Topological Representation

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14866))

Included in the following conference series:

  • 421 Accesses

Abstract

Solving the problem of spatial-temporal invariance has always been a hot research topic in the field of skeleton-based action recognition. However, most of the current methods only solve the invariant problem of spatial dimension without considering both temporal and spatial dimensions together. To address above issue, we propose spatial-temporal dynamic topological representations (ST-DTR) to dynamically learn features of spatial-temporal nodes and topological relation, and employ aggregated module to effectively combine spatial-temporal features. At the same time, we adopt the operation of adaptive selection kernel in the temporal dimension for effective spatial-temporal modeling. Specifically, the spatial-temporal joint attention mechanism is introduced to enhance the feature representation and obtain the joints with the plenty information from key frames in the whole skeleton sequence for improving the network identification performance. The effectiveness of the proposed method is evaluated on three standard datasets NW-UCLA, NTU RGB+D 60 and NTU RGB+D 120. Extensive experiments show that our proposed method outperforms some state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Lin, J., Gan, C., Han, S.: Tsm: temporal shift module for efficient video understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7083–7093 (2019)

    Google Scholar 

  2. Rahmani, H., Mian, A.: 3D action recognition from novel viewpoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1506–1515(2016)

    Google Scholar 

  3. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941(2016)

    Google Scholar 

  4. Wang, L., Xiong, Y., Wang, Z., et al.:Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, pp. 20–36. Springer, Cham (2016)

    Google Scholar 

  5. Wang, L., Huang, B., Zhao, Z., et al.: Videomae v2: scaling video masked autoencoders with dual masking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14549–14560 (2023)

    Google Scholar 

  6. Cao, Z., Hidalgo Martinez, G., Simon, T., et al.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)

    Google Scholar 

  7. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018)

    Google Scholar 

  8. Shi, L., Zhang, Y., Cheng, J., et al.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035(2019)

    Google Scholar 

  9. Liu, Z., Zhang, H., Chen, Z., et al.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152 (2020)

    Google Scholar 

  10. Zhang, P., Lan, C., Zeng, W., et al.: Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1112–1121 (2020)

    Google Scholar 

  11. Si, C., Chen, W., Wang, W., et al.: An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1227–1236 (2019)

    Google Scholar 

  12. Chen, Y., Zhang, Z., Yuan, C., et al.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13359–13368 (2021)

    Google Scholar 

  13. Du, Y., Fu, Y., Wang, L.: Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR Asian Conference on Pattern Recognition, pp. 579–583 (2015)

    Google Scholar 

  14. Li, C., Zhong, Q., Xie, D., et al.: Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv preprint arXiv:1804.06055 (2018)

  15. Caetano, C., Sena, J., Brémond, F., et al.: Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 1–8 (2019)

    Google Scholar 

  16. Li, Y., Chen, Y., Dai, X., et al.: Revisiting dynamic convolution via matrix decomposition. arXiv preprint arXiv: 2103.08756 (2021)

    Google Scholar 

  17. Liu, Z., Wang, L., Wu, W., et al.: Tam: temporal adaptive module for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13708–13718 (2021)

    Google Scholar 

  18. Song Y, F., Zhang, Z., Shan, C., et al.: Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1474–1488 (2022)

    Google Scholar 

  19. Shahroudy, A., Liu, J., Ng, T., et al.: NTURGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)

    Google Scholar 

  20. Liu, J., Shahroudy, A., Perez, M., et al.: NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2684–2701 (2020)

    Article  Google Scholar 

  21. Wang, J., Nie, X., Xia, Y., et al.: Cross-view action modeling, learning and recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2649–2656 (2014)

    Google Scholar 

  22. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

    Google Scholar 

  23. Meng, D., Peng, X., Wang, K., et al.: Frame attention networks for facial expression recognition in videos. In: 2019 IEEE International Conference on Image Processing, pp. 3866–3870 (2019)

    Google Scholar 

  24. Song, Y-F., Zhang, Z., Shan, C., et al.: Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1625–1633 (2020)

    Google Scholar 

  25. Liu, J., Wang, G., Hu, P., et al.: Global context-aware attention lstm networks for 3D action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1647–1656 (2017)

    Google Scholar 

  26. Cheng, K., Zhang, Y., He, X., et al.: Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 183–192 (2020)

    Google Scholar 

  27. Chen, Z., Li, S., Yang, B., et al.: Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 2, pp. 1113–1122 (2021)

    Google Scholar 

  28. Ye, F., Pu, S., Zhong, Q., et al.: Dynamic GCN: context-enriched topology learning for skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 55–63 (2020)

    Google Scholar 

  29. Zhou, H., Liu, Q., Wang, Y.: Learning discriminative representations for skeleton based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10608–10617 (2023)

    Google Scholar 

  30. Chi, H. G., Ha, M. H., Chi, S., et al.: Infogcn: Representation learning for human skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20186–20196 (2022)

    Google Scholar 

  31. Xu, K., Ye, F., Zhong, Q., et al.: Topology-aware convolutional neural network for efficient skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2866–2874 (2022)

    Google Scholar 

Download references

Acknowledgments

This research was funded by the National Natural Science Foundation of China (62272096), and the Fund of Jilin Provincial Department of Education Project (JJKH20231083KJ, JLJY202301810566).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Zhao .

Editor information

Editors and Affiliations

Ethics declarations

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qi, M., Liu, Z., Li, S., Zhao, W. (2024). Skeleton Action Recognition Based on Spatial-Temporal Dynamic Topological Representation. In: Huang, DS., Zhang, X., Guo, J. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14866. Springer, Singapore. https://doi.org/10.1007/978-981-97-5594-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5594-3_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5593-6

  • Online ISBN: 978-981-97-5594-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics