Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448734.3450861acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdtmisConference Proceedingsconference-collections
research-article

SCR-Graph: Spatial-Causal Relationships Based Graph Reasoning Network for Human Action Prediction

Published: 17 May 2021 Publication History

Abstract

Technologies to predict human actions are extremely important for applications such as human robot cooperation and autonomous driving. However, a majority of the existing algorithms focus on exploiting visual features of the videos and do not consider the mining of relationships, which include spatial relationships between human and scene elements as well as causal relationships in temporal action sequences. In fact, human beings are good at using spatial and causal relational reasoning mechanism to predict the actions of others. Inspired by this idea, we proposed a Spatial and Causal Relationship based Graph Reasoning Network (SCR-Graph), which can be used to predict human actions by modeling the action-scene relationship, and causal relationship between actions, in spatial and temporal dimensions respectively. Here, in spatial dimension, a hierarchical graph attention module is designed by iteratively aggregating the features of different kinds of scene elements in different level. In temporal dimension, we designed a knowledge graph based causal reasoning module and map the past actions to temporal causal features through Diffusion RNN. Finally, we integrated the causality features into the heterogeneous graph in the form of shadow node, and introduced a self-attention module to determine the time when the knowledge graph information should be activated. Extensive experimental results on the VIRAT datasets demonstrate the favorable performance of the proposed framework.

References

[1]
Apratim Bhattacharyya, Mario Fritz, and Bernt Schiele. Long-term on-board prediction of people in traffic scenes under uncertainty. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4194–4202, 2018.
[2]
Jim Mainprice, Rafi Hayne, and Dmitry Berenson. Goal set inverse optimal control and iterative replanning for predicting human reaching motions in shared workspaces. IEEE Transactions on Robotics, 32(4):897–908, 2016.
[3]
Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander G Hauptmann, and Li Fei-Fei. Peeking into the future: Predicting future person activities and locations in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5725– 5734, 2019.
[4]
Jakub Złotowski, Diane Proudfoot, Kumar Yogeeswaran, and Christoph Bartneck. Anthropomorphism: opportunities and challenges in human–robot interaction. International Journal of Social Robotics, 7(3):347–360, 2015.
[5]
Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 961– 971, 2016.
[6]
Michael S Ryoo. Human activity prediction: Early recognition of ongoing activities from streaming videos. In 2011 International Conference on Computer Vision, pages 1036–1043. IEEE, 2011.
[7]
Kang Li and Yun Fu. Prediction of human activity by discovering temporal sequence patterns. IEEE transactions on pattern analysis and machine intelligence, 36(8):1644–1657, 2014.
[8]
Minh Hoai and Fernando De la Torre. Max-margin early event detectors. International Journal of Computer Vision, 107(2):191–202, 2014.
[9]
Christoph Feichtenhofer, Axel Pinz, and Andrew Zisserman. Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1933–1941, 2016.
[10]
Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the 23rd ACM international conference on Multimedia, pages 461–470. ACM, 2015.
[11]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 4489–4497, 2015.
[12]
Zhaofan Qiu, Ting Yao, and Tao Mei. Learning spatio-temporal representation with pseudo-3d residual networks. In proceedings of the IEEE International Conference on Computer Vision, pages 5533– 5541, 2017.
[13]
Yu Kong, Zhiqiang Tao, and Yun Fu. Adversarial action prediction networks. IEEE transactions on pattern analysis and machine intelligence, 2018.
[14]
Yu Kong, Shangqian Gao, Bin Sun, and Yun Fu. Action prediction from videos via memorizing hard-to-predict samples. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[15]
Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Graph neural networks: A review of methods and applications. arXiv preprint arXiv:1812.08434, 2018.
[16]
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203, 2013.
[17]
David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alan Aspuru-Guzik, and Ryan P Adams.´ Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pages 2224– 2232, 2015.
[18]
Petar Velickoviˇ c, Guillem Cucurull, Arantxa Casanova, Adriana´ Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
[19]
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. Heterogeneous graph attention network. In The World Wide Web Conference, pages 2022–2032. ACM, 2019.
[20]
Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry¨ Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
[21]
Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory.¨ Neural computation, 9(8):1735–1780, 1997.
[22]
Youngjoo Seo, Michael Defferrard, Pierre Vandergheynst, and Xavier¨ Bresson. Structured sequence modeling with graph convolutional recurrent networks. In International Conference on Neural Information Processing, pages 362–373. Springer, 2018.
[23]
Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926, 2017.
[24]
Mayank Kejriwal. Domain-Specific Knowledge Graph Construction. Springer, 2019.
[25]
Ruiyu Li, Makarand Tapaswi, Renjie Liao, Jiaya Jia, Raquel Urtasun, and Sanja Fidler. Situation recognition with graph neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 4173–4182, 2017.
[26]
Yuan Fang, Kingsley Kuan, Jie Lin, Cheston Tan, and Vijay Chandrasekhar. Object detection meets knowledge graphs. 2017.
[27]
Cewu Lu, Ranjay Krishna, Michael Bernstein, and Li Fei-Fei. Visual relationship detection with language priors. In European Conference on Computer Vision, pages 852–869. Springer, 2016.
[28]
Yuke Zhu, Ce Zhang, Christopher Re, and Li Fei-Fei.´ Building a large-scale multimodal knowledge base system for answering visual queries. arXiv preprint arXiv:1507.05670, 2015.
[29]
Kenneth Marino, Ruslan Salakhutdinov, and Abhinav Gupta. The more you know: Using knowledge graphs for image classification. arXiv preprint arXiv:1612.04844, 2016.
[30]
Junyu Gao, Tianzhu Zhang, and Changsheng Xu. Watch, think and attend: End-to-end video classification via dynamic knowledge evolution modeling. In Proceedings of the 26th ACM international conference on Multimedia, pages 690–699. ACM, 2018.
[31]
Shang-Hua Teng Scalable algorithms for data and network analysis. Foundations and Trends R in Theoretical Computer Science, 12(1–2):1–274, 2016.
[32]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
[33]
George Awad, Asad Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzad Godil, David Joy, Andrew Delgado, Alan Smeaton, Yvette Graham, Trecvid 2018: Benchmarking video activity detection, video captioning and matching, video storytelling linking and video search. 2018.
[34]
Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick.´ Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
[35]
Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath´ Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
[36]
Inwoong Lee, Doyoung Kim, Seoungyoon Kang, and Sanghoon Lee. Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
[37]
Konstantinos Sechidis, Grigorios Tsoumakas, and Ioannis Vlahavas. On the stratification of multi-label data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 145–158. Springer, 2011.
[38]
Xiao Li, Ye-Yi Wang, and Alex Acero. Learning query intent from regularized click graphs. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 339–346, 2008.

Cited By

View all
  • (2024)Empowering Predictive Modeling by GAN-based Causal Information LearningACM Transactions on Intelligent Systems and Technology10.1145/365261015:3(1-19)Online publication date: 17-May-2024
  • (2022)Real-Time Human Fault Detection in Assembly Tasks, Based on Human Action Prediction Using a Spatio-Temporal Learning ModelSustainability10.3390/su1415902714:15(9027)Online publication date: 23-Jul-2022

Index Terms

  1. SCR-Graph: Spatial-Causal Relationships Based Graph Reasoning Network for Human Action Prediction
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      CONF-CDS 2021: The 2nd International Conference on Computing and Data Science
      January 2021
      1142 pages
      ISBN:9781450389570
      DOI:10.1145/3448734
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 May 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Graph neural network
      2. action prediction
      3. knowledge graph

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      CONF-CDS 2021

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Empowering Predictive Modeling by GAN-based Causal Information LearningACM Transactions on Intelligent Systems and Technology10.1145/365261015:3(1-19)Online publication date: 17-May-2024
      • (2022)Real-Time Human Fault Detection in Assembly Tasks, Based on Human Action Prediction Using a Spatio-Temporal Learning ModelSustainability10.3390/su1415902714:15(9027)Online publication date: 23-Jul-2022

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media