research-article

SCR-Graph: Spatial-Causal Relationships Based Graph Reasoning Network for Human Action Prediction

Authors:

Chunsheng HuaAuthors Info & Claims

CONF-CDS 2021: The 2nd International Conference on Computing and Data Science

Article No.: 131, Pages 1 - 9

https://doi.org/10.1145/3448734.3450861

Published: 17 May 2021 Publication History

Abstract

Technologies to predict human actions are extremely important for applications such as human robot cooperation and autonomous driving. However, a majority of the existing algorithms focus on exploiting visual features of the videos and do not consider the mining of relationships, which include spatial relationships between human and scene elements as well as causal relationships in temporal action sequences. In fact, human beings are good at using spatial and causal relational reasoning mechanism to predict the actions of others. Inspired by this idea, we proposed a Spatial and Causal Relationship based Graph Reasoning Network (SCR-Graph), which can be used to predict human actions by modeling the action-scene relationship, and causal relationship between actions, in spatial and temporal dimensions respectively. Here, in spatial dimension, a hierarchical graph attention module is designed by iteratively aggregating the features of different kinds of scene elements in different level. In temporal dimension, we designed a knowledge graph based causal reasoning module and map the past actions to temporal causal features through Diffusion RNN. Finally, we integrated the causality features into the heterogeneous graph in the form of shadow node, and introduced a self-attention module to determine the time when the knowledge graph information should be activated. Extensive experimental results on the VIRAT datasets demonstrate the favorable performance of the proposed framework.

References

[1]

Apratim Bhattacharyya, Mario Fritz, and Bernt Schiele. Long-term on-board prediction of people in traffic scenes under uncertainty. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4194–4202, 2018.

[2]

Jim Mainprice, Rafi Hayne, and Dmitry Berenson. Goal set inverse optimal control and iterative replanning for predicting human reaching motions in shared workspaces. IEEE Transactions on Robotics, 32(4):897–908, 2016.

Digital Library

[3]

Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander G Hauptmann, and Li Fei-Fei. Peeking into the future: Predicting future person activities and locations in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5725– 5734, 2019.

[4]

Jakub Złotowski, Diane Proudfoot, Kumar Yogeeswaran, and Christoph Bartneck. Anthropomorphism: opportunities and challenges in human–robot interaction. International Journal of Social Robotics, 7(3):347–360, 2015.

[5]

Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 961– 971, 2016.

[6]

Michael S Ryoo. Human activity prediction: Early recognition of ongoing activities from streaming videos. In 2011 International Conference on Computer Vision, pages 1036–1043. IEEE, 2011.

Digital Library

[7]

Kang Li and Yun Fu. Prediction of human activity by discovering temporal sequence patterns. IEEE transactions on pattern analysis and machine intelligence, 36(8):1644–1657, 2014.

Digital Library

[8]

Minh Hoai and Fernando De la Torre. Max-margin early event detectors. International Journal of Computer Vision, 107(2):191–202, 2014.

Digital Library

[9]

Christoph Feichtenhofer, Axel Pinz, and Andrew Zisserman. Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1933–1941, 2016.

[10]

Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the 23rd ACM international conference on Multimedia, pages 461–470. ACM, 2015.

Digital Library

[11]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 4489–4497, 2015.

Digital Library

[12]

Zhaofan Qiu, Ting Yao, and Tao Mei. Learning spatio-temporal representation with pseudo-3d residual networks. In proceedings of the IEEE International Conference on Computer Vision, pages 5533– 5541, 2017.

[13]

Yu Kong, Zhiqiang Tao, and Yun Fu. Adversarial action prediction networks. IEEE transactions on pattern analysis and machine intelligence, 2018.

[14]

Yu Kong, Shangqian Gao, Bin Sun, and Yun Fu. Action prediction from videos via memorizing hard-to-predict samples. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

[15]

Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Graph neural networks: A review of methods and applications. arXiv preprint arXiv:1812.08434, 2018.

[16]

Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203, 2013.

[17]

David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alan Aspuru-Guzik, and Ryan P Adams.´ Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pages 2224– 2232, 2015.

Digital Library

[18]

Petar Velickoviˇ c, Guillem Cucurull, Arantxa Casanova, Adriana´ Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.

[19]

Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. Heterogeneous graph attention network. In The World Wide Web Conference, pages 2022–2032. ACM, 2019.

Digital Library

[20]

Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry¨ Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.

[21]

Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory.¨ Neural computation, 9(8):1735–1780, 1997.

Digital Library

[22]

Youngjoo Seo, Michael Defferrard, Pierre Vandergheynst, and Xavier¨ Bresson. Structured sequence modeling with graph convolutional recurrent networks. In International Conference on Neural Information Processing, pages 362–373. Springer, 2018.

[23]

Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926, 2017.

[24]

Mayank Kejriwal. Domain-Specific Knowledge Graph Construction. Springer, 2019.

Digital Library

[25]

Ruiyu Li, Makarand Tapaswi, Renjie Liao, Jiaya Jia, Raquel Urtasun, and Sanja Fidler. Situation recognition with graph neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 4173–4182, 2017.

[26]

Yuan Fang, Kingsley Kuan, Jie Lin, Cheston Tan, and Vijay Chandrasekhar. Object detection meets knowledge graphs. 2017.

[27]

Cewu Lu, Ranjay Krishna, Michael Bernstein, and Li Fei-Fei. Visual relationship detection with language priors. In European Conference on Computer Vision, pages 852–869. Springer, 2016.

[28]

Yuke Zhu, Ce Zhang, Christopher Re, and Li Fei-Fei.´ Building a large-scale multimodal knowledge base system for answering visual queries. arXiv preprint arXiv:1507.05670, 2015.

[29]

Kenneth Marino, Ruslan Salakhutdinov, and Abhinav Gupta. The more you know: Using knowledge graphs for image classification. arXiv preprint arXiv:1612.04844, 2016.

[30]

Junyu Gao, Tianzhu Zhang, and Changsheng Xu. Watch, think and attend: End-to-end video classification via dynamic knowledge evolution modeling. In Proceedings of the 26th ACM international conference on Multimedia, pages 690–699. ACM, 2018.

Digital Library

[31]

Shang-Hua Teng Scalable algorithms for data and network analysis. Foundations and Trends R in Theoretical Computer Science, 12(1–2):1–274, 2016.

Digital Library

[32]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.

Digital Library

[33]

George Awad, Asad Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzad Godil, David Joy, Andrew Delgado, Alan Smeaton, Yvette Graham, Trecvid 2018: Benchmarking video activity detection, video captioning and matching, video storytelling linking and video search. 2018.

[34]

Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick.´ Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.

[35]

Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath´ Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.

[36]

Inwoong Lee, Doyoung Kim, Seoungyoon Kang, and Sanghoon Lee. Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In 2017 IEEE International Conference on Computer Vision (ICCV), 2017.

[37]

Konstantinos Sechidis, Grigorios Tsoumakas, and Ioannis Vlahavas. On the stratification of multi-label data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 145–158. Springer, 2011.

Digital Library

[38]

Xiao Li, Ye-Yi Wang, and Alex Acero. Learning query intent from regularized click graphs. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 339–346, 2008.

Digital Library

Cited By

Zeng JZhang GYuan JLi YJin D(2024)Empowering Predictive Modeling by GAN-based Causal Information LearningACM Transactions on Intelligent Systems and Technology10.1145/365261015:3(1-19)Online publication date: 17-May-2024
https://dl.acm.org/doi/10.1145/3652610
Zhang ZPeng GWang WChen Y(2022)Real-Time Human Fault Detection in Assembly Tasks, Based on Human Action Prediction Using a Spatio-Temporal Learning ModelSustainability10.3390/su1415902714:15(9027)Online publication date: 23-Jul-2022
https://doi.org/10.3390/su14159027

Index Terms

SCR-Graph: Spatial-Causal Relationships Based Graph Reasoning Network for Human Action Prediction
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
2. Information systems
  1. Information systems applications

Index terms have been assigned to the content through auto-classification.

Recommendations

Self-Explainable Temporal Graph Networks based on Graph Information Bottleneck
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Temporal Graph Neural Networks (TGNN) have the ability to capture both the graph topology and dynamic dependencies of interactions within a graph over time. There has been a growing need to explain the predictions of TGNN models due to the difficulty in ...
Visual Knowledge Graph for Human Action Reasoning in Videos
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Action recognition has been traditionally treated as a high-level video classification problem. However, such a manner lacks the detailed and semantic understanding of body movement, which is the critical knowledge to explain and infer complex human ...
Utilizing Expert Knowledge and Contextual Information for Sample-Limited Causal Graph Construction
Database Systems for Advanced Applications
Abstract
This paper focuses on causal discovery, which aims at inferring the underlying causal relationships from observational samples. Existing methods of causal discovery rely on a large number of samples. So when the number of samples is limited, they ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CONF-CDS 2021: The 2nd International Conference on Computing and Data Science

January 2021

1142 pages

ISBN:9781450389570

DOI:10.1145/3448734

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CONF-CDS 2021

CONF-CDS 2021: The 2nd International Conference on Computing and Data Science

January 28 - 30, 2021

CA, Stanford, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
144
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)2

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zeng JZhang GYuan JLi YJin D(2024)Empowering Predictive Modeling by GAN-based Causal Information LearningACM Transactions on Intelligent Systems and Technology10.1145/365261015:3(1-19)Online publication date: 17-May-2024
https://dl.acm.org/doi/10.1145/3652610
Zhang ZPeng GWang WChen Y(2022)Real-Time Human Fault Detection in Assembly Tasks, Based on Human Action Prediction Using a Spatio-Temporal Learning ModelSustainability10.3390/su1415902714:15(9027)Online publication date: 23-Jul-2022
https://doi.org/10.3390/su14159027

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents