Action-Scene Model for Recognizing Human Actions from Background in Realistic Videos

Qu, Wen; Zhang, Yifei; Feng, Shi; Wang, Daling; Yu, Ge

doi:10.1007/978-3-319-08010-9_62

Wen Qu²⁰,
Yifei Zhang^20,21,
Shi Feng^20,21,
Daling Wang^20,21 &
…
Ge Yu^20,21

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8485))

Included in the following conference series:

International Conference on Web-Age Information Management

5938 Accesses
2 Citations

Abstract

Using single information from person region fails to distinguish similar actions in realistic videos due to occlusions and variation of person. In this paper, we explore the problem of modeling action-scene context from the background regions of the realistic videos. The contextual cues of actions and scenes are formulated in a graphical model representation. A novel Action-Scene Model is proposed to mine the contextual cues with little prior knowledge. The proposed approach can infer actions from background regions directly and is a complement to the existing methods. In order to fuse the contextual cues effectively with other components, a context weightis introduced to measure the contributions of context based on the proposed model. We present experimental results on a realistic video dataset. The experiment results validate the effectiveness of Action-Scene Model in identifying the actions from background regions. And the learned contextual cues can achieve better performance than the existing methods especially for scene-dependent action categories.

Project supported by the National Basic Research 973 Program of China under Grant No. 2011CB302200-G, the Key Program of National Natural Science Foundation of China under Grant No. 61033007, the National Natural Science Foundation of China under Grant Nos. 61370074,61100026, and the Fundamental Research Funds for the Central Universities of China under Grant Nos. N120404007.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Spatial–temporal correlations learning and action-background jointed attention for weakly-supervised temporal action localization

Article 18 March 2022

Action-Aware Network with Upper and Lower Limit Loss for Weakly-Supervised Temporal Action Localization

Article 24 September 2022

Separately Guided Context-Aware Network for Weakly Supervised Temporal Action Detection

Article 25 March 2023

References

Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV, pp. 1395–1402 (2005)
Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research (3), 993–1022 (2003)
Google Scholar
Comaniciu, D., Ramesh, V., Meer, P.: Real-time tracking of non-rigid objects using mean shift. In: CVPR, pp. 142–149 (2000)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)
Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV 88(2), 303–338 (2010)
Article Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained partbased models. IEEE Trans. Pattern Anal. 32(8), 1627–1645 (2010)
Article Google Scholar
Griffiths, T., Steyvers, M.: Find scientific topics. Proceedings of the National Academy of Sciences 101(suppl. 1), 5228–5235 (2004)
Article Google Scholar
Han, D., Bo, L., Sminchisescu, C.: Selection and context for action recognition. In: ICCV, pp. 1933–1940 (2009)
Google Scholar
Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 494–507. Springer, Heidelberg (2010)
Chapter Google Scholar
Jiang, Y., Li, Z., Chang, S.: Modeling scene and object contexts for human action retrieval with few examples. IEEE Transactions on Circuits and Systems for Video Technology 21(5), 674–681 (2011)
Article Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: CVPR, pp. 1996–2003 (2009)
Google Scholar
Lowe, D.: Distinctive image features form scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Article Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR, pp. 2929–2936 (2009)
Google Scholar
Motwani, T., Mooney, R.: Improving video activity recognition using object recognition and text mining. In: ECAI (2012)
Google Scholar
Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)
Article Google Scholar
Oliv, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42(3), 142–175 (2001)
Google Scholar
Schölkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press (2002)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, pp. 32–36 (2004)
Google Scholar
Ullah, M., Parizi, S., Laptev, I.: Improving bag of features action recognition with non-local cues. In: BMVC, pp. 1–11 (2010)
Google Scholar
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR, pp. 17–24 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Northeastern University, China
Wen Qu, Yifei Zhang, Shi Feng, Daling Wang & Ge Yu
Key Laboratory of Medical Image Computing, Northeastern University, Ministry of Education, Shenyang, 110819, China
Yifei Zhang, Shi Feng, Daling Wang & Ge Yu

Authors

Wen Qu
View author publications
You can also search for this author in PubMed Google Scholar
Yifei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shi Feng
View author publications
You can also search for this author in PubMed Google Scholar
Daling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ge Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, University of Utah, 50 S. Central Campus Drive, 84112, Salt Lake City,, UT, USA
Feifei Li
Department of Computer Science, Tsinghua University, 100084, Beijing, China
Guoliang Li
POSTECH, Republic of Korea
Seung-won Hwang
Shanghai Key Laboratory of Scalable Computing and Systems, Department of Computer Science and Engineering,, Shanghai Jiao Tong University, China
Bin Yao
Advanced Digital Sciences Center (ADSC), 138632, Singapore, Singapore
Zhenjie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qu, W., Zhang, Y., Feng, S., Wang, D., Yu, G. (2014). Action-Scene Model for Recognizing Human Actions from Background in Realistic Videos. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_62

Download citation

DOI: https://doi.org/10.1007/978-3-319-08010-9_62
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Action-Scene Model for Recognizing Human Actions from Background in Realistic Videos

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Spatial–temporal correlations learning and action-background jointed attention for weakly-supervised temporal action localization

Action-Aware Network with Upper and Lower Limit Loss for Weakly-Supervised Temporal Action Localization

Separately Guided Context-Aware Network for Weakly Supervised Temporal Action Detection

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Action-Scene Model for Recognizing Human Actions from Background in Realistic Videos

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Spatial–temporal correlations learning and action-background jointed attention for weakly-supervised temporal action localization

Action-Aware Network with Upper and Lower Limit Loss for Weakly-Supervised Temporal Action Localization

Separately Guided Context-Aware Network for Weakly Supervised Temporal Action Detection

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation