Abstract
Using single information from person region fails to distinguish similar actions in realistic videos due to occlusions and variation of person. In this paper, we explore the problem of modeling action-scene context from the background regions of the realistic videos. The contextual cues of actions and scenes are formulated in a graphical model representation. A novel Action-Scene Model is proposed to mine the contextual cues with little prior knowledge. The proposed approach can infer actions from background regions directly and is a complement to the existing methods. In order to fuse the contextual cues effectively with other components, a context weightis introduced to measure the contributions of context based on the proposed model. We present experimental results on a realistic video dataset. The experiment results validate the effectiveness of Action-Scene Model in identifying the actions from background regions. And the learned contextual cues can achieve better performance than the existing methods especially for scene-dependent action categories.
Project supported by the National Basic Research 973 Program of China under Grant No. 2011CB302200-G, the Key Program of National Natural Science Foundation of China under Grant No. 61033007, the National Natural Science Foundation of China under Grant Nos. 61370074,61100026, and the Fundamental Research Funds for the Central Universities of China under Grant Nos. N120404007.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV, pp. 1395–1402 (2005)
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research (3), 993–1022 (2003)
Comaniciu, D., Ramesh, V., Meer, P.: Real-time tracking of non-rigid objects using mean shift. In: CVPR, pp. 142–149 (2000)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV 88(2), 303–338 (2010)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained partbased models. IEEE Trans. Pattern Anal. 32(8), 1627–1645 (2010)
Griffiths, T., Steyvers, M.: Find scientific topics. Proceedings of the National Academy of Sciences 101(suppl. 1), 5228–5235 (2004)
Han, D., Bo, L., Sminchisescu, C.: Selection and context for action recognition. In: ICCV, pp. 1933–1940 (2009)
Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 494–507. Springer, Heidelberg (2010)
Jiang, Y., Li, Z., Chang, S.: Modeling scene and object contexts for human action retrieval with few examples. IEEE Transactions on Circuits and Systems for Video Technology 21(5), 674–681 (2011)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: CVPR, pp. 1996–2003 (2009)
Lowe, D.: Distinctive image features form scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR, pp. 2929–2936 (2009)
Motwani, T., Mooney, R.: Improving video activity recognition using object recognition and text mining. In: ECAI (2012)
Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)
Oliv, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42(3), 142–175 (2001)
Schölkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press (2002)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, pp. 32–36 (2004)
Ullah, M., Parizi, S., Laptev, I.: Improving bag of features action recognition with non-local cues. In: BMVC, pp. 1–11 (2010)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR, pp. 17–24 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Qu, W., Zhang, Y., Feng, S., Wang, D., Yu, G. (2014). Action-Scene Model for Recognizing Human Actions from Background in Realistic Videos. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_62
Download citation
DOI: https://doi.org/10.1007/978-3-319-08010-9_62
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)