Abstract
In this paper, a novel action recognition method is proposed based on hierarchical dynamic Bayesian network (HDBN). The algorithm is divided into system learning stage and action recognition stage. In the stage of system learning, the video features are extracted using deep neural networks firstly, and using hierarchical clustering and assisting manually, a hierarchical action semantic dictionary (HASD) is built. The next, we construct the HDBN graph model to present video sequence. In the stage of recognition, we first get the representative frames of unknown video using deep neural networks. The features are inputted into the HDBN, and the HDBN inference is used to get recognition results. The testing results show the proposed method is promising.
Similar content being viewed by others
References
Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: ECCV, 404–417
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Chaaraoui AA, Climent-Pérez P, Flórez-Revuelta F (2013) Silhouette-based human action recognition using sequences of key poses. Pattern Recogn Lett 34(15):1799–1807
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. CVPR 1:886–893
Do T-M-T, Artieres T (2010) Neural conditional random fields. In: Proceedings of the 13th international conference on artificial intelligence and statistics, vol. 9, pp 177–184
Efros AA, Berg AC, Mori G, Malik J (2003) Recognizing action at a distance. In: ICCV 2003, Nice, France, 726–733
Farzad H, Babette D, Carme T (2016) Action recognition based on efficient deep feature learning in the spatio-temporal domain. IEEE Robotics and Automation Letters 1(2):984–991
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2005) Actions as space–time shapes. In: IEEE ICCV, 1395–1402
Gross JL, Yellen J (2011) Graph theory and its applications, 2nd edn. Chapman and Hall/CRC, Boca Raton
Haykin S (2009) Neural networks and learning machines, 3rd edn. Prentice Hall, New York
Hoai M, Lan Z-Z, Dela Torre F (2011) Joint segmentation and classification of human actions in video. In: CVPR, 3265–3272
Hongzhao C, Wang G, Xue J-H, He L (2016) A novel hierarchical framework for human action recognition. Pattern Recogn 55:148–159
Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In: ICCV, 1–8
Kong Y, Behnam S, Yun F (2016) Learning hierarchical 3D kernel descriptors for RGB-D action recognition. Comput Vision Image Underst 144:14–23
Lafferty J, Mccallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, pp 282-289
Lafferty J, Zhu X, Liu Y (2004) Kernel conditional random fields: representation and clique selection. In: ICML
Laptev I (2005) On space–time interest points. Int J Comput Vis 64(2–3):107–123
Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical in variant spatio- temporal features for action recognition within dependent subspace analysis. CVPR, In, pp 3361–3368
Lee H, Grosse R, Ranganath R, Ng AY (2011) Unsupervised learning of hierarchical representations with convolutional deep belief networks. Commun ACM 54(10):95–103
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. In: CVPR
Liu J, Yang Y, Sha M (2009) Learning semantic visual vocabularies using diffusion distance. In: CVPR, 461–468
Liu CH, Liu J, He ZC, Zhai YJ, Hu QH, Huang YL (2016) Convolutional neural random fields for action recognition. Pattern Recognition 59:213–224
Ni B, Pei Y, Liang Z, Lin L, Moulin P (2013) Integrating multi-stage depth-induced contextual information for human action recognition and localization. In: IEEE international conference and workshops on automatic face and gesture recognition, pp 1–8
Paul IE, Mohan CK (2016) Human action recognition using genetic algorithms and convolutional neural networks. Pattern Recogn 59:199–212
Shen H, Yan Y, Xu S, Ballas N, Chen W (2015) Evaluation of semi-supervised learning method on action recognition. Multimedia Tools and Applications 74(2):523–542
Sminchisescu C, Kanaujia A, Metaxas D (2006) Conditional models for contextual human motion recognition. Comput Vis Image Underst 104(2–3):210–220
Walker J, Gupta A, Hebert M Dense optical flow prediction from a static image, arXiv preprint arXiv: 1505.00295
Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation and recognition. Comput Vis Image Underst 115(2):224–241
Wenwen D, Kai L, Xujia F, Cheng F (2016) Profile HMMs for skeleton-based human action recognition. Signal Process Image Commun 42:109–119
Wolf W (1996) Key frame selection bymotion analysis. Atlanta: Rodney Andrew Kennedy Inc 2:1228–1231
Xiaofei J, Zhaojie J, Wang C, Wang C (2016) Muti-view transition HMMs based view-invariant human action recognition method. Multimedia Tools and Applications 75(19):11847–11864
Yang Y, Lin M (2016) Human action recognition with graph-based multiple-instance learning. Pattern Recognition 53:148–162
Yang Q, Xue D-y, Jian-jiang C (2012) Human action recognition using dynamic bayesian network. International Journal of Advancements in Computing Technology 4(12):291–298
Zhang Z, Hu Y, Chan S, Chia L-T (2008) Motion context: a new representation for human action recognition. In: ECCV, 817–829
Zhou F, De la Torre F, Hodgins JK (2013) Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE Trans Pattern Anal Mach Intell 35(3):582–596
Acknowledgements
This research is supported by the project (61271362, 61671362) of the National Natural Science Foundation of China.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xiao, Q., Song, R. Action recognition based on hierarchical dynamic Bayesian network. Multimed Tools Appl 77, 6955–6968 (2018). https://doi.org/10.1007/s11042-017-4614-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4614-0