Abstract
Human behavior recognition is one important task of image processing and surveillance system. One main challenge of human behavior recognition is how to effectively model behaviors on condition of unconstrained videos due to tremendous variations from camera motion, background clutter, object appearance and so on. In this paper, we propose two novel Multi-Feature Hierarchical Latent Dirichlet Allocation models for human behavior recognition by extending the bag-of-word topic models such as the Latent Dirichlet Allocation model and the Multi-Modal Latent Dirichlet Allocation model. The two proposed models with three hierarchies including low-level visual features, feature topics, and behavior topics can effectively fuse two different types of features including motion and static visual features, avoid detecting or tracking the motion objects, and improve the recognition performance even if the features are extracted with a great amount of noise. Finally, we adopt the variational EM algorithm to learn the parameters of these models. Experiments on the YouTube dataset demonstrate the effectiveness of our proposed models.
Similar content being viewed by others
References
Hu W, Tan T, Wang L. A survey on visual surveillance of object motion and behaviors. IEEE Trans Syst Man Cybern C-Appl Rev, 2004, 34: 334–352
Dollar P, Rabaud V, Cottrell G, et al. Behavior recognition via sparse spatio-temporal features. In: the 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, 2005. 65–72
Bay H, Ess A, Tuytelaars T, et al. SURF: speeded up robust features. Comput Vis Image Und, 2008, 110: 346–358
Liu J G, Luo J B, Shah M. Recognizing realistic actions from video “in the wild”. In: International Conference on Computer Vision and Pattern Recognition, Florida, 2009. 1996–2003
Gelman A, Carlin J B, Stern H S, et al. Bayesian data analysis. 2nd ed. Chapman Hall/CRC Texts in Statistical Science, 2004
Wang X G, Ma X X, Eric W, et al. Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Trans Pattern Anal, 2009, 31: 539–555
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. J Mach Learn Res, 2003, 3: 993–1022
Blei D M, Jordan M I. Modeling annotated data. In: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 2003. 127–134
Yakhnenko O, Honavar V. Multi-modal hierarchical Dirichlet process model for predicting image annotation and image-object label correspondence. In: the 9th SIAM International Conference on Data Mining, Sparks, Nevada, 2009. 281–294
Bobick A, Davis J. The recognition of human movement using temporal templates. IEEE Trans Pattern Anal, 2001, 23: 257–267
Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition, Cambridge, 2004. 32–36
Oikonomopoulos A, Patras I, Pantic M. Spatiotemporal salient points for visual recognition of human actions. IEEE Trans Syst Man Cybern B-Cybern, 2006, 36: 710–719
Blank M, Gorelick L, Shechtman E, et al. Actions as space-time shapes. In: International Conference on Computer Vision, Beijing, 2005. 1395–1402
Seo H J, Milanfar P. Detection of human actions from a single example. In: International Conference on Computer Vision, Kyoto, 2009. 1965–1970
Fathi A, Mori G. Action recognition by learning mid-level motion features. In: International Conference on Computer Vision and Pattern Recognition, Alaska, 2008. 1–8
Mauthner T, Roth P M, Bischof H. Instant action recognition. In: the 16th Scandinavian Conference on Image Analysis, Oslo, 2009. 1–10
Brendel W, Todorovic S. Activities as time series of human postures. In: European Conference on Computer Vision, Crete, 2010. 721–734
MatiKainen P, Hebert M, Sukthankar R. Representing pairwise spatial and temporal relations for action recognition. In: European Conference on Computer Vision, Crete, 2010. 508–521
Lui Y M, Beveridge J R. Action classification on product manifolds. In: International Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 833–839
Li Y, Fermuller C, Aloimonos Y, et al. Learning shift-invariant sparse representation of actions. In: International Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 2630–2637
Wang L, Suter D. Learning and matching of dynamic shape manifolds for human action recognition. IEEE Trans Image Process, 2007, 16: 1646–1661
Gong S G, Xiang T. Recognition of group activities using dynamic probabilistic networks. In: International Conference on Computer Vision, Nice, 2003. 742–749
Li W Q, Zhang Z Y, Liu Z C. Expandable data-driven graphical modeling of human actions based on salient postures. IEEE Trans Circ Syst for Vid, 2008, 18: 1499–1510
Niebles J, Li F F. A hierarchical model of shape and appearance for human action classification. In: International Conference on Computer Vision and Pattern Recognition, Minnesota, 2007. 1–8
Nater F, Grabner H, Gool L V. Exploiting simple hierarchies for unsupervised human behavior analysis. In: International Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 2014–2021
Laptev I, Marszalek M, Schmid C, et al. Learning realistic human action movies. In: International Conference on Computer Vision and Pattern Recognition, Alaska, 2008. 1–8
Kratz L, Nishino K. Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. In: International Conference on Computer Vision and Pattern Recognition, Florida, 2009. 1446–1453
Ikizler-Cinbis N, Sclaroff S. Object, scene and actions: combining multiple features for human action recognition. In: European Conference on Computer Vision, Crete, 2010. 494–507
Yao A, Gall G, Gool L V. A hough transform-based voting framework for action recognition. In: International Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 2061–2068
Niebles J C, Wang H C, Li F F. Unsupervised learning of human action categories using spatio-temporal words. Int J Comput Vision, 2008, 79: 299–318
Wang Y, Mori G. Human action recognition by semi-latent topic models. IEEE Trans Pattern Anal, 2009, 31: 1762–1774
Hospedale T, Gong S G, Xiang T. A markov clustering topic model for mining behavior in video. In: International Conference on Computer Vision, Kyoto, 2009. 1165–1172
Li H P, Liu J, Zhang S W. Hierarchical Latent Dirichlet Allocation models for realistic action recognition. In: International Conference on Acoustics, Speech, and Singal Processing, Prague, 2011. 1297–1300
Lowe D G. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 60: 91–110
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, H., Zhang, F. & Zhang, S. Multi-feature hierarchical topic models for human behavior recognition. Sci. China Inf. Sci. 57, 1–15 (2014). https://doi.org/10.1007/s11432-013-4794-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-013-4794-9