Abstract
Video activity analysis is used in various video applications such as human action recognition, video retrieval, video archiving. In this paper, we propose to apply 3D wavelet transform statistics to natural video signals and employ the resulting statistical attributes for video modeling and analysis. From the 3D wavelet transform, we investigate the marginal and joint statistics as well as the Mutual Information (MI) estimates. We show that marginal histograms are approximated quite well by Generalized Gaussian Density (GGD) functions; and the MI between coefficients decreases when the activity level increases in videos. Joint statistics attributes are applied to scene activity grouping, leading to 87.3% accurate grouping of videos. Also, marginal and joint statistics features extracted from the video are used for human action classification employing Support Vector Machine (SVM) classifiers and 93.4% of the human activities are properly classified.













Similar content being viewed by others
References
Boashash B (2003) Time-frequency signal analysis and processing: a comprehensive reference. Elsevier Science, Oxford
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267. doi:10.1109/34.910878
Chang CC, Lin CJ (2001) LIBSVM : a library for support vector machines
Chen W, Zhang YJ (2008) Parametric model for video content analysis. Elsevier B.V. Pattern Recogn Lett 29:181–191. doi:10.1016/j.patrec.2007.09.020
Coudert F, Benois-Pineau J, Le Lann PY, Barba D (1999) Binkey: a system for video content analysis on the fly. In: Proceedings of IEEE Int’l Conf. Multimedia Comput. Syst., 1:679–684
Cover TM, Thomas JA (1991) Elements of information theory. Wiley Interscience, NewYork
Cunha AL, Do MN, Vetterli M (2007) A stochastic model for video and its information rates. In: Proceedings of the 2007 Data Compression Conference, pp. 3–12
DeVore RA, Lucier BJ (1992) Wavelets. In: Iserles A (ed) Proceedings of Acta Numerica 92. Cambridge University Press, New York, pp 1–56
Do MN (2001) Directional multiresolution image representations. PhD thesis, Swiss Federal Institute of Technology
Do MN, Vetterli M (2000) Texture similarity measurement using Kullback–Leibler distance on wavelet subbands. In Proc. of IEEE Int’l Conf. on Image Processing 3:730–733. doi: 10.1109/ICIP.2000.899558
Duan LY, Xu M, Tian Q, Xu CS, Jesse SJ (2005) A unified framework for semantic shot classification in sports video. IEEE Trans Multimed 7(6):1066–1083. doi:10.1109/TMM.2005.858395
Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: Proceeding of Int’l Conference on Computer Vision and Pattern, pp. 1–8
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space–time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253. doi:10.1109/TPAMI.2007.70711
Greenspan H, Goldberger J, Mayer A (2004) Probabilistic space-time video modeling via piecewise GMM. IEEE Trans Pattern Anal Mach Intell 26(3):384–396. doi:10.1109/TPAMI.2004.1262334
http://en.wikipedia.org/wiki/Wavelet. Accessed 14 September 2011
http://nsl.cs.sfu.ca/wiki/index.php/Video_Library_and_Tools. Accessed 15 September 2011
http://taco.poly.edu/WaveletSoftware/standard3D.html. Accessed 15 April 2011
http://www.irisa.fr/vista/Equipe/People/Laptev/download.html. Accessed 15 September 2011
http://www.open-video.org. Accessed 15 April 2011
Ikizler N, Cinbis RG, Duygulu P (2008) Human action recognition with line and flow histograms. In: Proceedings of Int’l Conference on Pattern Recognition, pp. 1–4
ITU-R Recommendation BT.500-11 (2002) Methodology for the subjective assessment of the quality of television pictures.
Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In: Proceedings of Int’l Conference on Computer Vision, pp. 1–8
Kienzle V, Bakir GH, Franz MO, Schölkopf B (2004) Efficient approximations for support vector machines in object detection. Pattern Recognition, Lecture Notes in Computer Science 3175:54–61. doi:10.1007/978-3-540-28649-3_7
Kim TK, Cipolla R (2009) Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans Pattern Anal Mach Intell 31(8):1415–1428. doi:10.1109/TPAMI.2008.167
Laptev I, Lindeberg T (2004) Local descriptors for spatio-temporal recognition. In: Proceedings of ECCV Workshop, Spatial Coherence for Visual Motion Analysis, pp. 91–103
Laptev I, Lindeberg T (2004) Velocity adaptation of space-time interest points. In Proc. Of In’l Conf. on Pattern Recognition 1:52–56. doi: 10.1109/ICPR.2004.1334003
Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proceedings of Int’l Conference on Computer Vision and Pattern Recognition, pp. 1–8
Lawrence Raniner RR, Biing-Hwang J (1993) Fundamentals of speech processing. Prentice-Hall International
Li Z, Liu G (2011) Video scene analysis in 3D wavelet transform domain. J Multimed Tool Appl. doi:10.1007/s11042-010-0594-z
Lian S, Sun J, Wang Z (2004) A secure 3D-SPIHT codec. In: Proceedings of European Signal Processing Conference, pp. 813–816
Liu J, Moulin P (2001) Information-theoretic analysis of interscale and intrascale dependencies between image wavelet coefficients. IEEE Trans Image Process 10(11):1647–1658. doi:10.1109/83.967393
Lu F, Yang X, Lin W, Zhang R, Yu S (2011) Image classification with multiple feature channels. Opt Eng 50(05). doi:10.1117/1.3582852
Mallat S (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693. doi:10.1109/34.192463
Meyer Y (1989) Wavelets. In: Combes JM et al (eds) Proceedings of. Springer Verlag, Berlin, pp. 21
Mo X, Wilson R (2004) Video modeling and segmentation using Gaussian Mixture Models. In: Proceedings of the 17th Int’l Conference on Pattern Recognition, ICPR 3:854–857
Moddemeijer R (1989) On estimation of entropy and mutual information of continuous distributions. Signal Process 16(3):233–246
Moeslund TB, Hilton A, Kruger V (2006) A survey of advances in vision based human motion capture and analysis. Comput Vis Image Understand 104(2–3):90–126. doi:10.1016/j.cviu.2006.08.002
Moulin P, Liu J (1999) Analysis of multiresolution image denoising schemes using generalized Gaussian and complexity priors. IEEE Trans Inform Theor 45:909–919. doi:10.1109/18.761332
Ngo CW, Pong TC, Zhang HJ (2002) Motion-based video representation for scene change detection. Int J Comput Vis 50(2):127–142. doi:10.1023/A:1020341931699
Nicolas H, Manaury A, Benois-Pineau J, Dupuy W, Barba D (2004) Grouping video shots into scenes based on 1D mosaic descriptors. In: Proceedings of Int’l Conf. on Image Processing, ICIP, 1:637–640
Niebles JC, Wang H, Fei LF (2008) Unsupervised learning of human action categories using spatial–temporal words. Int J Comput Vis 79(3):299–318. doi:10.1007/s11263-007-0122-4
Oh TH, Besar R (2003) JPEG2000 and JPEG: image quality measures of compressed medical images. In Proceedings of 4th National Conf. on Telecommunication Tech., pp. 31–35
Oikonomopoulos A, Patras I, Pantic M (2006) Spatiotemporal salient points for visual recognition of human actions. IEEE Trans Syst Man Cybern B Cybern 36(3):710–719. doi:10.1109/TSMCB.2005.861864
Oikonomopoulos A, Pantic M, Patras I (2009) Sparse B-spline polynomial descriptors for human activity recognition. Image Vis Comput 27(12):1814–1825. doi:10.1016/j.imavis.2009.05.010
Omidyeganeh M, Ghaemmaghami S, Shirmohammadi S (2010) Autoregressive Video Modeling through 2D Wavelet Statistics. In: Proceedings of the IEEE Int’l Conf. on Intelligent Information Hiding and Multimedia Signal Processing 1:272–275. doi: 10.1109/IIHMSP.2010.75
Po DD-Y, Do MN (2003) Directional multiscale statistical modeling of images. Wavelets: Applications in Signal and Image Processing 5207:69–79. doi:10.1117/12.506412
Po DD-Y, Do MN (2006) Directional multiscale modeling of images using the contourlet transform. IEEE Trans Image Process 15(6):1610–1620. doi:10.1109/TIP.2006.873450
Poppe R (2010) A survey on vision based human action recognition. Image Vis Comput, Elsevier 28(6):976–990. doi:10.1016/j.imavis.2009.11.014
Rajagopalan R, Orchard MT (2002) Synthesizing processed video by filtering temporal relationships. IEEE Trans Image Process 11(1):26–36. doi:10.1109/83.977880
Rapantzikos K, Avrithis YS, Kollias SD (2007) Spatiotemporal saliency for event detection and representation in the 3D wavelet domain: potential in human action recognition. CIVR 2007, pp. 294–301
Recommendation ITU-R BT 500–6 (1994) Method for the subjective assessment of the quality of television pictures
Sarkar S, Phillips PJ, Liu Z, Vega IR, Grother P, Bowyer KW (2005) The humanID gait challenge problem: data sets, performance, and analysis. IEEE Trans Pattern Anal Mach Intell 27(2):162–177. doi:10.1109/TPAMI.2005.39
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local SVM approach. In: Proceedings of Int’l Conference on Patter Regocgnition, pp. 32–36
Sharifi K, Leon-Garcia A (1995) Estimation of shape parameter for generalized Gaussian distributions in subband decompositions of video. IEEE Trans Circ Syst Video Tech 5:52–56. doi:10.1109/76.350779
Simoncelli EP, Duccigrossi RW (1997) Embedded wavelet image compression based on a joint property model. In: Proceedings of the IEEE Int’l Conf. On Image Processing 1:640–643. doi: 10.1109/ICIP.1997.647994
Simoncelli EP, Portilla J (1998) Texture characterization via joint statistics of wavelet coefficient magnitudes. In Proc. of IEEE Int’l Conf. on Image Processing 2:62–66. doi: 10.1109/ICIP.1998.723417
Song Y, Goncalves L, Perona P (2003) Unsupervised learning of human motion. IEEE Trans Pattern Anal Mach Intell 25(7):814–827. doi:10.1109/TPAMI.2003.1206511
Sun X, Chen M, Hauptmann A (2009) Action recognition via local descriptors and holistic features. In Proc. Of IEEE Int’l Conf. on Computer Vision and Pattern Recognition Workshops 58–65. doi: 10.1109/CVPRW.2009.5204255
Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circ Syst Video Tech 18(11):1473–1488. doi:10.1109/TCSVT.2008.2005594
Wong SF, Cipolla R (2007) Extracting spatiotemporal interest points using global information. In: Proceedings of Int’l Conference on Computer Vision, pp. 1–8
Wong SF, Kim TK, Cipolla R (2007) Learning motion categories using both semantic and structural information. In: Proceedings of Int’l Conference on Computer Vision and Pattern Recognition, pp. 1–8
Wouwer GV, Scheunders P, Dyck DV (1999) Statistical texture characterization from discrete wavelet representations. IEEE Trans Image Process 8(4):592–598. doi:10.1109/83.753747
Xu G, Ma YF, Zhang HJ, Yang SQ (2005) HMM-based framework for video semantic analysis. IEEE Trans Circ Syst Video Tech 15(11):1422–1433. doi:10.1109/TCSVT.2005.856903
Zhai Y, Shah M (2006) Video scene segmentation using Markov chain Monte Carlo. IEEE Trans Multimed 8(4):686–697. doi:10.1109/TMM.2006.876299
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Omidyeganeh, M., Ghaemmaghami, S. & Shirmohammadi, S. Application of 3D-wavelet statistics to video analysis. Multimed Tools Appl 65, 441–465 (2013). https://doi.org/10.1007/s11042-012-1012-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1012-5