Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Dynamic texture representation using a deep multi-scale convolutional network

Published: 01 February 2017 Publication History

Abstract

A multilayer convolutional network for recognition of dynamic textures.Extension of the proposed network to a multi-scale framework.An analysis of generalisation performance of the proposed approach.Extensive evaluation of the proposed multi-scale PCANet-TOP approach. This work addresses dynamic texture representation and recognition via a convolutional multilayer architecture. The proposed method considers an image sequence as a concatenation of spatial images along the time axis as well as spatio-temporal images along both horizontal and vertical axes of an image sequence and uses multilayer convolutional operations to describe each plane. The filters used are learned via principal component analysis (PCA) on each of the three orthogonal planes of an image sequence. A particularly advantageous attribute of the technique is the unsupervised training procedure of the proposed network. An inter-database evaluation has been performed to investigate the generalisation capability of the proposed approach. Moreover, a multi-scale extension of the proposed architecture is presented to capture texture details at multiple resolutions. Through extensive evaluations on different databases, it is shown that the proposed PCA-based network on three orthogonal planes (PCANet-TOP) yields very discriminative features for dynamic texture classification.

References

[1]
A. Ravichandran, R. Chaudhry, R. Vidal, Categorizing dynamic textures using a bag of dynamical systems, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 342-353.
[2]
H. Ji, X. Yang, H. Ling, Y. Xu, Wavelet domain multifractal analysis for static and dynamic texture classification, IEEE Trans. Image Process., 22 (2013) 286-299.
[3]
A. Chan, E. Coviello, G. Lanckriet, Clustering dynamic textures with the hierarchical em algorithm, in: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2022-2029.
[4]
A. Mumtaz, E. Coviello, G. Lanckriet, A. Chan, A scalable and accurate descriptor for dynamic textures using bag of system trees, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015) 697-712.
[5]
Y. Qiao, L. Weng, Hidden markov model based dynamic texture classification, IEEE Signal Process. Lett., 22 (2015) 509-512.
[6]
A. Ramirez Rivera, O. Chae, Spatiotemporal directional number transitional graph for dynamic texture recognition, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015) 2146-2152.
[7]
G. Johansson, Visual perception of biological motion and a model for its analysis, Percept. Psychophys., 14 (1973) 201-211.
[8]
V. Bruce, P.R. Green, M. Georgeson, Visual Perception, Psychology Press, UK, 1996.
[9]
R. Pteri, S. Fazekas, M.J. Huiskes, DynTex: a comprehensive database of dynamic textures, Pattern Recogn. Lett. <http://projects.cwi.nl/dyntex/>.
[10]
G. Doretto, A. Chiuso, Y.N. Wu, S. Soatto, Dynamic textures, IJCV, 51 (2003) 91-109.
[11]
A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope, Int. J. Comput. Vision, 42 (2001) 145-175.
[12]
G. Zhao, M. Pietikinen, Dynamic texture recognition using local binary patterns with an application to facial expressions, IEEE Trans. Pattern Anal. Mach. Intell., 29 (2007) 915-928.
[13]
Q. Zhen, D. Huang, Y. Wang, L. Chen, Lpq based static and dynamic modeling of facial expressions in 3d videos, in: Lecture Notes in Computer Science, vol. 8232, Springer International Publishing, 2013, pp. 122-129.
[14]
B. Jiang, M. Valstar, B. Martinez, M. Pantic, A dynamic appearance descriptor approach to facial actions temporal modeling, IEEE Trans. Cybern., 44 (2014) 161-174.
[15]
Y. Xu, Y. Quan, H. Ling, H. Ji, Dynamic texture classification using dynamic fractal analysis, in: ICCV, IEEE, 2011, pp. 1219-1226.
[16]
I. Goodfellow, A. Courville, Y. Bengio, Deep Learning, Book in Preparation for MIT Press, 2015. <http://goodfeli.github.io/dlbook/>.
[17]
S.R. Arashloo, A comparison of deep multilayer networks and markov random field matching models for face recognition in the wild, IET Comput. Vision (2016).
[18]
S.R. Arashloo, J. Kittler, Fast pose invariant face recognition using super coupled multiresolution Markov random fields on a {GPU}, Pattern Recogn. Lett., 48 (2014) 49-59.
[19]
S.R. Arashloo, J. Kittler, Hierarchical image matching for pose-invariant face recognition, in: BMVC, British Machine Vision Association, 2009.
[20]
J. Kittler, Energy minimization methods in computer vision and pattern recognition, in: 7th International Conference, EMMCVPR 2009, Bonn, Germany, August 2427, 2009, Springer, Berlin, Heidelberg, 2009, pp. 56-69.
[21]
S.R. Arashloo, J. Kittler, W.J. Christmas, Facial feature localization using graph matching with higher order statistical shape priors and global optimization, in: Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on, 2010, pp. 18.
[22]
S.R. Arashloo, J. Kittler, Class-specific kernel fusion of multiple descriptors for face verification using multiscale binarised statistical image features, IEEE Trans. Inf. Forensics Secur., 9 (2014) 2100-2109.
[23]
A.S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson, Cnn features off-the-shelf: an astounding baseline for recognition, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014, pp. 512-519.
[24]
L. Deng, J. Li, J.-T. Huang, K. Yao, D. Yu, F. Seide, M.L. Seltzer, G. Zweig, X. He, J. Williams, Y. Gong, A. Acero, Recent advances in deep learning for speech research at microsoft, in: ICASSP, IEEE, 2013, pp. 8604-8608.
[25]
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, Overfeat: integrated recognition, localization and detection using convolutional networks, in: International Conference on Learning Representations (ICLR 2014), April 2014.
[26]
Y. Sun, Y. Chen, X. Wang, X. Tang, Deep learning face representation by joint identification-verification, in: Advances in Neural Information Processing Systems 27, Curran Associates, Inc., 2014, pp. 1988-1996.
[27]
R. Collobert, J. Weston, A unified architecture for natural language processing: deep neural networks with multitask learning, in: ACM International Conference Proceeding Series, vol. 307, ACM, 2008, pp. 160-167.
[28]
Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998) 2278-2324.
[29]
T.H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, Y. Ma, Pcanet: a simple deep learning baseline for image classification?, IEEE Trans. Image Process., 24 (2015) 5017-5032.
[30]
S.R. Arashloo, J. Kittler, Dynamic texture recognition using multiscale binarized statistical image features, IEEE Trans. Multimedia, 16 (2014) 2099-2109.
[31]
T. Kung, W. Richards, Inferring water from images, Nat. Comput. (1988) 224-233.
[32]
Y. Wang, S. Chun Zhu, Modeling textured motion: particle, wave and sketch, in: IEEE Int. Conf. on Computer Vision, ICCV03, 2003, pp. 213220.
[33]
M. Szummer, R.W. Picard, Temporal texture modeling, in: ICIP (3), IEEE, 1996, pp. 823-826.
[34]
A.W. Fitzgibbon, Stochastic rigidity: image registration for nowhere-static scenes, in: ICCV, 2001, pp. 662-669.
[35]
Z. Bar-Joseph, R. El-Yaniv, D. Lischinski, M. Werman, Texture mixing and texture movie synthesis using statistical learning, IEEE Trans. Vis. Comput. Graph., 7 (2001) 120-135.
[36]
A.B. Chan, N. Vasconcelos, Probabilistic kernels for the classification of auto-regressive visual processes, in: IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 846-851.
[37]
F. Woolfe, A.W. Fitzgibbon, Shift-invariant dynamic texture recognition, in: Lecture Notes in Computer Science, vol. 3952, Springer, 2006, pp. 549-562.
[38]
A. Ravichandran, R. Chaudhry, R. Vidal, View-invariant dynamic texture recognition using a bag of dynamical systems, in: CVPR, IEEE, 2009, pp. 1651-1657.
[39]
D. Chetverikov, R. Pteri, A brief survey of dynamic texture description and recognition, in: Advances in Soft Computing, vol. 30, Springer, 2005, pp. 17-26.
[40]
R. Pteri, D. Chetverikov, Dynamic texture recognition using normal flow and texture regularity, in: Lecture Notes in Computer Science, vol. 3523, Springer, 2005, pp. 223-230.
[41]
R. Polana, R. Nelson, Temporal texture and activity recognition, in: Computational Imaging and Vision, vol. 9, Springer, Netherlands, 1997, pp. 87-124.
[42]
K.G. Derpanis, R.P. Wildes, Spacetime texture representation and recognition based on a spatiotemporal orientation analysis, IEEE Trans. Pattern Anal. Mach. Intell., 34 (2012) 1193-1205.
[43]
M.R. Naphade, C.-Y. Lin, J.R. Smith, Video texture indexing using spatio-temporal wavelets, in: ICIP (2), 2002, pp. 437-440.
[44]
R.P. Wildes, J.R. Bergen, Qualitative spatiotemporal analysis using an oriented energy representation, in: Lecture Notes in Computer Science, vol. 1843, Springer, 2000, pp. 768-784.
[45]
T. Ojala, M. Pietikinen, T. Menp, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal. Mach. Intell., 24 (2002) 971-987.
[46]
X. Qi, C.-G. Li, G. Zhao, X. Hong, M. Pietikinen, Dynamic texture and scene classification by transferring deep image features, Neurocomputing, 171 (2016) 1230-1241.
[47]
D. Culibrk, N. Sebe, Temporal dropout of changes approach to convolutional learning of spatio-temporal features, in: ACM Multimedia, ACM, 2014, pp. 1201-1204.
[48]
C. Theriault, N. Thome, M. Cord, Dynamic scene classification: learning motion descriptors with slow features analysis, in: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 2603-2610.
[49]
G.W. Taylor, R. Fergus, Y. LeCun, C. Bregler, Convolutional learning of spatio-temporal features, in: Lecture Notes in Computer Science, vol. 6316, Springer, 2010, pp. 140-153.
[50]
Q. Le, W. Zou, S. Yeung, A. Ng, Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis, in: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 3361-3368.
[51]
S. Ji, W. Xu, M. Yang, K. Yu, 3d convolutional neural networks for human action recognition, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 221-231.
[52]
G.W. Taylor, G.E. Hinton, S.T. Roweis, Modeling human motion using binary latent variables, in: NIP S, MIT Press, 2006, pp. 1345-1352.
[53]
G.W. Taylor, G.E. Hinton, Factored conditional restricted Boltzmann machines for modeling motion style, in: ICML 09, ACM, New York, NY, USA, 2009, pp. 1025-1032.
[54]
Y. Bengio, Learning deep architectures for AI, Found. Trends Mach. Learn., 2 (2009) 1-127.
[55]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1-9.
[56]
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: NIPS, 2012, pp. 1106-1114.
[57]
P. Saisan, G. Doretto, Y.N. Wu, S. Soatto, Dynamic texture recognition, in: CVPR (2), IEEE Computer Society, 2001, pp. 58-63.
[58]
B. Ghanem, N. Ahuja, Maximum margin distance learning for dynamic texture recognition, in: Lecture Notes in Computer Science, vol. 6312, Springer, Berlin, Heidelberg, 2010, pp. 223-236.
[59]
A. Chan, N. Vasconcelos, Classifying video with kernel dynamic textures, in: Computer Vision and Pattern Recognition, 2007. CVPR 07. IEEE Conference on, 2007, pp. 1-6.
[60]
C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol., 2 (2011) 27:1-27:27. <http://www.csie.ntu.edu.tw/cjlin/libsvm>
[61]
S. Dubois, R. Pteri, M. Mnard, Characterization and recognition of dynamic textures based on the 2d+t curvelet transform, SIViP, 9 (2015) 819-830.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Visual Communication and Image Representation
Journal of Visual Communication and Image Representation  Volume 43, Issue C
February 2017
207 pages

Publisher

Academic Press, Inc.

United States

Publication History

Published: 01 February 2017

Author Tags

  1. Dynamic texture
  2. Multi-scale analysis
  3. Multilayer convolutional architectures
  4. PCA

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Adequately hierarchical patterns based on pairwise regionsMultimedia Systems10.1007/s00530-023-01217-430:1Online publication date: 28-Jan-2024
  • (2024)A Novel gray wolf optimization-based key frame extraction method for video classification using ConvLSTMNeural Computing and Applications10.1007/s00521-024-10266-336:32(20355-20385)Online publication date: 1-Nov-2024
  • (2023)Representing dynamic textures based on polarized gradient featuresMachine Vision and Applications10.1007/s00138-023-01438-734:5Online publication date: 28-Aug-2023
  • (2022)Dynamic Texture Classification Based on 3D ICA-Learned Filters and Fisher Vector Encoding in Big Data EnvironmentJournal of Signal Processing Systems10.1007/s11265-021-01737-094:11(1129-1143)Online publication date: 1-Nov-2022
  • (2022)Dynamic texture description using adapted bipolar-invariant and blurred featuresMultidimensional Systems and Signal Processing10.1007/s11045-022-00826-y33:3(945-979)Online publication date: 1-Sep-2022
  • (2021)A Comprehensive Taxonomy of Dynamic Texture RepresentationACM Computing Surveys10.1145/348789255:1(1-39)Online publication date: 23-Nov-2021
  • (2021)Prominent Local Representation for Dynamic Textures Based on High-Order Gaussian-GradientsIEEE Transactions on Multimedia10.1109/TMM.2020.299720223(1367-1382)Online publication date: 1-Jan-2021
  • (2021)Human action recognition using three orthogonal planes with unsupervised deep convolutional neural networkMultimedia Tools and Applications10.1007/s11042-021-10636-280:13(20019-20043)Online publication date: 1-May-2021
  • (2021)A novel background updation algorithm using fuzzy c-means clustering for pedestrian detectionMultimedia Tools and Applications10.1007/s11042-020-09897-080:5(7637-7651)Online publication date: 1-Feb-2021
  • (2021)Two-stream spatiotemporal feature fusion for human action recognitionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-020-01940-337:7(1821-1835)Online publication date: 1-Jul-2021
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media