Abstract
Time warping is the popular technique of temporally aligning two sequences and has successfully applied in temporal alignment tasks such as activity recognition. However, existing time warping methods suffer from limited representation ability because aligning process is performed on either raw sequences or the projected lower-dimensional features. In this paper, we propose a stacked time warping framework (STW) to learn layer-wise representation for temporal alignment in a stacked structure. By using this structure, STW gives higher flexibility than existing methods meanwhile unifies them into a deep architecture. Based on the proposed STW framework, we explore a stacked marginal time warping (SMTW) method by using marginal stacked denoising autoencoder (mSDA) as the regularization term which enables SMTW to marginalize out noises and learn layer-wise non-linear representations with the effective closed-form solution. Benefitting from the incorporation of mSDA, SMTW achieves better alignment performance and keeps comparable time efficiency with regular time warping methods. Experiments on both synthetic data and practical human activity recognition datasets demonstrate that SMTW is superior to the state-of-the-art time warping methods in quantity.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust, Speech, Signal Process 26(1):43–49
Zhou F, Torre F (2009) Canonical time warping for alignment of human behavior. In: Advances in Neural Information Processing Systems, pp 2286–2294
King B, Smaragdis P, Mysore GJ (2012) Noise-robust dynamic time warping using plca features. In: IEEE International Conference on Acoustics. Speech and Signal Processing, pp 1973–1976
Listgarten J, Neal RM, Roweis ST, Emili A (2004) Multiple alignment of continuous time series. In: Advances in Neural Information Processing Systems, pp 817–824
Junejo IN, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33(1):172–185
Li X, Liu T, Deng J, Tao D (2016) Video face editing using temporal-spatial-smooth warping. ACM Trans Intell Syst Technol 7(3):1–28
Shariat S, Pavlovic V (2011) Isotonic cca for sequence alignment and activity recognition. In: International Conference on Computer Vision, pp 2572–2578
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Liu W, Zha ZJ, Wang Y, Lu K, Tao D (2016) \(p\)-laplacian regularized sparse coding for human activity recognition. IEEE Trans Industrial Electron 63(8):5120–5129
Zhou F, De la Torre F (2012) Generalized time warping for multi-modal alignment of human motion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1282–1289
Anderson TW (1962) An introduction to multivariate statistical analysis. Tech. rep, Wiley, New York
Gong D, Medioni G (2011) Dynamic manifold warping for view invariant action recognition. In: International Conference on Computer Vision, pp 571–578
Vu HT, Carey C, Mahadevan S (2012) Manifold warping: Manifold alignment over time. In: The 26th AAAI Conference on Artificial Intelligence, pp 1155–1161
Panagakis Y, Nicolaou MA, Zafeiriou S, Pantic M (2013) Robust canonical time warping for the alignment of grossly corrupted sequences. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 540–547
Zhou F, Torre FDL (2016) Generalized canonical time warping. IEEE Trans Pattern Anal Mach Intell 38(2):279–294
Su B, Hua G (2017) Order-preserving wasserstein distance for sequence matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2906–2914
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition pp 770–778
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representation
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision, pp 1026–1034
Girshick R (2015) Fast r-cnn. In: IEEE International Conference on Computer Vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp 91–99
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition
Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. In: Advances in Neural Information Processing Systems, pp 809–817
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: International Conference on Machine Learning, pp 1096–1103
Chen M, Weinberger KQ, Xu ZE, Sha F (2015) Marginalizing stacked linear denoising autoencoders. J Mach Learn Res 16:3849–3875
Wei P, Ke Y, Goh CK (2016) Deep nonlinear feature coding for unsupervised domain adaptation. In: International Joint Conference on Artificial Intelligence, pp 2189–2195
Ding Z, Shao M, Fu Y (2015) Deep low-rank coding for transfer learning. In: International Joint Conference on Artificial Intelligence, pp 3453–3459
Zhou JT, Pan SJ, Tsang IW, Yan Y (2014) Hybrid heterogeneous transfer learning through deep learning. In: The 28th AAAI Conference on Artificial Intelligence, pp 2213–2220
Jiang W, Gao H, Chung Fl, Huang H (2016) The \(l_{2,1}\)-norm stacked robust autoencoders for domain adaptation. In: The Thirtieth AAAI Conference on Artificial Intelligence, pp 1723–1729
Li S, Kawale J, Fu Y (2015) Deep collaborative filtering via marginalized denoising auto-encoder. In: ACM International on Conference on Information and Knowledge Management, pp 811–820
Chen Z, Chen M, Weinberger KQ, Zhang W (2015) Marginalized denoising for link prediction and multi-label learning. In: The 29th AAAI Conference on Artificial Intelligence, pp 1707–1713
Majumdar A (2015) Real-time dynamic mri reconstruction using stacked denoising autoencoder. arXiv:1503.06383
Shao M, Li S, Ding Z, Fu Y (2015) Deep linear coding for fast graph clustering. In: The 29th AAAI Conference on Artificial Intelligence, pp 3798–3804
Xu ZE, Chen M, Weinberger KQ, Sha F (2012) From sbow to dcot marginalized encoders for text representation. In: ACM International Conference on Information and Knowledge Management, pp 1879–1884
Nie L, Wang Y, Zhang X, Huang X, Luo Z (2016) Enhancing temporal alignment with autoencoder. In: International Joint Conference on Neural Network, pp 4873–4879
Liu W, Yang X, Tao D, Cheng J, Tang Y (2017) Multiview dimension reduction via hessian multiset canonical correlations. Inf Fus 41:119–128
Yang X, Liu W, Tao D, Cheng J (2017) Canonical correlation analysis networks for two-view image recognition. Inf Sci 385:338–352
Guan N, Zhang X, Luo Z, Lan L (2012) Sparse representation based discriminative canonical correlation analysis for face recognition. In: International Conference on Machine Learning and Applications, pp 51–56
Van Der Maaten L, Chen M, Tyree S, Weinberger KQ (2013) Learning with marginalized corrupted features. In: International Conference on Machine Learning, pp 410–418
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Ouyang W, Zeng X, Wang X (2016) Partial occlusion handling in pedestrian detection with a deep model. IEEE Trans Circuits Syst Video Technol 26(11):2123–2137
Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2006) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153–160
Liu T, Gong M, Tao D (2017) Large-cone nonnegative matrix factorization. IEEE Trans Neural Netw Learn Syst 28(9):2129–2142
Liu T, Tao D, Xu D (2016) Dimensionality-dependent generalization bounds for k-dimensional coding schemes. Neural Comput 28(10):2213–2249
Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: International Conference on Machine Learning, pp 689–696
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):11
Nikitidis S, Zafeiriou S, Pantic M (2014) Merging svms with linear discriminant analysis: a combined model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1067–1074
Nene SA, Nayar SK, Murase H, et al (1996) Columbia object image library (coil-20). Tech. rep., Technical report CUCS-005-96
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. Int Conf Pattern Recognit 3:32–36
Jolliffe I (2002) Principal component analysis. Wiley, New York
Alpaydm E (1999) Combined 5 \(\times \) 2 cv f test for comparing supervised classification learning algorithms. Neural Comput 11(8):1885–1892
Acknowledgements
This work was supported by the National Key Research and Development Program of China [2016YFB0200401] and the National Natural Science Foundation of China [U1435222].
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, X., Nie, L., Lan, L. et al. Stacked Marginal Time Warping for Temporal Alignment. Neural Process Lett 49, 711–735 (2019). https://doi.org/10.1007/s11063-018-9834-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-018-9834-4