Stacked Marginal Time Warping for Temporal Alignment

Zhang, Xiang; Nie, Liquan; Lan, Long; Huang, Xuhui; Luo, Zhigang

doi:10.1007/s11063-018-9834-4

Stacked Marginal Time Warping for Temporal Alignment

Published: 14 May 2018

Volume 49, pages 711–735, (2019)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Xiang Zhang^1,3,
Liquan Nie^2,3,
Long Lan^1,3,
Xuhui Huang^3,4 &
…
Zhigang Luo^2,3

271 Accesses
Explore all metrics

Abstract

Time warping is the popular technique of temporally aligning two sequences and has successfully applied in temporal alignment tasks such as activity recognition. However, existing time warping methods suffer from limited representation ability because aligning process is performed on either raw sequences or the projected lower-dimensional features. In this paper, we propose a stacked time warping framework (STW) to learn layer-wise representation for temporal alignment in a stacked structure. By using this structure, STW gives higher flexibility than existing methods meanwhile unifies them into a deep architecture. Based on the proposed STW framework, we explore a stacked marginal time warping (SMTW) method by using marginal stacked denoising autoencoder (mSDA) as the regularization term which enables SMTW to marginalize out noises and learn layer-wise non-linear representations with the effective closed-form solution. Benefitting from the incorporation of mSDA, SMTW achieves better alignment performance and keeps comparable time efficiency with regular time warping methods. Experiments on both synthetic data and practical human activity recognition datasets demonstrate that SMTW is superior to the state-of-the-art time warping methods in quantity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating time series averages from latent space of multi-tasking neural networks

Article 05 July 2023

Efficient spatio-temporal network for action recognition

Article 23 August 2024

Coarse-DTW for Sparse Time Series Alignment

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

http://mocap.cs.cmu.edu.

References

Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust, Speech, Signal Process 26(1):43–49
Article MATH Google Scholar
Zhou F, Torre F (2009) Canonical time warping for alignment of human behavior. In: Advances in Neural Information Processing Systems, pp 2286–2294
King B, Smaragdis P, Mysore GJ (2012) Noise-robust dynamic time warping using plca features. In: IEEE International Conference on Acoustics. Speech and Signal Processing, pp 1973–1976
Listgarten J, Neal RM, Roweis ST, Emili A (2004) Multiple alignment of continuous time series. In: Advances in Neural Information Processing Systems, pp 817–824
Junejo IN, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33(1):172–185
Article Google Scholar
Li X, Liu T, Deng J, Tao D (2016) Video face editing using temporal-spatial-smooth warping. ACM Trans Intell Syst Technol 7(3):1–28
Google Scholar
Shariat S, Pavlovic V (2011) Isotonic cca for sequence alignment and activity recognition. In: International Conference on Computer Vision, pp 2572–2578
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet MATH Google Scholar
Liu W, Zha ZJ, Wang Y, Lu K, Tao D (2016) $p$-laplacian regularized sparse coding for human activity recognition. IEEE Trans Industrial Electron 63(8):5120–5129
Google Scholar
Zhou F, De la Torre F (2012) Generalized time warping for multi-modal alignment of human motion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1282–1289
Anderson TW (1962) An introduction to multivariate statistical analysis. Tech. rep, Wiley, New York
Gong D, Medioni G (2011) Dynamic manifold warping for view invariant action recognition. In: International Conference on Computer Vision, pp 571–578
Vu HT, Carey C, Mahadevan S (2012) Manifold warping: Manifold alignment over time. In: The 26th AAAI Conference on Artificial Intelligence, pp 1155–1161
Panagakis Y, Nicolaou MA, Zafeiriou S, Pantic M (2013) Robust canonical time warping for the alignment of grossly corrupted sequences. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 540–547
Zhou F, Torre FDL (2016) Generalized canonical time warping. IEEE Trans Pattern Anal Mach Intell 38(2):279–294
Article Google Scholar
Su B, Hua G (2017) Order-preserving wasserstein distance for sequence matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2906–2914
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition pp 770–778
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representation
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision, pp 1026–1034
Girshick R (2015) Fast r-cnn. In: IEEE International Conference on Computer Vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp 91–99
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition
Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. In: Advances in Neural Information Processing Systems, pp 809–817
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: International Conference on Machine Learning, pp 1096–1103
Chen M, Weinberger KQ, Xu ZE, Sha F (2015) Marginalizing stacked linear denoising autoencoders. J Mach Learn Res 16:3849–3875
MathSciNet Google Scholar
Wei P, Ke Y, Goh CK (2016) Deep nonlinear feature coding for unsupervised domain adaptation. In: International Joint Conference on Artificial Intelligence, pp 2189–2195
Ding Z, Shao M, Fu Y (2015) Deep low-rank coding for transfer learning. In: International Joint Conference on Artificial Intelligence, pp 3453–3459
Zhou JT, Pan SJ, Tsang IW, Yan Y (2014) Hybrid heterogeneous transfer learning through deep learning. In: The 28th AAAI Conference on Artificial Intelligence, pp 2213–2220
Jiang W, Gao H, Chung Fl, Huang H (2016) The $l_{2,1}$-norm stacked robust autoencoders for domain adaptation. In: The Thirtieth AAAI Conference on Artificial Intelligence, pp 1723–1729
Li S, Kawale J, Fu Y (2015) Deep collaborative filtering via marginalized denoising auto-encoder. In: ACM International on Conference on Information and Knowledge Management, pp 811–820
Chen Z, Chen M, Weinberger KQ, Zhang W (2015) Marginalized denoising for link prediction and multi-label learning. In: The 29th AAAI Conference on Artificial Intelligence, pp 1707–1713
Majumdar A (2015) Real-time dynamic mri reconstruction using stacked denoising autoencoder. arXiv:1503.06383
Shao M, Li S, Ding Z, Fu Y (2015) Deep linear coding for fast graph clustering. In: The 29th AAAI Conference on Artificial Intelligence, pp 3798–3804
Xu ZE, Chen M, Weinberger KQ, Sha F (2012) From sbow to dcot marginalized encoders for text representation. In: ACM International Conference on Information and Knowledge Management, pp 1879–1884
Nie L, Wang Y, Zhang X, Huang X, Luo Z (2016) Enhancing temporal alignment with autoencoder. In: International Joint Conference on Neural Network, pp 4873–4879
Liu W, Yang X, Tao D, Cheng J, Tang Y (2017) Multiview dimension reduction via hessian multiset canonical correlations. Inf Fus 41:119–128
Article Google Scholar
Yang X, Liu W, Tao D, Cheng J (2017) Canonical correlation analysis networks for two-view image recognition. Inf Sci 385:338–352
Article Google Scholar
Guan N, Zhang X, Luo Z, Lan L (2012) Sparse representation based discriminative canonical correlation analysis for face recognition. In: International Conference on Machine Learning and Applications, pp 51–56
Van Der Maaten L, Chen M, Tyree S, Weinberger KQ (2013) Learning with marginalized corrupted features. In: International Conference on Machine Learning, pp 410–418
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Ouyang W, Zeng X, Wang X (2016) Partial occlusion handling in pedestrian detection with a deep model. IEEE Trans Circuits Syst Video Technol 26(11):2123–2137
Article Google Scholar
Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2006) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153–160
Google Scholar
Liu T, Gong M, Tao D (2017) Large-cone nonnegative matrix factorization. IEEE Trans Neural Netw Learn Syst 28(9):2129–2142
MathSciNet Google Scholar
Liu T, Tao D, Xu D (2016) Dimensionality-dependent generalization bounds for k-dimensional coding schemes. Neural Comput 28(10):2213–2249
Article MathSciNet Google Scholar
Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: International Conference on Machine Learning, pp 689–696
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):11
Article MathSciNet MATH Google Scholar
Nikitidis S, Zafeiriou S, Pantic M (2014) Merging svms with linear discriminant analysis: a combined model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1067–1074
Nene SA, Nayar SK, Murase H, et al (1996) Columbia object image library (coil-20). Tech. rep., Technical report CUCS-005-96
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. Int Conf Pattern Recognit 3:32–36
Google Scholar
Jolliffe I (2002) Principal component analysis. Wiley, New York
MATH Google Scholar
Alpaydm E (1999) Combined 5 $\times $ 2 cv f test for comparing supervised classification learning algorithms. Neural Comput 11(8):1885–1892
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China [2016YFB0200401] and the National Natural Science Foundation of China [U1435222].

Author information

Authors and Affiliations

State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, 410073, China
Xiang Zhang & Long Lan
Science and Technology on Parallel and distributed Processing, National University of Defense Technology, Changsha, 410073, China
Liquan Nie & Zhigang Luo
College of Computer, National University of Defense Technology, Changsha, 410073, China
Xiang Zhang, Liquan Nie, Long Lan, Xuhui Huang & Zhigang Luo
Department of Computer Science, National University of Defense Technology, Changsha, 410073, China
Xuhui Huang

Authors

Xiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Liquan Nie
View author publications
You can also search for this author in PubMed Google Scholar
Long Lan
View author publications
You can also search for this author in PubMed Google Scholar
Xuhui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhigang Luo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Nie, L., Lan, L. et al. Stacked Marginal Time Warping for Temporal Alignment. Neural Process Lett 49, 711–735 (2019). https://doi.org/10.1007/s11063-018-9834-4

Download citation

Published: 14 May 2018
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s11063-018-9834-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stacked Marginal Time Warping for Temporal Alignment

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Estimating time series averages from latent space of multi-tasking neural networks

Efficient spatio-temporal network for action recognition

Coarse-DTW for Sparse Time Series Alignment

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now