research-article

Deep learning from temporal coherence in video

Authors:

Hossein Mobahi,

Ronan Collobert,

Jason WestonAuthors Info & Claims

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 737 - 744

https://doi.org/10.1145/1553374.1553469

Published: 14 June 2009 Publication History

Abstract

This work proposes a learning method for deep architectures that takes advantage of sequential data, in particular from the temporal coherence that naturally exists in unlabeled video recordings. That is, two successive frames are likely to contain the same object or objects. This coherence is used as a supervisory signal over the unlabeled data, and is used to improve the performance on a supervised task of interest. We demonstrate the effectiveness of this method on some pose invariant object and face recognition tasks.

References

[1]

Becker, S. (1996a). Learning Temporally Persistent Hierarchical Representations. Advances in Neural Information Processing Systems (pp. 824--830).

[2]

Becker, S. (1996b). Mutual information maximization: models of cortical self-organization. Network: Computation in Neural Systems, 7, 7--31.

[3]

Becker, S. (1999). Implicit Learning in 3D Object Recognition: The Importance of Temporal Context. Neural Computation, 11, 347--374.

Digital Library

[4]

Becker, S., & Hinton, G. (1992). Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature, 355, 161--163.

[5]

Belkin, M., Niyogi, P., & Sindhwani, V. (2005). On manifold regularization. Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS) (pp. 17--24).

[6]

Bottou, L. (1991). Stochastic gradient learning in neural networks. Proceedings of Neuro-Nîmes 91. Nimes, France: EC2.

[7]

Bowling, M., Ghodsi, A., & Wilkinson, D. (2005). Action respecting embedding. International Conference on Machine Learning (pp. 65--72).

Digital Library

[8]

Bromley, J., Bentz, J., W. Bottou, L., & Guyon, I. (1993). Signature verification using a siamese time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence (p. 669).

[9]

Caputo, B., Hornegger, J., Paulus, D., & Niemann, H. (2002). A spin-glass markov random field for 3-d object recognition (Technical Report LME-TR-2002-01). Institut fur Informatik, Universitat Erlangen Nurnberg.

[10]

Chapelle, O., Schölkopf, B., & Zien, A. (2006). Semi-supervised learning. Adaptive computation and machine learning. Cambridge, Mass., USA: MIT Press.

Digital Library

[11]

Chapelle, O., & Zien, A. (2003). Semi-Supervised Classification by Low Density Separation. Advances in Neural Information Processing Systems, 17, 1633--1640.

[12]

Chopra, S., Hadsell, R., & LeCun, Y. (2005). Learning a Similarity Measure Discriminatively, with Application to Face Verification. Proc. Computer Vision and Pattern Recognition Conference (pp. 539--546).

Digital Library

[13]

Hinton, G., & Sejnowski, T. (1999). Unsupervised Learning: Foundations of Neural Computation. MIT Press.

Digital Library

[14]

Huang, R., Metaxas, D. N., & Pavlovic, V. (2004). A hybrid face recognition method using markov random fields. Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 (pp. 157--160).

Digital Library

[15]

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86, 2278--2324.

[16]

LeCun, Y., Huang, F. J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. Proc. Computer Vision and Pattern Recognition Conference (pp. 97--104).

Digital Library

[17]

Nayar, S. K., Watanabe, M., & Noguchi, M. (1996). Real-time focus range sensor. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 1186--1198.

Digital Library

[18]

Osadchy, R., LeCun, Y., & Miller, M. (2007). Synergistic face detection and pose estimation with energy-based models. Journal of Machine Learning Research (pp. 1197--1215).

Digital Library

[19]

Roobaert, D., & Hulle, M. M. V. (1999). View-based 3d object recognition with support vector machines. In IEEE International Workshop on Neural Networks for Signal Processing (pp. 77--84).

[20]

Roweis, S., & Saul, L. (2000). Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290, 2323--2326.

[21]

Samaria, F., & Harter, A. (1994). Parameterisation of a stochastic model for human face identification. Proceedings of 2nd IEEE Workshop on Applications of Computer Vision (pp. 138--142).

[22]

Tenenbaum, J., Silva, V., & Langford, J. (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290, 2319--2323.

[23]

Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 30, 1958--1970.

Digital Library

[24]

Vapnik, V. (1995). The nature of statistical learning theory. Springer. Second edition.

Digital Library

[25]

Wersing, H., & Köörner, E. (2003). Learning optimized features for hierarchical models of invariant recognition. Neural Computation, 15, 1559--1599.

Digital Library

[26]

Weston, J., Rattle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. International Conference on Machine Learning (pp. 1168--1175).

Digital Library

[27]

Wiskott, L., & Sejnowski, T. (2002). Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14, 715--770.

Digital Library

Cited By

Song PZhao C(2024)Slow Down to Go Better: A Survey on Slow Feature AnalysisIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.320162135:3(3416-3436)Online publication date: Mar-2024
https://doi.org/10.1109/TNNLS.2022.3201621
Lai QZeng AWang YCao LLi YXu Q(2024)Self-Supervised Video Representation Learning via Capturing Semantic Changes Indicated by SaccadesIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329093834:8(6634-6645)Online publication date: Aug-2024
https://doi.org/10.1109/TCSVT.2023.3290938
Ernst MLópez FAubret AFleming RTriesch J(2024)Self-Supervised Learning of Color Constancy2024 IEEE International Conference on Development and Learning (ICDL)10.1109/ICDL61372.2024.10644375(1-7)Online publication date: 20-May-2024
https://doi.org/10.1109/ICDL61372.2024.10644375
Show More Cited By

Index Terms

Deep learning from temporal coherence in video
1. Computing methodologies
  1. Machine learning
  2. Modeling and simulation
    1. Model development and analysis
      1. Model verification and validation
      2. Modeling methodologies

Recommendations

Learning temporal coherence via self-supervision for GAN-based video generation

Our work explores temporal self-supervision for GAN-based video generation tasks. While adversarial training successfully yields generative models for a variety of areas, temporal relationships in the generated data are much less explored. Natural ...
Unsupervised learning from videos using temporal coherency deep networks
Abstract
In this work we address the challenging problem of unsupervised learning from videos. Existing methods utilize the spatio-temporal continuity in contiguous video frames as regularization for the learning process. Typically, this ...
Graphical abstract

Display Omitted
Highlights
- Results for the action and scene discovery problems are presented.
- Our models ...
Deep Label Distribution Learning With Label Ambiguity

Convolutional neural networks (ConvNets) have achieved excellent recognition performance in various visual recognition tasks. A large labeled training set is one of the most important factors for its success. However, it is difficult to collect ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

ISBN:9781605585161

DOI:10.1145/1553374

General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University

Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

Sponsors

NSF
Microsoft Research: Microsoft Research
MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICML '09

Sponsor:

Microsoft Research

ICML '09: The 26th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 14 - 18, 2009

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

198
Total Citations
View Citations
1,901
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)2

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Song PZhao C(2024)Slow Down to Go Better: A Survey on Slow Feature AnalysisIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.320162135:3(3416-3436)Online publication date: Mar-2024
https://doi.org/10.1109/TNNLS.2022.3201621
Lai QZeng AWang YCao LLi YXu Q(2024)Self-Supervised Video Representation Learning via Capturing Semantic Changes Indicated by SaccadesIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329093834:8(6634-6645)Online publication date: Aug-2024
https://doi.org/10.1109/TCSVT.2023.3290938
Ernst MLópez FAubret AFleming RTriesch J(2024)Self-Supervised Learning of Color Constancy2024 IEEE International Conference on Development and Learning (ICDL)10.1109/ICDL61372.2024.10644375(1-7)Online publication date: 20-May-2024
https://doi.org/10.1109/ICDL61372.2024.10644375
Huo JKong MLi WWu JLai YGao Y(2024)Towards efficient image and video style transfer via distillation and learnable feature transformationComputer Vision and Image Understanding10.1016/j.cviu.2024.103947241(103947)Online publication date: Apr-2024
https://doi.org/10.1016/j.cviu.2024.103947
Wu ZJiang YWu ZJiang Y(2024)Unsupervised Feature Learning for Video UnderstandingDeep Learning for Video Understanding10.1007/978-3-031-57679-9_6(93-127)Online publication date: 28-Mar-2024
https://doi.org/10.1007/978-3-031-57679-9_6
Liu JNi BChen YYu ZWang H(2023)Learning by Restoring Broken 3D GeometryIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326386745:9(11024-11039)Online publication date: 1-Sep-2023
https://doi.org/10.1109/TPAMI.2023.3263867
Stergiou ADeligiannis N(2023)Leaping Into Memories: Space-Time Deep Feature Synthesis2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00188(1966-1976)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.00188
Li LWang WZhou TLi JYang Y(2023)Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01794(18706-18716)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.01794
Ma YKlabjan D(2023)Semi-supervised 3D Video Information Retrieval with Deep Neural Network and Bi-directional Dynamic-time Warping Algorithm2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386172(758-767)Online publication date: 15-Dec-2023
https://doi.org/10.1109/BigData59044.2023.10386172
Wang WAhn EFeng DKim J(2023)A Review of Predictive and Contrastive Self-supervised Learning for Medical ImagesMachine Intelligence Research10.1007/s11633-022-1406-420:4(483-513)Online publication date: 3-Jun-2023
https://doi.org/10.1007/s11633-022-1406-4
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents