Article

Greedy layer-wise training of deep networks

Authors:

Pascal Lamblin,

Hugo LarochelleAuthors Info & Claims

NIPS'06: Proceedings of the 20th International Conference on Neural Information Processing Systems

Pages 153 - 160

Published: 04 December 2006 Publication History

Abstract

Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

References

[1]

Allender, E. (1996). Circuit complexity before the dawn of the new millennium. In 16th Annual Conference on Foundations of Software Technology and Theoretical Computer Science, pp. 1-18. Lecture Notes in Computer Science 1180.

Digital Library

[2]

Bengio, Y., Delalleau, O., & Le Roux, N. (2006). The curse of highly variable functions for local kernel machines. In Weiss, Y., Schölkopf, B., & Platt, J. (Eds.), Advances in Neural Information Processing Systems 18, pp. 107-114. MIT Press, Cambridge, MA.

[3]

Bengio, Y., & Le Cun, Y. (2007). Scaling learning algorithms towards AI. In Bottou, L., Chapelle, O., DeCoste, D., & Weston, J. (Eds.), Large Scale Kernel Machines. MIT Press.

[4]

Bengio, Y., Le Roux, N., Vincent, P., Delalleau, O., & Marcotte, P. (2006). Convex neural networks. In Weiss, Y., Schölkopf, B., & Platt, J. (Eds.), Advances in Neural Information Processing Systems 18, pp. 123-130. MIT Press, Cambridge, MA.

[5]

Chen, H., & Murray, A. (2003). A continuous restricted boltzmann machine with an implementable training algorithm. IEE Proceedings of Vision, Image and Signal Processing, 150(3), 153-158.

[6]

Fahlman, S., & Lebiere, C. (1990). The cascade-correlation learning architecture. In Touretzky, D. (Ed.), Advances in Neural Information Processing Systems 2, pp. 524-532 Denver, CO. Morgan Kaufmann, San Mateo.

Digital Library

[7]

Hastad, J. T. (1987). Computational Limitations for Small Depth Circuits. MIT Press, Cambridge, MA.

Digital Library

[8]

Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527-1554.

Digital Library

[9]

Hinton, G. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771-1800.

Digital Library

[10]

Hinton, G., Dayan, P., Frey, B., & Neal, R. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268, 1558-1161.

[11]

Hinton, G., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507.

[12]

Lengellé, R., & Denoeux, T. (1996). Training MLPs layer by layer using an objective function for internal representations. Neural Networks, 9, 83-97.

Digital Library

[13]

Movellan, J., Mineiro, P., & Williams, R. (2002). A monte-carlo EM approach for partially observable diffusion processes: theory and applications to neural networks. Neural Computation, 14, 1501-1544.

Digital Library

[14]

Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257-277.

Digital Library

[15]

Utgoff, P., & Stracuzzi, D. (2002). Many-layered learning. Neural Computation, 14, 2497-2539.

Digital Library

[16]

Welling, M., Rosen-Zvi, M., & Hinton, G. E. (2005). Exponential family harmoniums with an application to information retrieval. In Advances in Neural Information Processing Systems, Vol. 17 Cambridge, MA. MIT Press.

Cited By

Balasamy KSeethalakshmi VSuganyadevi S(2024)Medical Image Analysis Through Deep Learning Techniques: A Comprehensive SurveyWireless Personal Communications: An International Journal10.1007/s11277-024-11428-1137:3(1685-1714)Online publication date: 31-Jul-2024
https://dl.acm.org/doi/10.1007/s11277-024-11428-1
Lei HLei YChen ZLi SHuang ZZhou FTan EXiao XLei YHu HHuang YLiu CLei B(2023)Early diagnosis and clinical score prediction of Parkinson's disease based on longitudinal neuroimaging dataNeural Computing and Applications10.1007/s00521-023-08508-x35:22(16429-16455)Online publication date: 9-May-2023
https://dl.acm.org/doi/10.1007/s00521-023-08508-x
Moskalev ASepliarskaia ASosnovik ISmeulders AKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)LieGGProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602098(25212-25223)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602098
Show More Cited By

Index Terms

Greedy layer-wise training of deep networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Research on Point-wise Gated Deep Networks

Display Omitted We introduce pgRBMs into DBNs and present Point-wise Gated Deep Belief Networks.Similar to pgDBNs, Point-wise Gated Deep Boltzmann Machines are presented.We introduce dropout and weight uncertainty methods into pgRBMs.We discuss the ...
A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks

In this paper, through extension of the present methods and based on error minimization, two fast and efficient layer-by-layer pre-training methods are proposed for initializing deep neural network (DNN) weights. Due to confrontation with a large number ...
Greedy deep transform learning
2017 IEEE International Conference on Image Processing (ICIP)
We introduce deep transform learning — a new tool for deep learning. Deeper representation is learnt by stacking one transform after another. The learning proceeds in a greedy way. The first layer learns the transform and features from the input ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'06: Proceedings of the 20th International Conference on Neural Information Processing Systems

December 2006

1632 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 04 December 2006

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

327
Total Citations
View Citations
1
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Balasamy KSeethalakshmi VSuganyadevi S(2024)Medical Image Analysis Through Deep Learning Techniques: A Comprehensive SurveyWireless Personal Communications: An International Journal10.1007/s11277-024-11428-1137:3(1685-1714)Online publication date: 31-Jul-2024
https://dl.acm.org/doi/10.1007/s11277-024-11428-1
Lei HLei YChen ZLi SHuang ZZhou FTan EXiao XLei YHu HHuang YLiu CLei B(2023)Early diagnosis and clinical score prediction of Parkinson's disease based on longitudinal neuroimaging dataNeural Computing and Applications10.1007/s00521-023-08508-x35:22(16429-16455)Online publication date: 9-May-2023
https://dl.acm.org/doi/10.1007/s00521-023-08508-x
Moskalev ASepliarskaia ASosnovik ISmeulders AKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)LieGGProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602098(25212-25223)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602098
Han XZhang YWang WWang B(2022)Text Adversarial Attacks and DefensesSecurity and Communication Networks10.1155/2022/64584882022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/6458488
Xi T(2022)Research on the Evaluation Algorithm of English Viewing, Listening, and Speaking Teaching Effect Based on DA-BP Neural NetworkMobile Information Systems10.1155/2022/46214052022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/4621405
Chawla AJacob PFarrell PAumayr EFallon S(2022)Towards Interpretable Anomaly Detection: Unsupervised Deep Neural Network Approach using Feedback LoopNOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium10.1109/NOMS54207.2022.9789914(1-9)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1109/NOMS54207.2022.9789914
Heydarzadeh MKia SNourani MHenao HCapolino G(2022)Gear fault diagnosis using discrete wavelet transform and deep neural networksIECON 2016 - 42nd Annual Conference of the IEEE Industrial Electronics Society10.1109/IECON.2016.7793549(1494-1500)Online publication date: 18-Apr-2022
https://dl.acm.org/doi/10.1109/IECON.2016.7793549
Cui PWiese K(2022)EvoDNN - Evolving Weights, Biases, and Activation Functions in a Deep Neural Network2022 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)10.1109/CIBCB55180.2022.9863054(1-9)Online publication date: 15-Aug-2022
https://dl.acm.org/doi/10.1109/CIBCB55180.2022.9863054
Catal CGiray GTekinerdogan BKumar SShukla S(2022)Applications of deep learning for phishing detection: a systematic literature reviewKnowledge and Information Systems10.1007/s10115-022-01672-x64:6(1457-1500)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.1007/s10115-022-01672-x
Ma JZou XSun LCheng YLu CSu YChong JJin HLin Y(2022)A prediction-based cycle life test optimization method for cross-formula batteries using instance transfer and variable-length-input deep learning modelNeural Computing and Applications10.1007/s00521-022-07322-135:4(2947-2971)Online publication date: 17-Jun-2022
https://dl.acm.org/doi/10.1007/s00521-022-07322-1
Show More Cited By

View Options

View options

Figures

Tables

Media

View Table of Conten