Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2976456.2976476guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Greedy layer-wise training of deep networks

Published: 04 December 2006 Publication History

Abstract

Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

References

[1]
Allender, E. (1996). Circuit complexity before the dawn of the new millennium. In 16th Annual Conference on Foundations of Software Technology and Theoretical Computer Science, pp. 1-18. Lecture Notes in Computer Science 1180.
[2]
Bengio, Y., Delalleau, O., & Le Roux, N. (2006). The curse of highly variable functions for local kernel machines. In Weiss, Y., Schölkopf, B., & Platt, J. (Eds.), Advances in Neural Information Processing Systems 18, pp. 107-114. MIT Press, Cambridge, MA.
[3]
Bengio, Y., & Le Cun, Y. (2007). Scaling learning algorithms towards AI. In Bottou, L., Chapelle, O., DeCoste, D., & Weston, J. (Eds.), Large Scale Kernel Machines. MIT Press.
[4]
Bengio, Y., Le Roux, N., Vincent, P., Delalleau, O., & Marcotte, P. (2006). Convex neural networks. In Weiss, Y., Schölkopf, B., & Platt, J. (Eds.), Advances in Neural Information Processing Systems 18, pp. 123-130. MIT Press, Cambridge, MA.
[5]
Chen, H., & Murray, A. (2003). A continuous restricted boltzmann machine with an implementable training algorithm. IEE Proceedings of Vision, Image and Signal Processing, 150(3), 153-158.
[6]
Fahlman, S., & Lebiere, C. (1990). The cascade-correlation learning architecture. In Touretzky, D. (Ed.), Advances in Neural Information Processing Systems 2, pp. 524-532 Denver, CO. Morgan Kaufmann, San Mateo.
[7]
Hastad, J. T. (1987). Computational Limitations for Small Depth Circuits. MIT Press, Cambridge, MA.
[8]
Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527-1554.
[9]
Hinton, G. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771-1800.
[10]
Hinton, G., Dayan, P., Frey, B., & Neal, R. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268, 1558-1161.
[11]
Hinton, G., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507.
[12]
Lengellé, R., & Denoeux, T. (1996). Training MLPs layer by layer using an objective function for internal representations. Neural Networks, 9, 83-97.
[13]
Movellan, J., Mineiro, P., & Williams, R. (2002). A monte-carlo EM approach for partially observable diffusion processes: theory and applications to neural networks. Neural Computation, 14, 1501-1544.
[14]
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257-277.
[15]
Utgoff, P., & Stracuzzi, D. (2002). Many-layered learning. Neural Computation, 14, 2497-2539.
[16]
Welling, M., Rosen-Zvi, M., & Hinton, G. E. (2005). Exponential family harmoniums with an application to information retrieval. In Advances in Neural Information Processing Systems, Vol. 17 Cambridge, MA. MIT Press.

Cited By

View all
  • (2023)Early diagnosis and clinical score prediction of Parkinson's disease based on longitudinal neuroimaging dataNeural Computing and Applications10.1007/s00521-023-08508-x35:22(16429-16455)Online publication date: 9-May-2023
  • (2022)LieGGProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602098(25212-25223)Online publication date: 28-Nov-2022
  • (2022)Text Adversarial Attacks and DefensesSecurity and Communication Networks10.1155/2022/64584882022Online publication date: 1-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'06: Proceedings of the 19th International Conference on Neural Information Processing Systems
December 2006
1632 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 04 December 2006

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Early diagnosis and clinical score prediction of Parkinson's disease based on longitudinal neuroimaging dataNeural Computing and Applications10.1007/s00521-023-08508-x35:22(16429-16455)Online publication date: 9-May-2023
  • (2022)LieGGProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602098(25212-25223)Online publication date: 28-Nov-2022
  • (2022)Text Adversarial Attacks and DefensesSecurity and Communication Networks10.1155/2022/64584882022Online publication date: 1-Jan-2022
  • (2022)Research on the Evaluation Algorithm of English Viewing, Listening, and Speaking Teaching Effect Based on DA-BP Neural NetworkMobile Information Systems10.1155/2022/46214052022Online publication date: 1-Jan-2022
  • (2022)Towards Interpretable Anomaly Detection: Unsupervised Deep Neural Network Approach using Feedback LoopNOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium10.1109/NOMS54207.2022.9789914(1-9)Online publication date: 25-Apr-2022
  • (2022)EvoDNN - Evolving Weights, Biases, and Activation Functions in a Deep Neural Network2022 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)10.1109/CIBCB55180.2022.9863054(1-9)Online publication date: 15-Aug-2022
  • (2022)Applications of deep learning for phishing detection: a systematic literature reviewKnowledge and Information Systems10.1007/s10115-022-01672-x64:6(1457-1500)Online publication date: 1-Jun-2022
  • (2022)A prediction-based cycle life test optimization method for cross-formula batteries using instance transfer and variable-length-input deep learning modelNeural Computing and Applications10.1007/s00521-022-07322-135:4(2947-2971)Online publication date: 17-Jun-2022
  • (2021)Network Intrusion Detection with Nonsymmetric Deep Autoencoding Feature ExtractionSecurity and Communication Networks10.1155/2021/28438562021Online publication date: 1-Jan-2021
  • (2021)Network Anomaly Detection Method Based on Joint Optimization of GAN and Classifier in Few-shot ScenariosProceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering10.1145/3501409.3501528(659-663)Online publication date: 22-Oct-2021
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media