Article

Deconstructing the ladder network architecture

Authors:

Mohammad Pezeshki,

Philémon Brakel,

Aaron Courville,

Yoshua BengioAuthors Info & Claims

ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48

Pages 2368 - 2376

Published: 19 June 2016 Publication History

Abstract

The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the 'combinator function'. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples, respectively.

References

[1]

Alain, Guillaume and Bengio, Yoshua. What regularized auto-encoders learn from the data generating distribution. In ICLR'2013. also arXiv report 1211.4246, 2013.

[2]

Bengio, Yoshua. Learning deep architectures for ai. Foundations and trends® in Machine Learning, 2(1):1-127, 2009.

[3]

Bengio, Yoshua. How auto-encoders could provide credit assignment in deep networks via target propagation. Technical report, arXiv:1407.7906, 2014.

[4]

Bengio, Yoshua, Yao, Li, Alain, Guillaume, and Vincent, Pascal. Generalized denoising auto-encoders as generative models. In NIPS'2013, 2013.

[5]

Bergstra, James and Bengio, Yoshua. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(1):281-305, 2012.

[6]

Cho, Kyunghyun, van Merriënboer, Bart, Bahdanau, Dzmitry, and Bengio, Yoshua. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.

[7]

Graves, Alex. Practical variational inference for neural networks. In Advances in Neural Information Processing Systems, pp. 2348-2356, 2011.

[8]

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.

[9]

Hinton, Geoffrey E, Osindero, Simon, and Teh, Yee-Whye. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527-1554, 2006.

[10]

Hinton, Geoffrey E., Srivastava, Nitish, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan. Improving neural networks by preventing co-adaptation of feature detectors. Technical report, arXiv:1207.0580, 2012.

[11]

Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.

[12]

Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[13]

Kingma, Diederik P, Mohamed, Shakir, Rezende, Danilo Jimenez, and Welling, Max. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems, pp. 3581-3589, 2014.

[14]

Langley, P. Crafting papers on machine learning. In Langley, Pat (ed.), Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp. 1207- 1216, Stanford, CA, 2000. Morgan Kaufmann.

[15]

Larochelle, Hugo and Bengio, Yoshua. Classification using discriminative restricted boltzmann machines. In Proceedings of the 25th international conference on Machine learning, pp. 536-543. ACM, 2008.

[16]

Maas, Andrew L, Hannun, Awni Y, and Ng, Andrew Y. Rectifier nonlinearities improve neural network acoustic models. In Proc. ICML, volume 30, 2013.

[17]

Ranzato, Marc'Aurelio and Szummer, Martin. Semi-supervised learning of compact document representations with deep networks. In Proceedings of the 25th international conference on Machine learning, pp. 792- 799. ACM, 2008.

[18]

Rasmus, Antti, Valpola, Harri, Honkala, Mikko, Berglund, Mathias, and Raiko, Tapani. Semi-supervised learning with ladder network. arXiv preprint arXiv:1507.02672, 2015.

[19]

Rezende, Danilo J., Mohamed, Shakir, and Wierstra, Daan. Stochastic backpropagation and approximate inference in deep generative models. In ICML'2014, 2014.

[20]

Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout: A simple way to prevent neural networks from overfit-ting. The Journal of Machine Learning Research, 15(1): 1929-1958, 2014.

[21]

Srivastava, Rupesh K, Greff, Klaus, and Schmidhuber, Jürgen. Training very deep networks. In Advances in Neural Information Processing Systems, pp. 2368-2376, 2015.

[22]

Valpola, Harri. From neural pca to deep unsupervised learning. arXiv preprint arXiv:1411.7783, 2014.

[23]

Vincent, Pascal, Larochelle, Hugo, Bengio, Yoshua, and Manzagol, Pierre-Antoine. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pp. 1096-1103. ACM, 2008.

[24]

Vincent, Pascal, Larochelle, Hugo, Lajoie, Isabelle, Bengio, Yoshua, and Manzagol, Pierre-Antoine. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Machine Learning Res., 11, 2010.

Cited By

Yu ZLi TYu NPan YChen HLiu B(2019)Reconstruction of Hidden Representation for Robust Feature ExtractionACM Transactions on Intelligent Systems and Technology10.1145/328417410:2(1-24)Online publication date: 12-Jan-2019
https://dl.acm.org/doi/10.1145/3284174
Krokos ECheng HChang JNebesh BPaul CWhitley KVarshney A(2019)Enhancing Deep Learning with Visual InteractionsACM Transactions on Interactive Intelligent Systems10.1145/31509779:1(1-27)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1145/3150977
Shams SPlatania RKim JZhang JLee KYang SPark SShehu AWu CBoucher CLi JLiu HPop M(2018)A Distributed Semi-Supervised Platform for DNase-Seq Data Analytics using Deep Generative Convolutional NetworksProceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/3233547.3233601(244-253)Online publication date: 15-Aug-2018
https://dl.acm.org/doi/10.1145/3233547.3233601
Show More Cited By

Deconstructing the ladder network architecture
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Semi-supervised learning with Ladder networks
NIPS'15: Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 2

We combine supervised learning with unsupervised learning in deep neural networks. The proposed model is trained to simultaneously minimize the sum of supervised and unsupervised cost functions by backpropagation, avoiding the need for layer-wise pre-...
Graph Ladder Networks for Network Classification
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Numerous network representation-based algorithms for network classification have emerged in recent years, but many suffer from two limitations. First, they separate the network representation learning and node classification in networks into two steps, ...
Completion-Attention Ladder Network for Few-Shot Underwater Acoustic Recognition
Abstract
Underwater acoustic object recognition is becoming attractive given the critical information available. However, this comes at the expense of large-scale annotated data, which is expensive to collect and annotate. This paper proposes a semi-...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48

June 2016

3077 pages

Publisher

JMLR.org

Publication History

Published: 19 June 2016

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yu ZLi TYu NPan YChen HLiu B(2019)Reconstruction of Hidden Representation for Robust Feature ExtractionACM Transactions on Intelligent Systems and Technology10.1145/328417410:2(1-24)Online publication date: 12-Jan-2019
https://dl.acm.org/doi/10.1145/3284174
Krokos ECheng HChang JNebesh BPaul CWhitley KVarshney A(2019)Enhancing Deep Learning with Visual InteractionsACM Transactions on Interactive Intelligent Systems10.1145/31509779:1(1-27)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1145/3150977
Shams SPlatania RKim JZhang JLee KYang SPark SShehu AWu CBoucher CLi JLiu HPop M(2018)A Distributed Semi-Supervised Platform for DNase-Seq Data Analytics using Deep Generative Convolutional NetworksProceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/3233547.3233601(244-253)Online publication date: 15-Aug-2018
https://dl.acm.org/doi/10.1145/3233547.3233601
Zhao SSong JErmon S(2017)Learning hierarchical features from deep generative modelsProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3306104(4091-4099)Online publication date: 6-Aug-2017
https://dl.acm.org/doi/10.5555/3305890.3306104
Wu YZhang SZhang YBengio YSalakhutdinov R(2016)On multiplicative integration with recurrent neural networksProceedings of the 30th International Conference on Neural Information Processing Systems10.5555/3157382.3157418(2864-2872)Online publication date: 5-Dec-2016
https://dl.acm.org/doi/10.5555/3157382.3157418

View Options

View options

Figures

Tables

Media

View Table of Conten