Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3045390.3045640guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Deconstructing the ladder network architecture

Published: 19 June 2016 Publication History

Abstract

The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the 'combinator function'. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples, respectively.

References

[1]
Alain, Guillaume and Bengio, Yoshua. What regularized auto-encoders learn from the data generating distribution. In ICLR'2013. also arXiv report 1211.4246, 2013.
[2]
Bengio, Yoshua. Learning deep architectures for ai. Foundations and trends® in Machine Learning, 2(1):1-127, 2009.
[3]
Bengio, Yoshua. How auto-encoders could provide credit assignment in deep networks via target propagation. Technical report, arXiv:1407.7906, 2014.
[4]
Bengio, Yoshua, Yao, Li, Alain, Guillaume, and Vincent, Pascal. Generalized denoising auto-encoders as generative models. In NIPS'2013, 2013.
[5]
Bergstra, James and Bengio, Yoshua. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(1):281-305, 2012.
[6]
Cho, Kyunghyun, van Merriënboer, Bart, Bahdanau, Dzmitry, and Bengio, Yoshua. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.
[7]
Graves, Alex. Practical variational inference for neural networks. In Advances in Neural Information Processing Systems, pp. 2348-2356, 2011.
[8]
He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.
[9]
Hinton, Geoffrey E, Osindero, Simon, and Teh, Yee-Whye. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527-1554, 2006.
[10]
Hinton, Geoffrey E., Srivastava, Nitish, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan. Improving neural networks by preventing co-adaptation of feature detectors. Technical report, arXiv:1207.0580, 2012.
[11]
Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[12]
Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[13]
Kingma, Diederik P, Mohamed, Shakir, Rezende, Danilo Jimenez, and Welling, Max. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems, pp. 3581-3589, 2014.
[14]
Langley, P. Crafting papers on machine learning. In Langley, Pat (ed.), Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp. 1207- 1216, Stanford, CA, 2000. Morgan Kaufmann.
[15]
Larochelle, Hugo and Bengio, Yoshua. Classification using discriminative restricted boltzmann machines. In Proceedings of the 25th international conference on Machine learning, pp. 536-543. ACM, 2008.
[16]
Maas, Andrew L, Hannun, Awni Y, and Ng, Andrew Y. Rectifier nonlinearities improve neural network acoustic models. In Proc. ICML, volume 30, 2013.
[17]
Ranzato, Marc'Aurelio and Szummer, Martin. Semi-supervised learning of compact document representations with deep networks. In Proceedings of the 25th international conference on Machine learning, pp. 792- 799. ACM, 2008.
[18]
Rasmus, Antti, Valpola, Harri, Honkala, Mikko, Berglund, Mathias, and Raiko, Tapani. Semi-supervised learning with ladder network. arXiv preprint arXiv:1507.02672, 2015.
[19]
Rezende, Danilo J., Mohamed, Shakir, and Wierstra, Daan. Stochastic backpropagation and approximate inference in deep generative models. In ICML'2014, 2014.
[20]
Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout: A simple way to prevent neural networks from overfit-ting. The Journal of Machine Learning Research, 15(1): 1929-1958, 2014.
[21]
Srivastava, Rupesh K, Greff, Klaus, and Schmidhuber, Jürgen. Training very deep networks. In Advances in Neural Information Processing Systems, pp. 2368-2376, 2015.
[22]
Valpola, Harri. From neural pca to deep unsupervised learning. arXiv preprint arXiv:1411.7783, 2014.
[23]
Vincent, Pascal, Larochelle, Hugo, Bengio, Yoshua, and Manzagol, Pierre-Antoine. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pp. 1096-1103. ACM, 2008.
[24]
Vincent, Pascal, Larochelle, Hugo, Lajoie, Isabelle, Bengio, Yoshua, and Manzagol, Pierre-Antoine. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Machine Learning Res., 11, 2010.

Cited By

View all
  • (2019)Reconstruction of Hidden Representation for Robust Feature ExtractionACM Transactions on Intelligent Systems and Technology10.1145/328417410:2(1-24)Online publication date: 12-Jan-2019
  • (2019)Enhancing Deep Learning with Visual InteractionsACM Transactions on Interactive Intelligent Systems10.1145/31509779:1(1-27)Online publication date: 1-Mar-2019
  • (2018)A Distributed Semi-Supervised Platform for DNase-Seq Data Analytics using Deep Generative Convolutional NetworksProceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/3233547.3233601(244-253)Online publication date: 15-Aug-2018
  • Show More Cited By
  1. Deconstructing the ladder network architecture

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48
    June 2016
    3077 pages

    Publisher

    JMLR.org

    Publication History

    Published: 19 June 2016

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Reconstruction of Hidden Representation for Robust Feature ExtractionACM Transactions on Intelligent Systems and Technology10.1145/328417410:2(1-24)Online publication date: 12-Jan-2019
    • (2019)Enhancing Deep Learning with Visual InteractionsACM Transactions on Interactive Intelligent Systems10.1145/31509779:1(1-27)Online publication date: 1-Mar-2019
    • (2018)A Distributed Semi-Supervised Platform for DNase-Seq Data Analytics using Deep Generative Convolutional NetworksProceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/3233547.3233601(244-253)Online publication date: 15-Aug-2018
    • (2017)Learning hierarchical features from deep generative modelsProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3306104(4091-4099)Online publication date: 6-Aug-2017
    • (2016)On multiplicative integration with recurrent neural networksProceedings of the 30th International Conference on Neural Information Processing Systems10.5555/3157382.3157418(2864-2872)Online publication date: 5-Dec-2016

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media