Abstract
Hierarchical neural networks for object recognition have a long history. In recent years, novel methods for incrementally learning a hierarchy of features from unlabeled inputs were proposed as good starting point for supervised training. These deep learning methods—together with the advances of parallel computers—made it possible to successfully attack problems that were not practical before, in terms of depth and input size. In this article, we introduce the reader to the basic concepts of deep learning, discuss selected methods in detail, and present application examples from computer vision and speech recognition.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Behnke S (1999) Hebbian learning and competition in the neural abstraction pyramid. In: Proceedings of international joint conference on neural networks (IJCNN), Washington, DC, USA, vol 2, pp 1356–1361
Behnke S (2003a) Discovering hierarchical speech features using convolutional non-negative matrix factorization. In: Proceedings of international joint conference on neural networks (IJCNN), Portland, Oregon, USA, vol 4, pp 2758–2763
Behnke S (2003b) Hierarchical neural networks for image interpretation. Lecture notes in computer science, vol 2766. Springer, Berlin
Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems (NIPS), Vancouver, Canada, pp 153–160
Bottou L (2011) From machine learning to machine reasoning. Arxiv preprint. arXiv:1102.1808
Boureau Y, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: Proceedings of computer vision and pattern recognition (CVPR), San Francisco, CA, USA, pp 2559–2566
Cireşan DC, Meier U, Masci J, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of computer vision and pattern recognition (CVPR) (in press)
Coates A, Lee H, Ng AY (2010) An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of international conference on artificial intelligence and statistics (AISTATS), Chia, Laguna, Italy
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of international conference on machine learning (ICML), Helsinki, Finland, pp 160–167
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4):303–314
Dahl G, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1):30–42
Erhan D, Manzagol P, Bengio Y, Bengio S, Vincent P (2009) The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Proceedings of international conference on artificial intelligence and statistics (AISTATS), Clearwater Beach, FL, USA, pp 153–160
Erhan D, Bengio Y, Courville AC, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11:625–660
Fidler S, Leonardis A (2007) Towards scalable representations of object categories: learning a hierarchy of parts. In: Proceedings of computer vision and pattern recognition (CVPR), Minneapolis, MN, USA
Fukushima K (1980) Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4):193–202
Grangier D, Bottou L, Collobert R (2009) Deep convolutional networks for scene parsing. In: ICML deep learning workshop, Montreal, Canada
Hinton G (2002) Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8):1771–1800
Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Hinton G, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput. 18(7):1527–1554
Hochreiter S, Bengio Y Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer SC, Kolen JF (eds) A field guide to dynamical recurrent neural networks. Wiley/IEEE Press, New York
Huang J, Mumford D (1999) Statistics of natural images and models. In: Proceedings of computer vision and pattern recognition (CVPR), Ft. Collins, CO, USA
Kavukcuoglu K, Ranzato M, LeCun Y (2010) Fast inference in sparse coding algorithms with applications to object recognition. CoRR abs/1010.3467
LeCun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4):541–551
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86:2278–2324
Lee H, Grosse R, Ranganath R, Ng A (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of international conference on machine learning (ICML), New York, NY, USA, pp 609–616
Lee H, Pham P, Largman Y, Ng A (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in neural information processing systems (NIPS), Vancouver, Canada, pp 1096–1104
Memisevic R (2011) Gradient-based learning of higher-order image features. In: Proceedings of international conference on computer vision (ICCV), Barcelona, Spain, pp 1591–1598
Ranzato M, Hinton G (2010) Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: Proceedings of computer vision and pattern recognition (CVPR), San Francisco, CA, USA, pp 2551–2558
Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in cortex. Nat. Neurosci. 2:1019–1025
Rumelhart D, Hinton G, Williams R (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Scherer D, Müller A, Behnke S (2010) Evaluation of pooling operations in convolutional architectures for object recognition. In: Proceedings of international conference on artificial neural networks (ICANN), Thessaloniki, Greece, pp 92–101
Schulz H, Behnke S (2012) Learning object-class segmentation with convolutional neural networks. In: Proceedings of the European symposium on artificial neural networks (ESANN), Bruges, Belgium
Shannon C (1949) The synthesis of two-terminal switching circuits. Bell Syst. Tech. J. 28(1):59–98
Taylor G, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features. In: Computer Vision (ECCV 2010), pp 140–153
Tieleman T (2008) Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of international conference on machine learning (ICML), pp 1064–1071
Vincent P (2011) A connection between score matching and denoising autoencoders. Neural Comput. 23(7):1661–1674
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11:3371–3408
Weston J, Ratle F, Collobert R (2008) Deep learning via semi-supervised embedding. In: Proceedings of international conference on machine learning (ICML), Helsinki, Finland, pp 1168–1175
Wiskott L, Sejnowski T (2002) Slow feature analysis: unsupervised learning of invariances. Neural Comput. 14(4):715–770
Zeiler M, Taylor G, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: Proceedings of international conference on computer vision (ICCV), Barcelona, Spain, pp 2018–2025
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schulz, H., Behnke, S. Deep Learning. Künstl Intell 26, 357–363 (2012). https://doi.org/10.1007/s13218-012-0198-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13218-012-0198-z