article

Deep, big, simple neural nets for handwritten digit recognition

Authors:

Dan Claudiu Cireşan,

Luca Maria Gambardella,

Jürgen SchmidhuberAuthors Info & Claims

Neural Computation, Volume 22, Issue 12

Pages 3207 - 3220

https://doi.org/10.1162/NECO_a_00052

Published: 01 December 2010 Publication History

Abstract

Good old online backpropagation for plain multilayer perceptrons yields a very low 0.35% error rate on the MNIST handwritten digits benchmark. All we need to achieve this best result so far are many hidden layers, many neurons per layer, numerous deformed training images to avoid overfitting, and graphics cards to greatly speed up learning.

References

[1]

Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2006). Greedy layer-wise training of deep networks. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems, 19 (pp. 153-160). Cambridge, MA: MIT Press.

[2]

Chellapilla, K., Puri, S., & Simard, P. (2006). High performance convolutional neural networks for document processing. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition. N.p.

[3]

Decoste, D., & Schölkopf, B. (2002). Training invariant support vector machines. Machine Learning, 46, 161-190.

Digital Library

[4]

Hinton, G. (2007). To recognize shapes, first learn to generate images. In P. Cisek, T. Drew, & J. Kalaska (Eds.), Computational neuroscience: Theoretical insights into brain function. Burlington, MA: Elsevier.

[5]

Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504-507.

[6]

Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Technische Universität München.

[7]

Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001). Gradient flow in recurrent nets: The difficulty of learning long-term dependencies. In S. C. Kramer & J. F. Kolen (Eds.), A field guide to dynamical recurrent neural networks. Piscataway, NJ: IEEE Press.

[8]

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term Memory. Neural Computation, 9, 1735-1780.

Digital Library

[9]

Lauer, F., Suen, C., & Bloch, G. (2007). A trainable feature extractor for handwritten digit recognition. Pattern Recognition, 40, 1816-1824.

Digital Library

[10]

LeCun, Y. (1985). Une procédure d'apprentissage pour réseau à seuil asymétrique. Proceedings of Cognitiva, Paris, 85, 599-604.

[11]

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 309-318.

[12]

Mohamed, A, Dahl, G., & Hinton, G. E. (2009). Deep belief networks for phone recognition. In Proc. of NIPS 2009 Workshop on Deep Learningfor Speech Recognition and Related Applications. N.p.

[13]

Nair, V., & Hinton, G. E. (2009). Implicit mixtures of restricted Boltzmann machines. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems, 21. Cambridge, MA: MIT Press.

[14]

NVIDIA (2009). NVIDIA CUDA: Reference manual (Version 2.3).

[15]

Ranzato, M., Huang, F., Boureau, Y., & LeCun, Y. (2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Proc. Computer Vision and Pattern Recognition Conference (CVPR'07). San Mateo, CA: IEEE Computer Society Press.

[16]

Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2006). Efficient learning of sparse representations with an energy-based model. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems, 19. Cambridge, MA: MIT Press.

[17]

Ruetsch, G., & Micikevicius, P. (2009). Optimizing matrix transpose in CUDA (Tech. Rep.). Santa Clara, CA: NVIDIA

[18]

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Parallel distributed processing, Cambridge, MA: MIT Press.

[19]

Russell, S., & Norvig, P. (2002). Artificial intelligence: A modern approach (2nd ed.). Upper Saddle River, NJ: Prentice Hall.

[20]

Salakhutdinov, R, & Hinton, G. (2007). Learning a nonlinear embedding by preserving class neighborhood structure. In Proceedings of the International Conference on Artificial Intelligence and Statistics. San Francisco: Morgan Kaufmann.

[21]

Scherer, D., & Behnke, S. (2009). Accelerating large-scale convolutional neural networks with parallel graphics multiprocessors. In Proc. of NIPS 2009 Workshop on Large-Scale Machine Learning: Parallelism and Massive Datasets. N.p.

[22]

Simard, P. Y., Steinkraus, D., & Platt, J.C. (2003). Best practices for convolutional neural networks applied to visual document analysis. In Intl. Conf Document Analysis and Recognition (pp. 958-962). San Mateo, CA: IEEE Computer Society Press.

[23]

Steinkraus, D., Buck, 1., & Simard, P. Y. (2005). GPUs for machine learning algorithms. In Proceedings of the Eighth International Conference on Document Analysis and Recognition (pp. 1115-1120). San Mateo, CA: IEEE Computer Society Press.

[24]

Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. Unpublished doctoral dissertation, Harvard University.

Cited By

Wang YLi WFaccio FWu QSchmidhuber JSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Highway value iteration networksProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694151(50807-50821)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694151
Cheng KHuang GHildemann L(2024)Using particle size distributions to identify indoor emissions: a machine learning method for source recognitionNeural Computing and Applications10.1007/s00521-024-09899-136:24(14989-14997)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s00521-024-09899-1
Müller JKainz B(2024)Resource-Efficient Medical Image Analysis with Self-adapting Forward-Forward NetworksMachine Learning in Medical Imaging10.1007/978-3-031-73290-4_18(180-190)Online publication date: 7-Oct-2024
https://dl.acm.org/doi/10.1007/978-3-031-73290-4_18
Show More Cited By

Deep, big, simple neural nets for handwritten digit recognition

Recommendations

Better Digit Recognition with a Committee of Simple Neural Nets
ICDAR '11: Proceedings of the 2011 International Conference on Document Analysis and Recognition

We present a new method to train the members of a committee of one-hidden-layer neural nets. Instead of training various nets on subsets of the training data we preprocess the training data for each individual model such that the corresponding errors ...
Script invariant handwritten digit recognition using a simple feature descriptor

Handwritten digit recognition is still considered as a difficult task because of the large variability of the digits shapes written by individuals. A lot of work have been done towards digit identification with excellent performance but mostly these ...
Handwritten Geez Digit Recognition Using Deep Learning
Amharic language is the second most spoken language in the Semitic family after Arabic. In Ethiopia and neighboring countries more than 100 million people speak the Amharic language. There are many historical documents that are written using the Geez ...

Comments

Information & Contributors

Information

Published In

cover image Neural Computation

Neural Computation Volume 22, Issue 12

December 2010

279 pages

ISSN:0899-7667

Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 December 2010

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

130
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang YLi WFaccio FWu QSchmidhuber JSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Highway value iteration networksProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694151(50807-50821)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694151
Cheng KHuang GHildemann L(2024)Using particle size distributions to identify indoor emissions: a machine learning method for source recognitionNeural Computing and Applications10.1007/s00521-024-09899-136:24(14989-14997)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s00521-024-09899-1
Müller JKainz B(2024)Resource-Efficient Medical Image Analysis with Self-adapting Forward-Forward NetworksMachine Learning in Medical Imaging10.1007/978-3-031-73290-4_18(180-190)Online publication date: 7-Oct-2024
https://dl.acm.org/doi/10.1007/978-3-031-73290-4_18
Kim YCho YJang JNa BKim YSong KKang WMoon IKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)SAALProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619081(16424-16440)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619081
Weiss MTonella P(2023)Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/361759333:1(1-29)Online publication date: 23-Nov-2023
https://dl.acm.org/doi/10.1145/3617593
Destras OLe Beux SDe Magalhães FNicolescu G(2023)Survey on Activation Functions for Optical Neural NetworksACM Computing Surveys10.1145/360753356:2(1-30)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3607533
Pérez EReyes Ó(2023)Performing Melanoma Diagnosis by an Effective Multi-view Convolutional Network ArchitectureInternational Journal of Computer Vision10.1007/s11263-023-01848-0131:11(3094-3117)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1007/s11263-023-01848-0
Malhotra RSingh P(2023)Recent advances in deep learning models: a systematic literature reviewMultimedia Tools and Applications10.1007/s11042-023-15295-z82:29(44977-45060)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s11042-023-15295-z
S. AMary LKoshy B(2023)Detection and classification of vehicles using audio visual cuesMultimedia Tools and Applications10.1007/s11042-023-14868-282:28(44087-44106)Online publication date: 27-Apr-2023
https://dl.acm.org/doi/10.1007/s11042-023-14868-2
Yang ZSinnott RBailey JKe Q(2023)A survey of automated data augmentation algorithms for deep learning-based image classification tasksKnowledge and Information Systems10.1007/s10115-023-01853-265:7(2805-2861)Online publication date: 17-Mar-2023
https://dl.acm.org/doi/10.1007/s10115-023-01853-2
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents