Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Deep, big, simple neural nets for handwritten digit recognition

Published: 01 December 2010 Publication History

Abstract

Good old online backpropagation for plain multilayer perceptrons yields a very low 0.35% error rate on the MNIST handwritten digits benchmark. All we need to achieve this best result so far are many hidden layers, many neurons per layer, numerous deformed training images to avoid overfitting, and graphics cards to greatly speed up learning.

References

[1]
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2006). Greedy layer-wise training of deep networks. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems, 19 (pp. 153-160). Cambridge, MA: MIT Press.
[2]
Chellapilla, K., Puri, S., & Simard, P. (2006). High performance convolutional neural networks for document processing. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition. N.p.
[3]
Decoste, D., & Schölkopf, B. (2002). Training invariant support vector machines. Machine Learning, 46, 161-190.
[4]
Hinton, G. (2007). To recognize shapes, first learn to generate images. In P. Cisek, T. Drew, & J. Kalaska (Eds.), Computational neuroscience: Theoretical insights into brain function. Burlington, MA: Elsevier.
[5]
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504-507.
[6]
Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Technische Universität München.
[7]
Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001). Gradient flow in recurrent nets: The difficulty of learning long-term dependencies. In S. C. Kramer & J. F. Kolen (Eds.), A field guide to dynamical recurrent neural networks. Piscataway, NJ: IEEE Press.
[8]
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term Memory. Neural Computation, 9, 1735-1780.
[9]
Lauer, F., Suen, C., & Bloch, G. (2007). A trainable feature extractor for handwritten digit recognition. Pattern Recognition, 40, 1816-1824.
[10]
LeCun, Y. (1985). Une procédure d'apprentissage pour réseau à seuil asymétrique. Proceedings of Cognitiva, Paris, 85, 599-604.
[11]
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 309-318.
[12]
Mohamed, A, Dahl, G., & Hinton, G. E. (2009). Deep belief networks for phone recognition. In Proc. of NIPS 2009 Workshop on Deep Learningfor Speech Recognition and Related Applications. N.p.
[13]
Nair, V., & Hinton, G. E. (2009). Implicit mixtures of restricted Boltzmann machines. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems, 21. Cambridge, MA: MIT Press.
[14]
NVIDIA (2009). NVIDIA CUDA: Reference manual (Version 2.3).
[15]
Ranzato, M., Huang, F., Boureau, Y., & LeCun, Y. (2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Proc. Computer Vision and Pattern Recognition Conference (CVPR'07). San Mateo, CA: IEEE Computer Society Press.
[16]
Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2006). Efficient learning of sparse representations with an energy-based model. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems, 19. Cambridge, MA: MIT Press.
[17]
Ruetsch, G., & Micikevicius, P. (2009). Optimizing matrix transpose in CUDA (Tech. Rep.). Santa Clara, CA: NVIDIA
[18]
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Parallel distributed processing, Cambridge, MA: MIT Press.
[19]
Russell, S., & Norvig, P. (2002). Artificial intelligence: A modern approach (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
[20]
Salakhutdinov, R, & Hinton, G. (2007). Learning a nonlinear embedding by preserving class neighborhood structure. In Proceedings of the International Conference on Artificial Intelligence and Statistics. San Francisco: Morgan Kaufmann.
[21]
Scherer, D., & Behnke, S. (2009). Accelerating large-scale convolutional neural networks with parallel graphics multiprocessors. In Proc. of NIPS 2009 Workshop on Large-Scale Machine Learning: Parallelism and Massive Datasets. N.p.
[22]
Simard, P. Y., Steinkraus, D., & Platt, J.C. (2003). Best practices for convolutional neural networks applied to visual document analysis. In Intl. Conf Document Analysis and Recognition (pp. 958-962). San Mateo, CA: IEEE Computer Society Press.
[23]
Steinkraus, D., Buck, 1., & Simard, P. Y. (2005). GPUs for machine learning algorithms. In Proceedings of the Eighth International Conference on Document Analysis and Recognition (pp. 1115-1120). San Mateo, CA: IEEE Computer Society Press.
[24]
Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. Unpublished doctoral dissertation, Harvard University.

Cited By

View all
  • (2024)Highway value iteration networksProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694151(50807-50821)Online publication date: 21-Jul-2024
  • (2024)Using particle size distributions to identify indoor emissions: a machine learning method for source recognitionNeural Computing and Applications10.1007/s00521-024-09899-136:24(14989-14997)Online publication date: 1-Aug-2024
  • (2024)Resource-Efficient Medical Image Analysis with Self-adapting Forward-Forward NetworksMachine Learning in Medical Imaging10.1007/978-3-031-73290-4_18(180-190)Online publication date: 7-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Neural Computation
Neural Computation  Volume 22, Issue 12
December 2010
279 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 December 2010

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Highway value iteration networksProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694151(50807-50821)Online publication date: 21-Jul-2024
  • (2024)Using particle size distributions to identify indoor emissions: a machine learning method for source recognitionNeural Computing and Applications10.1007/s00521-024-09899-136:24(14989-14997)Online publication date: 1-Aug-2024
  • (2024)Resource-Efficient Medical Image Analysis with Self-adapting Forward-Forward NetworksMachine Learning in Medical Imaging10.1007/978-3-031-73290-4_18(180-190)Online publication date: 7-Oct-2024
  • (2023)SAALProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619081(16424-16440)Online publication date: 23-Jul-2023
  • (2023)Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/361759333:1(1-29)Online publication date: 23-Nov-2023
  • (2023)Survey on Activation Functions for Optical Neural NetworksACM Computing Surveys10.1145/360753356:2(1-30)Online publication date: 15-Sep-2023
  • (2023)Performing Melanoma Diagnosis by an Effective Multi-view Convolutional Network ArchitectureInternational Journal of Computer Vision10.1007/s11263-023-01848-0131:11(3094-3117)Online publication date: 1-Nov-2023
  • (2023)Recent advances in deep learning models: a systematic literature reviewMultimedia Tools and Applications10.1007/s11042-023-15295-z82:29(44977-45060)Online publication date: 1-Dec-2023
  • (2023)Detection and classification of vehicles using audio visual cuesMultimedia Tools and Applications10.1007/s11042-023-14868-282:28(44087-44106)Online publication date: 27-Apr-2023
  • (2023)A survey of automated data augmentation algorithms for deep learning-based image classification tasksKnowledge and Information Systems10.1007/s10115-023-01853-265:7(2805-2861)Online publication date: 17-Mar-2023
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media