Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3104322.3104416guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Deep learning via Hessian-free optimization

Published: 21 June 2010 Publication History

Abstract

We develop a 2nd-order optimization method based on the "Hessian-free" approach, and apply it to training deep auto-encoders. Without using pre-training, we obtain results superior to those reported by Hinton & Salakhutdinov (2006) on the same tasks they considered. Our method is practical, easy to use, scales nicely to very large datasets, and isn't limited in applicability to auto-encoders, or any specific model class. We also discuss the issue of "pathological curvature" as a possible explanation for the difficulty of deep-learning and how 2nd-order optimization, and our method in particular, effectively deals with it.

References

[1]
Amari, S., Park, H., and Fukumizu, K. Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Computation, 2000.
[2]
Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. Greedy layer-wise training of deep networks. In NIPS, 2007.
[3]
Erhan, D., Bengio, Y., Courville, A., Manzagol, P., Vincent, P., and Bengio, S. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 2010.
[4]
Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science, July 2006.
[5]
LeCun, Y., Bottou, L., Orr, G., and Muller, K. Efficient backprop. In Orr, G. and K., Muller (eds.), Neural Networks: Tricks of the trade. Springer, 1998.
[6]
Mizutani, E. and Dreyfus, S. E. Second-order stagewise back-propagation for hessian-matrix analyses and investigation of negative curvature. Neural Networks, 21(2-3):193 - 203, 2008.
[7]
Nocedal, J. and Wright, S. J. Numerical Optimization. Springer, 1999.
[8]
Pearlmutter, B. A. Fast exact multiplication by the hessian. Neural Computation, 1994.
[9]
Schraudolph, N. N. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 2002.

Cited By

View all
  • (2024)NavCim: Comprehensive Design Space Exploration for Analog Computing-in-Memory ArchitecturesProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676946(168-182)Online publication date: 14-Oct-2024
  • (2023)CoLAProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668026(43894-43917)Online publication date: 10-Dec-2023
  • (2023)Kronecker-factored approximate curvature for modern neural network architecturesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667583(33624-33655)Online publication date: 10-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'10: Proceedings of the 27th International Conference on International Conference on Machine Learning
June 2010
1262 pages
ISBN:9781605589077

Sponsors

  • NSF: National Science Foundation
  • Xerox
  • Microsoft Research: Microsoft Research
  • Yahoo!
  • IBM: IBM

Publisher

Omnipress

Madison, WI, United States

Publication History

Published: 21 June 2010

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)NavCim: Comprehensive Design Space Exploration for Analog Computing-in-Memory ArchitecturesProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676946(168-182)Online publication date: 14-Oct-2024
  • (2023)CoLAProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668026(43894-43917)Online publication date: 10-Dec-2023
  • (2023)Kronecker-factored approximate curvature for modern neural network architecturesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667583(33624-33655)Online publication date: 10-Dec-2023
  • (2023)Bayesian numerical integration with neural networksProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625985(1606-1617)Online publication date: 31-Jul-2023
  • (2023)Dataset distillation with convexified implicit gradientsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619350(22649-22674)Online publication date: 23-Jul-2023
  • (2023)Certified Edge Unlearning for Graph Neural NetworksProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599271(2606-2617)Online publication date: 6-Aug-2023
  • (2023)Selecting and Composing Learning Rate Policies for Deep Neural NetworksACM Transactions on Intelligent Systems and Technology10.1145/357050814:2(1-25)Online publication date: 16-Feb-2023
  • (2021)CHEFProceedings of the VLDB Endowment10.14778/3476249.347629014:11(2410-2418)Online publication date: 27-Oct-2021
  • (2021)Conditional Directed Graph Convolution for 3D Human Pose EstimationProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475219(602-611)Online publication date: 17-Oct-2021
  • (2020)Efficient continuous pareto exploration in multi-task learningProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525543(6522-6531)Online publication date: 13-Jul-2020
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media