article

Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons

Authors:

Shun-Ichi Amari,

Hyeyoung Park,

Kenji FukumizuAuthors Info & Claims

Neural Computation, Volume 12, Issue 6

Pages 1399 - 1409

https://doi.org/10.1162/089976600300015420

Published: 01 June 2000 Publication History

Abstract

The natural gradient learning method is known to have ideal performances for on-line training of multilayer perceptrons. It avoids plateaus, which give rise to slow convergence of the backpropagation method. It is Fisher efficient, whereas the conventional method is not. However, for implementing the method, it is necessary to calculate the Fisher information matrix and its inverse, which is practically very difficult. This article proposes an adaptive method of directly obtaining the inverse of the Fisher information matrix. It generalizes the adaptive Gauss-Newton algorithms and provides a solid theoretical justification of them. Simulations show that the proposed adaptive method works very well for realizing natural gradient learning.

References

[1]

Amari, S. (1985). Differential-geometrical method in statistics. Berlin: Springer-Verlag.

Google Scholar

[2]

Amari, S. (1998). Natural gradient works efficiently in learning. Neural Computation, 10, 251-276.

Digital Library

Google Scholar

[3]

Amari, S., & Nagaoka, H. (2000). Information geometry. New York: American Mathematical Society and Oxford University Press.

Google Scholar

[4]

Bottou, L. (1998). Online algorithms and stochastic approximations. In D. Saad (Ed.), Online learning in neural networks (pp. 9-42). Cambridge: Cambridge University Press.

Digital Library

Google Scholar

[5]

Edelman, A., Arias, T., & Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM Journal of Matrix Analysis and Applications, 20, 303-353.

Digital Library

Google Scholar

[6]

LeCun, Y., Bottou, L., Orr, G. B., & Müller, K.-R. (1998). Efficient backprop. In G. B. Orr & K. R. Müller (Eds.), Neural networks--Tricks of the trade, (pp. 5-50). Berlin: Springer-Verlag.

Digital Library

Google Scholar

[7]

Park, H., Amari, S., & Fukumizu, K. (2000). Adaptive natural gradient learning algorithms for various stochastic models. Submitted.

Google Scholar

[8]

Rattray, M., Saad, D. & Amari, S. (1998). Natural gradient descent for on-line learning. Physical Review Letters, 81, 5461-5464.

Crossref

Google Scholar

[9]

Saad, D., & Solla, S. A. (1995). On-line learning in soft committee machines. Phys. Rev. E, 52, 4225-4243.

Crossref

Google Scholar

[10]

Yang, H. H., & Amari, S. (1998). Complexity issues in natural gradient descent method for training multilayer perceptrons. Neural Computation, 10, 2137- 2157.

Digital Library

Google Scholar

Cited By

View all

Ren YGoldfarb DRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Tensor normal training for deep learning modelsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542255(26040-26052)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3542255
Singh SAlistarh DLarochelle HRanzato MHadsell RBalcan MLin H(2020)WoodFisherProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497243(18098-18109)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497243
Goldfarb DRen YBahamou ALarochelle HRanzato MHadsell RBalcan MLin H(2020)Practical quasi-newton methods for training deep neural networksProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3495925(2386-2396)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3495925
Show More Cited By

Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Natural conjugate gradient training of multilayer perceptrons

Natural gradient (NG) descent, arguably the fastest on-line method for multilayer perceptron (MLP) training, exploits the ''natural'' Riemannian metric that the Fisher information matrix defines in the MLP weight space. It also accelerates ordinary ...
Natural conjugate gradient training of multilayer perceptrons
ICANN'06: Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I

For maximum log–likelihood estimation, the Fisher matrix defines a Riemannian metric in weight space and, as shown by Amari and his coworkers, the resulting natural gradient greatly accelerates on–line multilayer perceptron (MLP) training. While its ...
Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons

The natural gradient descent method is applied to train an n-m-1 multilayer perceptron. Based on an efficient scheme to represent the Fisher information matrix for an n-m-1 stochastic multilayer perceptron, a new algorithm is proposed to calculate the ...

Comments

Information & Contributors

Information

Published In

Neural Computation Volume 12, Issue 6

June 2000

231 pages

ISSN:0899-7667

Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 June 2000

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
10
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ren YGoldfarb DRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Tensor normal training for deep learning modelsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542255(26040-26052)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3542255
Singh SAlistarh DLarochelle HRanzato MHadsell RBalcan MLin H(2020)WoodFisherProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497243(18098-18109)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497243
Goldfarb DRen YBahamou ALarochelle HRanzato MHadsell RBalcan MLin H(2020)Practical quasi-newton methods for training deep neural networksProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3495925(2386-2396)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3495925
Chen SChou CChang E(2019)EA-CGProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33013337(3337-3346)Online publication date: 27-Jan-2019
https://dl.acm.org/doi/10.1609/aaai.v33i01.33013337
Bernacchia ALengyel MHennequin G(2018)Exact natural gradient in deep linear networks and application to the nonlinear caseProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327494(5945-5954)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327345.3327494
Guo WWei HOng YHervas JZhao JWang HZhang K(2018)Numerical analysis near singularities in RBF networksThe Journal of Machine Learning Research10.5555/3291125.329112619:1(1-39)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3291125.3291126
Amari SOzeki TKarakida RYoshida YOkada M(2018)Dynamics of learning in mlpNeural Computation10.1162/neco_a_0102930:1(1-33)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1162/neco_a_01029
Chang HLearned-Miller EMcCallum A(2017)Active biasProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3294771.3294867(1003-1013)Online publication date: 4-Dec-2017
https://dl.acm.org/doi/10.5555/3294771.3294867
Mahsereci MHennig P(2017)Probabilistic line searches for stochastic optimizationThe Journal of Machine Learning Research10.5555/3122009.317686318:1(4262-4320)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.5555/3122009.3176863
Punjani ABrubaker MFleet D(2017)Building Proteins in a DayIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2016.262757339:4(706-718)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TPAMI.2016.2627573
Show More Cited By

Abstract

References

Cited By

Recommendations

Natural conjugate gradient training of multilayer perceptrons

Natural conjugate gradient training of multilayer perceptrons

Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations