Approximation and estimation bounds for artificial neural networks

Barron, Andrew R.

doi:10.1007/BF00993164

Approximation and estimation bounds for artificial neural networks

Published: January 1994

Volume 14, pages 115–133, (1994)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Approximation and estimation bounds for artificial neural networks

Download PDF

Andrew R. Barron¹

7744 Accesses
378 Citations
10 Altmetric
Explore all metrics

Abstract

For a common class of artificial neural networks, the mean integrated squared error between the estimated network and a target functionf is shown to be bounded by

$$O\left( {\frac{{\mathop c\nolimits_f^2 }}{n}} \right) + O\left( {\frac{{nd}}{N}\log N} \right)$$

wheren is the number of nodes,d is the input dimension of the function,N is the number of training observations, andC _f is the first absolute moment of the Fourier magnitude distribution off. The two contributions to this total risk are the approximation error and the estimation error. Approximation error refers to the distance between the target function and the closest neural network function of a given architecture and estimation error refers to the distance between this ideal network function and an estimated network function. Withn ∼ C _f(N/(d logN))^1/2 nodes, the order of the bound on the mean integrated squared error is optimized to beO(C _f((d/N) logN)^1/2). The bound demonstrates surprisingly favorable properties of network estimation compared to traditional series and nonparametric curve estimation techniques in the case thatd is moderately large. Similar bounds are obtained when the number of nodesn is not preselected as a function ofC _f (which is generally not knowna priori), but rather the number of nodes is optimized from the observed data by the use of a complexity regularization or minimum description length criterion. The analysis involves Fourier techniques for the approximation error, metric entropy considerations for the estimation error, and a calculation of the index of resolvability of minimum complexity estimation of the family of networks.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Barron, A. R. (1989). Statistical properties of artificial neural networks.Proceedings of the IEEE International Conference on Decision and Control, (pp. 280–285). New York: IEEE.
Google Scholar
Barron, A. R. (1990). Complexity regularization with applications to artificial neural networks. In G. Roussas (ed.)Nonparametric Functional Estimation, (pp. 561–576). Boston, MA and Dordrecht, the Netherlands: Kluwer Academic Publishers.
Google Scholar
Barron, A. R. (1991). Approximation and estimation bounds for artificial neural networks.Proceedings of the Fourth Workshop on Computational Learning Theory, (pp. 243–249). San Mateo, CA: Morgan Kaufmann Publishers. (Prelimary version of the present paper).
Google Scholar
Barron, A. R. (1992). Neural net approximation.Proceedings of the Seventh Yale Workshop on Adaptive and Learning Systems, (pp. 69–72). K. S. Narendra (ed.), Yale University.
Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory, 39, 930–945.
Google Scholar
Barron, A. R. & Cover, T. M. (1991). Minimum complexity density estimation.IEEE Transactions on Information Theory, 37, 1034–1054.
Google Scholar
Barron, A. R. & Sheu, C.-H. (1991). Approximation of density functions by sequences of exponential families.Annals of Statistics, 19, 1347–1369.
Google Scholar
Cover, T. M. & Thomas, J. (1991).Elements of Information Theory, New York: Wiley.
Google Scholar
Cox, D. D. (1988). Approximation of least squares regression on nested subspaces.Annals of Statistics, 16, 713–732.
Google Scholar
Cybenko, G. (1989). Approximations by superpositions of a sigmoidal function.Mathematics of Control, Signals, and Systems, 2, 303–314.
Google Scholar
Eubank, R. (1988).Spline Smoothing and Nonparametric Regression, New York: Marcel Dekker.
Google Scholar
Hardle, W. (1990).Applied Nonparametric Regression, Cambridge, U.K. and New York: Cambridge University Press.
Google Scholar
Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications.Information and Computation, 100, 78–150.
Google Scholar
Hornik, K., Stinchcombe, M., & White, H. (1988). Multi-layer feedforward networks are universal approximators.Neural Networks, 2, 359–366.
Google Scholar
Ibragimov, I. A., and Hasminskii, R. Z. (1980). On nonparametric estimation of regression.Doklady Acad. Nauk SSSR, 252, 780–784.
Google Scholar
Jones, L. K. (1992). A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training.Annals of Statistics, 20, 608–613.
Google Scholar
Li, K. C. (1987). Asymptotic optimality forC _p,C _L, cross-validation, and generalized cross-validation: discrete index set.Annals of Statistics,15, 958–975.
Google Scholar
McCaffrey, D. F. & Gallant, A. R. (1991). Convergence rates for single hidden layer feedforward networks. Rand Corporation working paper, Santa Monica, California and Institute of Statistics Mimeograph Series, Number 2207, North Carolina State University.
Nemirovskii, A. S. (1985). Nonparametric estimation of smooth regression functions.Soviet Journal of Computer and Systems Science, 23, 1–11.
Google Scholar
Nemirovskii, A. S., Polyak, B. T. & Tsybakov, A. B. (1985). Rate of convergence of nonparametric estimators of maximum-likelihood type.Problems of Information Transmission, 21, 258–272.
Google Scholar
Nussbaum, M. (1986). On nonparametric estimation of a regression function that is smooth in a domain on R^k.Theory of Probability and its Applications, 31, 118–125.
Google Scholar
Pinsker, M. S. (1980). Optimal filtering of square-integrable signals on a background of Gaussian noise.Problems in Information Transmission, 16.
Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length.Annals of Statistics, 11, 416–431.
Google Scholar
Seber, G. A. F. & Wild, C. M. (1989).Nonlinear Regression, New York: Wiley.
Google Scholar
Silverman, B. W. (1986).Density Estimation for Statistics and Data Analysis, New York: Chapman and Hall.
Google Scholar
Stone, C. J. (1982). Optimal global rates of convergence for nonparametric estimators.Annals of Statistics, 10, 1040–1053.
Google Scholar
Stone, C. J. (1990). Large-sample inference for log-spline models.Annals of Statistics, 18, 717–741.
Google Scholar
Vapnik, V. (1982).Estimation of Dependences Based on Empirical Data, New York: Springer-Verlag.
Google Scholar
White, H. (1990). Connectionist nonparametric regression: multilayer feedforward networks can learn arbitrary mappings.Neural Networks, 3, 535–550.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Yale University, P.O. Box 208290, 06520, New Haven, CT
Andrew R. Barron

Authors

Andrew R. Barron
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barron, A.R. Approximation and estimation bounds for artificial neural networks. Mach Learn 14, 115–133 (1994). https://doi.org/10.1007/BF00993164

Download citation

Issue Date: January 1994
DOI: https://doi.org/10.1007/BF00993164

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Approximation and estimation bounds for artificial neural networks

Abstract

Article PDF

Similar content being viewed by others

Neural Networks with Linear Threshold Activations: Structure and Algorithms

The Universal Approximation Property

Approximation Spaces of Deep Neural Networks

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Approximation and estimation bounds for artificial neural networks

Abstract

Article PDF

Similar content being viewed by others

Neural Networks with Linear Threshold Activations: Structure and Algorithms

The Universal Approximation Property

Approximation Spaces of Deep Neural Networks

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation