A mean-field optimal control formulation of deep learning

E, Weinan; Han, Jiequn; Li, Qianxiao

doi:10.1007/s40687-018-0172-y

A mean-field optimal control formulation of deep learning

Research
Published: 13 December 2018

Volume 6, article number 10, (2019)
Cite this article

Research in the Mathematical Sciences Aims and scope Submit manuscript

4480 Accesses
70 Citations
6 Altmetric
Explore all metrics

Abstract

Recent work linking deep neural networks and dynamical systems opened up new avenues to analyze deep learning. In particular, it is observed that new insights can be obtained by recasting deep learning as an optimal control problem on difference or differential equations. However, the mathematical aspects of such a formulation have not been systematically explored. This paper introduces the mathematical formulation of the population risk minimization problem in deep learning as a mean-field optimal control problem. Mirroring the development of classical optimal control, we state and prove optimality conditions of both the Hamilton–Jacobi–Bellman type and the Pontryagin type. These mean-field results reflect the probabilistic nature of the learning problem. In addition, by appealing to the mean-field Pontryagin’s maximum principle, we establish some quantitative relationships between population and empirical learning problems. This serves to establish a mathematical foundation for investigating the algorithmic and theoretical connections between optimal control and deep learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning dynamics of gradient descent optimization in deep neural networks

Article 08 April 2021

Deterministic Neural Networks Optimization from a Continuous and Energy Point of View

Article 24 May 2023

On the regularized risk of distributionally robust learning over deep neural networks

Article 08 August 2022

References

Andersson, D., Djehiche, B.: A maximum principle for SDEs of mean-field type. Appl. Math. Optim. 63(3), 341–356 (2011)
Article MathSciNet MATH Google Scholar
Arora, S., Ge, R., Neyshabur, B., Zhang, Y.: Stronger generalization bounds for deep nets via a compression approach. arXiv preprint arXiv:1802.05296 (2018)
Athans, M., Falb, P.L.: Optimal Control: An Introduction to the Theory and Its Applications. Courier Corporation, Chelmsford (2013)
Google Scholar
Bellman, R.: Dynamic Programming. Courier Corporation, Chelmsford (2013)
MATH Google Scholar
Bengio, Y.: Learning deep architectures for AI. Found. Trends® Mach. Learn. 2(1), 1–127 (2009)
Article MathSciNet MATH Google Scholar
Bensoussan, A., Frehse, J., Yam, P.: Mean Field Games and Mean Field Type Control Theory, vol. 101. Springer, Berlin (2013)
Book MATH Google Scholar
Boltyanskii, V.G., Gamkrelidze, R.V., Pontryagin, L.S.: The Theory of Optimal Processes. I. The Maximum Principle. TRW Space Technology Labs, Los Angeles, CA (1960)
Google Scholar
Bongini, M., Fornasier, M., Rossi, F., Solombrino, F.: Mean-field pontryagin maximum principle. J. Optim. Theory Appl. 175(1), 1–38 (2017)
Article MathSciNet MATH Google Scholar
Bressan, A., Piccoli, B.: Introduction to the Mathematical Theory of Control, vol. 2. American Institute of Mathematical Sciences, Springfield (2007)
MATH Google Scholar
Bryson, A.E.: Applied Optimal Control: Optimization, Estimation and Control. CRC Press, Boca Raton (1975)
Google Scholar
Buckdahn, R., Djehiche, B., Li, J.: A general stochastic maximum principle for SDEs of mean-field type. Appl. Math. Optim. 64(2), 197–216 (2011)
Article MathSciNet MATH Google Scholar
Caponigro, M., Fornasier, M., Piccoli, B., Trélat, E.: Sparse stabilization and control of alignment models. Math. Models Methods Appl. Sci. 25(03), 521–564 (2015)
Article MathSciNet MATH Google Scholar
Cardaliaguet, P.: Notes on mean field games (2010) (Unpublished note)
Carmona, R., Delarue, F.: Forward–backward stochastic differential equations and controlled McKean–Vlasov dynamics. Ann. Probab. 43(5), 2647–2700 (2015)
Article MathSciNet MATH Google Scholar
Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., Holtham, E.: Reversible architectures for arbitrarily deep residual neural networks. In: Proceedings of AAAI Conference on Artificial Intelligence (2018)
Chang, B., Meng, L., Haber, E., Tung, F., Begert, D.: Multi-level residual networks from dynamical systems view. In: Proceedings of International Conference on Learning Representations (2018)
Chen, T.Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.: Neural ordinary differential equations. arXiv preprint arXiv:1806.07366 (2018)
Crandall, M.G., Ishii, H., Lions, P.-L.: User’s guide to viscosity solutions of second order partial differential equations. Bull. Am. Math. Soc. 27(1), 1–67 (1992)
Article MathSciNet MATH Google Scholar
Crandall, M.G., Lions, P.-L.: Viscosity solutions of Hamilton–Jacobi equations. Trans. Am. Math. Soc. 277(1), 1–42 (1983)
Article MathSciNet MATH Google Scholar
Crandall, M.G., Lions, P.-L.: Hamilton–Jacobi equations in infinite dimensions. I. Uniqueness of viscosity solutions. J. Funct. Anal. 62(3), 379–396 (1985)
Article MathSciNet MATH Google Scholar
Crandall, M.G., Lions, P.-L.: Hamilton–Jacobi equations in infinite dimensions. II. Existence of viscosity solutions. J. Funct. Anal. 65(3), 368–405 (1986)
Article MathSciNet MATH Google Scholar
Crandall, M.G., Lions, P.-L.: Hamilton–Jacobi equations in infinite dimensions, III. J. Funct. Anal. 68(2), 214–247 (1986)
Article MathSciNet MATH Google Scholar
Dziugaite, G.K., Roy, D.M.: Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008 (2017)
Evans, L.C.: Partial Differential Equations. Graduate Studies in Mathematics. American Mathematical Society, Providence (1998)
MATH Google Scholar
Fornasier, M., Solombrino, F.: Mean-field optimal control. ESAIM Control Optim. Calc. Var. 20(4), 1123–1152 (2014)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001)
MATH Google Scholar
Gangbo, W., Święch, A.: Existence of a solution to an equation arising from the theory of mean field games. J. Differ. Equ. 259(11), 6573–6643 (2015)
Article MathSciNet MATH Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of International Conference on Machine Learning (2011)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
Guéant, O., Lasry, J.-M., Lions, P.-L.: Mean Field Games and Applications. Paris-Princeton Lectures on Mathematical Finance, pp. 205–266. Springer, Berlin (2011)
MATH Google Scholar
Haber, E., Ruthotto, L.: Stable architectures for deep neural networks. Inverse Probl. 34(1), 014004 (2017)
Article MathSciNet MATH Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Huang, M., Malhamé, R.P., Caines, P.E.: Large population stochastic dynamic games: closed-loop Mckean–Vlasov systems and the Nash certainty equivalence principle. Commun. Inf. Syst. 6(3), 221–252 (2006)
MathSciNet MATH Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of International Conference on Machine Learning (2015)
Jastrzebski, S., Arpit, D., Ballas, N., Verma, V., Che, T., Bengio, Y.: Residual connections encourage iterative inference. In: Proceedings of International Conference on Learning Representations (2018)
Keller, H.: Approximation methods for nonlinear problems with application to two-point boundary value problems. Math. Comput. 29(130), 464–474 (1975)
Article MathSciNet MATH Google Scholar
Kelley, W.G., Peterson, A.C.: The Theory of Differential Equations: Classical and Qualitative. Springer, Berlin (2010)
Book MATH Google Scholar
Lasry, J.-M., Lions, P.-L.: Mean field games. Jpn. J. Math. 2(1), 229–260 (2007)
Article MathSciNet MATH Google Scholar
Lauriere, M., Pironneau, O.: Dynamic programming for mean-field type control. C. R. Math. 352(9), 707–713 (2014)
Article MathSciNet MATH Google Scholar
LeCun, Y.: A theoretical framework for back-propagation. In: The Connectionist Models Summer School, pp. 21–28 (1988)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Li, F.-F., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)
Article Google Scholar
Li, Q., Chen, L., Tai, C., E, W.: Maximum principle based algorithms for deep learning. J. Mach. Learn. Res. 18, 1–29 (2018)
MathSciNet MATH Google Scholar
Li, Q., Hao, S.: An optimal control approach to deep learning and applications to discrete-weight neural networks. In: Proceedings of International Conference on Machine Learning (2018)
Li, Z., Shi, Z.: Deep residual learning and PDEs on manifold. arXiv preprint arXiv:1708.05115 (2017)
Liberzon, D.: Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton University Press, Princeton (2012)
Book MATH Google Scholar
Lions, P.-L.: Cours au collège de france: Théorie des jeuxa champs moyens (2012)
Lu, Y., Zhong, A., Li, Q., Dong, B.: Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. arXiv preprint arXiv:1710.10121 (2017)
Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. In: Proceedings of advances in neural information processing systems (2017)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Pham, H., Wei, X.: Dynamic programming for optimal control of stochastic Mckean–Vlasov dynamics. SIAM J. Control Optim. 55(2), 1069–1101 (2017)
Article MathSciNet MATH Google Scholar
Pham, H., Wei, X.: Bellman equation and viscosity solutions for mean-field stochastic control problem. ESAIM Control Optim. Calc. Var. 24(1), 437–461 (2018)
Article MathSciNet MATH Google Scholar
Pinelis, I., Sakhanenko, A.: Remarks on inequalities for large deviation probabilities. Theory Probab. Appl. 30(1), 143–148 (1986)
Article MATH Google Scholar
Pontryagin, L.S.: Mathematical Theory of Optimal Processes. CRC Press, Boca Raton (1987)
Google Scholar
Sonoda, S., Murata, N.: Double continuum limit of deep neural networks. In: ICML Workshop on Principled Approaches to Deep Learning (2017)
Stegall, C.: Optimization of functions on certain subsets of Banach spaces. Math. Ann. 236(2), 171–176 (1978)
Article MathSciNet MATH Google Scholar
Subbotina, N.: The method of characteristics for Hamilton–Jacobi equations and applications to dynamical optimization. J. Math. Sci. 135(3), 2955–3091 (2006)
Article MathSciNet MATH Google Scholar
Sznitman, A.S.: Topics in propagation of chaos. In: Hennequin, P.-L. (ed.) Ecole d’été de probabilités de saintflour xix—1989, pp. 165–251. Springer, Berlin (1991)
Chapter Google Scholar
Veit, A., Wilber, M. J, Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems, pp. 550–558 (2016)
E, W.: A proposal on machine learning via dynamical systems. Commun. Math. Stat. 5(1), 1–11 (2017)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The work of W. E and J. Han is supported in part by ONR Grant N00014-13-1-0338 and Major Program of NNSFC under Grant 91130005. Q. Li is supported by the Agency for Science, Technology and Research, Singapore.

Author information

Authors and Affiliations

Princeton University, Princeton, NJ, 08544, USA
Weinan E & Jiequn Han
Beijing Institute of Big Data Research and Peking University, Beijing, 100871, China
Weinan E
Institute of High Performance Computing, Agency for Science, Technology and Research, 1 Fusionopolis Way, Connexis North, Singapore, 138632, Singapore
Qianxiao Li

Authors

Weinan E
View author publications
You can also search for this author in PubMed Google Scholar
Jiequn Han
View author publications
You can also search for this author in PubMed Google Scholar
Qianxiao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiequn Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

E, W., Han, J. & Li, Q. A mean-field optimal control formulation of deep learning. Res Math Sci 6, 10 (2019). https://doi.org/10.1007/s40687-018-0172-y

Download citation

Received: 05 July 2018
Accepted: 03 December 2018
Published: 13 December 2018
DOI: https://doi.org/10.1007/s40687-018-0172-y

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A mean-field optimal control formulation of deep learning

Abstract

Access this article

Similar content being viewed by others

Learning dynamics of gradient descent optimization in deep neural networks

Deterministic Neural Networks Optimization from a Continuous and Energy Point of View

On the regularized risk of distributionally robust learning over deep neural networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Navigation

A mean-field optimal control formulation of deep learning

Abstract

Access this article

Similar content being viewed by others

Learning dynamics of gradient descent optimization in deep neural networks

Deterministic Neural Networks Optimization from a Continuous and Energy Point of View

On the regularized risk of distributionally robust learning over deep neural networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation