Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

A mean-field optimal control formulation of deep learning

  • Research
  • Published:
Research in the Mathematical Sciences Aims and scope Submit manuscript

Abstract

Recent work linking deep neural networks and dynamical systems opened up new avenues to analyze deep learning. In particular, it is observed that new insights can be obtained by recasting deep learning as an optimal control problem on difference or differential equations. However, the mathematical aspects of such a formulation have not been systematically explored. This paper introduces the mathematical formulation of the population risk minimization problem in deep learning as a mean-field optimal control problem. Mirroring the development of classical optimal control, we state and prove optimality conditions of both the Hamilton–Jacobi–Bellman type and the Pontryagin type. These mean-field results reflect the probabilistic nature of the learning problem. In addition, by appealing to the mean-field Pontryagin’s maximum principle, we establish some quantitative relationships between population and empirical learning problems. This serves to establish a mathematical foundation for investigating the algorithmic and theoretical connections between optimal control and deep learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Andersson, D., Djehiche, B.: A maximum principle for SDEs of mean-field type. Appl. Math. Optim. 63(3), 341–356 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  2. Arora, S., Ge, R., Neyshabur, B., Zhang, Y.: Stronger generalization bounds for deep nets via a compression approach. arXiv preprint arXiv:1802.05296 (2018)

  3. Athans, M., Falb, P.L.: Optimal Control: An Introduction to the Theory and Its Applications. Courier Corporation, Chelmsford (2013)

    Google Scholar 

  4. Bellman, R.: Dynamic Programming. Courier Corporation, Chelmsford (2013)

    MATH  Google Scholar 

  5. Bengio, Y.: Learning deep architectures for AI. Found. Trends® Mach. Learn. 2(1), 1–127 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bensoussan, A., Frehse, J., Yam, P.: Mean Field Games and Mean Field Type Control Theory, vol. 101. Springer, Berlin (2013)

    Book  MATH  Google Scholar 

  7. Boltyanskii, V.G., Gamkrelidze, R.V., Pontryagin, L.S.: The Theory of Optimal Processes. I. The Maximum Principle. TRW Space Technology Labs, Los Angeles, CA (1960)

    Google Scholar 

  8. Bongini, M., Fornasier, M., Rossi, F., Solombrino, F.: Mean-field pontryagin maximum principle. J. Optim. Theory Appl. 175(1), 1–38 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bressan, A., Piccoli, B.: Introduction to the Mathematical Theory of Control, vol. 2. American Institute of Mathematical Sciences, Springfield (2007)

    MATH  Google Scholar 

  10. Bryson, A.E.: Applied Optimal Control: Optimization, Estimation and Control. CRC Press, Boca Raton (1975)

    Google Scholar 

  11. Buckdahn, R., Djehiche, B., Li, J.: A general stochastic maximum principle for SDEs of mean-field type. Appl. Math. Optim. 64(2), 197–216 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  12. Caponigro, M., Fornasier, M., Piccoli, B., Trélat, E.: Sparse stabilization and control of alignment models. Math. Models Methods Appl. Sci. 25(03), 521–564 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  13. Cardaliaguet, P.: Notes on mean field games (2010) (Unpublished note)

  14. Carmona, R., Delarue, F.: Forward–backward stochastic differential equations and controlled McKean–Vlasov dynamics. Ann. Probab. 43(5), 2647–2700 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  15. Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., Holtham, E.: Reversible architectures for arbitrarily deep residual neural networks. In: Proceedings of AAAI Conference on Artificial Intelligence (2018)

  16. Chang, B., Meng, L., Haber, E., Tung, F., Begert, D.: Multi-level residual networks from dynamical systems view. In: Proceedings of International Conference on Learning Representations (2018)

  17. Chen, T.Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.: Neural ordinary differential equations. arXiv preprint arXiv:1806.07366 (2018)

  18. Crandall, M.G., Ishii, H., Lions, P.-L.: User’s guide to viscosity solutions of second order partial differential equations. Bull. Am. Math. Soc. 27(1), 1–67 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  19. Crandall, M.G., Lions, P.-L.: Viscosity solutions of Hamilton–Jacobi equations. Trans. Am. Math. Soc. 277(1), 1–42 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  20. Crandall, M.G., Lions, P.-L.: Hamilton–Jacobi equations in infinite dimensions. I. Uniqueness of viscosity solutions. J. Funct. Anal. 62(3), 379–396 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  21. Crandall, M.G., Lions, P.-L.: Hamilton–Jacobi equations in infinite dimensions. II. Existence of viscosity solutions. J. Funct. Anal. 65(3), 368–405 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  22. Crandall, M.G., Lions, P.-L.: Hamilton–Jacobi equations in infinite dimensions, III. J. Funct. Anal. 68(2), 214–247 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  23. Dziugaite, G.K., Roy, D.M.: Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008 (2017)

  24. Evans, L.C.: Partial Differential Equations. Graduate Studies in Mathematics. American Mathematical Society, Providence (1998)

    MATH  Google Scholar 

  25. Fornasier, M., Solombrino, F.: Mean-field optimal control. ESAIM Control Optim. Calc. Var. 20(4), 1123–1152 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  26. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001)

    MATH  Google Scholar 

  27. Gangbo, W., Święch, A.: Existence of a solution to an equation arising from the theory of mean field games. J. Differ. Equ. 259(11), 6573–6643 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  28. Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of International Conference on Machine Learning (2011)

  29. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  30. Guéant, O., Lasry, J.-M., Lions, P.-L.: Mean Field Games and Applications. Paris-Princeton Lectures on Mathematical Finance, pp. 205–266. Springer, Berlin (2011)

    MATH  Google Scholar 

  31. Haber, E., Ruthotto, L.: Stable architectures for deep neural networks. Inverse Probl. 34(1), 014004 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  32. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  33. Huang, M., Malhamé, R.P., Caines, P.E.: Large population stochastic dynamic games: closed-loop Mckean–Vlasov systems and the Nash certainty equivalence principle. Commun. Inf. Syst. 6(3), 221–252 (2006)

    MathSciNet  MATH  Google Scholar 

  34. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of International Conference on Machine Learning (2015)

  35. Jastrzebski, S., Arpit, D., Ballas, N., Verma, V., Che, T., Bengio, Y.: Residual connections encourage iterative inference. In: Proceedings of International Conference on Learning Representations (2018)

  36. Keller, H.: Approximation methods for nonlinear problems with application to two-point boundary value problems. Math. Comput. 29(130), 464–474 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  37. Kelley, W.G., Peterson, A.C.: The Theory of Differential Equations: Classical and Qualitative. Springer, Berlin (2010)

    Book  MATH  Google Scholar 

  38. Lasry, J.-M., Lions, P.-L.: Mean field games. Jpn. J. Math. 2(1), 229–260 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  39. Lauriere, M., Pironneau, O.: Dynamic programming for mean-field type control. C. R. Math. 352(9), 707–713 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  40. LeCun, Y.: A theoretical framework for back-propagation. In: The Connectionist Models Summer School, pp. 21–28 (1988)

  41. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  42. Li, F.-F., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)

    Article  Google Scholar 

  43. Li, Q., Chen, L., Tai, C., E, W.: Maximum principle based algorithms for deep learning. J. Mach. Learn. Res. 18, 1–29 (2018)

    MathSciNet  MATH  Google Scholar 

  44. Li, Q., Hao, S.: An optimal control approach to deep learning and applications to discrete-weight neural networks. In: Proceedings of International Conference on Machine Learning (2018)

  45. Li, Z., Shi, Z.: Deep residual learning and PDEs on manifold. arXiv preprint arXiv:1708.05115 (2017)

  46. Liberzon, D.: Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton University Press, Princeton (2012)

    Book  MATH  Google Scholar 

  47. Lions, P.-L.: Cours au collège de france: Théorie des jeuxa champs moyens (2012)

  48. Lu, Y., Zhong, A., Li, Q., Dong, B.: Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. arXiv preprint arXiv:1710.10121 (2017)

  49. Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. In: Proceedings of advances in neural information processing systems (2017)

  50. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  51. Pham, H., Wei, X.: Dynamic programming for optimal control of stochastic Mckean–Vlasov dynamics. SIAM J. Control Optim. 55(2), 1069–1101 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  52. Pham, H., Wei, X.: Bellman equation and viscosity solutions for mean-field stochastic control problem. ESAIM Control Optim. Calc. Var. 24(1), 437–461 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  53. Pinelis, I., Sakhanenko, A.: Remarks on inequalities for large deviation probabilities. Theory Probab. Appl. 30(1), 143–148 (1986)

    Article  MATH  Google Scholar 

  54. Pontryagin, L.S.: Mathematical Theory of Optimal Processes. CRC Press, Boca Raton (1987)

    Google Scholar 

  55. Sonoda, S., Murata, N.: Double continuum limit of deep neural networks. In: ICML Workshop on Principled Approaches to Deep Learning (2017)

  56. Stegall, C.: Optimization of functions on certain subsets of Banach spaces. Math. Ann. 236(2), 171–176 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  57. Subbotina, N.: The method of characteristics for Hamilton–Jacobi equations and applications to dynamical optimization. J. Math. Sci. 135(3), 2955–3091 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  58. Sznitman, A.S.: Topics in propagation of chaos. In: Hennequin, P.-L. (ed.) Ecole d’été de probabilités de saintflour xix—1989, pp. 165–251. Springer, Berlin (1991)

    Chapter  Google Scholar 

  59. Veit, A., Wilber, M. J, Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems, pp. 550–558 (2016)

  60. E, W.: A proposal on machine learning via dynamical systems. Commun. Math. Stat. 5(1), 1–11 (2017)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The work of W. E and J. Han is supported in part by ONR Grant N00014-13-1-0338 and Major Program of NNSFC under Grant 91130005. Q. Li is supported by the Agency for Science, Technology and Research, Singapore.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiequn Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

E, W., Han, J. & Li, Q. A mean-field optimal control formulation of deep learning. Res Math Sci 6, 10 (2019). https://doi.org/10.1007/s40687-018-0172-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40687-018-0172-y