Abstract
Stochastic control problems with delay are challenging due to the path-dependent feature of the system and thus its intrinsic high dimensions. In this paper, we propose and systematically study deep neural network-based algorithms to solve stochastic control problems with delay features. Specifically, we employ neural networks for sequence modeling (e.g., recurrent neural networks such as long short-term memory) to parameterize the policy and optimize the objective function. The proposed algorithms are tested on three benchmark examples: a linear-quadratic problem, optimal consumption with fixed finite delay, and portfolio optimization with complete memory. Particularly, we notice that the architecture of recurrent neural networks naturally captures the path-dependent feature with much flexibility and yields better performance with more efficient and stable training of the network compared to feedforward networks. The superiority is even evident in the case of portfolio optimization with complete memory, which features infinite delay.
Similar content being viewed by others
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283
Asea PK, Zak PJ (1999) Time-to-build and cycles. J Econ Dyn Control 23(8):1155–1175
Bandini E, Cosso A, Fuhrman M, Pham H (2018) Backward SDEs for optimal control of partially observed path-dependent stochastic systems: a control randomization approach. Ann Appl Probab 28(3):1634–1678
Bauer H, Rieder U (2005) Stochastic control problems with delay. Math Methods Oper Res 62(3):411–427
Bengio Y (2009) Learning deep architectures for AI. Found® Trends Mach Learn 2(1):1–127
Carleo G, Troyer M (2017) Solving the quantum many-body problem with artificial neural networks. Science 355(6325):602–606
Carmona R (2016) Lectures on BSDEs, stochastic control, and stochastic differential games with financial applications. SIAM
Carmona R, Laurière M (2019) Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games: II—the finite horizon case. arXiv preprint arXiv:1908.01613
Chang MH, Pang T, Pemy M (2008) Finite difference approximation for stochastic optimal stopping problems with delays. J Ind Manag Optim 4(2):227
Chen L, Wu Z (2010) Maximum principle for the stochastic optimal control problem with delay and application. Automatica 46(6):1074–1080
Chen L, Wu Z (2012) Dynamic programming principle for stochastic recursive optimal control problem with delayed systems. ESAIM Control Optim Calc Var 18(4):1005–1026
Chen L, Wu Z, Yu Z (2012) Delayed stochastic linear-quadratic control problem and related applications. J Appl Math 6:66
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
Eiginan W, Han J, Jentzen A (2017) Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun Math Stat 5(4):349–380
Elsanosi I, Larssen B (2001) Optimal consumption under partial observations for a stochastic system with delay. Preprint series. Pure mathematics http://urn. nb. no/URN: NBN: no-8076
Elsanosi I, Øksendal B, Sulem A (2000) Some solvable stochastic control problems with delay. Stoch Int J Probab Stoch Process 71(1–2):69–89
Federico S (2011) A stochastic control problem with delay arising in a pension fund model. Finance Stoch 15(3):421–459
Fischer M, Nappo G (2008) Time discretisation and rate of convergence for the optimal control of continuous-time stochastic systems with delay. Appl Math Optim 57(2):177–206
Fischer M, Reiss M (2007) Discretisation of stochastic control problems for continuous time dynamics with delay. J Comput Appl Math 205(2):969–981
Fouque JP, Zhang Z (2020) Deep learning methods for mean field control problems with delay. Front Appl Math Stat 6:11
Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3:115–143
Gozzi F, Marinelli C, Savin S (2009) On controlled linear diffusions with delay in a model of optimal advertising under uncertainty with memory effects. J Optim Theory Appl 142(2):291–321
Gozzi F, Masiero F (2017) Stochastic optimal control with delay in the control I: solving the HJB equation through partial smoothing. SIAM J Control Optim 55(5):2981–3012
Gozzi F, di Roma S, Marinelli C (2005) Stochastic optimal control of delay equations arising in advertising models. Stoch Part Differ Equ Appl VII:133–148
Graves A (2013) Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850
Graves A, Mohamed Ar, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 6645–6649
Graves A, Schmidhuber J (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in neural information processing systems, pp 545–552
Guatteri G, Masiero F (2020) Stochastic maximum principle for problems with delay with general dependence on the past. arXiv preprint arXiv:2002.03953
Han J (2016) Deep learning approximation for stochastic control problems. arXiv preprint arXiv:1611.07422 (2016)
Han J, Hu R (2020) Deep fictitious play for finding Markovian Nash equilibrium in multi-agent games. In: Mathematical and scientific machine learning. PMLR, pp 221–245
Han J, Hu R, Long J (2020) Convergence of deep fictitious play for stochastic differential games. arXiv preprint arXiv:2008.05519
Han J, Jentzen A, Weinan E (2018) Solving high-dimensional partial differential equations using deep learning. Proc Natl Acad Sci 115(34):8505–8510
Han J, Lu J, Zhou M (2020) Solving high-dimensional eigenvalue problems using deep neural networks: a diffusion Monte Carlo like approach. J Comput Phys 423:109792
Han J, Ma C, Ma Z, Weinan E (2019) Uniformly accurate machine learning-based hydrodynamic models for kinetic equations. Proc Natl Acad Sci 116(44):21983–21991
Han J, Zhang L, Weinan E (2019) Solving many-electron Schrödinger equation using deep neural networks. J Comput Phys 66:108929
Hermann J, Schätzle Z, Noé F (2020) Deep-neural-network solution of the electronic Schrödinger equation. Nat Chem 12(10):891–897
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hu R (2021) Deep fictitious play for stochastic differential games. Commun Math Sci 19(2):325–353
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proceedings of the international conference on learning representations (ICLR)
Kolmanovskiĭ VB, Shaĭkhet LE (1996) Control of systems with aftereffect, vol 157. American Mathematical Society
Kushner H (2008) Numerical methods for controlled stochastic delay systems. Springer
Kushner HJ (2006) Numerical approximations for stochastic systems with delays in the state and control. Stoch Int J Probab Stoch Process 78(5):343–376
Kydland FE, Prescott EC (1982) Time to build and aggregate fluctuations. Econom J Econom Soc 66:1345–1370
Larssen B (2002) Dynamic programming in stochastic control of systems with delay. Stoch Int J Probab Stoch Process 74(3–4):651–673
Larssen B, Risebro NH (2001) When are HJB-equations for control problems with stochastic delay equations finite dimensional? Preprint series. Pure mathematics http://urn. nb. no/URN: NBN: no-8076
Li K, Liu J (2018) Portfolio selection under time delays: a piecewise dynamic programming approach. Available at SSRN 2916481
Min M, Hu R (2021) Signatured deep fictitious play for mean field games with common noise. In: International conference on machine learning (ICML). PMLR. arXiv:2106.03272
Mohammed SEA (1984) Stochastic functional differential equations, vol 99. Pitman Advanced Publishing Program
Mohammed SEA (1998) Stochastic differential systems with memory: theory, examples and applications. In: Stochastic analysis and related topics VI. Springer, pp 1–77
Øksendal B, Sulem A (2000) A maximum principle for optimal control of stochastic systems with delay, with applications to finance. Preprint series. Pure mathematics http://urn. nb. no/URN: NBN: no-8076
Øksendal B, Sulem A, Zhang T (2011) Optimal control of stochastic delay equations and time-advanced backward stochastic differential equations. Adv Appl Probab 43(2):572–596
Pang T, Hussain A (2017) A stochastic portfolio optimization model with complete memory. Stoch Anal Appl 35(4):1–25
Peng S, Yang Z (2009) Anticipated backward stochastic differential equations. Ann Probab 37(3):877–902
Pfau D, Spencer JS, Matthews AG, Foulkes WMC (2020) Ab initio solution of the many-electron Schrödinger equation with deep neural networks. Phys Rev Res 2(3):033429
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Xuan Y, Balkin R, Han J, Hu R, Ceniceros HD (2021) Optimal policies for a pandemic: a stochastic game approach and a deep learning algorithm. In: Mathematical and scientific machine learning (MSML). arXiv:2012.06745
Acknowledgements
J.H. and R.H. are grateful to the reviewers for their valuable and constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
R.H. was partially supported by the NSF grant DMS-1953035, the Faculty Career Development Award and the Research Assistance Program Award, University of California, Santa Barbara.
Rights and permissions
About this article
Cite this article
Han, J., Hu, R. Recurrent neural networks for stochastic control problems with delay. Math. Control Signals Syst. 33, 775–795 (2021). https://doi.org/10.1007/s00498-021-00300-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00498-021-00300-3