Recurrent neural networks for stochastic control problems with delay

Han, Jiequn; Hu, Ruimeng

doi:10.1007/s00498-021-00300-3

Recurrent neural networks for stochastic control problems with delay

Original Article
Published: 22 July 2021

Volume 33, pages 775–795, (2021)
Cite this article

Mathematics of Control, Signals, and Systems Aims and scope Submit manuscript

Jiequn Han¹ &
Ruimeng Hu^2,3

696 Accesses
11 Citations
Explore all metrics

Abstract

Stochastic control problems with delay are challenging due to the path-dependent feature of the system and thus its intrinsic high dimensions. In this paper, we propose and systematically study deep neural network-based algorithms to solve stochastic control problems with delay features. Specifically, we employ neural networks for sequence modeling (e.g., recurrent neural networks such as long short-term memory) to parameterize the policy and optimize the objective function. The proposed algorithms are tested on three benchmark examples: a linear-quadratic problem, optimal consumption with fixed finite delay, and portfolio optimization with complete memory. Particularly, we notice that the architecture of recurrent neural networks naturally captures the path-dependent feature with much flexibility and yields better performance with more efficient and stable training of the network compared to feedforward networks. The superiority is even evident in the case of portfolio optimization with complete memory, which features infinite delay.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear-Quadratic Stochastic Delayed Control and Deep Learning Resolution

Article 23 September 2021

A time-varying stock portfolio selection model based on optimized PSO-BiLSTM and multi-objective mathematical programming under budget constraints

Article 16 June 2023

A Novel Deep Reinforcement Learning Framework for Stock Portfolio Optimization

Notes

https://github.com/frankhan91/RNN-ControlwithDelay.

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283
Asea PK, Zak PJ (1999) Time-to-build and cycles. J Econ Dyn Control 23(8):1155–1175
Article MathSciNet Google Scholar
Bandini E, Cosso A, Fuhrman M, Pham H (2018) Backward SDEs for optimal control of partially observed path-dependent stochastic systems: a control randomization approach. Ann Appl Probab 28(3):1634–1678
Article MathSciNet Google Scholar
Bauer H, Rieder U (2005) Stochastic control problems with delay. Math Methods Oper Res 62(3):411–427
Article MathSciNet Google Scholar
Bengio Y (2009) Learning deep architectures for AI. Found® Trends Mach Learn 2(1):1–127
Article MathSciNet Google Scholar
Carleo G, Troyer M (2017) Solving the quantum many-body problem with artificial neural networks. Science 355(6325):602–606
Article MathSciNet Google Scholar
Carmona R (2016) Lectures on BSDEs, stochastic control, and stochastic differential games with financial applications. SIAM
Carmona R, Laurière M (2019) Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games: II—the finite horizon case. arXiv preprint arXiv:1908.01613
Chang MH, Pang T, Pemy M (2008) Finite difference approximation for stochastic optimal stopping problems with delays. J Ind Manag Optim 4(2):227
MathSciNet MATH Google Scholar
Chen L, Wu Z (2010) Maximum principle for the stochastic optimal control problem with delay and application. Automatica 46(6):1074–1080
Article MathSciNet Google Scholar
Chen L, Wu Z (2012) Dynamic programming principle for stochastic recursive optimal control problem with delayed systems. ESAIM Control Optim Calc Var 18(4):1005–1026
Article MathSciNet Google Scholar
Chen L, Wu Z, Yu Z (2012) Delayed stochastic linear-quadratic control problem and related applications. J Appl Math 6:66
MathSciNet Google Scholar
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
Eiginan W, Han J, Jentzen A (2017) Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun Math Stat 5(4):349–380
Article MathSciNet Google Scholar
Elsanosi I, Larssen B (2001) Optimal consumption under partial observations for a stochastic system with delay. Preprint series. Pure mathematics http://urn. nb. no/URN: NBN: no-8076
Elsanosi I, Øksendal B, Sulem A (2000) Some solvable stochastic control problems with delay. Stoch Int J Probab Stoch Process 71(1–2):69–89
MathSciNet MATH Google Scholar
Federico S (2011) A stochastic control problem with delay arising in a pension fund model. Finance Stoch 15(3):421–459
Article MathSciNet Google Scholar
Fischer M, Nappo G (2008) Time discretisation and rate of convergence for the optimal control of continuous-time stochastic systems with delay. Appl Math Optim 57(2):177–206
Article MathSciNet Google Scholar
Fischer M, Reiss M (2007) Discretisation of stochastic control problems for continuous time dynamics with delay. J Comput Appl Math 205(2):969–981
Article MathSciNet Google Scholar
Fouque JP, Zhang Z (2020) Deep learning methods for mean field control problems with delay. Front Appl Math Stat 6:11
Article Google Scholar
Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3:115–143
MathSciNet MATH Google Scholar
Gozzi F, Marinelli C, Savin S (2009) On controlled linear diffusions with delay in a model of optimal advertising under uncertainty with memory effects. J Optim Theory Appl 142(2):291–321
Article MathSciNet Google Scholar
Gozzi F, Masiero F (2017) Stochastic optimal control with delay in the control I: solving the HJB equation through partial smoothing. SIAM J Control Optim 55(5):2981–3012
Article MathSciNet Google Scholar
Gozzi F, di Roma S, Marinelli C (2005) Stochastic optimal control of delay equations arising in advertising models. Stoch Part Differ Equ Appl VII:133–148
MATH Google Scholar
Graves A (2013) Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850
Graves A, Mohamed Ar, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 6645–6649
Graves A, Schmidhuber J (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in neural information processing systems, pp 545–552
Guatteri G, Masiero F (2020) Stochastic maximum principle for problems with delay with general dependence on the past. arXiv preprint arXiv:2002.03953
Han J (2016) Deep learning approximation for stochastic control problems. arXiv preprint arXiv:1611.07422 (2016)
Han J, Hu R (2020) Deep fictitious play for finding Markovian Nash equilibrium in multi-agent games. In: Mathematical and scientific machine learning. PMLR, pp 221–245
Han J, Hu R, Long J (2020) Convergence of deep fictitious play for stochastic differential games. arXiv preprint arXiv:2008.05519
Han J, Jentzen A, Weinan E (2018) Solving high-dimensional partial differential equations using deep learning. Proc Natl Acad Sci 115(34):8505–8510
Article MathSciNet Google Scholar
Han J, Lu J, Zhou M (2020) Solving high-dimensional eigenvalue problems using deep neural networks: a diffusion Monte Carlo like approach. J Comput Phys 423:109792
Article MathSciNet Google Scholar
Han J, Ma C, Ma Z, Weinan E (2019) Uniformly accurate machine learning-based hydrodynamic models for kinetic equations. Proc Natl Acad Sci 116(44):21983–21991
Article MathSciNet Google Scholar
Han J, Zhang L, Weinan E (2019) Solving many-electron Schrödinger equation using deep neural networks. J Comput Phys 66:108929
Article Google Scholar
Hermann J, Schätzle Z, Noé F (2020) Deep-neural-network solution of the electronic Schrödinger equation. Nat Chem 12(10):891–897
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Hu R (2021) Deep fictitious play for stochastic differential games. Commun Math Sci 19(2):325–353
Article MathSciNet Google Scholar
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proceedings of the international conference on learning representations (ICLR)
Kolmanovskiĭ VB, Shaĭkhet LE (1996) Control of systems with aftereffect, vol 157. American Mathematical Society
Kushner H (2008) Numerical methods for controlled stochastic delay systems. Springer
Kushner HJ (2006) Numerical approximations for stochastic systems with delays in the state and control. Stoch Int J Probab Stoch Process 78(5):343–376
Article MathSciNet Google Scholar
Kydland FE, Prescott EC (1982) Time to build and aggregate fluctuations. Econom J Econom Soc 66:1345–1370
MATH Google Scholar
Larssen B (2002) Dynamic programming in stochastic control of systems with delay. Stoch Int J Probab Stoch Process 74(3–4):651–673
MathSciNet MATH Google Scholar
Larssen B, Risebro NH (2001) When are HJB-equations for control problems with stochastic delay equations finite dimensional? Preprint series. Pure mathematics http://urn. nb. no/URN: NBN: no-8076
Li K, Liu J (2018) Portfolio selection under time delays: a piecewise dynamic programming approach. Available at SSRN 2916481
Min M, Hu R (2021) Signatured deep fictitious play for mean field games with common noise. In: International conference on machine learning (ICML). PMLR. arXiv:2106.03272
Mohammed SEA (1984) Stochastic functional differential equations, vol 99. Pitman Advanced Publishing Program
Mohammed SEA (1998) Stochastic differential systems with memory: theory, examples and applications. In: Stochastic analysis and related topics VI. Springer, pp 1–77
Øksendal B, Sulem A (2000) A maximum principle for optimal control of stochastic systems with delay, with applications to finance. Preprint series. Pure mathematics http://urn. nb. no/URN: NBN: no-8076
Øksendal B, Sulem A, Zhang T (2011) Optimal control of stochastic delay equations and time-advanced backward stochastic differential equations. Adv Appl Probab 43(2):572–596
Article MathSciNet Google Scholar
Pang T, Hussain A (2017) A stochastic portfolio optimization model with complete memory. Stoch Anal Appl 35(4):1–25
Article MathSciNet Google Scholar
Peng S, Yang Z (2009) Anticipated backward stochastic differential equations. Ann Probab 37(3):877–902
Article MathSciNet Google Scholar
Pfau D, Spencer JS, Matthews AG, Foulkes WMC (2020) Ab initio solution of the many-electron Schrödinger equation with deep neural networks. Phys Rev Res 2(3):033429
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article Google Scholar
Xuan Y, Balkin R, Han J, Hu R, Ceniceros HD (2021) Optimal policies for a pandemic: a stochastic game approach and a deep learning algorithm. In: Mathematical and scientific machine learning (MSML). arXiv:2012.06745

Download references

Acknowledgements

J.H. and R.H. are grateful to the reviewers for their valuable and constructive comments.

Author information

Authors and Affiliations

Department of Mathematics, Princeton University, Princeton, NJ, 08544-1000, USA
Jiequn Han
Department of Mathematics, University of California, Santa Barbara, CA, 93106-3080, USA
Ruimeng Hu
Department of Statistics and Applied Probability, University of California, Santa Barbara, CA, 93106-3080, USA
Ruimeng Hu

Authors

Jiequn Han
View author publications
You can also search for this author in PubMed Google Scholar
Ruimeng Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruimeng Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

R.H. was partially supported by the NSF grant DMS-1953035, the Faculty Career Development Award and the Research Assistance Program Award, University of California, Santa Barbara.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, J., Hu, R. Recurrent neural networks for stochastic control problems with delay. Math. Control Signals Syst. 33, 775–795 (2021). https://doi.org/10.1007/s00498-021-00300-3

Download citation

Received: 05 January 2021
Accepted: 09 July 2021
Published: 22 July 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00498-021-00300-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recurrent neural networks for stochastic control problems with delay

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Linear-Quadratic Stochastic Delayed Control and Deep Learning Resolution

A time-varying stock portfolio selection model based on optimized PSO-BiLSTM and multi-objective mathematical programming under budget constraints

A Novel Deep Reinforcement Learning Framework for Stock Portfolio Optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Recurrent neural networks for stochastic control problems with delay

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Linear-Quadratic Stochastic Delayed Control and Deep Learning Resolution

A time-varying stock portfolio selection model based on optimized PSO-BiLSTM and multi-objective mathematical programming under budget constraints

A Novel Deep Reinforcement Learning Framework for Stock Portfolio Optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation