Abstract
In this paper we study discrete-time Markov decision processes in Borel spaces with a finite number of constraints and with unbounded rewards and costs. Our aim is to provide a simple method to compute constrained optimal control policies when the payoff functions and the constraints are of either: infinite-horizon discounted type and average (a.k.a. ergodic) type. To deduce optimality results for the discounted case, we use the Lagrange multipliers method that rewrites the original problem (with constraints) into a parametric family of discounted unconstrained problems. Based on the dynamic programming technique as long with a simple use of elementary differential calculus, we obtain both suitable Lagrange multipliers and a family of control policies associated to these multipliers, this last family becomes optimal for the original problem with constraints. We next apply the vanishing discount factor method in order to obtain, in a straightforward way, optimal control policies associated to the average problem with constraints. Finally, to illustrate our results, we provide a simple application to linear–quadratic systems (LQ-systems).
Similar content being viewed by others
References
Altman E (1999) Constrained Markov decision processes. Chapman & Hall/CRC, Boca Raton, FL
Bäuerle N, Rieder U (2011) Markov decision processes with applications to finance. Springer, Berlin
Beutler FJ, Ross KW (1985) Optimal policies for controlled Markov chains with a constraint. J Math Anal Appl 112:236–252
Borkar VS (1994) Ergodic control of Markov chains with constraints—the general case. SIAM J Control Optim 32:176–186
Chang HS (2006) A policy improvement method in constrained stochastic dynamic programming. IEEE Trans Automat Contr 51(9):1523–1526
Chen RC, Blankenship GL (2004) Dynamic programming equations for discounted constrained stochastic control. IEEE Trans Automat Contr 49:699–709
Chen RC, Feinberg EA (2007) Non randomized policies for constrained Markov decision processes. Math Methods Oper Res 66(1):165–179
Costa OLV, Dufour F (2012) Average control of Markov decision processes with Feller transition probabilities and general action spaces. J Math Anal Appl 396:58–69
Ding Y, Jia R, Tang S (2003) Dynamical principal agent model based on CMCP. Math Methods Oper Res 58:149–157
Djonin DV, Krishnamurthy V (2007) MIMO transmission control in fading channels—a constrained Markov decision process formulation with monotone randomized policies. IEEE Trans Signal Process 55:5069–5083
Dutta PK (1991) What do discounted optima converge to? A theory of discount rate asymptotic in economic models. J Econ Theory 55:64–94
Feinberg EA, Kasyanov PO, Zadoianchuk NV (2012) Average cost Markov decision processes with weakly continuous transition probabilities. Math Oper Res 37(4):591–607
Feinberg E, Schwartz A (1996) Constrained discounted dynamic programming. Math Oper Res 21:922–945
González-Hernández J, Hernández-Lerma O (2005) Extreme points of sets of randomized strategies in constrained optimization and control problems. SIAM J Optim 15:1085–1104
Guo XP, Quanxin Z (2006) Average optimality for Markov decision processes in Borel spaces: a new condition and approach. J Appl Probab 43:318–334
Haviv M (1996) On constrained Markov decision processes. Oper Res Lett 19:25–28
Hernández-Lerma O, González-Hernández J (2000) Constrained Markov control processes in Borel spaces: the discounted case. Math Methods Oper Res 52:271–285
Hernández-Lerma O, González-Hernández J, López-Martínez RR (2003) Constrained average cost Markov control processes in Borel spaces. SIAM J Control Optim 42:442–468
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York
Hilgert N, Hernández-Lerma O (2003) Bias optimality versus strong 0-discount optimality in Markov control processes with unbounded costs. Acta Appl Math 77:215–235
Jasso-Fuentes H, Escobedo-Trujillo BA, Mendoza-Pérez AF (2016) The Lagrange and the vanishing discount techniques to controlled diffusions with cost constraints. J Math Anal Appl 437:999–1035
Korf LA (2006) Approximating infinite horizon stochastic optimal control in discrete time with constraints. Ann Oper Res 142:165–186
Krishnamurthy V, Vázquez Abad F, Martin K (2003) Implementation of gradient estimation to a constrained Markov decision problem. In: 42nd IEEE conference on decision and control, 2003, pp 4841–4846
Lyer K, Hamachandra N (2010) Sensitivity analysis and optimal ultimately stationary deterministic policies in some constrained discounted cost models. Math Methods Oper Res 71(3):404–425
Mendoza-Pérez AF, Hernández-Lerma O (2010) Markov control processes with pathwise constraints. Math Methods Oper Res 71:477–502
Mendoza-Pérez AF, Hernández-Lerma O (2012) Deterministic optimal policies for Markov control processes with pathwise constraints. Appl Math 39(2):185–209
Mendoza-Pérez AF, Jasso-Fuentes H, Hernández-Lerma O (2015) The Lagrange approach to ergodic control of diffusions with cost constraints. Optimization 64:176–196
Nishimura K, Stachurski J (2007) Stochastic optimal policies when the discount rate vanishes. J Econ Dyn Control 31:1416–1430
Prieto-Rumeau T, Hernández-Lerma O (2008) Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J Control Optim 47:1888–1908
Prieto-Rumeau T, Hernández-Lerma O (2010) The vanishing discount approach to constrained continuous-time controlled Markov chains. Syst Control Lett 59:504–509
Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer, Boston
Ross KW, Varadarajan R (1989) Markov decision processes with sample path constraints. Oper Res 37:780–790
Ross KW, Varadarajan R (1991) Multichain Markov decision processes with a sample path constraint. Math Oper Res 16:195–207
Vega-Amaya O (2015) On the vanishing discount factor approach for Markov decision processes with weakly continuous transition probabilities. J Math Anal Appl 426:978–985
Zadorojniy A, Schwartz A (2006) Robustness of policies in constrained Markov decision processes. IEEE Trans. Automat. Control 51:635–638
Acknowledgments
The authors wish to thank the editors and the two anonymous referees who have patiently gone through this paper and whose suggestions have improved its presentation and readability.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported in part by CONACyT Grant No. 238045.
Rights and permissions
About this article
Cite this article
Mendoza-Pérez, A.F., Jasso-Fuentes, H. & De-la-Cruz Courtois, O.A. Constrained Markov decision processes in Borel spaces: from discounted to average optimality. Math Meth Oper Res 84, 489–525 (2016). https://doi.org/10.1007/s00186-016-0551-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-016-0551-3
Keywords
- Markov decision processes
- Constrained control problems
- Vanishing discount approach
- Lagrange multipliers