Abstract
We consider a variance minimization problem for semi-Markov decision processes with state-dependent discount factors in Borel spaces. The reward function may be unbounded from below and from above. Under suitable conditions, we first prove that the discount variance minimization criterion can be transformed into an equivalent expected discount criterion, and then show the existence of a discount variance minimal policy over the class of expected discount optimal stationary policies. Furthermore, we also give a value iteration algorithm for calculating the expected discount optimal value function. Finally, two examples are used to illustrate our results.
Similar content being viewed by others
References
Bertsekas DP (2001) Dynamic programming and optimal control. Athena Scientific, Belmont
Berument H, Kilinc Z, Ozlale U (2004) The effects of different inflation risk prepius on interest rate spreads. Phys A 333:317–324
Cruz-Suárez D, Montes-de-Oca R, Salem-Silva F (2004) Conditions for the uniqueness of optima policies of discounted Markov decision processes. Math Methods Oper Res 60:415–436
Filar JA, Kallenberg LCM, Lee HM (1989) Variance-penalized Markov decision processes. Math Oper Res 14:147–161
González-Hernández J, López-Martínez RR, Minjárez-Sosa JA (2008) Adaptive policies for stochastic systems under a randomized cost criterion. Bol Soc Mat Mex 14:149–163
González-Hernández J, López-Martínez RR, Minjárez-Sosa JA (2009) Approximation, estimation and control of stochastic systems under randomized discounted cost criterion. Kybernetika 45:737–754
Guo XP, Yang J (2008) A new condition and approach for zero-sum stochastic games with average payoffs. Stoch Anal Appl 26:537–561
Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes: theory and applications. Springer, Berlin Heidelberg
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York
Hernández-Lerma O, Vega-Amaya O, Carrasco G (1999) Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J Control Optim 38:79–93
Hinderer K (1970) Foundations of non-stationary dynamical programming with discrete time parameter. Springer, New York
Huang Y, Kallenberg LCM (1994) On finding optimal policies for Markov decision chains: a unifying framework for mean-variance-tradeoffs. Math Oper Res 19:434–448
Jaquette SC (1973) Markov decision processes with a new optimality criterion: discrete time. Ann Stat 1:496–505
Kadota Y, Kurano M, Yasuda M (1995) Discounted Markov decision processes with general utility. In: Proceeding of APORS’ 94. World Scientific, pp 330–337
Kitaev MY, Rykov VV (1995) Controlled queueing systems. CRC Press, Florida
Newell RG, Pizer WA (2003) Discounting the distant future: how much do uncertain rates increase valuation. J Environ Econ Manage 46:52–71
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Schäl M (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z Wahrscheinlichkeitstheorie Verw Gebiete 32:179–196
Sobel MJ (1982) The variance of discounted Markov decision processes. J Appl Probab 19:794–802
Vega-Amaya O (2012) On the regularity property of semi-Markov processes with Borel state spaces. In: Hernández-Hernández D, Minjárez-Sosa JA (eds) Optimization, control, and applications of stochastic systems. Springer, New York, pp 301–309
Wakuta W (1987) Arbitrary state semi-Markov decision processes with unbounded rewards. Optimization 18:447–454
Wei QD, Guo XP (2011) Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett 39:369–374
Wei QD, Guo XP (2012) New average optimality conditions for semi-Markov decision processes in Borel spaces. J Optim Theory Appl 153:709–732
Zhang Y (2013) Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. Top 21:378–408
Zhu QX, Guo XP (2007) Markov decision processes with variance minimization: a new condition and approach. Stoch Anal Appl 25:577–592
Acknowledgments
This research of the first author was supported by the Fundamental Research Funds for the Central Universities of Huaqiao University (No. 14BS114). The research of the second author was supported by NSFC and GDUPS. We are greatly indebted to the anonymous referees for many valuable comments and suggestions that have improved the presentation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wei, Q., Guo, X. Semi-Markov decision processes with variance minimization criterion. 4OR-Q J Oper Res 13, 59–79 (2015). https://doi.org/10.1007/s10288-014-0267-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10288-014-0267-2
Keywords
- Semi-Markov decision processes
- State-dependent discount factors
- Discount optimality equation
- Discount variance minimal policy