Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Semi-Markov decision processes with variance minimization criterion

  • Research paper
  • Published:
4OR Aims and scope Submit manuscript

Abstract

We consider a variance minimization problem for semi-Markov decision processes with state-dependent discount factors in Borel spaces. The reward function may be unbounded from below and from above. Under suitable conditions, we first prove that the discount variance minimization criterion can be transformed into an equivalent expected discount criterion, and then show the existence of a discount variance minimal policy over the class of expected discount optimal stationary policies. Furthermore, we also give a value iteration algorithm for calculating the expected discount optimal value function. Finally, two examples are used to illustrate our results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bertsekas DP (2001) Dynamic programming and optimal control. Athena Scientific, Belmont

    Google Scholar 

  • Berument H, Kilinc Z, Ozlale U (2004) The effects of different inflation risk prepius on interest rate spreads. Phys A 333:317–324

    Article  Google Scholar 

  • Cruz-Suárez D, Montes-de-Oca R, Salem-Silva F (2004) Conditions for the uniqueness of optima policies of discounted Markov decision processes. Math Methods Oper Res 60:415–436

    Article  Google Scholar 

  • Filar JA, Kallenberg LCM, Lee HM (1989) Variance-penalized Markov decision processes. Math Oper Res 14:147–161

    Article  Google Scholar 

  • González-Hernández J, López-Martínez RR, Minjárez-Sosa JA (2008) Adaptive policies for stochastic systems under a randomized cost criterion. Bol Soc Mat Mex 14:149–163

    Google Scholar 

  • González-Hernández J, López-Martínez RR, Minjárez-Sosa JA (2009) Approximation, estimation and control of stochastic systems under randomized discounted cost criterion. Kybernetika 45:737–754

    Google Scholar 

  • Guo XP, Yang J (2008) A new condition and approach for zero-sum stochastic games with average payoffs. Stoch Anal Appl 26:537–561

    Article  Google Scholar 

  • Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes: theory and applications. Springer, Berlin Heidelberg

    Book  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York

    Book  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York

    Book  Google Scholar 

  • Hernández-Lerma O, Vega-Amaya O, Carrasco G (1999) Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J Control Optim 38:79–93

    Article  Google Scholar 

  • Hinderer K (1970) Foundations of non-stationary dynamical programming with discrete time parameter. Springer, New York

    Book  Google Scholar 

  • Huang Y, Kallenberg LCM (1994) On finding optimal policies for Markov decision chains: a unifying framework for mean-variance-tradeoffs. Math Oper Res 19:434–448

    Article  Google Scholar 

  • Jaquette SC (1973) Markov decision processes with a new optimality criterion: discrete time. Ann Stat 1:496–505

    Article  Google Scholar 

  • Kadota Y, Kurano M, Yasuda M (1995) Discounted Markov decision processes with general utility. In: Proceeding of APORS’ 94. World Scientific, pp 330–337

  • Kitaev MY, Rykov VV (1995) Controlled queueing systems. CRC Press, Florida

    Google Scholar 

  • Newell RG, Pizer WA (2003) Discounting the distant future: how much do uncertain rates increase valuation. J Environ Econ Manage 46:52–71

    Article  Google Scholar 

  • Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York

    Book  Google Scholar 

  • Schäl M (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z Wahrscheinlichkeitstheorie Verw Gebiete 32:179–196

    Article  Google Scholar 

  • Sobel MJ (1982) The variance of discounted Markov decision processes. J Appl Probab 19:794–802

    Article  Google Scholar 

  • Vega-Amaya O (2012) On the regularity property of semi-Markov processes with Borel state spaces. In: Hernández-Hernández D, Minjárez-Sosa JA (eds) Optimization, control, and applications of stochastic systems. Springer, New York, pp 301–309

    Chapter  Google Scholar 

  • Wakuta W (1987) Arbitrary state semi-Markov decision processes with unbounded rewards. Optimization 18:447–454

    Article  Google Scholar 

  • Wei QD, Guo XP (2011) Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett 39:369–374

    Google Scholar 

  • Wei QD, Guo XP (2012) New average optimality conditions for semi-Markov decision processes in Borel spaces. J Optim Theory Appl 153:709–732

    Article  Google Scholar 

  • Zhang Y (2013) Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. Top 21:378–408

    Article  Google Scholar 

  • Zhu QX, Guo XP (2007) Markov decision processes with variance minimization: a new condition and approach. Stoch Anal Appl 25:577–592

    Article  Google Scholar 

Download references

Acknowledgments

This research of the first author was supported by the Fundamental Research Funds for the Central Universities of Huaqiao University (No. 14BS114). The research of the second author was supported by NSFC and GDUPS. We are greatly indebted to the anonymous referees for many valuable comments and suggestions that have improved the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianping Guo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, Q., Guo, X. Semi-Markov decision processes with variance minimization criterion. 4OR-Q J Oper Res 13, 59–79 (2015). https://doi.org/10.1007/s10288-014-0267-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10288-014-0267-2

Keywords

Mathematics Subject Classification