Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Constrained Markov decision processes with total cost criteria: Occupation measures and primal LP

  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

This paper is the third in a series on constrained Markov decision processes (CMDPs) with a countable state space and unbounded cost. In the previous papers we studied the expected average and the discounted cost. We analyze in this paper the total cost criterion. We study the properties of the set of occupation measures achieved by different classes of policies; we then focus on stationary policies and on mixed deterministic policies and present conditions under which optimal policies exist within these classes. We conclude by introducing an equivalent infinite Linear Program.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Altman E (1993) Asymptotic properties of constrained markov decision processes. ZOR — Mathematical Method of Operations Research 37:151–170

    Google Scholar 

  2. Altman E (1994) Denumerable constrained Markov Decision Processes and finite approximations. Math of Operations Research 19:169–191

    Google Scholar 

  3. Altman E (1996) Constrained Markov decision processes with total cost criteria: Lagrange approach and dual LP. Submitted

  4. Altman E (1995) Constrained Markov decision processes. INRIA Report 2574

  5. Altman E, Shwartz A (1989) Optimal priority assignment: A time sharing approach. IEEE Transactions on Automatic Control AC- 34:1089–1102

    Google Scholar 

  6. Altman E, Shwartz A (1991) Markov decision problems and state-action frequencies. SIAM J Control and Optimization 29:786–809

    Google Scholar 

  7. Altman E, Shwartz A (1991) Sensitivity of constrained Markov decision problems. Annals of Operations Research 32:1–22

    Google Scholar 

  8. Aumann RJ (1964) Mixed and behavior strategies in infinite extensive games. Advances in Game Theory Ann Math Study 52:627–650

    Google Scholar 

  9. Bernhard P (1992) Information and strategies in dynamic games. SIAM J Cont and Opt 30:212–228

    Google Scholar 

  10. Beutler FJ, Ross KW (1985) Optimal policies for controlled Markov chains with a constraint. J Mathematical Analysis and Applications 112:236–252

    Google Scholar 

  11. Beutler FJ. Ross KW (1986) Time-average optimal constrained semi-markov decision processes. Advances of Applied Probability 18:341–359

    Google Scholar 

  12. Billingsley P (1968) Convergence of probability measures. J. Wiley, New York

    Google Scholar 

  13. Borkar VS (1988) A convex analytic approach to Markov decision processes. Prob Th Rel Fields 78:583–602

    Google Scholar 

  14. Borkar VS (1990) Topics in controlled markov chains. Longman Scientific & Technical

    Google Scholar 

  15. Borkar VS (1994) Ergodic control of Markov chains with constraints-the general case. SIAM J Control and Optimization 32:176–186

    Google Scholar 

  16. Dekker R, Hordijk A (1988) Average, sensitive and Blackwell optimal policies in denumerable Markov decision chains with unbounded rewards. Mathematics of Operations Research 13:395–421

    Google Scholar 

  17. Derman C (1970) Finite state markovian decision processes. Academic Press

  18. Derman C, Klein M (1965) Some remarks on finite horizon Markovian decision models. Operations research 13:272–278

    Google Scholar 

  19. Derman C, Veinott Jr AF (1972) Constrained Markov decision chains. Management Science 19:389–390

    Google Scholar 

  20. Derman C, Strauch RE (1966) A note on memoryless rules for controlling sequential control processes. Ann Math Stat 37:276–278

    Google Scholar 

  21. Feinberg EA (1995) Constrained semi-Markov decision processes with average rewards. ZOR—Mathematical Method of Operations Research 39:257–288

    Google Scholar 

  22. Feinberg EA, Reiman MI (1994) Optimality of randomized trunk reservation. Probability in the Engineering and Informational Sciences 8:463–489

    Google Scholar 

  23. Feinberg EA, Sonin I (1993) The existence of an equivalent stationary strategy in the case of discount factor equal one. Unpublished Draft

  24. Feinberg EA, Sonin I (1995) Notes on equivalent stationary policies in Markov decision processes with total rewards. Submitted to ZOR—Methematical Methods of Operations Research

  25. Feinberg EA, Shwartz A (1995) Constrained discounted dynamic programming, to appear in Math of Operations Research

  26. Haviv M (1995) On constrained Markov decision processes. Submitted to OR letters

  27. Hinderer K (1970) Foundation of non-stationary dynamic programming with discrete time parameter, vol 33. Lecture Notes in Operations Research and Mathematical Systems. Springer-Verlag, Berlin

    Google Scholar 

  28. Hordijk A (1977) Dynamic programming and markov potential theory. Second Edition. Mathematical Centre Tracts 51, Mathematisch Centrum, Amsterdam

    Google Scholar 

  29. Hordijk A, Kallenberg LCM (1984) Constrained undiscounted stochastic dynamic programming. Mathematics of Operations Research 9:276–289

    Google Scholar 

  30. Hordijk A, Spieksma F (1989) Constrained admission control to a queuing system. Advances of Applied Probability 21:409–431

    Google Scholar 

  31. Kadelka D (1983) On randomized policies and mixtures of deterministic policies in Dynamic Programming. Methods of Operations Research 46:67–75

    Google Scholar 

  32. Kallenberg LCM (1983) Linear programming and finite markovian control problems. Mathematical Centre Tracts 148, Amsterdam

  33. Kemeney JG, Snell JL, Knapp AW (1976) Denumerable markov chains. Springer-Verlag

  34. Krylov N (1985) Once more about the connection between elliptic operators and Ito's stochastic equations. In: Krylov N et al. (eds) Statistics and control of stochastic processes Steklov Seminar 1984 Optimization Software, New York 69–101

    Google Scholar 

  35. Kuhn HW (1953) Extensive games and the problem of information. Ann Math Stud 28:193–216

    Google Scholar 

  36. Lazar A (1983) Optimal flow control of a class of queuing networks in equilibrium. IEEE Transactions on Automatic Control 28:1001–1007

    Google Scholar 

  37. Nain P, Ross KW (1986) Optimal priority assignment with hard constraint. Transactions on Automatic Control 31:883–888

    Google Scholar 

  38. Piunovskiy AB (1994) Control of jump processes with constraints. Automatika i telemekhanika 4:75–89

    Google Scholar 

  39. Ross KW (1989) Randomized and past-dependent policies for Markov decision processes with multiple constraints. Operations Research 37:474–477

    Google Scholar 

  40. Ross KW, Chen B (1988) Optimal scheduling of interactive and non interactive traffic in telecommunication systems. IEEE Trans on Auto Control 33:261–267

    Google Scholar 

  41. Ross KW, Varadarajan R (1989) Markov decision processes with sample path constraints: The communicating case. Operations Research 37:780–790

    Google Scholar 

  42. Royden HL (1988) Real Analysis. 3rd Edition. Macmillan publishing Company. New York

    Google Scholar 

  43. Sennott LI (1991) Constrained discounted Markov decision chains. Probability in the Engineering and Informational Sciences 5:463–475

    Google Scholar 

  44. Sennott LI (1993) Constrained average cost Markov decision chains. Probability in the Engineering and Informational Sciences 7:69–83

    Google Scholar 

  45. Spieksma FM (1990) Geometrically ergodic markov chains and the optimal control of queues. PhD thesis University of Leiden

  46. Van Der Wal (1990) Stochastic dynamic programming. Mathematisch Centrum, Amsterdam

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Altman, E. Constrained Markov decision processes with total cost criteria: Occupation measures and primal LP. Mathematical Methods of Operations Research 43, 45–72 (1996). https://doi.org/10.1007/BF01303434

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01303434

Key words