Abstract
This paper is the third in a series on constrained Markov decision processes (CMDPs) with a countable state space and unbounded cost. In the previous papers we studied the expected average and the discounted cost. We analyze in this paper the total cost criterion. We study the properties of the set of occupation measures achieved by different classes of policies; we then focus on stationary policies and on mixed deterministic policies and present conditions under which optimal policies exist within these classes. We conclude by introducing an equivalent infinite Linear Program.
Similar content being viewed by others
References
Altman E (1993) Asymptotic properties of constrained markov decision processes. ZOR — Mathematical Method of Operations Research 37:151–170
Altman E (1994) Denumerable constrained Markov Decision Processes and finite approximations. Math of Operations Research 19:169–191
Altman E (1996) Constrained Markov decision processes with total cost criteria: Lagrange approach and dual LP. Submitted
Altman E (1995) Constrained Markov decision processes. INRIA Report 2574
Altman E, Shwartz A (1989) Optimal priority assignment: A time sharing approach. IEEE Transactions on Automatic Control AC- 34:1089–1102
Altman E, Shwartz A (1991) Markov decision problems and state-action frequencies. SIAM J Control and Optimization 29:786–809
Altman E, Shwartz A (1991) Sensitivity of constrained Markov decision problems. Annals of Operations Research 32:1–22
Aumann RJ (1964) Mixed and behavior strategies in infinite extensive games. Advances in Game Theory Ann Math Study 52:627–650
Bernhard P (1992) Information and strategies in dynamic games. SIAM J Cont and Opt 30:212–228
Beutler FJ, Ross KW (1985) Optimal policies for controlled Markov chains with a constraint. J Mathematical Analysis and Applications 112:236–252
Beutler FJ. Ross KW (1986) Time-average optimal constrained semi-markov decision processes. Advances of Applied Probability 18:341–359
Billingsley P (1968) Convergence of probability measures. J. Wiley, New York
Borkar VS (1988) A convex analytic approach to Markov decision processes. Prob Th Rel Fields 78:583–602
Borkar VS (1990) Topics in controlled markov chains. Longman Scientific & Technical
Borkar VS (1994) Ergodic control of Markov chains with constraints-the general case. SIAM J Control and Optimization 32:176–186
Dekker R, Hordijk A (1988) Average, sensitive and Blackwell optimal policies in denumerable Markov decision chains with unbounded rewards. Mathematics of Operations Research 13:395–421
Derman C (1970) Finite state markovian decision processes. Academic Press
Derman C, Klein M (1965) Some remarks on finite horizon Markovian decision models. Operations research 13:272–278
Derman C, Veinott Jr AF (1972) Constrained Markov decision chains. Management Science 19:389–390
Derman C, Strauch RE (1966) A note on memoryless rules for controlling sequential control processes. Ann Math Stat 37:276–278
Feinberg EA (1995) Constrained semi-Markov decision processes with average rewards. ZOR—Mathematical Method of Operations Research 39:257–288
Feinberg EA, Reiman MI (1994) Optimality of randomized trunk reservation. Probability in the Engineering and Informational Sciences 8:463–489
Feinberg EA, Sonin I (1993) The existence of an equivalent stationary strategy in the case of discount factor equal one. Unpublished Draft
Feinberg EA, Sonin I (1995) Notes on equivalent stationary policies in Markov decision processes with total rewards. Submitted to ZOR—Methematical Methods of Operations Research
Feinberg EA, Shwartz A (1995) Constrained discounted dynamic programming, to appear in Math of Operations Research
Haviv M (1995) On constrained Markov decision processes. Submitted to OR letters
Hinderer K (1970) Foundation of non-stationary dynamic programming with discrete time parameter, vol 33. Lecture Notes in Operations Research and Mathematical Systems. Springer-Verlag, Berlin
Hordijk A (1977) Dynamic programming and markov potential theory. Second Edition. Mathematical Centre Tracts 51, Mathematisch Centrum, Amsterdam
Hordijk A, Kallenberg LCM (1984) Constrained undiscounted stochastic dynamic programming. Mathematics of Operations Research 9:276–289
Hordijk A, Spieksma F (1989) Constrained admission control to a queuing system. Advances of Applied Probability 21:409–431
Kadelka D (1983) On randomized policies and mixtures of deterministic policies in Dynamic Programming. Methods of Operations Research 46:67–75
Kallenberg LCM (1983) Linear programming and finite markovian control problems. Mathematical Centre Tracts 148, Amsterdam
Kemeney JG, Snell JL, Knapp AW (1976) Denumerable markov chains. Springer-Verlag
Krylov N (1985) Once more about the connection between elliptic operators and Ito's stochastic equations. In: Krylov N et al. (eds) Statistics and control of stochastic processes Steklov Seminar 1984 Optimization Software, New York 69–101
Kuhn HW (1953) Extensive games and the problem of information. Ann Math Stud 28:193–216
Lazar A (1983) Optimal flow control of a class of queuing networks in equilibrium. IEEE Transactions on Automatic Control 28:1001–1007
Nain P, Ross KW (1986) Optimal priority assignment with hard constraint. Transactions on Automatic Control 31:883–888
Piunovskiy AB (1994) Control of jump processes with constraints. Automatika i telemekhanika 4:75–89
Ross KW (1989) Randomized and past-dependent policies for Markov decision processes with multiple constraints. Operations Research 37:474–477
Ross KW, Chen B (1988) Optimal scheduling of interactive and non interactive traffic in telecommunication systems. IEEE Trans on Auto Control 33:261–267
Ross KW, Varadarajan R (1989) Markov decision processes with sample path constraints: The communicating case. Operations Research 37:780–790
Royden HL (1988) Real Analysis. 3rd Edition. Macmillan publishing Company. New York
Sennott LI (1991) Constrained discounted Markov decision chains. Probability in the Engineering and Informational Sciences 5:463–475
Sennott LI (1993) Constrained average cost Markov decision chains. Probability in the Engineering and Informational Sciences 7:69–83
Spieksma FM (1990) Geometrically ergodic markov chains and the optimal control of queues. PhD thesis University of Leiden
Van Der Wal (1990) Stochastic dynamic programming. Mathematisch Centrum, Amsterdam
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Altman, E. Constrained Markov decision processes with total cost criteria: Occupation measures and primal LP. Mathematical Methods of Operations Research 43, 45–72 (1996). https://doi.org/10.1007/BF01303434
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01303434