article

Iterative Aggregation-Disaggregation Procedures for Discounted Semi-Markov Reward Processes

Authors:

Paul J. Schweitzer,

Martin L. Puterman,

Kyle W. KindleAuthors Info & Claims

Operations Research, Volume 33, Issue 3

Pages 589 - 605

https://doi.org/10.1287/opre.33.3.589

Published: 01 June 1985 Publication History

Abstract

The equation v = q + Mv, where M is a matrix with nonnegative elements and spectral radius less than one, arises in Markovian decision processes and input-output models. In this paper, we solve the equation using an iterative aggregation-disaggregation procedure that alternates between solving an aggregated problem and disaggregating the variables, one block at a time, in terms of the aggregate variables of the other blocks. The disaggregated variables are then used to guide the choice of weights in the subsequent aggregation. Computational experiments on randomly generated and inventory problems indicate that this algorithm is significantly faster than successive approximations when the spectral radius of M is near one, and is slower in unstructured problems with spectral radii in the neighborhood of 0.8. The algorithm appears promising for large structured problems, where it can often reduce computational time and main memory storage requirements and offer greater robustness to initial values.

References

[1]

BARTMANN, D. 1980. Acceleration of the Method of Successive Approximations in Dynamic Programming. Technical University of Munich, Institut fur Sta-tistik und Unternehmensforschung, TUM-M8005 (February).

[2]

BRANDT, A. 1977. Multilevel Adaptive Solutions to Boundary-Value Problems. Math. Comp. 31, 333-390.

[3]

CHATELIN F., AND W. L. MIRANKER, 1980. Acceleration by Aggregation of Successive Approximation Methods. Linear Algebra Appl. 43, 17-47.

[4]

FEDERGRUEN, A., AND P. J. SCHWEITZER. 1980. A Survey of Asymptotic Value-Iteration for Undiscounted Markovian Decision Processes. In Recent Developments in Markov Decision Processes, pp. 73-109, R. Hartley, L. C. Thomas and D. J. White (eds.). Academic Press, New York. (Proceedings of the International Conference on Markov Decision Processes, University of Manchester, Manchester, England, July 17-19, 1978).

[5]

HACKBUSCH, W. 1980. Convergence of Multi-Grid Iterations Applied to Difference Equations. Math. Comp. 34, No. 150, 425-440.

[6]

HASTINGS, N. 1969. Optimization of Discounted Markov Decision Problems. Opnl. Res. Quart. 20, 499-500.

[7]

HOWARD, R. A. 1960. Dynamic Programming and Markov Processes. John Wiley & Sons, New York.

[8]

JEWELL, W. 1963. Markov-Renewal Programming, I and II. Opns. Res. 11, 938-972.

Digital Library

[9]

KUSHNER, H. 1971. Introduction to Stochastic Control, Holt, Rinehart & Winston, New York.

[10]

KUSHNER, H., AND A. J. KLEINMAN. 1971. Accelerated Procedures for the Solution of Discrete Markov Control Problem. IEEE Trans. Automat. Control 16, 147-152.

[11]

LARRANETA, J. 1978. Approaches to Approximate Markov Decision Processes. Department of Industrial Organization, University of Sevilla, Sevilla, Spain.

[12]

LIPPMAN, S. 1975. Applying a New Device in the Optimization of Exponential Systems. Opns. Res. 23, 687-710.

Digital Library

[13]

MACQUEEN, J. 1966. A Modified Dynamic Programming Method for Markov Decision Problems. J. Math. Anal. Appl. 14, 38-43.

[14]

MANDEL, J. AND B. SEKERKA, 1983. A Local Convergence Proof for the Iterative Aggregation Method. Linear Algebra Appl. 51, 163-172.

[15]

MENDELSSOHN, R. 1980. The Effects of Grid Size and Approximation Techniques on the Solutions of Markov Decision Problems. Administrative Report No. 20-H, Southwest Fisheries Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Honolulu, Hawaii.

[16]

MENDELSSOHN, R. 1982. An Iterative Aggregation Procedure for Markov Decision Processes. Opns. Res. 30, 62-73.

Digital Library

[17]

MIRANKER, W. L., AND V. YA PAN. 1980. Methods of Aggregation. Linear Algebra Its Appl. 29, 231-258.

[18]

MORTON, T. E. 1971. On the Asymptotic Convergence Rate of Cost Differences for Markovian Decision Processes. Opns. Res. 19, 244-248.

[19]

MORTON, T. E., AND W. E. WECKER. 1977. Discounting, Ergodicity and Convergence for Markov Decision Processes. Mgmt. Sci. 23, 890-900.

Digital Library

[20]

NICOLAIDES, R. A. 1976. On Multiple Grid and Related Techniques for Solving Discrete Elliptic Systems. J. Comp. Phys. 9, 418-431.

[21]

POPYACK, J. L., R. L. BROWN AND C. C. WHITE III. 1979. Discrete Version of an Algorithm Due to Varaiya. IEEE Trans. Automat. Control AC-24 (No. 3), 503-504.

[22]

PORTEUS, E. 1971. Some Bounds for Discounted Sequential Decision Processes Mgmt. Sci. 18, 7-11.

[23]

PORTEUS, E. 1975. Bounds and Transformations for Finite Markov Decision Chains. Opns. Res. 23, 761-784.

Digital Library

[24]

PORTEUS, E. 1980a. Improved Iterative Computation of the Expected Discounted Return in Markov and Semi-Markov Chains. Z. Opns. Res. 24, 155-170.

[25]

PORTEUS, E. 1980b. Overview of Iterative Methods for Discounted Finite Markov and Semi-Markov Decision Chains. In Recent Developments in Markov Decision Processes, pp. 1-20, R. Hartley, L. C. Thomas and D. J. White (eds.). Academic Press, New York.

[26]

PORTEUS, E. L. 1981. Computing the Discounted Return in Markov and Semi-Markov Chains. Naval Res. Logist. Quart. 28, 567-578.

[27]

PORTEUS, E., AND J. TOTTEN. 1978. Accelerated Computation of the Expected Discounted Return in a Markov Chain. Opns. Res. 26, 350-358.

Digital Library

[28]

REETZ, D. 1973. Solution of a Markovian Decision Problem by Successive Overrelaxation. Z. Opns. Res. 21, 29-32.

[29]

REETZ, D. 1977. Approximate Solutions of a Discounted Markovian Decision Process. Dynam. Optim. Bonner Math. Schrift. 98, 77-92.

[30]

SCHELLHAAS, H. 1974. Zur Extrapolation in Markoffschen Entscheidungsmodel-len mit Diskontierung. Z. Opns. Res. 18, 91-104.

[31]

SCHWEITZER, P. J. 1972. Data Transformations for Markov Renewal Programming. National ORSA Meeting, Atlantic City, New Jersey (November).

[32]

SCHWEITZER, P. J., M. L. PUTERMAN AND K. W. KINDLE. 1981. Iterative Aggregation-Disaggregation Procedures for Solving Discounted Semi-Markov Reward Processes, Working Paper No. 8123, Graduate School of Management, University of Rochester, Rochester, N.Y.

[33]

THOMAS, L. C, R. HARTLEY AND A. LAVERCOMBE. 1981. Computational Comparisons of Algorithms for Discounted Markov Decision Processes I--Value Iteration. Notes in Decision Theory, Note No. 100, Department of Decision Theory, University of Manchester, Manchester, England.

[34]

VAKHUTINSKY, I. YA., L. M. DUDKIN AND A. A. RYVKIN. 1979. Iterative Aggregation: A New Approach to the Solution of Large-Scale Problems. Econometrica 47, 821-841.

[35]

VAN NUNEN, J. 1976. A Set of Successive Approximation Methods for Discounted Markovian Decision Problems. Z. Opns. Res. 20, 203-208.

[36]

VARGA, A. 1962. Matrix Iterative Analysis. Prentice-Hall, Englewood Cliffs, N.J.

[37]

VERKHOVSKY, B. S. 1976a. Smoothing Systems Optimal Design. RC 6085, IBM Research Division, Yorktown Heights, N.Y. (July).

[38]

VERKHOVSKY, B. S. 1976b. Algorithm with Nonlinear Acceleration for a System of Linear Equations. Technical Report No. 76-WR-1, Department of Civil Engineering, Princeton University, Princeton, N.J.

[39]

VERKHOVSKY, B. C. 1976c. Algorithm with Controlled Feedback for System of Equations with Stochastic Matrix. IBM Tech. Disclosure Bull. 18 (No. 10), 3466-3467 (March). (See also pp. 3464-3465).

[40]

VERKHOVSKY, B. 1977. Smoothing System Design and Parametric Markovian Programming. In Markov Decision Theory, pp. 105-117, H. Tijms and J. Wessels (eds.). Math. Centre Tract 93, Amsterdam.

[41]

WHITT, W. 1978. Approximations of Dynamic Programs, I. Math. Opns. Res. 3, 231-243.

Digital Library

[42]

WHITT, W. 1979. Approximations of Dynamic Programs, II. Math. Opns. Res. 4, 179-185.

[43]

YOUNG, D. M. 1971. Iterative Solution of Large Linear Systems, Academic Press, New York.

Cited By

Petrik MLuss R(2016)Interpretable policies for dynamic product recommendationsProceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence10.5555/3020948.3021011(607-616)Online publication date: 25-Jun-2016
https://dl.acm.org/doi/10.5555/3020948.3021011
Boutilier CDean THanks S(1999)Decision-theoretic planningJournal of Artificial Intelligence Research10.5555/3013545.301354611:1(1-94)Online publication date: 1-Jul-1999
https://dl.acm.org/doi/10.5555/3013545.3013546
Dean TGivan RLeach S(1997)Model reduction techniques for computing approximately optimal solutions for Markov decision processesProceedings of the Thirteenth conference on Uncertainty in artificial intelligence10.5555/2074226.2074241(124-131)Online publication date: 1-Aug-1997
https://dl.acm.org/doi/10.5555/2074226.2074241
Show More Cited By

Recommendations

Optimally solving Markov decision processes with total expected discounted reward function

Compared computational performance of linear programming and the policy iteration.Considered only discrete-time infinite-horizon MDPs with discounted reward.Used randomly generated test problems and a real-life health-care problem.Showed that, unlike ...
Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces

This paper deals with continuous-time Markov decision processes in Polish spaces, under an expected discounted reward criterion. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates ...
Zero-Sum Discounted Reward Criterion Games for Piecewise Deterministic Markov Processes

This papers deals with the zero-sum game with a discounted reward criterion for piecewise deterministic Markov process (PDMPs) in general Borel spaces. The two players can act on the jump rate and transition measure of the process, with the decisions ...

Comments

Information & Contributors

Information

Published In

cover image Operations Research

Operations Research Volume 33, Issue 3

June 1985

236 pages

ISSN:0030-364X

Issue’s Table of Contents

Publisher

INFORMS

Linthicum, MD, United States

Publication History

Published: 01 June 1985

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Petrik MLuss R(2016)Interpretable policies for dynamic product recommendationsProceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence10.5555/3020948.3021011(607-616)Online publication date: 25-Jun-2016
https://dl.acm.org/doi/10.5555/3020948.3021011
Boutilier CDean THanks S(1999)Decision-theoretic planningJournal of Artificial Intelligence Research10.5555/3013545.301354611:1(1-94)Online publication date: 1-Jul-1999
https://dl.acm.org/doi/10.5555/3013545.3013546
Dean TGivan RLeach S(1997)Model reduction techniques for computing approximately optimal solutions for Markov decision processesProceedings of the Thirteenth conference on Uncertainty in artificial intelligence10.5555/2074226.2074241(124-131)Online publication date: 1-Aug-1997
https://dl.acm.org/doi/10.5555/2074226.2074241
Boutilier C(1996)Planning, learning and coordination in multiagent decision processesProceedings of the 6th conference on Theoretical aspects of rationality and knowledge10.5555/1029693.1029710(195-210)Online publication date: 17-Mar-1996
https://dl.acm.org/doi/10.5555/1029693.1029710
Boutilier CDearden R(1994)Using abstractions for decision-theoretic planning with time constraintsProceedings of the Twelfth AAAI National Conference on Artificial Intelligence10.5555/2891730.2891887(1016-1022)Online publication date: 1-Aug-1994
https://dl.acm.org/doi/10.5555/2891730.2891887
(1991)Aggregation and Disaggregation Techniques and Methodology in OptimizationOperations Research10.1287/opre.39.4.55339:4(553-582)Online publication date: 1-Aug-1991
https://dl.acm.org/doi/10.1287/opre.39.4.553

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents