article

Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes

Authors:

Oleksandr Shlakhter,

Chi-Guhn Lee,

Dmitry Khmelev,

Nasser JaberAuthors Info & Claims

Operations Research, Volume 58, Issue 1

Pages 193 - 202

https://doi.org/10.1287/opre.1090.0705

Published: 01 January 2010 Publication History

Abstract

We study the general approach to accelerating the convergence of the most widely used solution method of Markov decision processes (MDPs) with the total expected discounted reward. Inspired by the monotone behavior of the contraction mappings in the feasible set of the linear programming problem equivalent to the MDP, we establish a class of operators that can be used in combination with a contraction mapping operator in the standard value iteration algorithm and its variants. We then propose two such operators, which can be easily implemented as part of the value iteration algorithm and its variants. Numerical studies show that the computational savings can be significant especially when the discount factor approaches one and the transition probability matrix becomes dense, in which the standard value iteration algorithm and its variants suffer from slow convergence.

References

[1]

Blackwell, D., "Discrete dynamic programming," Ann. Math. Statist., v33, pp. 719-726, 1962.

Crossref

Google Scholar

[2]

de Farias, D. and Van Roy, B., "The linear programming approach to approximate dynamic programming," Oper. Res., v51, pp. 850-856, 2003.

Digital Library

Google Scholar

[3]

Derman, C., Finite State Markovian Decision Processes, Academic Press, New York, 1970.

Crossref

Google Scholar

[4]

Herzberg, M. and Yechiali, U., "Accelerating procedures of the value iteration algorithm for discounted Markov decision process, based on a one-step lookahead analysis," Oper. Res., v42, pp. 940-946, 1994.

Digital Library

Google Scholar

[5]

Herzberg, M. and Yechiali, U., "A k-step look-ahead analysis of value iteration algorithm for Markov decision processes," Eur. J. Oper. Res., v88, pp. 622-636, 1996.

Crossref

Google Scholar

[6]

Lippman, S. A., "Applying a new device in the optimization of exponential queuing systems," Oper. Res., v23, pp. 687-710, 1975.

Digital Library

Google Scholar

[7]

Puterman, M. L., Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, New York, 1994.

Digital Library

Google Scholar

[8]

Schweitzer, P. and Seidmann, A., "Generalized polynomial approximation in Markovian decision processes," J. Math. Anal. Appl., v110, pp. 568-582, 1985.

Crossref

Google Scholar

[9]

Trick, M. A. and Zin, S. E., "Spline approximations to value functions," Macroeconomic Dynam., v1, pp. 255-277, 1997.

Crossref

Google Scholar

[10]

Veinott, A. F., "Discrete dynamic programming with sensitive discount optimality criteria," Ann. Math. Statist., v40, pp. 1635-1660, 1969.

Crossref

Google Scholar

Cited By

View all

Lee JRyu EOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Accelerating value iteration with anchoringProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668468(53924-53963)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668468
Goyal VGrand-Clément J(2023)A First-Order Approach to Accelerated Value IterationOperations Research10.1287/opre.2022.226971:2(517-535)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1287/opre.2022.2269
Cao HTian HCai JAlfa AHuang S(2017)Dynamic Load-Balancing Spectrum Decision for Heterogeneous Services Provisioning in Multi-Channel Cognitive Radio NetworksIEEE Transactions on Wireless Communications10.1109/TWC.2017.271740316:9(5911-5924)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1109/TWC.2017.2717403
Show More Cited By

Index Terms

Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic representations
      1. Markov networks
    2. Stochastic processes
      1. Markov processes

Index terms have been assigned to the content through auto-classification.

Recommendations

A Note on Generalized Second-Order Value Iteration in Markov Decision Processes
Abstract
Value iteration is one of the first-order algorithms to approximate the solution of the Bellman equation arising from the Markov Decision Process (MDP). In recent literature, by approximating the max operator in the Bellman equation by a smooth ...
Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes

This paper deals with a mean-variance problem for finite horizon semi-Markov decision processes. The state and action spaces are Borel spaces, while the reward function may be unbounded. The goal is to seek an optimal policy with minimal finite horizon ...
On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes

We consider a general class of total cost Markov decision processes (MDP) in which the one-stage costs can have arbitrary signs, but the sum of the negative parts of the one-stage costs is finite for all policies and all initial states. This class, which ...

Comments

Information & Contributors

Information

Published In

Operations Research Volume 58, Issue 1

January 2010

255 pages

ISSN:0030-364X

Issue’s Table of Contents

Publisher

INFORMS

Linthicum, MD, United States

Publication History

Published: 01 January 2010

Accepted: 01 July 2008

Received: 01 January 2005

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Lee JRyu EOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Accelerating value iteration with anchoringProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668468(53924-53963)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668468
Goyal VGrand-Clément J(2023)A First-Order Approach to Accelerated Value IterationOperations Research10.1287/opre.2022.226971:2(517-535)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1287/opre.2022.2269
Cao HTian HCai JAlfa AHuang S(2017)Dynamic Load-Balancing Spectrum Decision for Heterogeneous Services Provisioning in Multi-Channel Cognitive Radio NetworksIEEE Transactions on Wireless Communications10.1109/TWC.2017.271740316:9(5911-5924)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1109/TWC.2017.2717403
Chang H(2014)Technical communiqueAutomatica (Journal of IFAC)10.1016/j.automatica.2014.05.00950:7(1940-1943)Online publication date: 1-Jul-2014
https://dl.acm.org/doi/10.1016/j.automatica.2014.05.009

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

A Note on Generalized Second-Order Value Iteration in Markov Decision Processes

Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes

On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations