Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes

Published: 01 January 2010 Publication History

Abstract

We study the general approach to accelerating the convergence of the most widely used solution method of Markov decision processes (MDPs) with the total expected discounted reward. Inspired by the monotone behavior of the contraction mappings in the feasible set of the linear programming problem equivalent to the MDP, we establish a class of operators that can be used in combination with a contraction mapping operator in the standard value iteration algorithm and its variants. We then propose two such operators, which can be easily implemented as part of the value iteration algorithm and its variants. Numerical studies show that the computational savings can be significant especially when the discount factor approaches one and the transition probability matrix becomes dense, in which the standard value iteration algorithm and its variants suffer from slow convergence.

References

[1]
Blackwell, D., "Discrete dynamic programming," Ann. Math. Statist., v33, pp. 719-726, 1962.
[2]
de Farias, D. and Van Roy, B., "The linear programming approach to approximate dynamic programming," Oper. Res., v51, pp. 850-856, 2003.
[3]
Derman, C., Finite State Markovian Decision Processes, Academic Press, New York, 1970.
[4]
Herzberg, M. and Yechiali, U., "Accelerating procedures of the value iteration algorithm for discounted Markov decision process, based on a one-step lookahead analysis," Oper. Res., v42, pp. 940-946, 1994.
[5]
Herzberg, M. and Yechiali, U., "A k-step look-ahead analysis of value iteration algorithm for Markov decision processes," Eur. J. Oper. Res., v88, pp. 622-636, 1996.
[6]
Lippman, S. A., "Applying a new device in the optimization of exponential queuing systems," Oper. Res., v23, pp. 687-710, 1975.
[7]
Puterman, M. L., Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, New York, 1994.
[8]
Schweitzer, P. and Seidmann, A., "Generalized polynomial approximation in Markovian decision processes," J. Math. Anal. Appl., v110, pp. 568-582, 1985.
[9]
Trick, M. A. and Zin, S. E., "Spline approximations to value functions," Macroeconomic Dynam., v1, pp. 255-277, 1997.
[10]
Veinott, A. F., "Discrete dynamic programming with sensitive discount optimality criteria," Ann. Math. Statist., v40, pp. 1635-1660, 1969.

Cited By

View all
  • (2023)Accelerating value iteration with anchoringProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668468(53924-53963)Online publication date: 10-Dec-2023
  • (2023)A First-Order Approach to Accelerated Value IterationOperations Research10.1287/opre.2022.226971:2(517-535)Online publication date: 1-Mar-2023
  • (2017)Dynamic Load-Balancing Spectrum Decision for Heterogeneous Services Provisioning in Multi-Channel Cognitive Radio NetworksIEEE Transactions on Wireless Communications10.1109/TWC.2017.271740316:9(5911-5924)Online publication date: 1-Sep-2017
  • Show More Cited By

Index Terms

  1. Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Operations Research
      Operations Research  Volume 58, Issue 1
      January 2010
      255 pages

      Publisher

      INFORMS

      Linthicum, MD, United States

      Publication History

      Published: 01 January 2010
      Accepted: 01 July 2008
      Received: 01 January 2005

      Author Tags

      1. Markov decision processes
      2. accelerated convergence
      3. linear programming
      4. value iteration

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Accelerating value iteration with anchoringProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668468(53924-53963)Online publication date: 10-Dec-2023
      • (2023)A First-Order Approach to Accelerated Value IterationOperations Research10.1287/opre.2022.226971:2(517-535)Online publication date: 1-Mar-2023
      • (2017)Dynamic Load-Balancing Spectrum Decision for Heterogeneous Services Provisioning in Multi-Channel Cognitive Radio NetworksIEEE Transactions on Wireless Communications10.1109/TWC.2017.271740316:9(5911-5924)Online publication date: 1-Sep-2017
      • (2014)Technical communiqueAutomatica (Journal of IFAC)10.1016/j.automatica.2014.05.00950:7(1940-1943)Online publication date: 1-Jul-2014

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media