Quality-based Rewards for Monte-Carlo Tree Search Simulations

Pepels, Tom; Tak, Mandy J.W.; Lanctot, Marc; Winands, Mark H. M.

doi:10.3233/978-1-61499-419-0-705

Abstract

Monte-Carlo Tree Search is a best-first search technique based on simulations to sample the state space of a decision-making problem. In games, positions are evaluated based on estimates obtained from rewards of numerous randomized play-outs. Generally, rewards from play-outs are discrete values representing the outcome of the game (loss, draw, or win), e.g., r∊{−1,0,1}, which are backpropagated from expanded leaf nodes to the root node. However, a play-out may provide additional information. In this paper, we introduce new measures for assessing the a posteriori quality of a simulation. We show that altering the rewards of play-outs based on their assessed quality improves results in six distinct two-player games and in the General Game Playing agent CADIAPLAYER. We propose two specific enhancements, the Relative Bonus and Qualitative Bonus. Both are used as control variates, a variance reduction method for statistical simulation. Relative Bonus is based on the number of moves made during a simulation and Qualitative Bonus relies on a domain-dependent assessment of the game's terminal state. We show that the proposed enhancements, both separate and combined, lead to significant performance increases in the domains discussed.

Contact

IOS Press Copyright 2024

Contact

IOS Press Copyright 2024

This website uses cookies

This website uses cookies