Abstract
The purpose of the reinforcement learning system is to learn an optimal policy in general. On the other hand, in two-player games such as Othello, it is important to acquire a penalty-avoiding policy that can avoid losing the game. We know the penalty avoiding rational policy making algorithm (PARP) to learn the policy. If we apply PARP to large-scale problems, we are confronted with an explosion of the number of states. In this article, we focus on Othello, a game that has huge state spaces. We introduce several ideas and heuristics to adapt PARP to Othello. We show that our learning player beats the well-known Othello program, KITTY.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sutton RS, Barto A (1998) Reinforcement learning: an introduction. A Bradford Book. The MIT Press, Cambridge
Watkins CJH, Dayan P (1992) Technical note: Q-learning. Mach Learn 8:55–68
Miyazaki K, Yamamura M, Kobayashi S (1997) k-Certainty exploration method: an action selector on reinforcement learning to identify the environment. Artif Intell 91:155–171
Miyazaki K, Kobayashi S (2000) Reinforcement learning for penalty avoiding policy making. 2000 IEEE International Conference on Systems, Man and Cybernetics, Nashville, October, 2000, pp 206–211
Matsubara H (1995) Recent progresses on game programming researches (in Japanese). J Jpn Soc Artif Intell 10:835–845
Miyazaki K, Kobayashi S (1999) On the rationality of profit sharing in partially observable markov decision processes. Proceedings of the 5th International Conference on Information Systems Analysis and Synthesis, pp 190–197
Miyazaki K, Kobayashi S (2001) Rationality of reward sharing in multi-agent reinforcement learning. New Generat Comput 19:157–172
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Miyazaki, K., Tsuboi, S. & Kobayashi, S. Development of a reinforcement learning system to play Othello. Artificial Life and Robotics 7, 177–181 (2004). https://doi.org/10.1007/BF02471202
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02471202