Abstract
Many enhancements have been proposed for Monte-Carlo Tree Search (MCTS). Some of them have been applied successfully in the context of General Game Playing (GGP). MCTS and its enhancements are usually controlled by multiple parameters that require extensive and time-consuming computation to be tuned in advance. Moreover, in GGP optimal parameter values may vary depending on the considered game. This paper proposes a method to automatically tune search-control parameters on-line for GGP. This method considers the tuning problem as a Combinatorial Multi-Armed Bandit (CMAB). Four strategies designed to deal with CMABs are evaluated for this particular problem. Experiments show that on-line tuning in GGP almost reaches the same performance as off-line tuning. It can be considered as a valid alternative for domains where off-line parameter tuning is costly or infeasible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Available on request at https://bitbucket.org/CFSironi/ggp-project.
- 2.
Verision of November 18, 2012. Downloaded from the CadiaPlayer project website: http://cadia.ru.is/wiki/public:cadiaplayer:main.
References
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Benbassat, A., Sipper, M.: EvoMCTS: A scalable approach for general game learning. IEEE Trans. Comput. Intell. AI Games 6(4), 382–394 (2014)
Björnsson, Y., Finnsson, H.: CadiaPlayer: A simulation-based general game player. IEEE Trans. Comput. Intell. AI Games 1(1), 4–15 (2009)
Björnsson, Y., Marsland, T.A.: Learning extension parameters in game-tree search. Inf. Sci. 154(3), 95–118 (2003)
Bouzy, B., Helmstetter, B.: Monte-carlo go developments. In: Van Den Herik, H.J., Iida, H., Heinz, E.A. (eds.) Advances in Computer Games. IFIP, vol. 135, pp. 159–174. Springer, Boston, MA (2004). https://doi.org/10.1007/978-0-387-35706-5_11
Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Brügmann, B.: Monte Carlo Go. Technical report, Max Planck Institute of Physics, München, Germany (1993)
Burke, E.K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., Qu, R.: Hyper-heuristics: A survey of the state of the art. J. Oper. Res. Soc. 64(12), 1695–1724 (2013)
Cazenave, T.: Generalized rapid action value estimation. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence, pp. 754–760. AAAI Press (2015)
Chaslot, G.M.J.B., Winands, M.H.M., Szita, I., van den Herik, H.J.: Cross-entropy for Monte-Carlo tree search. ICGA J. 31(3), 145–156 (2008)
Chaslot, G.M.J.B., Winands, M.H.M., van den Herik, H.J., Uiterwijk, J.W.H.M., Bouzy, B.: Progressive strategies for Monte-Carlo tree search. New Math. Nat. Comput. 4(3), 343–357 (2008)
Cole, N., Louis, S.J., Miles, C.: Using a genetic algorithm to tune first-person shooter bots. In: 2004 Congress on Evolutionary Computation (CEC2004), vol. 1, pp. 139–145. IEEE (2004)
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Coulom, R.: CLOP: Confident local optimization for noisy black-box parameter tuning. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 146–157. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31866-5_13
Finnsson, H., Björnsson, Y.: Simulation-based approach to general game playing. In: AAAI, vol. 8, pp. 259–264 (2008)
Finnsson, H., Björnsson, Y.: Learning simulation control in general game-playing agents. In: AAAI, vol. 10, pp. 954–959 (2010)
Fürnkranz, J.: Recent advances in machine learning and game playing. ÖGAI J. 26(2), 19–28 (2007)
Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Proceedings of the 24th International Conference on Machine Learning, pp. 273–280. ACM (2007)
Karnin, Z., Koren, T., Somekh, O.: Almost optimal exploration in multi-armed bandits. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1238–1246 (2013)
Kocsis, L., Szepesvári, C., Winands, M.H.M.: RSPSA: Enhanced parameter optimization in games. In: van den Herik, H.J., Hsu, S.-C., Hsu, T., Donkers, H.H.L.M.J. (eds.) ACG 2005. LNCS, vol. 4250, pp. 39–56. Springer, Heidelberg (2006). https://doi.org/10.1007/11922155_4
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Kunanusont, K., Gaina, R.D., Liu, J., Perez-Liebana, D., Lucas, S.M.: The N-tuple bandit evolutionary algorithm for automatic game improvement. In: 2017 Congress on Evolutionary Computation, pp. 2201–2208. IEEE (2017)
Levine, J., Congdon, C.B., Ebner, M., Kendall, G., Lucas, S.M., Miikkulainen, R., Schaul, T., Thompson, T.: General video game playing. In: Artificial and Computational Intelligence in Games. Dagstuhl Follow-up, vol. 6, pp. 77–83 (2013)
Lucas, S.M., Samothrakis, S., Pérez, D.: Fast evolutionary adaptation for Monte Carlo tree search. In: Esparcia-Alcázar, A.I., Mora, A.M. (eds.) EvoApplications 2014. LNCS, vol. 8602, pp. 349–360. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45523-4_29
Mendes, A., Togelius, J., Nealen, A.: Hyper-heuristic general video game playing. In: 2016 IEEE Conference on Computational Intelligence and Games (CIG), pp. 94–101. IEEE (2016)
Nijssen, J.P.A.M., Winands, M.H.M.: Enhancements for multi-player Monte-Carlo tree search. In: van den Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2010. LNCS, vol. 6515, pp. 238–249. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-17928-0_22
Ontanón, S.: The combinatorial multi-armed bandit problem and its application to real-time strategy games. In: Proceedings of the Ninth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, pp. 58–64. AAAI Press (2013)
Ontanón, S.: Combinatorial multi-armed bandits for real-time strategy games. J. Artif. Intell. Res. 58, 665–702 (2017)
Perez, D., Samothrakis, S., Lucas, S.: Knowledge-based fast evolutionary MCTS for general video game playing. In: 2014 IEEE Conference on Computational Intelligence and Games (CIG), pp. 68–75. IEEE (2014)
Roelofs, G.J.: Action Space Representation in Combinatorial Multi-Armed Bandits. Master’s thesis, Department of Knowledge Engineering, Maastricht University, Maastricht, The Netherlands (2015)
Schreiber, S.: Games - base repository (2017). http://games.ggp.org/base/
Schreiber, S., Landau, A.: The General Game Playing base package (2017). https://github.com/ggp-org/ggp-base
Shleyfman, A., Komenda, A., Domshlak, C.: On combinatorial actions and CMABs with linear side information. In: Proceedings of the Twenty-first European Conference on Artificial Intelligence, pp. 825–830. IOS Press (2014)
Sironi, C.F., Winands, M.H.M.: Comparison of rapid action value estimation variants for general game playing. In: 2016 IEEE Conference on Computational Intelligence and Games (CIG), pp. 309–316. IEEE (2016)
Świechowski, M., Mańdziuk, J.: Self-adaptation of playing strategies in general game playing. IEEE Trans. Comput. Intell. AI Games 6(4), 367–381 (2014)
Tak, M.J.W., Winands, M.H.M., Björnsson, Y.: N-grams and the last-good-reply policy applied in general game playing. IEEE Trans. Comput. Intell. AI Games 4(2), 73–83 (2012)
Acknowledgments
This work is funded by the Netherlands Organisation for Scientific Research (NWO) in the framework of the project GoGeneral, grant number 612.001.121.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Sironi, C.F., Winands, M.H.M. (2018). On-Line Parameter Tuning for Monte-Carlo Tree Search in General Game Playing. In: Cazenave, T., Winands, M., Saffidine, A. (eds) Computer Games. CGW 2017. Communications in Computer and Information Science, vol 818. Springer, Cham. https://doi.org/10.1007/978-3-319-75931-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-75931-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75930-2
Online ISBN: 978-3-319-75931-9
eBook Packages: Computer ScienceComputer Science (R0)