On-Line Parameter Tuning for Monte-Carlo Tree Search in General Game Playing

Sironi, Chiara F.; Winands, Mark H. M.

doi:10.1007/978-3-319-75931-9_6

Chiara F. Sironi¹² &
Mark H. M. Winands¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 818))

Included in the following conference series:

Workshop on Computer Games

741 Accesses

Abstract

Many enhancements have been proposed for Monte-Carlo Tree Search (MCTS). Some of them have been applied successfully in the context of General Game Playing (GGP). MCTS and its enhancements are usually controlled by multiple parameters that require extensive and time-consuming computation to be tuned in advance. Moreover, in GGP optimal parameter values may vary depending on the considered game. This paper proposes a method to automatically tune search-control parameters on-line for GGP. This method considers the tuning problem as a Combinatorial Multi-Armed Bandit (CMAB). Four strategies designed to deal with CMABs are evaluated for this particular problem. Experiments show that on-line tuning in GGP almost reaches the same performance as off-line tuning. It can be considered as a valid alternative for domains where off-line parameter tuning is costly or infeasible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Self-adaptive MCTS for General Video Game Playing

Playout Policy Adaptation for Games

Multi-objective Adaptation of a Parameterized GVGAI Agent Towards Several Games

Notes

1.
Available on request at https://bitbucket.org/CFSironi/ggp-project.
2.
Verision of November 18, 2012. Downloaded from the CadiaPlayer project website: http://cadia.ru.is/wiki/public:cadiaplayer:main.

References

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Article MATH Google Scholar
Benbassat, A., Sipper, M.: EvoMCTS: A scalable approach for general game learning. IEEE Trans. Comput. Intell. AI Games 6(4), 382–394 (2014)
Article Google Scholar
Björnsson, Y., Finnsson, H.: CadiaPlayer: A simulation-based general game player. IEEE Trans. Comput. Intell. AI Games 1(1), 4–15 (2009)
Article Google Scholar
Björnsson, Y., Marsland, T.A.: Learning extension parameters in game-tree search. Inf. Sci. 154(3), 95–118 (2003)
Article MathSciNet Google Scholar
Bouzy, B., Helmstetter, B.: Monte-carlo go developments. In: Van Den Herik, H.J., Iida, H., Heinz, E.A. (eds.) Advances in Computer Games. IFIP, vol. 135, pp. 159–174. Springer, Boston, MA (2004). https://doi.org/10.1007/978-0-387-35706-5_11
Chapter Google Scholar
Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Article Google Scholar
Brügmann, B.: Monte Carlo Go. Technical report, Max Planck Institute of Physics, München, Germany (1993)
Google Scholar
Burke, E.K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., Qu, R.: Hyper-heuristics: A survey of the state of the art. J. Oper. Res. Soc. 64(12), 1695–1724 (2013)
Article Google Scholar
Cazenave, T.: Generalized rapid action value estimation. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence, pp. 754–760. AAAI Press (2015)
Google Scholar
Chaslot, G.M.J.B., Winands, M.H.M., Szita, I., van den Herik, H.J.: Cross-entropy for Monte-Carlo tree search. ICGA J. 31(3), 145–156 (2008)
Google Scholar
Chaslot, G.M.J.B., Winands, M.H.M., van den Herik, H.J., Uiterwijk, J.W.H.M., Bouzy, B.: Progressive strategies for Monte-Carlo tree search. New Math. Nat. Comput. 4(3), 343–357 (2008)
Article MathSciNet MATH Google Scholar
Cole, N., Louis, S.J., Miles, C.: Using a genetic algorithm to tune first-person shooter bots. In: 2004 Congress on Evolutionary Computation (CEC2004), vol. 1, pp. 139–145. IEEE (2004)
Google Scholar
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Chapter Google Scholar
Coulom, R.: CLOP: Confident local optimization for noisy black-box parameter tuning. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 146–157. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31866-5_13
Chapter Google Scholar
Finnsson, H., Björnsson, Y.: Simulation-based approach to general game playing. In: AAAI, vol. 8, pp. 259–264 (2008)
Google Scholar
Finnsson, H., Björnsson, Y.: Learning simulation control in general game-playing agents. In: AAAI, vol. 10, pp. 954–959 (2010)
Google Scholar
Fürnkranz, J.: Recent advances in machine learning and game playing. ÖGAI J. 26(2), 19–28 (2007)
Google Scholar
Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Proceedings of the 24th International Conference on Machine Learning, pp. 273–280. ACM (2007)
Google Scholar
Karnin, Z., Koren, T., Somekh, O.: Almost optimal exploration in multi-armed bandits. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1238–1246 (2013)
Google Scholar
Kocsis, L., Szepesvári, C., Winands, M.H.M.: RSPSA: Enhanced parameter optimization in games. In: van den Herik, H.J., Hsu, S.-C., Hsu, T., Donkers, H.H.L.M.J. (eds.) ACG 2005. LNCS, vol. 4250, pp. 39–56. Springer, Heidelberg (2006). https://doi.org/10.1007/11922155_4
Chapter Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Kunanusont, K., Gaina, R.D., Liu, J., Perez-Liebana, D., Lucas, S.M.: The N-tuple bandit evolutionary algorithm for automatic game improvement. In: 2017 Congress on Evolutionary Computation, pp. 2201–2208. IEEE (2017)
Google Scholar
Levine, J., Congdon, C.B., Ebner, M., Kendall, G., Lucas, S.M., Miikkulainen, R., Schaul, T., Thompson, T.: General video game playing. In: Artificial and Computational Intelligence in Games. Dagstuhl Follow-up, vol. 6, pp. 77–83 (2013)
Google Scholar
Lucas, S.M., Samothrakis, S., Pérez, D.: Fast evolutionary adaptation for Monte Carlo tree search. In: Esparcia-Alcázar, A.I., Mora, A.M. (eds.) EvoApplications 2014. LNCS, vol. 8602, pp. 349–360. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45523-4_29
Google Scholar
Mendes, A., Togelius, J., Nealen, A.: Hyper-heuristic general video game playing. In: 2016 IEEE Conference on Computational Intelligence and Games (CIG), pp. 94–101. IEEE (2016)
Google Scholar
Nijssen, J.P.A.M., Winands, M.H.M.: Enhancements for multi-player Monte-Carlo tree search. In: van den Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2010. LNCS, vol. 6515, pp. 238–249. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-17928-0_22
Chapter Google Scholar
Ontanón, S.: The combinatorial multi-armed bandit problem and its application to real-time strategy games. In: Proceedings of the Ninth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, pp. 58–64. AAAI Press (2013)
Google Scholar
Ontanón, S.: Combinatorial multi-armed bandits for real-time strategy games. J. Artif. Intell. Res. 58, 665–702 (2017)
MathSciNet MATH Google Scholar
Perez, D., Samothrakis, S., Lucas, S.: Knowledge-based fast evolutionary MCTS for general video game playing. In: 2014 IEEE Conference on Computational Intelligence and Games (CIG), pp. 68–75. IEEE (2014)
Google Scholar
Roelofs, G.J.: Action Space Representation in Combinatorial Multi-Armed Bandits. Master’s thesis, Department of Knowledge Engineering, Maastricht University, Maastricht, The Netherlands (2015)
Google Scholar
Schreiber, S.: Games - base repository (2017). http://games.ggp.org/base/
Schreiber, S., Landau, A.: The General Game Playing base package (2017). https://github.com/ggp-org/ggp-base
Shleyfman, A., Komenda, A., Domshlak, C.: On combinatorial actions and CMABs with linear side information. In: Proceedings of the Twenty-first European Conference on Artificial Intelligence, pp. 825–830. IOS Press (2014)
Google Scholar
Sironi, C.F., Winands, M.H.M.: Comparison of rapid action value estimation variants for general game playing. In: 2016 IEEE Conference on Computational Intelligence and Games (CIG), pp. 309–316. IEEE (2016)
Google Scholar
Świechowski, M., Mańdziuk, J.: Self-adaptation of playing strategies in general game playing. IEEE Trans. Comput. Intell. AI Games 6(4), 367–381 (2014)
Article MATH Google Scholar
Tak, M.J.W., Winands, M.H.M., Björnsson, Y.: N-grams and the last-good-reply policy applied in general game playing. IEEE Trans. Comput. Intell. AI Games 4(2), 73–83 (2012)
Article Google Scholar

Download references

Acknowledgments

This work is funded by the Netherlands Organisation for Scientific Research (NWO) in the framework of the project GoGeneral, grant number 612.001.121.

Author information

Authors and Affiliations

Games and AI Group, Department of Data Science and Knowledge Engineering, Maastricht University, Maastricht, The Netherlands
Chiara F. Sironi & Mark H. M. Winands

Authors

Chiara F. Sironi
View author publications
You can also search for this author in PubMed Google Scholar
Mark H. M. Winands
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chiara F. Sironi .

Editor information

Editors and Affiliations

Université Paris-Dauphine, Paris, France
Tristan Cazenave
Maastricht University, Maastricht, The Netherlands
Mark H.M. Winands
The University of New South Wales, Sydney, New South Wales, Australia
Abdallah Saffidine

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sironi, C.F., Winands, M.H.M. (2018). On-Line Parameter Tuning for Monte-Carlo Tree Search in General Game Playing. In: Cazenave, T., Winands, M., Saffidine, A. (eds) Computer Games. CGW 2017. Communications in Computer and Information Science, vol 818. Springer, Cham. https://doi.org/10.1007/978-3-319-75931-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-75931-9_6
Published: 15 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75930-2
Online ISBN: 978-3-319-75931-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On-Line Parameter Tuning for Monte-Carlo Tree Search in General Game Playing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-adaptive MCTS for General Video Game Playing

Playout Policy Adaptation for Games

Multi-objective Adaptation of a Parameterized GVGAI Agent Towards Several Games

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On-Line Parameter Tuning for Monte-Carlo Tree Search in General Game Playing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-adaptive MCTS for General Video Game Playing

Playout Policy Adaptation for Games

Multi-objective Adaptation of a Parameterized GVGAI Agent Towards Several Games

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation