Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

On-Line Parameter Tuning for Monte-Carlo Tree Search in General Game Playing

  • Conference paper
  • First Online:
Computer Games (CGW 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 818))

Included in the following conference series:

  • 741 Accesses

Abstract

Many enhancements have been proposed for Monte-Carlo Tree Search (MCTS). Some of them have been applied successfully in the context of General Game Playing (GGP). MCTS and its enhancements are usually controlled by multiple parameters that require extensive and time-consuming computation to be tuned in advance. Moreover, in GGP optimal parameter values may vary depending on the considered game. This paper proposes a method to automatically tune search-control parameters on-line for GGP. This method considers the tuning problem as a Combinatorial Multi-Armed Bandit (CMAB). Four strategies designed to deal with CMABs are evaluated for this particular problem. Experiments show that on-line tuning in GGP almost reaches the same performance as off-line tuning. It can be considered as a valid alternative for domains where off-line parameter tuning is costly or infeasible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Available on request at https://bitbucket.org/CFSironi/ggp-project.

  2. 2.

    Verision of November 18, 2012. Downloaded from the CadiaPlayer project website: http://cadia.ru.is/wiki/public:cadiaplayer:main.

References

  1. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)

    Article  MATH  Google Scholar 

  2. Benbassat, A., Sipper, M.: EvoMCTS: A scalable approach for general game learning. IEEE Trans. Comput. Intell. AI Games 6(4), 382–394 (2014)

    Article  Google Scholar 

  3. Björnsson, Y., Finnsson, H.: CadiaPlayer: A simulation-based general game player. IEEE Trans. Comput. Intell. AI Games 1(1), 4–15 (2009)

    Article  Google Scholar 

  4. Björnsson, Y., Marsland, T.A.: Learning extension parameters in game-tree search. Inf. Sci. 154(3), 95–118 (2003)

    Article  MathSciNet  Google Scholar 

  5. Bouzy, B., Helmstetter, B.: Monte-carlo go developments. In: Van Den Herik, H.J., Iida, H., Heinz, E.A. (eds.) Advances in Computer Games. IFIP, vol. 135, pp. 159–174. Springer, Boston, MA (2004). https://doi.org/10.1007/978-0-387-35706-5_11

    Chapter  Google Scholar 

  6. Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)

    Article  Google Scholar 

  7. Brügmann, B.: Monte Carlo Go. Technical report, Max Planck Institute of Physics, München, Germany (1993)

    Google Scholar 

  8. Burke, E.K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., Qu, R.: Hyper-heuristics: A survey of the state of the art. J. Oper. Res. Soc. 64(12), 1695–1724 (2013)

    Article  Google Scholar 

  9. Cazenave, T.: Generalized rapid action value estimation. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence, pp. 754–760. AAAI Press (2015)

    Google Scholar 

  10. Chaslot, G.M.J.B., Winands, M.H.M., Szita, I., van den Herik, H.J.: Cross-entropy for Monte-Carlo tree search. ICGA J. 31(3), 145–156 (2008)

    Google Scholar 

  11. Chaslot, G.M.J.B., Winands, M.H.M., van den Herik, H.J., Uiterwijk, J.W.H.M., Bouzy, B.: Progressive strategies for Monte-Carlo tree search. New Math. Nat. Comput. 4(3), 343–357 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  12. Cole, N., Louis, S.J., Miles, C.: Using a genetic algorithm to tune first-person shooter bots. In: 2004 Congress on Evolutionary Computation (CEC2004), vol. 1, pp. 139–145. IEEE (2004)

    Google Scholar 

  13. Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7

    Chapter  Google Scholar 

  14. Coulom, R.: CLOP: Confident local optimization for noisy black-box parameter tuning. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 146–157. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31866-5_13

    Chapter  Google Scholar 

  15. Finnsson, H., Björnsson, Y.: Simulation-based approach to general game playing. In: AAAI, vol. 8, pp. 259–264 (2008)

    Google Scholar 

  16. Finnsson, H., Björnsson, Y.: Learning simulation control in general game-playing agents. In: AAAI, vol. 10, pp. 954–959 (2010)

    Google Scholar 

  17. Fürnkranz, J.: Recent advances in machine learning and game playing. ÖGAI J. 26(2), 19–28 (2007)

    Google Scholar 

  18. Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Proceedings of the 24th International Conference on Machine Learning, pp. 273–280. ACM (2007)

    Google Scholar 

  19. Karnin, Z., Koren, T., Somekh, O.: Almost optimal exploration in multi-armed bandits. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1238–1246 (2013)

    Google Scholar 

  20. Kocsis, L., Szepesvári, C., Winands, M.H.M.: RSPSA: Enhanced parameter optimization in games. In: van den Herik, H.J., Hsu, S.-C., Hsu, T., Donkers, H.H.L.M.J. (eds.) ACG 2005. LNCS, vol. 4250, pp. 39–56. Springer, Heidelberg (2006). https://doi.org/10.1007/11922155_4

    Chapter  Google Scholar 

  21. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29

    Chapter  Google Scholar 

  22. Kunanusont, K., Gaina, R.D., Liu, J., Perez-Liebana, D., Lucas, S.M.: The N-tuple bandit evolutionary algorithm for automatic game improvement. In: 2017 Congress on Evolutionary Computation, pp. 2201–2208. IEEE (2017)

    Google Scholar 

  23. Levine, J., Congdon, C.B., Ebner, M., Kendall, G., Lucas, S.M., Miikkulainen, R., Schaul, T., Thompson, T.: General video game playing. In: Artificial and Computational Intelligence in Games. Dagstuhl Follow-up, vol. 6, pp. 77–83 (2013)

    Google Scholar 

  24. Lucas, S.M., Samothrakis, S., Pérez, D.: Fast evolutionary adaptation for Monte Carlo tree search. In: Esparcia-Alcázar, A.I., Mora, A.M. (eds.) EvoApplications 2014. LNCS, vol. 8602, pp. 349–360. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45523-4_29

    Google Scholar 

  25. Mendes, A., Togelius, J., Nealen, A.: Hyper-heuristic general video game playing. In: 2016 IEEE Conference on Computational Intelligence and Games (CIG), pp. 94–101. IEEE (2016)

    Google Scholar 

  26. Nijssen, J.P.A.M., Winands, M.H.M.: Enhancements for multi-player Monte-Carlo tree search. In: van den Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2010. LNCS, vol. 6515, pp. 238–249. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-17928-0_22

    Chapter  Google Scholar 

  27. Ontanón, S.: The combinatorial multi-armed bandit problem and its application to real-time strategy games. In: Proceedings of the Ninth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, pp. 58–64. AAAI Press (2013)

    Google Scholar 

  28. Ontanón, S.: Combinatorial multi-armed bandits for real-time strategy games. J. Artif. Intell. Res. 58, 665–702 (2017)

    MathSciNet  MATH  Google Scholar 

  29. Perez, D., Samothrakis, S., Lucas, S.: Knowledge-based fast evolutionary MCTS for general video game playing. In: 2014 IEEE Conference on Computational Intelligence and Games (CIG), pp. 68–75. IEEE (2014)

    Google Scholar 

  30. Roelofs, G.J.: Action Space Representation in Combinatorial Multi-Armed Bandits. Master’s thesis, Department of Knowledge Engineering, Maastricht University, Maastricht, The Netherlands (2015)

    Google Scholar 

  31. Schreiber, S.: Games - base repository (2017). http://games.ggp.org/base/

  32. Schreiber, S., Landau, A.: The General Game Playing base package (2017). https://github.com/ggp-org/ggp-base

  33. Shleyfman, A., Komenda, A., Domshlak, C.: On combinatorial actions and CMABs with linear side information. In: Proceedings of the Twenty-first European Conference on Artificial Intelligence, pp. 825–830. IOS Press (2014)

    Google Scholar 

  34. Sironi, C.F., Winands, M.H.M.: Comparison of rapid action value estimation variants for general game playing. In: 2016 IEEE Conference on Computational Intelligence and Games (CIG), pp. 309–316. IEEE (2016)

    Google Scholar 

  35. Świechowski, M., Mańdziuk, J.: Self-adaptation of playing strategies in general game playing. IEEE Trans. Comput. Intell. AI Games 6(4), 367–381 (2014)

    Article  MATH  Google Scholar 

  36. Tak, M.J.W., Winands, M.H.M., Björnsson, Y.: N-grams and the last-good-reply policy applied in general game playing. IEEE Trans. Comput. Intell. AI Games 4(2), 73–83 (2012)

    Article  Google Scholar 

Download references

Acknowledgments

This work is funded by the Netherlands Organisation for Scientific Research (NWO) in the framework of the project GoGeneral, grant number 612.001.121.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chiara F. Sironi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sironi, C.F., Winands, M.H.M. (2018). On-Line Parameter Tuning for Monte-Carlo Tree Search in General Game Playing. In: Cazenave, T., Winands, M., Saffidine, A. (eds) Computer Games. CGW 2017. Communications in Computer and Information Science, vol 818. Springer, Cham. https://doi.org/10.1007/978-3-319-75931-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75931-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75930-2

  • Online ISBN: 978-3-319-75931-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics