Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Adaptive Warm-Start MCTS in AlphaZero-Like Deep Reinforcement Learning

  • Conference paper
  • First Online:
PRICAI 2021: Trends in Artificial Intelligence (PRICAI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13033))

Included in the following conference series:

Abstract

AlphaZero has achieved impressive performance in deep reinforcement learning by utilizing an architecture that combines search and training of a neural network in self-play. Many researchers are looking for ways to reproduce and improve results for other games/tasks. However, the architecture is designed to learn from scratch, tabula rasa, accepting a cold-start problem in self-play. Recently, a warm-start enhancement method for Monte Carlo Tree Search was proposed to improve the self-play starting phase. It employs a fixed parameter \(I^\prime \) to control the warm-start length. Improved performance was reported in small board games. In this paper we present results with an adaptive switch method. Experiments show that our approach works better than the fixed \(I^\prime \), especially for “deep”, tactical, games (Othello and Connect Four). We conjecture that the adaptive value for \(I^\prime \) is also influenced by the size of the game, and that on average \(I^\prime \) will increase with game size. We conclude that AlphaZero-like deep reinforcement learning benefits from adaptive rollout based warm-start, as Rapid Action Value Estimate did for rollout-based reinforcement learning 15 years ago.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Browne, C.B., et al.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)

    Article  Google Scholar 

  2. Coulom, R.: Whole-history rating: a Bayesian rating system for players of time-varying strength. In: van den Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 113–124. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87608-3_11

    Chapter  Google Scholar 

  3. Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Proceedings of the 24th International Conference on Machine Learning, pp. 273–280 (2007)

    Google Scholar 

  4. Gelly, S., Silver, D.: Monte-Carlo tree search and rapid action value estimation in computer go. Artif. Intell. 175(11), 1856–1875 (2011)

    Article  MathSciNet  Google Scholar 

  5. Nair, S.: Alphazero general (2018). https://github.com/suragnair/alpha-zero-general. Accessed May 2018

  6. Plaat, A.: Learning to Play: Reinforcement Learning and Games. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59238-7

    Book  Google Scholar 

  7. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)

    Article  Google Scholar 

  8. Segler, M.H., Preuss, M., Waller, M.P.: Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555(7698), 604–610 (2018)

    Article  Google Scholar 

  9. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  10. Silver, D.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)

    Article  MathSciNet  Google Scholar 

  11. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    Article  Google Scholar 

  12. Sutton, R.S., Barto, A.G.: Reinforcement learning: An Introduction. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  13. Tesauro, G.: Temporal difference learning and TD-Gammon. Commun. ACM 38(3), 58–68 (1995)

    Article  Google Scholar 

  14. Vinyals, O.: Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)

    Article  Google Scholar 

  15. Wang, H., Emmerich, M., Plaat, A.: Assessing the potential of classical Q-learning in general game playing. In: Atzmueller, M., Duivesteijn, W. (eds.) BNAIC 2018. CCIS, vol. 1021, pp. 138–150. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31978-6_11

    Chapter  Google Scholar 

  16. Wang, H., Emmerich, M., Preuss, M., Plaat, A.: Alternative loss functions in alphazero-like self-play. In: 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 155–162. IEEE (2019)

    Google Scholar 

  17. Wang, H., Emmerich, M., Preuss, M., Plaat, A.: Analysis of hyper-parameters for small games: Iterations or epochs in self-play? arXiv preprint arXiv:2003.05988 (2020)

  18. Wang, H., Preuss, M., Emmerich, M., Plaat, A.: Tackling morpion solitaire with alphazero-like ranked reward reinforcement learning. arXiv preprint arXiv:2006.07970 (2020)

  19. Wang, H., Preuss, M., Plaat, A.: Warm-start alphazero self-play search enhancements. In: Proceedings of the Parallel Problem Solving from Nature - PPSN XVI, pp. 528–542 (2020)

    Google Scholar 

  20. Wu, D.J.: Accelerating self-play learning in go. arXiv preprint arXiv:1902.10565 (2019)

Download references

Acknowledgments

Hui Wang acknowledges financial support from the China Scholarship Council (CSC), CSC No.201706990015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, H., Preuss, M., Plaat, A. (2021). Adaptive Warm-Start MCTS in AlphaZero-Like Deep Reinforcement Learning. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13033. Springer, Cham. https://doi.org/10.1007/978-3-030-89370-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89370-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89369-9

  • Online ISBN: 978-3-030-89370-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics