Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1620270.1620327guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Achieving master level play in 9×9 computer go

Published: 13 July 2008 Publication History

Abstract

The UCT algorithm uses Monte-Carlo simulation to estimate the value of states in a search tree from the current state. However, the first time a state is encountered, UCT has no knowledge, and is unable to generalise from previous experience. We describe two extensions that address these weaknesses. Our first algorithm, heuristic UCT, incorporates prior knowledge in the form of a value function. The value function can be learned offline, using a linear combination of a million binary features, with weights trained by temporal-difference learning. Our second algorithm, UCT-RAVE, forms a rapid online generalisation based on the value of moves. We applied our algorithms to the domain of 9 × 9 Computer Go, using the program MoGo. Using both heuristic UCT and RAVE, MoGo became the first program to achieve human master level in competitive play.

References

[1]
Baxter, J.; Tridgell, A.; and Weaver, L. 1998. Experiments in parameter learning using temporal differences. International Computer Chess Association Journal 21(2):84-99.
[2]
Bruegmann, B. 1993. Monte-Carlo Go. http://www.cgl.ucsf.edu/go/Programs/Gobble.html.
[3]
Buro, M. 1999. From simple features to sophisticated evaluation functions. In First International Conference on Computers and Games, 126-145.
[4]
Coulom, R. 2006. Efficient selectivity and backup operators in Monte-Carlo tree search. In 5th International Conference on Computer and Games, 2006-05 29, 72-83.
[5]
Gelly, S., and Silver, D. 2007. Combining online and offline learning in UCT. In 17th International Conference on Machine Learning, 273-280.
[6]
Gelly, S.; Wang, Y.; Munos, R.; and Teytaud. O. 2006. Modification of UCT with patterns in Monte-Carlo Go. Technical Report 6062, INRIA.
[7]
Kocsis, L., and Szepesvari, C. 2006. Bandit based Monte-Carlo planning. In 15th European Conference on Machine Learning, 282-293.
[8]
Müller, M. 2002. Computer Go. Artificial lntelligence 134:145-179.
[9]
Schaeffer, J.; Hlynka, M.; and Jussila, V. 2000. Temporal difference learning applied to a high-performance game-playing program. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, 529-534.
[10]
Schaeffer, J. 1989. The history heuristic and alpha-beta search enhancements in practice. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-11(11):1203-1212.
[11]
Silver, D.; Sutton, R.; and Müller, M. 2007. Reinforcement learning of local shape in the game of Go. In 20th International Joint Conference on Artificial Intelligence, 1053-1058.
[12]
Sutton, R. 1988. Learning to predict by the method of temporal differences. Machine Learning 3(9):9-44.

Cited By

View all
  • (2019)Multiagent Monte Carlo Tree SearchProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332094(2309-2311)Online publication date: 8-May-2019
  • (2017)DUCTACM Transactions on Intelligent Systems and Technology10.1145/30661568:5(1-27)Online publication date: 12-Jul-2017
  • (2013)Learning non-myopically from human-generated rewardProceedings of the 2013 international conference on Intelligent user interfaces10.1145/2449396.2449422(191-202)Online publication date: 19-Mar-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
AAAI'08: Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
July 2008
1892 pages
ISBN:9781577353683

Sponsors

  • Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 13 July 2008

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Multiagent Monte Carlo Tree SearchProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332094(2309-2311)Online publication date: 8-May-2019
  • (2017)DUCTACM Transactions on Intelligent Systems and Technology10.1145/30661568:5(1-27)Online publication date: 12-Jul-2017
  • (2013)Learning non-myopically from human-generated rewardProceedings of the 2013 international conference on Intelligent user interfaces10.1145/2449396.2449422(191-202)Online publication date: 19-Mar-2013
  • (2012)Strong mitigationProceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 110.5555/2343576.2343634(407-414)Online publication date: 4-Jun-2012
  • (2012)Learning data transformation rules through examplesProceedings of the Ninth International Workshop on Information Integration on the Web10.1145/2331801.2331809(1-6)Online publication date: 20-May-2012
  • (2012)Evolutionary learning of policies for MCTS simulationsProceedings of the International Conference on the Foundations of Digital Games10.1145/2282338.2282379(212-219)Online publication date: 29-May-2012
  • (2012)Guiding combinatorial optimization with UCTProceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems10.1007/978-3-642-29828-8_23(356-361)Online publication date: 28-May-2012
  • (2011)Trade-offs in sampling-based adversarial planningProceedings of the Twenty-First International Conference on International Conference on Automated Planning and Scheduling10.5555/3038485.3038512(202-209)Online publication date: 11-Jun-2011
  • (2011)Learning is planningProceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence10.5555/3020548.3020552(19-26)Online publication date: 14-Jul-2011
  • (2011)Applying UCT to boolean satisfiabilityProceedings of the 14th international conference on Theory and application of satisfiability testing10.5555/2023474.2023519(373-374)Online publication date: 19-Jun-2011
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media