Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Algorithms for computing strategies in two-player simultaneous move games

Published: 01 August 2016 Publication History

Abstract

Simultaneous move games model discrete, multistage interactions where at each stage players simultaneously choose their actions. At each stage, a player does not know what action the other player will take, but otherwise knows the full state of the game. This formalism has been used to express games in general game playing and can also model many discrete approximations of real-world scenarios. In this paper, we describe both novel and existing algorithms that compute strategies for the class of two-player zero-sum simultaneous move games. The algorithms include exact backward induction methods with efficient pruning, as well as Monte Carlo sampling algorithms. We evaluate the algorithms in two different settings: the offline case, where computational resources are abundant and closely approximating the optimal strategy is a priority, and the online search case, where computational resources are limited and acting quickly is necessary. We perform a thorough experimental evaluation on six substantially different games for both settings. For the exact algorithms, the results show that our pruning techniques for backward induction dramatically improve the computation time required by the previous exact algorithms. For the sampling algorithms, the results provide unique insights into their performance and identify favorable settings and domains for different sampling algorithms. We present algorithms for computing strategies in zero-sum simultaneous move games.The algorithms include exact algorithms and Monte Carlo sampling algorithms.We compare the algorithms in the offline computation and the online game-playing.Novel exact algorithm dominates in the offline equilibrium strategy computation.Novel sampling algorithms can guarantee convergence to optimal strategies.

References

[1]
M. Campbell, A.J. Hoane, F. Hsu, Deep Blue, Artif. Intell. 134 (1-2) (2002) 57-83.
[2]
J. Schaeffer, R. Lake, P. Lu, M. Bryant, Chinook: the world man-machine checkers champion, AI Mag. 17 (1) (1996) 21-29.
[3]
G. Tesauro, Temporal difference learning and TD-Gammon, Commun. ACM 38 (3) (1995) 58-68.
[4]
V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human-level control through deep reinforcement learning, Nature 518 (7540) (2015) 529-533.
[5]
S. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, 3rd edition, Prentice Hall, 2009.
[6]
A. Keuter, L. Nett, Ermes-auction in Germany. First simultaneous multiple-round auction in the European telecommunications market, Telecommun. Policy 21 (4) (1997) 297-307.
[7]
D. Beard, P. Hingston, M. Masek, Using Monte Carlo tree search for replanning in a multistage simultaneous game, in: Proceedings of the IEEE Congress on Evolutionary Computation (CEC), 2012.
[8]
O. Teytaud, S. Flory, Upper confidence trees with short term partial information, in: Applications of Evolutionary Computation (EvoGames 2011), Part I, in: Lect. Notes Comput. Sci., vol. 6624, 2011, pp. 153-162.
[9]
H. Gintis, Game Theory Evolving, 2nd edition, Princeton University Press, 2009.
[10]
B. Bo¿ansky, V. Lisy, J. ¿ermák, R. Vítek, M. Pechou¿ek, Using double-oracle method and serialized alpha-beta search for pruning in simultaneous moves games, in: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013, pp. 48-54.
[11]
M. Lanctot, V. Lisy, M.H.M. Winands, Monte Carlo tree search in simultaneous move games with applications to Goofspiel, in: Computer Games Workshop at IJCAI 2013, in: Commun. Comput. Inf. Sci., vol. 408, 2014, pp. 28-43.
[12]
V. Lisy, V. Kovarík, M. Lanctot, B. Bo¿ansky, Convergence of Monte Carlo tree search in simultaneous move games, in: Adv. Neural Inf. Process. Syst., vol. 26, 2013, pp. 2112-2120.
[13]
V. Lisy, M. Lanctot, M. Bowling, Online Monte Carlo counterfactual regret minimization for search in imperfect information games, in: Proceedings of the 14th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2015, pp. 27-36.
[14]
M. Shafiei, N. Sturtevant, J. Schaeffer, Comparing UCT versus CFR in simultaneous games, in: Proceeding of the IJCAI Workshop on General Game-Playing (GIGA), 2009, pp. 75-82.
[15]
A. Saffidine, Solving games and all that, Ph.D. thesis, Université Paris-Dauphine, Paris, France, 2013.
[16]
J.V. Neumann, Zur theorie der gesellschaftsspiele, Math. Ann. 100 (1928) 295-320.
[17]
M.L. Littman, Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML), 1994, pp. 157-163.
[18]
M.L. Littman, Value-function reinforcement learning in Markov games, Cogn. Syst. Res. 2 (1) (2001) 55-66.
[19]
M.L. Littman, C. Szepesvári, A generalized reinforcement-learning model: convergence and applications, in: Proceedings of the 13th International Conference on Machine Learning (ICML), 1996, pp. 310-318.
[20]
M.G. Lagoudakis, R. Parr, Value function approximation in zero-sum Markov games, in: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI), 2002, pp. 283-292.
[21]
U. Savagaonkar, R. Givan, E.K.P. Chong, Sampling techniques for zero-sum, discounted Markov games, in: Proceedings of the 40th Annual Allerton Conference on Communication, Control and Computing, 2002, pp. 285-294.
[22]
J. Perolat, B. Scherrer, B. Piot, O. Pietquin, Approximate dynamic programming for two-player zero-sum Markov games, in: Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.
[23]
S. Singh, M. Kearns, Y. Mansour, Nash convergence of gradient dynamics in general-sum games, in: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI), 2000, pp. 541-548.
[24]
M. Bowling, M. Veloso, Convergence of gradient dynamics with a variable learning rate, in: Proceedings of the 18th International Conference on Machine Learning (ICML), 2001, pp. 27-34.
[25]
M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, in: Proceedings of 20th International Conference on Machine Learning (ICML), 2003, pp. 928-936.
[26]
M. Bowling, Convergence and no-regret in multiagent learning, in: Adv. Neural Inf. Process. Syst., vol. 17, 2005, pp. 209-216.
[27]
G. Gordon, No-regret algorithms for online convex programs, in: Proceedings of the 20th Annual Conference on Neural Information Processing Systems (NIPS), 2006, pp. 489-496.
[28]
M. Zinkevich, M. Johanson, M. Bowling, C. Piccione, Regret minimization in games with incomplete information, in: Adv. Neural Inf. Process. Syst., 2008, pp. 1729-1736.
[29]
M. Bowling, N. Burch, M. Johanson, O. Tammelin, Heads-up limit hold'em poker is solved, Science 347 (6218) (2015) 145-149.
[30]
A. Nowé, P. Vrancx, Y.-M.D. Hauwere, Game theory and multi-agent reinforcement learning, in: Reinforcement Learning: State-of-the-Art, 2012, pp. 441-470 (Ch. 12).
[31]
L. Busoniu, R. Babu¿ka, B.D. Schutter, A comprehensive survey of multi-agent reinforcement learning, IEEE Trans. Syst. Man Cybern., Part C, Appl. Rev. 38 (2) (2008) 156-172.
[32]
D. Bloembergen, K. Tuyls, D. Hennes, M. Kaisers, Evolutionary dynamics of multi-agent learning: a survey, J. Artif. Intell. Res. 53 (2015) 659-697.
[33]
Y. Shoham, K. Leyton-Brown, Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations, Cambridge University Press, 2009.
[34]
R. Bellman, Dynamic Programming, Princeton University Press, 1957.
[35]
S.M. Ross, Goofspiel -- the game of pure strategy, J. Appl. Probab. 8 (3) (1971) 621-625.
[36]
M. Buro, Solving the Oshi-Zumo game, in: Advances in Computer Games: Many Games, Many Challenges, in: IFIP Advances in Information and Communication Technology, vol. 135, 2003, pp. 361-366.
[37]
G.C. Rhoads, L. Bartholdi, Computer solution to the game of pure strategy, Games 3 (4) (2012) 150-156.
[38]
A. Saffidine, H. Finnsson, M. Buro, Alpha-beta pruning for games with simultaneous moves, in: Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), 2012, pp. 556-562.
[39]
H. McMahan, G. Gordon, A. Blum, Planning in the presence of cost functions controlled by an adversary, in: Proceedings of the 20th International Conference on Machine Learning (ICML), 2003, pp. 536-543.
[40]
M. Zinkevich, M. Bowling, N. Burch, A new algorithm for generating equilibria in massive zero-sum games, in: Proceedings of the 27th Conference on Artificial Intelligence (AAAI), 2007, pp. 788-793.
[41]
T.D. Hansen, P.B. Miltersen, T.B. Sørensen, On range of skill, in: Proceedings of the 28th Conference on Artificial Intelligence (AAAI), 2008, pp. 277-282.
[42]
D. Koller, N. Megiddo, B. von Stengel, Efficient computation of equilibria for extensive two-person games, Games Econ. Behav. 14 (2) (1996) 247-259.
[43]
T. Sandholm, The state of solving large incomplete-information games, and application to poker, AI Mag. 31 (4) (2010) 13-32.
[44]
B. Bo¿ansky, C. Kiekintveld, V. Lisy, M. Pechou¿ek, An exact double-oracle algorithm for zero-sum extensive-form games with imperfect information, J. Artif. Intell. Res. 51 (2014) 829-866.
[45]
R. Coulom, Efficient selectivity and backup operators in Monte-Carlo tree search, in: Proceedings of the 5th International Conference on Computers and Games (CG), in: Lect. Notes Comput. Sci., vol. 4630, 2006, pp. 72-83.
[46]
L. Kocsis, C. Szepesvári, Bandit-based Monte Carlo planning, in: 15th European Conference on Machine Learning, in: Lect. Notes Comput. Sci., vol. 4212, 2006, pp. 282-293.
[47]
S. Gelly, D. Silver, Monte-Carlo tree search and rapid action value estimation in computer Go, Artif. Intell. 175 (11) (2011) 1856-1875.
[48]
S. Gelly, L. Kocsis, M. Schoenauer, M. Sebag, D. Silver, C. Szepesvári, O. Teytaud, The grand challenge of computer Go: Monte Carlo tree search and extensions, Commun. ACM 55 (3) (2012) 106-113.
[49]
P. Ciancarini, G. Favini, Monte Carlo tree search in Kriegspiel, Artif. Intell. 174 (11) (2010) 670-684.
[50]
P.I. Cowling, E.J. Powley, D. Whitehouse, Information set Monte Carlo tree search, IEEE Trans. Comput. Intell. AI Games 4 (2) (2012) 120-143.
[51]
P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multiarmed bandit problem, Mach. Learn. 47 (2-3) (2002) 235-256.
[52]
M. Genesereth, N. Love, B. Pell, General game-playing: overview of the AAAI competition, AI Mag. 26 (2005) 73-84.
[53]
H. Finnsson, Cadia-player: a general game playing agent, Master's thesis, Reykjavík University, 2007.
[54]
H. Finnsson, Simulation-based general game playing, Ph.D. thesis, Reykjavík University, 2012.
[55]
S. Samothrakis, D. Robles, S.M. Lucas, A UCT agent for Tron: initial investigations, in: Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games (CIG), 2010, pp. 365-371.
[56]
P. Auer, N. Cesa-Bianchi, Y. Freund, R.E. Schapire, The nonstochastic multiarmed bandit problem, SIAM J. Comput. 32 (1) (2003) 48-77.
[57]
P. Perick, D.L. St-Pierre, F. Maes, D. Ernst, Comparison of different selection strategies in Monte-Carlo tree search for the game of Tron, in: Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), 2012, pp. 242-249.
[58]
M. Lanctot, K. Waugh, M. Bowling, M. Zinkevich, Sampling for regret minimization in extensive games, in: Adv. Neural Inf. Process. Syst., 2009, pp. 1078-1086.
[59]
V. Kovarík, V. Lisy, Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games, CoRR, arXiv:1509.00149.
[60]
M. Lanctot, C. Wittlinger, M.H.M. Winands, N.G.P. Den Teuling, Monte Carlo tree search for simultaneous move games: a case study in the game of Tron, in: Proceedings of the 25th Benelux Conference on Artificial Intelligence (BNAIC), 2013, pp. 104-111.
[61]
M.J.W. Tak, M.H.M. Winands, M. Lanctot, Monte Carlo tree search variants for simultaneous move games, in: Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), 2014, pp. 232-239.
[62]
T. Pepels, M.H. Winands, M. Lanctot, Real-time Monte Carlo tree search for Ms Pac-Man, IEEE Trans. Comput. Intell. AI Games 6 (3) (2014) 245-257.
[63]
D. Perez, E.J. Powley, D. Whitehouse, P. Rohlfshagen, S. Samothrakis, P.I. Cowling, S.M. Lucas, Solving the physical traveling salesman problem: tree search and macro actions, IEEE Trans. Comput. Intell. AI Games 6 (1) (2014) 31-45.
[64]
R.-K. Balla, A. Fern, UCT for tactical assault planning in real-time strategy games, in: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2009, pp. 40-45.
[65]
P.I. Cowling, M. Buro, M. Bida, A. Botea, B. Bouzy, M.V. Butz, P. Hingston, H. Muñoz-Avila, D. Nau, M. Sipper, Search in real-time video games, in: Artificial and Computational Intelligence in Games, in: Dagstuhl Follow-Ups, vol. 6, 2013, pp. 1-19.
[66]
M.G. Bellemare, Y. Naddaf, J. Veness, M. Bowling, The arcade learning environment: an evaluation platform for general agents, J. Artif. Intell. Res. 47 (2013) 253-279.
[67]
S. Ontañón, G. Synnaeve, A. Uriarte, F. Richoux, D. Churchill, M. Preuss, A survey of real-time strategy game AI research and competition in StarCraft, IEEE Trans. Comput. Intell. AI Games 5 (4) (2013) 293-311.
[68]
A. Kovarsky, M. Buro, Heuristic search applied to abstract combat games, Adv. Artif. Intell. (2005) 55-77.
[69]
F. Sailer, M. Buro, M. Lanctot, Adversarial planning through strategy simulation, in: IEEE Symposium on Computational Intelligence and Games (CIG), 2007, pp. 37-45.
[70]
V. Lisy, B. Bo¿ansky, M. Jakob, M. Pechou¿ek, Adversarial search with procedural knowledge heuristic, in: Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2009, pp. 899-906.
[71]
D. Churchill, A. Saffidine, M. Buro, Fast heuristic search for RTS game combat scenarios, in: 8th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), 2012, pp. 112-117.
[72]
A. Reinefeld, An improvement to the scout tree-search algorithm, ICCA J. 6 (4) (1983) 4-14.
[73]
S. Hart, A. Mas-Colell, A simple adaptive procedure leading to correlated equilibrium, Econometrica 68 (5) (2000) 1127-1150.
[74]
M. Lanctot, Monte Carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games, Ph.D. thesis, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, June 2013.
[75]
S. Gelly, D. Silver, Combining online and offline learning in UCT, in: Proceedings of the 24th International Conference on Machine Learning, 2007, pp. 273-280.
[76]
R. Lorentz, Amazons discover Monte-Carlo, in: Proceedings of the 6th International Conference on Computers and Games (CG), in: Lect. Notes Comput. Sci., vol. 5131, 2008, pp. 13-24.
[77]
M.H.M. Winands, Y. Björnsson, J.-T. Saito, Monte Carlo tree search in Lines of Action, IEEE Trans. Comput. Intell. AI Games 2 (4) (2010) 239-250.
[78]
R. Lorentz, T. Horey, Programming Breakthrough, in: Proceedings of the 8th International Conference on Computers and Games (CG), in: Lect. Notes Comput. Sci., vol. 8427, 2013, pp. 49-59.
[79]
M. Lanctot, M.H.M. Winands, T. Pepels, N.R. Sturtevant, Monte Carlo tree search with heuristic evaluations using implicit minimax backups, in: Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), 2014, pp. 341-348.
[80]
R. Ramanujan, B. Selman, Trade-offs in sampling-based adversarial planning, in: Proceedings of the 21st International Conference on Automated Planning and Scheduling (ICAPS), 2011, pp. 202-209.
[81]
M. Lanctot, A. Saffidine, J. Veness, C. Archibald, M.H.M. Winands, Monte Carlo *-minimax search, in: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI), 2013, pp. 580-586.
[82]
K.Q. Nguyen, R. Thawonmas, Monte Carlo tree search for collaboration control of ghosts in Ms. Pac-Man, IEEE Trans. Comput. Intell. AI Games 5 (1) (2013) 57-68.
[83]
S. Smith, D. Nau, An analysis of forward pruning, in: Proceedings of the National Conference on Artificial Intelligence, 1995, p. 1386.
[84]
N.G.P. Den Teuling, M.H.M. Winands, Monte-Carlo Tree Search for the simultaneous move game Tron, in: Proceedings of Computer Games Workshop (ECAI), 2012, pp. 126-141.
[85]
M. Ponsen, S. de Jong, M. Lanctot, Computing approximate Nash equilibria and robust best-responses using sampling, J. Artif. Intell. Res. 42 (2011) 575-605.
[86]
R. Gibson, M. Lanctot, N. Burch, D. Szafron, M. Bowling, Generalized sampling and variance in counterfactual regret minimization, in: Proceedings of the 26th Conference on Artificial Intelligence (AAAI), 2012, pp. 1355-1361.

Cited By

View all
  • (2022)Analysis of the Impact of Randomization of Search-Control Parameters in Monte-Carlo Tree SearchJournal of Artificial Intelligence Research10.1613/jair.1.1206572(717-757)Online publication date: 4-Jan-2022
  • (2020)Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move gamesMachine Language10.1007/s10994-019-05832-z109:1(1-50)Online publication date: 1-Jan-2020
  • (2019)Convergence of Multi-Agent Learning with a Finite Step Size in General-Sum GamesProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331788(935-943)Online publication date: 8-May-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Artificial Intelligence
Artificial Intelligence  Volume 237, Issue C
August 2016
228 pages

Publisher

Elsevier Science Publishers Ltd.

United Kingdom

Publication History

Published: 01 August 2016

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Analysis of the Impact of Randomization of Search-Control Parameters in Monte-Carlo Tree SearchJournal of Artificial Intelligence Research10.1613/jair.1.1206572(717-757)Online publication date: 4-Jan-2022
  • (2020)Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move gamesMachine Language10.1007/s10994-019-05832-z109:1(1-50)Online publication date: 1-Jan-2020
  • (2019)Convergence of Multi-Agent Learning with a Finite Step Size in General-Sum GamesProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331788(935-943)Online publication date: 8-May-2019
  • (2019)A TCM acupoints ranking approach towards post-stroke dysphagia based on an improved MCTS decision methodTechnology and Health Care10.3233/THC-19903427:S1(367-381)Online publication date: 1-Jan-2019
  • (2019)Comparing Randomization Strategies for Search-Control Parameters in Monte-Carlo Tree Search2019 IEEE Conference on Games (CoG)10.1109/CIG.2019.8848056(1-8)Online publication date: 20-Aug-2019
  • (2018)Actor-critic policy optimization in partially observable multiagent environmentsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327261(3426-3439)Online publication date: 3-Dec-2018
  • (2017)Multi-view decision processesProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3295222.3295296(5449-5458)Online publication date: 4-Dec-2017
  • (2017)A unified game-theoretic approach to multiagent reinforcement learningProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3294996.3295174(4193-4206)Online publication date: 4-Dec-2017
  • (2017)An algorithm for constructing and solving imperfect recall abstractions of large extensive-form gamesProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3171642.3171776(936-942)Online publication date: 19-Aug-2017
  • (2017)Multi-agent Reinforcement Learning in Sequential Social DilemmasProceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems10.5555/3091125.3091194(464-473)Online publication date: 8-May-2017
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media