research-article

Algorithms for computing strategies in two-player simultaneous move games

Artificial Intelligence, Volume 237, Issue C

Pages 1 - 40

https://doi.org/10.1016/j.artint.2016.03.005

Published: 01 August 2016 Publication History

Abstract

Simultaneous move games model discrete, multistage interactions where at each stage players simultaneously choose their actions. At each stage, a player does not know what action the other player will take, but otherwise knows the full state of the game. This formalism has been used to express games in general game playing and can also model many discrete approximations of real-world scenarios. In this paper, we describe both novel and existing algorithms that compute strategies for the class of two-player zero-sum simultaneous move games. The algorithms include exact backward induction methods with efficient pruning, as well as Monte Carlo sampling algorithms. We evaluate the algorithms in two different settings: the offline case, where computational resources are abundant and closely approximating the optimal strategy is a priority, and the online search case, where computational resources are limited and acting quickly is necessary. We perform a thorough experimental evaluation on six substantially different games for both settings. For the exact algorithms, the results show that our pruning techniques for backward induction dramatically improve the computation time required by the previous exact algorithms. For the sampling algorithms, the results provide unique insights into their performance and identify favorable settings and domains for different sampling algorithms. We present algorithms for computing strategies in zero-sum simultaneous move games.The algorithms include exact algorithms and Monte Carlo sampling algorithms.We compare the algorithms in the offline computation and the online game-playing.Novel exact algorithm dominates in the offline equilibrium strategy computation.Novel sampling algorithms can guarantee convergence to optimal strategies.

References

[1]

M. Campbell, A.J. Hoane, F. Hsu, Deep Blue, Artif. Intell. 134 (1-2) (2002) 57-83.

Digital Library

[2]

J. Schaeffer, R. Lake, P. Lu, M. Bryant, Chinook: the world man-machine checkers champion, AI Mag. 17 (1) (1996) 21-29.

[3]

G. Tesauro, Temporal difference learning and TD-Gammon, Commun. ACM 38 (3) (1995) 58-68.

Digital Library

[4]

V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human-level control through deep reinforcement learning, Nature 518 (7540) (2015) 529-533.

[5]

S. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, 3rd edition, Prentice Hall, 2009.

Digital Library

[6]

A. Keuter, L. Nett, Ermes-auction in Germany. First simultaneous multiple-round auction in the European telecommunications market, Telecommun. Policy 21 (4) (1997) 297-307.

[7]

D. Beard, P. Hingston, M. Masek, Using Monte Carlo tree search for replanning in a multistage simultaneous game, in: Proceedings of the IEEE Congress on Evolutionary Computation (CEC), 2012.

[8]

O. Teytaud, S. Flory, Upper confidence trees with short term partial information, in: Applications of Evolutionary Computation (EvoGames 2011), Part I, in: Lect. Notes Comput. Sci., vol. 6624, 2011, pp. 153-162.

Digital Library

[9]

H. Gintis, Game Theory Evolving, 2nd edition, Princeton University Press, 2009.

[10]

B. Bo¿ansky, V. Lisy, J. ¿ermák, R. Vítek, M. Pechou¿ek, Using double-oracle method and serialized alpha-beta search for pruning in simultaneous moves games, in: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013, pp. 48-54.

Digital Library

[11]

M. Lanctot, V. Lisy, M.H.M. Winands, Monte Carlo tree search in simultaneous move games with applications to Goofspiel, in: Computer Games Workshop at IJCAI 2013, in: Commun. Comput. Inf. Sci., vol. 408, 2014, pp. 28-43.

[12]

V. Lisy, V. Kovarík, M. Lanctot, B. Bo¿ansky, Convergence of Monte Carlo tree search in simultaneous move games, in: Adv. Neural Inf. Process. Syst., vol. 26, 2013, pp. 2112-2120.

[13]

V. Lisy, M. Lanctot, M. Bowling, Online Monte Carlo counterfactual regret minimization for search in imperfect information games, in: Proceedings of the 14th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2015, pp. 27-36.

Digital Library

[14]

M. Shafiei, N. Sturtevant, J. Schaeffer, Comparing UCT versus CFR in simultaneous games, in: Proceeding of the IJCAI Workshop on General Game-Playing (GIGA), 2009, pp. 75-82.

[15]

A. Saffidine, Solving games and all that, Ph.D. thesis, Université Paris-Dauphine, Paris, France, 2013.

[16]

J.V. Neumann, Zur theorie der gesellschaftsspiele, Math. Ann. 100 (1928) 295-320.

[17]

M.L. Littman, Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML), 1994, pp. 157-163.

Digital Library

[18]

M.L. Littman, Value-function reinforcement learning in Markov games, Cogn. Syst. Res. 2 (1) (2001) 55-66.

Digital Library

[19]

M.L. Littman, C. Szepesvári, A generalized reinforcement-learning model: convergence and applications, in: Proceedings of the 13th International Conference on Machine Learning (ICML), 1996, pp. 310-318.

[20]

M.G. Lagoudakis, R. Parr, Value function approximation in zero-sum Markov games, in: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI), 2002, pp. 283-292.

Digital Library

[21]

U. Savagaonkar, R. Givan, E.K.P. Chong, Sampling techniques for zero-sum, discounted Markov games, in: Proceedings of the 40th Annual Allerton Conference on Communication, Control and Computing, 2002, pp. 285-294.

[22]

J. Perolat, B. Scherrer, B. Piot, O. Pietquin, Approximate dynamic programming for two-player zero-sum Markov games, in: Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.

[23]

S. Singh, M. Kearns, Y. Mansour, Nash convergence of gradient dynamics in general-sum games, in: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI), 2000, pp. 541-548.

Digital Library

[24]

M. Bowling, M. Veloso, Convergence of gradient dynamics with a variable learning rate, in: Proceedings of the 18th International Conference on Machine Learning (ICML), 2001, pp. 27-34.

Digital Library

[25]

M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, in: Proceedings of 20th International Conference on Machine Learning (ICML), 2003, pp. 928-936.

[26]

M. Bowling, Convergence and no-regret in multiagent learning, in: Adv. Neural Inf. Process. Syst., vol. 17, 2005, pp. 209-216.

[27]

G. Gordon, No-regret algorithms for online convex programs, in: Proceedings of the 20th Annual Conference on Neural Information Processing Systems (NIPS), 2006, pp. 489-496.

[28]

M. Zinkevich, M. Johanson, M. Bowling, C. Piccione, Regret minimization in games with incomplete information, in: Adv. Neural Inf. Process. Syst., 2008, pp. 1729-1736.

[29]

M. Bowling, N. Burch, M. Johanson, O. Tammelin, Heads-up limit hold'em poker is solved, Science 347 (6218) (2015) 145-149.

[30]

A. Nowé, P. Vrancx, Y.-M.D. Hauwere, Game theory and multi-agent reinforcement learning, in: Reinforcement Learning: State-of-the-Art, 2012, pp. 441-470 (Ch. 12).

[31]

L. Busoniu, R. Babu¿ka, B.D. Schutter, A comprehensive survey of multi-agent reinforcement learning, IEEE Trans. Syst. Man Cybern., Part C, Appl. Rev. 38 (2) (2008) 156-172.

Digital Library

[32]

D. Bloembergen, K. Tuyls, D. Hennes, M. Kaisers, Evolutionary dynamics of multi-agent learning: a survey, J. Artif. Intell. Res. 53 (2015) 659-697.

Digital Library

[33]

Y. Shoham, K. Leyton-Brown, Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations, Cambridge University Press, 2009.

Digital Library

[34]

R. Bellman, Dynamic Programming, Princeton University Press, 1957.

Digital Library

[35]

S.M. Ross, Goofspiel -- the game of pure strategy, J. Appl. Probab. 8 (3) (1971) 621-625.

[36]

M. Buro, Solving the Oshi-Zumo game, in: Advances in Computer Games: Many Games, Many Challenges, in: IFIP Advances in Information and Communication Technology, vol. 135, 2003, pp. 361-366.

[37]

G.C. Rhoads, L. Bartholdi, Computer solution to the game of pure strategy, Games 3 (4) (2012) 150-156.

[38]

A. Saffidine, H. Finnsson, M. Buro, Alpha-beta pruning for games with simultaneous moves, in: Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), 2012, pp. 556-562.

Digital Library

[39]

H. McMahan, G. Gordon, A. Blum, Planning in the presence of cost functions controlled by an adversary, in: Proceedings of the 20th International Conference on Machine Learning (ICML), 2003, pp. 536-543.

[40]

M. Zinkevich, M. Bowling, N. Burch, A new algorithm for generating equilibria in massive zero-sum games, in: Proceedings of the 27th Conference on Artificial Intelligence (AAAI), 2007, pp. 788-793.

Digital Library

[41]

T.D. Hansen, P.B. Miltersen, T.B. Sørensen, On range of skill, in: Proceedings of the 28th Conference on Artificial Intelligence (AAAI), 2008, pp. 277-282.

Digital Library

[42]

D. Koller, N. Megiddo, B. von Stengel, Efficient computation of equilibria for extensive two-person games, Games Econ. Behav. 14 (2) (1996) 247-259.

[43]

T. Sandholm, The state of solving large incomplete-information games, and application to poker, AI Mag. 31 (4) (2010) 13-32.

Digital Library

[44]

B. Bo¿ansky, C. Kiekintveld, V. Lisy, M. Pechou¿ek, An exact double-oracle algorithm for zero-sum extensive-form games with imperfect information, J. Artif. Intell. Res. 51 (2014) 829-866.

Digital Library

[45]

R. Coulom, Efficient selectivity and backup operators in Monte-Carlo tree search, in: Proceedings of the 5th International Conference on Computers and Games (CG), in: Lect. Notes Comput. Sci., vol. 4630, 2006, pp. 72-83.

Digital Library

[46]

L. Kocsis, C. Szepesvári, Bandit-based Monte Carlo planning, in: 15th European Conference on Machine Learning, in: Lect. Notes Comput. Sci., vol. 4212, 2006, pp. 282-293.

Digital Library

[47]

S. Gelly, D. Silver, Monte-Carlo tree search and rapid action value estimation in computer Go, Artif. Intell. 175 (11) (2011) 1856-1875.

Digital Library

[48]

S. Gelly, L. Kocsis, M. Schoenauer, M. Sebag, D. Silver, C. Szepesvári, O. Teytaud, The grand challenge of computer Go: Monte Carlo tree search and extensions, Commun. ACM 55 (3) (2012) 106-113.

Digital Library

[49]

P. Ciancarini, G. Favini, Monte Carlo tree search in Kriegspiel, Artif. Intell. 174 (11) (2010) 670-684.

Digital Library

[50]

P.I. Cowling, E.J. Powley, D. Whitehouse, Information set Monte Carlo tree search, IEEE Trans. Comput. Intell. AI Games 4 (2) (2012) 120-143.

[51]

P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multiarmed bandit problem, Mach. Learn. 47 (2-3) (2002) 235-256.

Digital Library

[52]

M. Genesereth, N. Love, B. Pell, General game-playing: overview of the AAAI competition, AI Mag. 26 (2005) 73-84.

[53]

H. Finnsson, Cadia-player: a general game playing agent, Master's thesis, Reykjavík University, 2007.

[54]

H. Finnsson, Simulation-based general game playing, Ph.D. thesis, Reykjavík University, 2012.

[55]

S. Samothrakis, D. Robles, S.M. Lucas, A UCT agent for Tron: initial investigations, in: Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games (CIG), 2010, pp. 365-371.

[56]

P. Auer, N. Cesa-Bianchi, Y. Freund, R.E. Schapire, The nonstochastic multiarmed bandit problem, SIAM J. Comput. 32 (1) (2003) 48-77.

Digital Library

[57]

P. Perick, D.L. St-Pierre, F. Maes, D. Ernst, Comparison of different selection strategies in Monte-Carlo tree search for the game of Tron, in: Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), 2012, pp. 242-249.

[58]

M. Lanctot, K. Waugh, M. Bowling, M. Zinkevich, Sampling for regret minimization in extensive games, in: Adv. Neural Inf. Process. Syst., 2009, pp. 1078-1086.

[59]

V. Kovarík, V. Lisy, Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games, CoRR, arXiv:1509.00149.

[60]

M. Lanctot, C. Wittlinger, M.H.M. Winands, N.G.P. Den Teuling, Monte Carlo tree search for simultaneous move games: a case study in the game of Tron, in: Proceedings of the 25th Benelux Conference on Artificial Intelligence (BNAIC), 2013, pp. 104-111.

[61]

M.J.W. Tak, M.H.M. Winands, M. Lanctot, Monte Carlo tree search variants for simultaneous move games, in: Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), 2014, pp. 232-239.

[62]

T. Pepels, M.H. Winands, M. Lanctot, Real-time Monte Carlo tree search for Ms Pac-Man, IEEE Trans. Comput. Intell. AI Games 6 (3) (2014) 245-257.

[63]

D. Perez, E.J. Powley, D. Whitehouse, P. Rohlfshagen, S. Samothrakis, P.I. Cowling, S.M. Lucas, Solving the physical traveling salesman problem: tree search and macro actions, IEEE Trans. Comput. Intell. AI Games 6 (1) (2014) 31-45.

[64]

R.-K. Balla, A. Fern, UCT for tactical assault planning in real-time strategy games, in: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2009, pp. 40-45.

Digital Library

[65]

P.I. Cowling, M. Buro, M. Bida, A. Botea, B. Bouzy, M.V. Butz, P. Hingston, H. Muñoz-Avila, D. Nau, M. Sipper, Search in real-time video games, in: Artificial and Computational Intelligence in Games, in: Dagstuhl Follow-Ups, vol. 6, 2013, pp. 1-19.

[66]

M.G. Bellemare, Y. Naddaf, J. Veness, M. Bowling, The arcade learning environment: an evaluation platform for general agents, J. Artif. Intell. Res. 47 (2013) 253-279.

Digital Library

[67]

S. Ontañón, G. Synnaeve, A. Uriarte, F. Richoux, D. Churchill, M. Preuss, A survey of real-time strategy game AI research and competition in StarCraft, IEEE Trans. Comput. Intell. AI Games 5 (4) (2013) 293-311.

[68]

A. Kovarsky, M. Buro, Heuristic search applied to abstract combat games, Adv. Artif. Intell. (2005) 55-77.

Digital Library

[69]

F. Sailer, M. Buro, M. Lanctot, Adversarial planning through strategy simulation, in: IEEE Symposium on Computational Intelligence and Games (CIG), 2007, pp. 37-45.

Digital Library

[70]

V. Lisy, B. Bo¿ansky, M. Jakob, M. Pechou¿ek, Adversarial search with procedural knowledge heuristic, in: Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2009, pp. 899-906.

Digital Library

[71]

D. Churchill, A. Saffidine, M. Buro, Fast heuristic search for RTS game combat scenarios, in: 8th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), 2012, pp. 112-117.

[72]

A. Reinefeld, An improvement to the scout tree-search algorithm, ICCA J. 6 (4) (1983) 4-14.

[73]

S. Hart, A. Mas-Colell, A simple adaptive procedure leading to correlated equilibrium, Econometrica 68 (5) (2000) 1127-1150.

[74]

M. Lanctot, Monte Carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games, Ph.D. thesis, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, June 2013.

Digital Library

[75]

S. Gelly, D. Silver, Combining online and offline learning in UCT, in: Proceedings of the 24th International Conference on Machine Learning, 2007, pp. 273-280.

Digital Library

[76]

R. Lorentz, Amazons discover Monte-Carlo, in: Proceedings of the 6th International Conference on Computers and Games (CG), in: Lect. Notes Comput. Sci., vol. 5131, 2008, pp. 13-24.

Digital Library

[77]

M.H.M. Winands, Y. Björnsson, J.-T. Saito, Monte Carlo tree search in Lines of Action, IEEE Trans. Comput. Intell. AI Games 2 (4) (2010) 239-250.

[78]

R. Lorentz, T. Horey, Programming Breakthrough, in: Proceedings of the 8th International Conference on Computers and Games (CG), in: Lect. Notes Comput. Sci., vol. 8427, 2013, pp. 49-59.

[79]

M. Lanctot, M.H.M. Winands, T. Pepels, N.R. Sturtevant, Monte Carlo tree search with heuristic evaluations using implicit minimax backups, in: Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), 2014, pp. 341-348.

[80]

R. Ramanujan, B. Selman, Trade-offs in sampling-based adversarial planning, in: Proceedings of the 21st International Conference on Automated Planning and Scheduling (ICAPS), 2011, pp. 202-209.

[81]

M. Lanctot, A. Saffidine, J. Veness, C. Archibald, M.H.M. Winands, Monte Carlo *-minimax search, in: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI), 2013, pp. 580-586.

Digital Library

[82]

K.Q. Nguyen, R. Thawonmas, Monte Carlo tree search for collaboration control of ghosts in Ms. Pac-Man, IEEE Trans. Comput. Intell. AI Games 5 (1) (2013) 57-68.

[83]

S. Smith, D. Nau, An analysis of forward pruning, in: Proceedings of the National Conference on Artificial Intelligence, 1995, p. 1386.

Digital Library

[84]

N.G.P. Den Teuling, M.H.M. Winands, Monte-Carlo Tree Search for the simultaneous move game Tron, in: Proceedings of Computer Games Workshop (ECAI), 2012, pp. 126-141.

[85]

M. Ponsen, S. de Jong, M. Lanctot, Computing approximate Nash equilibria and robust best-responses using sampling, J. Artif. Intell. Res. 42 (2011) 575-605.

Digital Library

[86]

R. Gibson, M. Lanctot, N. Burch, D. Szafron, M. Bowling, Generalized sampling and variance in counterfactual regret minimization, in: Proceedings of the 26th Conference on Artificial Intelligence (AAAI), 2012, pp. 1355-1361.

Digital Library

Cited By

Sironi CWinands M(2022)Analysis of the Impact of Randomization of Search-Control Parameters in Monte-Carlo Tree SearchJournal of Artificial Intelligence Research10.1613/jair.1.1206572(717-757)Online publication date: 4-Jan-2022
https://dl.acm.org/doi/10.1613/jair.1.12065
Kovařík VLisý V(2020)Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move gamesMachine Language10.1007/s10994-019-05832-z109:1(1-50)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1007/s10994-019-05832-z
Song XWang TZhang CElkind EVeloso MAgmon NTaylor M(2019)Convergence of Multi-Agent Learning with a Finite Step Size in General-Sum GamesProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331788(935-943)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3331788
Show More Cited By

Algorithms for computing strategies in two-player simultaneous move games

Recommendations

Player modeling, search algorithms and strategies in multi-player games
ACG'05: Proceedings of the 11th international conference on Advances in Computer Games

For a long period of time, two person zero-sum games have been in the focus of researchers of various communities. The efforts were mainly driven by the fascination of special competitions such as Deep Blue vs. Kasparov, and of the beauty of parlor ...
Computing sequential equilibria for two-player games
SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm

Koller, Megiddo and von Stengel showed how to efficiently compute minimax strategies for two-player extensive-form zero-sum games with imperfect information but perfect recall using linear programming and avoiding conversion to normal form. Koller and ...
Two-Player Games on Cellular Automata

Comments

Information & Contributors

Information

Published In

cover image Artificial Intelligence

Artificial Intelligence Volume 237, Issue C

August 2016

228 pages

ISSN:0004-3702

Issue’s Table of Contents

Copyright © Elsevier B.V.

Publisher

Elsevier Science Publishers Ltd.

United Kingdom

Publication History

Published: 01 August 2016

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sironi CWinands M(2022)Analysis of the Impact of Randomization of Search-Control Parameters in Monte-Carlo Tree SearchJournal of Artificial Intelligence Research10.1613/jair.1.1206572(717-757)Online publication date: 4-Jan-2022
https://dl.acm.org/doi/10.1613/jair.1.12065
Kovařík VLisý V(2020)Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move gamesMachine Language10.1007/s10994-019-05832-z109:1(1-50)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1007/s10994-019-05832-z
Song XWang TZhang CElkind EVeloso MAgmon NTaylor M(2019)Convergence of Multi-Agent Learning with a Finite Step Size in General-Sum GamesProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331788(935-943)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3331788
Su CHan PJiang BLiu CChen JGómez CSchwarzacher S(2019)A TCM acupoints ranking approach towards post-stroke dysphagia based on an improved MCTS decision methodTechnology and Health Care10.3233/THC-19903427:S1(367-381)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.3233/THC-199034
Sironi CWinands M(2019)Comparing Randomization Strategies for Search-Control Parameters in Monte-Carlo Tree Search2019 IEEE Conference on Games (CoG)10.1109/CIG.2019.8848056(1-8)Online publication date: 20-Aug-2019
https://dl.acm.org/doi/10.1109/CIG.2019.8848056
Srinivasan SLanctot MZambaldi VPérolat JTuyls KMunos RBowling M(2018)Actor-critic policy optimization in partially observable multiagent environmentsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327261(3426-3439)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327144.3327261
Dimitrakakis CParkes DRadanovic GTylkin P(2017)Multi-view decision processesProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3295222.3295296(5449-5458)Online publication date: 4-Dec-2017
https://dl.acm.org/doi/10.5555/3295222.3295296
Lanctot MZambaldi VGruslys ALazaridou ATuyls KPérolat JSilver DGraepel T(2017)A unified game-theoretic approach to multiagent reinforcement learningProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3294996.3295174(4193-4206)Online publication date: 4-Dec-2017
https://dl.acm.org/doi/10.5555/3294996.3295174
Čermák JBošansky BLisy V(2017)An algorithm for constructing and solving imperfect recall abstractions of large extensive-form gamesProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3171642.3171776(936-942)Online publication date: 19-Aug-2017
https://dl.acm.org/doi/10.5555/3171642.3171776
Leibo JZambaldi VLanctot MMarecki JGraepel TLarson KWinikoff MDas SDurfee E(2017)Multi-agent Reinforcement Learning in Sequential Social DilemmasProceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems10.5555/3091125.3091194(464-473)Online publication date: 8-May-2017
https://dl.acm.org/doi/10.5555/3091125.3091194
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents