research-article

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

Authors:

Vinicius Zambaldi,

Janusz Marecki,

Thore GraepelAuthors Info & Claims

AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Pages 464 - 473

Published: 08 May 2017 Publication History

Abstract

Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.

References

[1]

Anatol Rapoport. Prisoner's dilemma--recollections and observations. In Game Theory as a Theory of a Conflict Resolution, pages 17--34. Springer, 1974.

[2]

Paul AM Van Lange, Jeff Joireman, Craig D Parks, and Eric Van Dijk. The psychology of social dilemmas: A review. Organizational Behavior and Human Decision Processes, 120(2):125--141, 2013.

[3]

Michael W Macy and Andreas Flache. Learning dynamics in social dilemmas. Proceedings of the National Academy of Sciences, 99(suppl 3):7229--7236, 2002.

[4]

Robert L. Trivers. The evolution of reciprocal altruism. Quarterly Review of Biology, pages 35--57, 1971.

[5]

Robert Axelrod. The Evolution of Cooperation. Basic Books, 1984.

[6]

Martin A Nowak and Karl Sigmund. Tit for tat in heterogeneous populations. Nature, 355(6357):250--253, 1992.

[7]

Martin Nowak, Karl Sigmund, et al. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner's dilemma game. Nature, 364(6432):56--58, 1993.

[8]

Martin A Nowak and Karl Sigmund. Evolution of indirect reciprocity by image scoring. Nature, 393(6685):573--577, 1998.

[9]

Robert Axelrod. An evolutionary approach to norms. American political science review, 80(04):1095--1111, 1986.

[10]

Samhar Mahmoud, Simon Miles, and Michael Luck. Cooperation emergence under resource-constrained peer punishment. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, pages 900--908. International Foundation for Autonomous Agents and Multiagent Systems, 2016.

Digital Library

[11]

T.W. Sandholm and R.H. Crites. Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems, 37(1--2):147--166, 1996.

[12]

Enrique Munoz de Cote, Alessandro Lazaric, and Marcello Restelli. Learning to cooperate in multi-agent social dilemmas. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2006.

Digital Library

[13]

M. Wunder, M. Littman, and M. Babes. Classes of multiagent Q-learning dynamics with greedy exploration. In Proceedings of the 27th International Conference on Machine Learning, 2010.

Digital Library

[14]

Erik Zawadzki, Asher Lipson, and Kevin Leyton-Brown. Empirically evaluating multiagent learning algorithms. CoRR, abs/1401.8074, 2014.

[15]

Daan Bloembergen, Karl Tuyls, Daniel Hennes, and Michael Kaisers. Evolutionary dynamics of multi-agent learning: A survey. Journal of Artificial Intelligence Research, 53:659--697, 2015.

[16]

Martin A Nowak and Robert M May. Evolutionary games and spatial chaos. Nature, 359(6398):826--829, 1992.

[17]

Chao Yu, Minjie Zhang, Fenghui Ren, and Guozhen Tan. Emotional multiagent reinforcement learning in spatial social dilemmas. IEEE Transactions on Neural Networks and Learning Systems, 26(12):3083--3096, 2015.

[18]

Hisashi Ohtsuki, Christoph Hauert, Erez Lieberman, and Martin A Nowak. A simple rule for the evolution of cooperation on graphs and social networks. Nature, 441(7092):502--505, 2006.

[19]

Francisco C Santos and Jorge M Pacheco. A new route to the evolution of cooperation. Journal of Evolutionary Biology, 19(3):726--733, 2006.

[20]

William E Walsh, Rajarshi Das, Gerald Tesauro, and Jeffrey O Kephart. Analyzing complex strategic interactions in multi-agent systems. In AAAI-02 Workshop on Game-Theoretic and Decision-Theoretic Agents, pages 109--118, 2002.

[21]

Michael Wellman. Methods for empirical game-theoretic analysis (extended abstract). In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 1552--1555, 2006.

Digital Library

[22]

M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning (ICML), pages 157--163, 1994.

Digital Library

[23]

Ann Nowé, Peter Vrancx, and Yann-Michaël De Hauwere. Game theory and multiagent reinforcement learning. In Marco Wiering and Martijn van Otterlo, editors, Reinforcement Learning: State-of-the-Art, chapter 14. Springer, 2012.

[24]

Max Kleiman-Weiner, M K Ho, J L Austerweil, Michael L Littman, and Josh B Tenenbaum. Coordinate to cooperate or compete: abstract goals and joint intentions in social interaction. In Proceedings of the 38th Annual Conference of the Cognitive Science Society, 2016.

[25]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529--533, 2015.

[26]

Y. Shoham, R. Powers, and T. Grenager. If multi-agent learning is the answer, what is the question? Artificial Intelligence, 171(7):365--377, 2007.

Digital Library

[27]

M. G. Lagoudakis and R. Parr. Value function approximation in zero-sum Markov games. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI), pages 283--292, 2002.

Digital Library

[28]

J. Pérolat, B. Scherrer, B. Piot, and O. Pietquin. Approximate dynamic programming for two-player zero-sum Markov games. In Proceedings of the International Conference on Machine Learning (ICML), 2015.

Digital Library

[29]

J. Pérolat, B. Piot, M. Geist, B. Scherrer, and O. Pietquin. Softened approximate policy iteration for Markov games. In Proceedings of the International Conference on Machine Learning (ICML), 2016.

Digital Library

[30]

Branislav Bo\vsanský, Viliam Lisý, Marc Lanctot, JirívCermák, and Mark H.M. Winands. Algorithms for computing strategies in two-player simultaneous move games. Artificial Intelligence, 237:1--40, 2016.

Digital Library

[31]

M. Zinkevich, A. Greenwald, and M. Littman. Cyclic equilibria in Markov games. In Neural Information Processing Systems, 2006.

Digital Library

[32]

J. Hu and M. P. Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the 15th International Conference on Machine Learning (ICML), pages 242--250, 1998.

Digital Library

[33]

A. Greenwald and K. Hall. Correlated-Q learning. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 242--249, 2003.

Digital Library

[34]

Michael Littman. Friend-or-foe Q-learning in general-sum games. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 322--328, 2001.

Digital Library

[35]

J. Pérolat, B. Piot, B. Scherrer, and O. Pietquin. On the use of non-stationary strategies for solving two-player zero-sum Markov games. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016.

[36]

Piotr J Gmytrasiewicz and Prashant Doshi. A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research, 24:49--79, 2005.

[37]

Pradeep Varakantham, Jun-young Kwak, Matthew E Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. Exploiting coordination locales in distributed POMDPs via social model shaping. In Proceedings of the 19th International Conference on Automated Planning and Scheduling, ICAPS, 2009.

Digital Library

[38]

Raphen Becker, Shlomo Zilberstein, Victor Lesser, and Claudia V Goldman. Solving transition independent decentralized Markov decision processes. Journal of Artificial Intelligence Research, 22:423--455, 2004.

[39]

Guillaume J. Laurent, La\"etitia Matignon, and N. Le Fort-Piat. The world of independent learners is not Markovian. Int. J. Know.-Based Intell. Eng. Syst., 15(1):55--64, 2011.

Digital Library

[40]

David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529:484--489, 2016.

[41]

W. Schultz, P. Dayan, and P.R. Montague. A neural substrate of prediction and reward. Science, 275(5306):1593--1599, 1997.

[42]

Y. Niv. Reinforcement learning in the brain. The Journal of Mathematical Psychology, 53(3):139--154, 2009.

[43]

Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, 1998.

Digital Library

[44]

Michael L Littman. Reinforcement learning improves behaviour from evaluative feedback. Nature, 521(7553):445--451, 2015.

[45]

Sascha Lange, Thomas Gabel, and Martin Riedmiller. Batch reinforcement learning. In Reinforcement learning, pages 45--73. Springer, 2012.

[46]

Katherine V Kortenkamp and Colleen F Moore. Time, uncertainty, and individual differences in decisions to cooperate in resource dilemmas. Personality and Social Psychology Bulletin, 32(5):603--615, 2006.

[47]

Craig D Parks and Lorne G Hulbert. High and low trusters' responses to fear in a payoff matrix. Journal of Conflict Resolution, 39(4):718--730, 1995.

[48]

Hui Bing Tan and Joseph P Forgas. When happiness makes us selfish, but sadness makes us fair: Affective influences on interpersonal strategies in the dictator game. Journal of Experimental Social Psychology, 46(3):571--576, 2010.

[49]

Joseph L. Austerweil, Stephen Brawner, Amy Greenwald, Elizabeth Hilliard, Mark Ho, Michael L. Littman, James MacGlashan, and Carl Trimbach. How other-regarding preferences can promote cooperation in non-zero-sum grid games. In Proceedings of the AAAI Symposium on Challenges and Opportunities in Multiagent Learning for the Real World, 2016.

[50]

Nathaniel D Daw, Yael Niv, and Peter Dayan. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience, 8(12):1704--1711, 2005.

[51]

Thomas C. Schelling. Micromotives and macrobehavior. WW Norton & Company, 1978 Rev. 2006.

Cited By

Smit JSantos FDastani MSichman JAlechina NDignum V(2024)Fairness and Cooperation between Independent Reinforcement Learners through Indirect ReciprocityProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663196(2468-2470)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663196
Roesch SLeonardos SDu YDastani MSichman JAlechina NDignum V(2024)The Selfishness Level of Social DilemmasProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663187(2441-2443)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663187
Rachum RNakar YTomlinson BAlon NMirsky RDastani MSichman JAlechina NDignum V(2024)Emergent Dominance Hierarchies in Reinforcement Learning AgentsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663182(2426-2428)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663182
Show More Cited By

Index Terms

Multi-agent Reinforcement Learning in Sequential Social Dilemmas
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Stochastic games
  2. Modeling and simulation
    1. Simulation types and techniques
      1. Agent / discrete models
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Multi-agent learning
      2. Reinforcement learning
        Multi-agent reinforcement learning

Recommendations

Greed and fear in multiperson social dilemmas

We consider greed and fear in social dilemmas, represented by multiplayer games with two strategies, cooperation and defection. The dilemmas are defined by relevant axioms. The N-person Prisoners Dilemma, Public Goods, Tragedy of the Commons, Volunteers ...
Tunable Behaviours in Sequential Social Dilemmas using Multi-Objective Reinforcement Learning
AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems

In this study, we leverage multi-objective reinforcement learning to create tunable agents, i.e., agents that can adopt a range of different behaviours according to the designer's preferences, without the need for retraining. We apply this technique to ...
Egalitarian solutions to multiperson social dilemmas in populations

We consider a class of multiperson social dilemma games played in large populations. In particular, the popular games, such as for example the N-person Prisoner's Dilemma, the Public Goods, the Tragedy of the Commons, the Volunteer's Dilemma, and the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

May 2017

1914 pages

General Chairs:
Kate Larson
University of Waterloo, Canada
,
Michael Winikoff
University of Otago, New Zealand
,
Program Chairs:
Sanmay Das
Washington University in St. Louis, USA
,
Edmund Durfee
University of Michigan, USA

Sponsors

IFAAMAS

In-Cooperation

ACM: Association for Computing Machinery

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2017

Check for updates

Author Tags

Qualifiers

Research-article

Acceptance Rates

AAMAS '17 Paper Acceptance Rate 127 of 457 submissions, 28%;

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

56
Total Citations
View Citations
1,268
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)9

Reflects downloads up to 26 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Smit JSantos FDastani MSichman JAlechina NDignum V(2024)Fairness and Cooperation between Independent Reinforcement Learners through Indirect ReciprocityProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663196(2468-2470)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663196
Roesch SLeonardos SDu YDastani MSichman JAlechina NDignum V(2024)The Selfishness Level of Social DilemmasProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663187(2441-2443)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663187
Rachum RNakar YTomlinson BAlon NMirsky RDastani MSichman JAlechina NDignum V(2024)Emergent Dominance Hierarchies in Reinforcement Learning AgentsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663182(2426-2428)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663182
Orzan NAcar EGrossi DRădulescu RDastani MSichman JAlechina NDignum V(2024)Emergent Cooperation under Uncertain Incentive AlignmentProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663012(1521-1530)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663012
Gerstgrasser MDanino TKeren SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Selectively sharing experiences improves multi-agent reinforcement learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668724(59543-59565)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668724
Lin YLi WZha HWang BOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Information design in multi-agent reinforcement learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667235(25584-25597)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667235
Guresti BVanlioglu AUre NAgmon NAn BRicci AYeoh W(2023)IQ-Flow: Mechanism Design for Inducing Cooperative Behavior to Self-Interested Agents in Sequential Social DilemmasProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598888(2143-2151)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.5555/3545946.3598888
Kwiatkowski AKalogeiton VPettré JCani M(2023)Reward Function Design for Crowd Simulation via Reinforcement LearningProceedings of the 16th ACM SIGGRAPH Conference on Motion, Interaction and Games10.1145/3623264.3624452(1-7)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3623264.3624452
Hamon GNisioti EMoulin-Frier CSilva SPaquete L(2023)Eco-evolutionary Dynamics of Non-episodic Neuroevolution in Large Multi-agent EnvironmentsProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3590703(143-146)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1145/3583133.3590703
Kolumbus YNisan NKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)How and why to manipulate your own agentProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602306(28080-28094)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602306
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents