Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3091125.3091194acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

Published: 08 May 2017 Publication History

Abstract

Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.

References

[1]
Anatol Rapoport. Prisoner's dilemma--recollections and observations. In Game Theory as a Theory of a Conflict Resolution, pages 17--34. Springer, 1974.
[2]
Paul AM Van Lange, Jeff Joireman, Craig D Parks, and Eric Van Dijk. The psychology of social dilemmas: A review. Organizational Behavior and Human Decision Processes, 120(2):125--141, 2013.
[3]
Michael W Macy and Andreas Flache. Learning dynamics in social dilemmas. Proceedings of the National Academy of Sciences, 99(suppl 3):7229--7236, 2002.
[4]
Robert L. Trivers. The evolution of reciprocal altruism. Quarterly Review of Biology, pages 35--57, 1971.
[5]
Robert Axelrod. The Evolution of Cooperation. Basic Books, 1984.
[6]
Martin A Nowak and Karl Sigmund. Tit for tat in heterogeneous populations. Nature, 355(6357):250--253, 1992.
[7]
Martin Nowak, Karl Sigmund, et al. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner's dilemma game. Nature, 364(6432):56--58, 1993.
[8]
Martin A Nowak and Karl Sigmund. Evolution of indirect reciprocity by image scoring. Nature, 393(6685):573--577, 1998.
[9]
Robert Axelrod. An evolutionary approach to norms. American political science review, 80(04):1095--1111, 1986.
[10]
Samhar Mahmoud, Simon Miles, and Michael Luck. Cooperation emergence under resource-constrained peer punishment. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, pages 900--908. International Foundation for Autonomous Agents and Multiagent Systems, 2016.
[11]
T.W. Sandholm and R.H. Crites. Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems, 37(1--2):147--166, 1996.
[12]
Enrique Munoz de Cote, Alessandro Lazaric, and Marcello Restelli. Learning to cooperate in multi-agent social dilemmas. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2006.
[13]
M. Wunder, M. Littman, and M. Babes. Classes of multiagent Q-learning dynamics with greedy exploration. In Proceedings of the 27th International Conference on Machine Learning, 2010.
[14]
Erik Zawadzki, Asher Lipson, and Kevin Leyton-Brown. Empirically evaluating multiagent learning algorithms. CoRR, abs/1401.8074, 2014.
[15]
Daan Bloembergen, Karl Tuyls, Daniel Hennes, and Michael Kaisers. Evolutionary dynamics of multi-agent learning: A survey. Journal of Artificial Intelligence Research, 53:659--697, 2015.
[16]
Martin A Nowak and Robert M May. Evolutionary games and spatial chaos. Nature, 359(6398):826--829, 1992.
[17]
Chao Yu, Minjie Zhang, Fenghui Ren, and Guozhen Tan. Emotional multiagent reinforcement learning in spatial social dilemmas. IEEE Transactions on Neural Networks and Learning Systems, 26(12):3083--3096, 2015.
[18]
Hisashi Ohtsuki, Christoph Hauert, Erez Lieberman, and Martin A Nowak. A simple rule for the evolution of cooperation on graphs and social networks. Nature, 441(7092):502--505, 2006.
[19]
Francisco C Santos and Jorge M Pacheco. A new route to the evolution of cooperation. Journal of Evolutionary Biology, 19(3):726--733, 2006.
[20]
William E Walsh, Rajarshi Das, Gerald Tesauro, and Jeffrey O Kephart. Analyzing complex strategic interactions in multi-agent systems. In AAAI-02 Workshop on Game-Theoretic and Decision-Theoretic Agents, pages 109--118, 2002.
[21]
Michael Wellman. Methods for empirical game-theoretic analysis (extended abstract). In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 1552--1555, 2006.
[22]
M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning (ICML), pages 157--163, 1994.
[23]
Ann Nowé, Peter Vrancx, and Yann-Michaël De Hauwere. Game theory and multiagent reinforcement learning. In Marco Wiering and Martijn van Otterlo, editors, Reinforcement Learning: State-of-the-Art, chapter 14. Springer, 2012.
[24]
Max Kleiman-Weiner, M K Ho, J L Austerweil, Michael L Littman, and Josh B Tenenbaum. Coordinate to cooperate or compete: abstract goals and joint intentions in social interaction. In Proceedings of the 38th Annual Conference of the Cognitive Science Society, 2016.
[25]
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529--533, 2015.
[26]
Y. Shoham, R. Powers, and T. Grenager. If multi-agent learning is the answer, what is the question? Artificial Intelligence, 171(7):365--377, 2007.
[27]
M. G. Lagoudakis and R. Parr. Value function approximation in zero-sum Markov games. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI), pages 283--292, 2002.
[28]
J. Pérolat, B. Scherrer, B. Piot, and O. Pietquin. Approximate dynamic programming for two-player zero-sum Markov games. In Proceedings of the International Conference on Machine Learning (ICML), 2015.
[29]
J. Pérolat, B. Piot, M. Geist, B. Scherrer, and O. Pietquin. Softened approximate policy iteration for Markov games. In Proceedings of the International Conference on Machine Learning (ICML), 2016.
[30]
Branislav Bo\vsanský, Viliam Lisý, Marc Lanctot, JirívCermák, and Mark H.M. Winands. Algorithms for computing strategies in two-player simultaneous move games. Artificial Intelligence, 237:1--40, 2016.
[31]
M. Zinkevich, A. Greenwald, and M. Littman. Cyclic equilibria in Markov games. In Neural Information Processing Systems, 2006.
[32]
J. Hu and M. P. Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the 15th International Conference on Machine Learning (ICML), pages 242--250, 1998.
[33]
A. Greenwald and K. Hall. Correlated-Q learning. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 242--249, 2003.
[34]
Michael Littman. Friend-or-foe Q-learning in general-sum games. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 322--328, 2001.
[35]
J. Pérolat, B. Piot, B. Scherrer, and O. Pietquin. On the use of non-stationary strategies for solving two-player zero-sum Markov games. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016.
[36]
Piotr J Gmytrasiewicz and Prashant Doshi. A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research, 24:49--79, 2005.
[37]
Pradeep Varakantham, Jun-young Kwak, Matthew E Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. Exploiting coordination locales in distributed POMDPs via social model shaping. In Proceedings of the 19th International Conference on Automated Planning and Scheduling, ICAPS, 2009.
[38]
Raphen Becker, Shlomo Zilberstein, Victor Lesser, and Claudia V Goldman. Solving transition independent decentralized Markov decision processes. Journal of Artificial Intelligence Research, 22:423--455, 2004.
[39]
Guillaume J. Laurent, La\"etitia Matignon, and N. Le Fort-Piat. The world of independent learners is not Markovian. Int. J. Know.-Based Intell. Eng. Syst., 15(1):55--64, 2011.
[40]
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529:484--489, 2016.
[41]
W. Schultz, P. Dayan, and P.R. Montague. A neural substrate of prediction and reward. Science, 275(5306):1593--1599, 1997.
[42]
Y. Niv. Reinforcement learning in the brain. The Journal of Mathematical Psychology, 53(3):139--154, 2009.
[43]
Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, 1998.
[44]
Michael L Littman. Reinforcement learning improves behaviour from evaluative feedback. Nature, 521(7553):445--451, 2015.
[45]
Sascha Lange, Thomas Gabel, and Martin Riedmiller. Batch reinforcement learning. In Reinforcement learning, pages 45--73. Springer, 2012.
[46]
Katherine V Kortenkamp and Colleen F Moore. Time, uncertainty, and individual differences in decisions to cooperate in resource dilemmas. Personality and Social Psychology Bulletin, 32(5):603--615, 2006.
[47]
Craig D Parks and Lorne G Hulbert. High and low trusters' responses to fear in a payoff matrix. Journal of Conflict Resolution, 39(4):718--730, 1995.
[48]
Hui Bing Tan and Joseph P Forgas. When happiness makes us selfish, but sadness makes us fair: Affective influences on interpersonal strategies in the dictator game. Journal of Experimental Social Psychology, 46(3):571--576, 2010.
[49]
Joseph L. Austerweil, Stephen Brawner, Amy Greenwald, Elizabeth Hilliard, Mark Ho, Michael L. Littman, James MacGlashan, and Carl Trimbach. How other-regarding preferences can promote cooperation in non-zero-sum grid games. In Proceedings of the AAAI Symposium on Challenges and Opportunities in Multiagent Learning for the Real World, 2016.
[50]
Nathaniel D Daw, Yael Niv, and Peter Dayan. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience, 8(12):1704--1711, 2005.
[51]
Thomas C. Schelling. Micromotives and macrobehavior. WW Norton & Company, 1978 Rev. 2006.

Cited By

View all
  • (2024)Fairness and Cooperation between Independent Reinforcement Learners through Indirect ReciprocityProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663196(2468-2470)Online publication date: 6-May-2024
  • (2024)The Selfishness Level of Social DilemmasProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663187(2441-2443)Online publication date: 6-May-2024
  • (2024)Emergent Dominance Hierarchies in Reinforcement Learning AgentsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663182(2426-2428)Online publication date: 6-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems
May 2017
1914 pages

Sponsors

  • IFAAMAS

In-Cooperation

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2017

Check for updates

Author Tags

  1. agent-based social simulation
  2. cooperation
  3. markov games
  4. non-cooperative games
  5. social dilemmas

Qualifiers

  • Research-article

Acceptance Rates

AAMAS '17 Paper Acceptance Rate 127 of 457 submissions, 28%;
Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)9
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Fairness and Cooperation between Independent Reinforcement Learners through Indirect ReciprocityProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663196(2468-2470)Online publication date: 6-May-2024
  • (2024)The Selfishness Level of Social DilemmasProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663187(2441-2443)Online publication date: 6-May-2024
  • (2024)Emergent Dominance Hierarchies in Reinforcement Learning AgentsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663182(2426-2428)Online publication date: 6-May-2024
  • (2024)Emergent Cooperation under Uncertain Incentive AlignmentProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663012(1521-1530)Online publication date: 6-May-2024
  • (2023)Selectively sharing experiences improves multi-agent reinforcement learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668724(59543-59565)Online publication date: 10-Dec-2023
  • (2023)Information design in multi-agent reinforcement learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667235(25584-25597)Online publication date: 10-Dec-2023
  • (2023)IQ-Flow: Mechanism Design for Inducing Cooperative Behavior to Self-Interested Agents in Sequential Social DilemmasProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598888(2143-2151)Online publication date: 30-May-2023
  • (2023)Reward Function Design for Crowd Simulation via Reinforcement LearningProceedings of the 16th ACM SIGGRAPH Conference on Motion, Interaction and Games10.1145/3623264.3624452(1-7)Online publication date: 15-Nov-2023
  • (2023)Eco-evolutionary Dynamics of Non-episodic Neuroevolution in Large Multi-agent EnvironmentsProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3590703(143-146)Online publication date: 15-Jul-2023
  • (2022)How and why to manipulate your own agentProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602306(28080-28094)Online publication date: 28-Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media