Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3545946.3598670acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Get It in Writing: Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL

Published: 30 May 2023 Publication History

Abstract

Multi-agent reinforcement learning (MARL) is a powerful tool for training automated systems acting independently in a common environment. However, it can lead to sub-optimal behavior when individual incentives and group incentives diverge. Humans are remarkably capable at solving these social dilemmas. It is an open problem in MARL to replicate such cooperative behaviors in selfish agents. In this work, we draw upon the idea of formal contracting from economics to overcome diverging incentives between agents in MARL. We propose an augmentation to a Markov game where agents voluntarily agree to binding state-dependent transfers of reward, under pre-specified conditions. Our contributions are theoretical and empirical. First, we show that this augmentation makes all subgame-perfect equilibria of all fully observed Markov games exhibit socially optimal behavior, given a sufficiently rich space of contracts. Next, we complement our game-theoretic analysis with experiments running deep RL on the contracting augmentation for various social dilemmas. We discuss some practical issues with learning in the contracting augmentation, and provide a training methodology that leads to high-welfare outcomes, Multi-Objective Contract Augmentation Learning (MOCA). We test our methodology in static, single-move games, as well as dynamic domains that simulate traffic, pollution management and common pool resource management.

References

[1]
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight experience replay. Advances in neural information processing systems, Vol. 30 (2017).
[2]
Michael Curry, Tuomas Sandholm, and John Dickerson. 2022. Differentiable Economics for Randomized Affine Maximizer Auctions. arXiv preprint arXiv:2202.02872 (2022).
[3]
Dave De Jonge and Dongmo Zhang. 2020. Strategic negotiations for extensive-form games. Autonomous Agents and Multi-Agent Systems, Vol. 34, 1 (2020), 1--41.
[4]
Coline Devin, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, and Sergey Levine. 2017. Learning modular neural network policies for multi-task and multi-robot transfer. In 2017 IEEE international conference on robotics and automation (ICRA). IEEE, 2169--2176.
[5]
Paul Dütting, Zhe Feng, Harikrishna Narasimhan, David C Parkes, and Sai S Ravindranath. 2021. Optimal auctions through deep learning. Commun. ACM, Vol. 64, 8 (2021), 109--116.
[6]
Robert Gibbons. 2005. Four formal (izable) theories of the firm? Journal of Economic Behavior & Organization, Vol. 58, 2 (2005), 200--245.
[7]
Ricard Gil and Giorgio Zanarone. 2017. Formal and informal contracting: Theory and evidence. Annual Review of Law and Social Science, Vol. 13 (2017), 141--159.
[8]
Moritz Hardt, Nimrod Megiddo, Christos Papadimitriou, and Mary Wootters. 2016. Strategic classification. In Proceedings of the 2016 ACM conference on innovations in theoretical computer science. 111--122.
[9]
Bengt Holmström. 1979. Moral hazard and observability. The Bell journal of economics (1979), 74--91.
[10]
Edward Hughes, Thomas W Anthony, Tom Eccles, Joel Z Leibo, David Balduzzi, and Yoram Bachrach. 2020. Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. 538--547.
[11]
Edward Hughes, Joel Z Leibo, Matthew G Phillips, Karl Tuyls, Edgar A Dué nez-Guzmán, Antonio Garc'ia Casta neda, Iain Dunning, Tina Zhu, Kevin R McKee, Raphael Koster, et al. 2018. Inequity aversion improves cooperation in intertemporal social dilemmas. arXiv preprint arXiv:1803.08884 (2018).
[12]
Leonid Hurwicz. 1973. The design of mechanisms for resource allocation. The American Economic Review, Vol. 63, 2 (1973), 1--30.
[13]
Marco Janssen and TK Ahn. 2003. Adaptation vs. anticipation in public-good games. In annual meeting of the American Political Science Association, Philadelphia, PA.
[14]
Raphael Köster, Dylan Hadfield-Menell, Richard Everett, Laura Weidinger, Gillian K Hadfield, and Joel Z Leibo. 2022. Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents. Proceedings of the National Academy of Sciences, Vol. 119, 3 (2022).
[15]
Raphael Köster, Kevin R McKee, Richard Everett, Laura Weidinger, William S Isaac, Edward Hughes, Edgar A Dué nez-Guzmán, Thore Graepel, Matthew Botvinick, and Joel Z Leibo. 2020. Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences. arXiv preprint arXiv:2010.09054 (2020).
[16]
János Kramár, Tom Eccles, Ian Gemp, Andrea Tacchetti, Kevin R McKee, Mateusz Malinowski, Thore Graepel, and Yoram Bachrach. 2022. Negotiation and honesty in artificial intelligence methods for the board game of Diplomacy. Nature Communications, Vol. 13, 1 (2022), 7214.
[17]
Joel Z Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. 2017. Multi-agent reinforcement learning in sequential social dilemmas. arXiv preprint arXiv:1702.03037 (2017).
[18]
Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Ken Goldberg, Joseph Gonzalez, Michael Jordan, and Ion Stoica. 2018. RLlib: Abstractions for distributed reinforcement learning. In International Conference on Machine Learning. PMLR, 3053--3062.
[19]
Andrei Lupu and Doina Precup. 2020. Gifting in multi-agent reinforcement learning. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. 789--797.
[20]
Eric Maskin and Jean Tirole. 2001. Markov perfect equilibrium: I. Observable actions. Journal of Economic Theory, Vol. 100, 2 (2001), 191--219.
[21]
Stephen McAleer, John Lanier, Michael Dennis, Pierre Baldi, and Roy Fox. 2021. Improving Social Welfare While Preserving Autonomy via a Pareto Mediator. arXiv preprint arXiv:2106.03927 (2021).
[22]
Smitha Milli, John Miller, Anca D Dragan, and Moritz Hardt. 2019. The social cost of strategic classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 230--239.
[23]
Michael Mussa and Sherwin Rosen. 1978. Monopoly and product quality. Journal of Economic theory, Vol. 18, 2 (1978), 301--317.
[24]
Roger B Myerson. 1981. Optimal auction design. Mathematics of operations research, Vol. 6, 1 (1981), 58--73.
[25]
Martin J Osborne and Ariel Rubinstein. 1994. A course in game theory. MIT press.
[26]
Emilio Parisotto, Jimmy Lei Ba, and Ruslan Salakhutdinov. 2015. Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342 (2015).
[27]
Lu'is Moniz Pereira, Tom Lenaerts, et al. 2017. Evolution of commitment and level of participation in public goods games. Autonomous Agents and Multi-Agent Systems, Vol. 31, 3 (2017), 561--583.
[28]
Jean-Jacques Rousseau. 1985. A discourse on inequality. Penguin.
[29]
Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. 2015. Policy distillation. arXiv preprint arXiv:1511.06295 (2015).
[30]
Mikayel Samvelyan, Tabish Rashid, Christian Schroeder De Witt, Gregory Farquhar, Nantas Nardelli, Tim GJ Rudner, Chia-Man Hung, Philip HS Torr, Jakob Foerster, and Shimon Whiteson. 2019. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043 (2019).
[31]
Tuomas W Sandholm and Victor R Lesser. 1996. Advantages of a leveled commitment contracting protocol. In AAAI/IAAI, Vol. 1. Citeseer, 126--133.
[32]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[33]
Reid G Smith. 1980. The contract net protocol: High-level communication and control in a distributed problem solver. IEEE Transactions on computers, Vol. 29, 12 (1980), 1104--1113.
[34]
Eric Sodomka, Elizabeth Hilliard, Michael Littman, and Amy Greenwald. 2013. Coco-q: Learning in stochastic games with side payments. In International Conference on Machine Learning. PMLR, 1471--1479.
[35]
Yee Teh, Victor Bapst, Wojciech M Czarnecki, John Quan, James Kirkpatrick, Raia Hadsell, Nicolas Heess, and Razvan Pascanu. 2017. Distral: Robust multitask reinforcement learning. Advances in Neural Information Processing Dystems, Vol. 30 (2017).
[36]
Albert W Tucker and Philip D Straffin Jr. 1983. The mathematics of Tucker: A sampler. The Two-Year College Mathematics Journal, Vol. 14, 3 (1983), 228--232.
[37]
William Vickrey. 1961. Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance, Vol. 16, 1 (1961), 8--37.
[38]
Eugene Vinitsky, Raphael Köster, John P Agapiou, Edgar Dué nez-Guzmán, Alexander Sasha Vezhnevets, and Joel Z Leibo. 2021. A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings. arXiv preprint arXiv:2106.09012 (2021).
[39]
Woodrow Z Wang, Mark Beliaev, Erdem Biyik, Daniel A Lazar, Ramtin Pedarsani, and Dorsa Sadigh. 2021. Emergent Prosociality in Multi-Agent Games Through Gifting. arXiv preprint arXiv:2105.06593 (2021).
[40]
Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag, Edward Hughes, and Hongyuan Zha. 2020. Learning to incentivize other learning agents. Advances in Neural Information Processing Systems, Vol. 33 (2020), 15208--15219.
[41]
Tijana Zrnic, Eric Mazumdar, Shankar Sastry, and Michael Jordan. 2021. Who Leads and Who Follows in Strategic Classification? Advances in Neural Information Processing Systems, Vol. 34 (2021), 15257--15269.

Cited By

View all
  • (2024)The Selfishness Level of Social DilemmasProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663187(2441-2443)Online publication date: 6-May-2024
  • (2023)Similarity-based cooperative equilibriumProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667185(24434-24465)Online publication date: 10-Dec-2023

Index Terms

  1. Get It in Writing: Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems
      May 2023
      3131 pages
      ISBN:9781450394321
      • General Chairs:
      • Noa Agmon,
      • Bo An,
      • Program Chairs:
      • Alessandro Ricci,
      • William Yeoh

      Sponsors

      Publisher

      International Foundation for Autonomous Agents and Multiagent Systems

      Richland, SC

      Publication History

      Published: 30 May 2023

      Check for updates

      Author Tags

      1. decentralized training
      2. formal contracts
      3. social dilemma

      Qualifiers

      • Research-article

      Conference

      AAMAS '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)36
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 23 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)The Selfishness Level of Social DilemmasProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663187(2441-2443)Online publication date: 6-May-2024
      • (2023)Similarity-based cooperative equilibriumProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667185(24434-24465)Online publication date: 10-Dec-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media