Article

Free access

Planning, learning and coordination in multiagent decision processes

Author:

Craig BoutilierAuthors Info & Claims

TARK '96: Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge

Pages 195 - 210

Published: 17 March 1996 Publication History

Abstract

There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from single-agent planning and learning can be applied in multiagent settings. We survey a number of different techniques from decision-theoretic planning and reinforcement learning and describe a number of interesting issues that arise with regard to coordinating the policies of individual agents. To this end, we describe <i>multiagent Markov decision processes</i> as a general model in which to frame this discussion. These are special <i>n-</i>person cooperative games in which agents share the same utility function. We discuss coordination mechanisms based on imposed conventions (or social laws) as well as learning methods for coordination. Our focus is on the decomposition of sequential decision processes so that coordination can be learned (or imposed) locally, at the level of individual states. We also discuss the use of structured problem representations and their role in the generalization of learned conventions and in approximation.

References

[1]

K. J. Astrom. Optimal control of Markov decision processes with incomplete state estimation. J. Math. Anal. Appl., 10:174--205, 1965.

[2]

Robert Axelrod. The Evolution of Cooperation. Basic Books, New York, 1984.

[3]

A. G. Barto, S. J. Bradtke, and S. P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1-2):81--138, 1995.

Digital Library

[4]

Richard E. Bellman. Dynamic Programming. Princeton University Press, Princeton, 1957.

Digital Library

[5]

D. P. Bertsekas and D. A. Castanon. Adaptive aggregation for infinite hofizon dynamic programming. IEEE Transactions on Automatic Control, 34:589--598, 1989.

[6]

Craig Boutilier, Thomas Dean, and Steve Hanks. Planning under uncertainty: Structural assumptions and computational leverage. In Proceedings of the Third European Workshop on Planning, Assisi, Italy, 1995.

[7]

Craig BoutJlier and Richard Dearden. Using abstractions for decision-theoretic planning with time constraints. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 1016--1022, Seattle, 1994. (Extended version to appear, Artificial Intelligence.)

Digital Library

[8]

Craig Boutilier and Richard Dearden. Approximating value trees in structured dynamic programming. (manuscript), 1996.

[9]

Craig Boutilier, Richard Dearden, and Moisés Goldszmidt. Exploiting structure in policy construction. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1104--1111, Montreal, 1995.

Digital Library

[10]

Craig Boutilier and David Poole. Computing optimal policies for partially observable decision processes using compact representations. (manuscript), 1996.

[11]

Craig Boufilier and Martin L. Puterman. Process-oriented planning and average-reward optimality. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1096--1103, Montreal, 1995.

Digital Library

[12]

Justin A. Boyan and Andrew W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7. MIT Press, Cambridge, 1995.

[13]

Anthony R. Cassandra, Leslie Pack Kaelbling, and Michael L. Littman. Acting optimally in partially observable stochastic domains. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 1023--1028, Seattle, 1994.

Digital Library

[14]

David Chapman and Leslie Pack Kaelbling. Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pages 726--731, Sydney, 1991.

Digital Library

[15]

Peter Dayan. The convergence of TD(λ) for general λ. Machine Learning, 8:341--362, 1992.

Digital Library

[16]

Peter Dayan and Geoffrey E. Hinton. Feudal reinforcement learning. In Advances in Neural Information Processing Systems 5. Morgan-Kaufmann, San Mateo, 1993.

Digital Library

[17]

Thomas Dean, Leslie Pack Kaelbling, Jak Kirman, and Ann Nicholson. Planning with deadlines in stochastic domains. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 574--579, Washington, D.C., 1993.

Digital Library

[18]

Thomas Dean and Keiji Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(3):142--150, 1989.

Digital Library

[19]

Thomas Dean and Shieu-Hong Lin. Decomposition techniques for planning in stochastic domains. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1121--1127, Montreal, 1995.

Digital Library

[20]

Thomas Dean and Michael Wellman. Planning and Control. Morgan Kaufmann, San Mateo, 1991.

Digital Library

[21]

Richard Dearden and Craig Boutilier. Integrating planning and execution in stochastic domains. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pages 162--169, Seattle, 1994.

Digital Library

[22]

Thomas G. Dietterich and Nicholas S. Flann. Explanation-based learning and reinforcement learning: A unified approach. In Proceedings of the Twelfth International Conference Machine Learning, pages 176--184, Lake Tahoe, 1995.

[23]

Eithan Ephrati and Jeffrey S. Rosenschein. Divide and conquer in multiagent planning. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 375--380, Seattle, 1994.

Digital Library

[24]

Drew Fudenberg and David K. Levine. Steady state learning and nash equilibrium. Econometrica, 61(3):547--573, 1993.

[25]

John C. Harsanyi and Reinhard Selten. A General Theory of Equilibrium Selection in Games. MIT Press, Cambridge, 1988.

[26]

Ronald A. Howard. Dynamic Programming and Markov Processes. MIT Press, Cambridge, 1960.

[27]

Leslie Pack Kaelbling. Learning in Embedded Systems. MIT Press, Cambridge, 1993.

Digital Library

[28]

Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning: A survey. (to appear), 1996.

[29]

Ehud Kalai and Ehud Lehrer. Rational learning leads to nash equilibrium. Econometrica, 61(5):1019--1045, 1993.

[30]

D. Kreps and R. Wilson. Sequential equilibria. Econometrica, 50:863--894, 1982.

[31]

David K. Lewis. Conventions, A Philosophical Study. Harvard University Press, Cambridge, 1969.

[32]

Michael L. Littman. Markov games as a framework for multiagent reinforcement learning. In Porceedings of the Eleventh International Conference on Machine Learning, pages 157--163, New Brunswick, NJ, 1994.

Digital Library

[33]

Sridhar Mahadevan. To discount or not to discount in reinforcement learning: A case study in comparing R-learning and Q-learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 164--172, New Brunswick, NJ, 1994.

[34]

Maja Mataric. Reward functions for accelerated learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 181--189, New Brunswick, NJ, 1994.

Digital Library

[35]

George Mailath Michihiro Kandori and Rafael Rob. Learning, mutation and long run equilibria in games. Econometrica, 61(1):29--56, 1993.

[36]

Andrew W. Moore. Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. In Proceedings of the Eighth International Conference on Machine Learning, pages 333--337, Evanston, IL, 1991.

[37]

Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping---reinforcement learning with less data and less time. Machine Learning, 13:103--130, 1993.

Digital Library

[38]

Andrew W. Moore and Christopher G. Atkeson. The partigame algorithm for variable resolution reinforcement learning in multidimensional state spaces. Machine Learning, 1995. To appear.

Digital Library

[39]

Roger B. Myerson. Game Theory: Analysis of Conflict. Harvard University Press, Cambridge, 1991.

[40]

Radford M. Neal. Probabilistic inference methods using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, Toronto, 1993.

[41]

Guillermo Owen. Game Theory. Academic Press, New York, 1982.

[42]

Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York, 1994.

Digital Library

[43]

Martin L. Puterman and M. C. Shin. Modified policy iteration algorithms for discounted Markov decision problems. Management Science, 24:1127--1137, 1978.

Digital Library

[44]

Paul L. Schweitzer, Martin L. Puterman, and Kyle W. Kindle. Iterative aggregation-disaggregation procedures for discounted semi-Markov reward processes. Operations Research, 33:589--605, 1985.

Digital Library

[45]

Sandip Sen and Mahendra Sekaran. Multiagent coordination with learning classifier systems. In Proceedings of the IJCAI Workshop on Adaptation and Learning in Multiagent Systems, pages 84--89, Montreal, 1995.

Digital Library

[46]

Sandip Sen, Mahendra Sekaran, and John Hale. Learning to coordinate without sharing information. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 426--431, Seattle, 1994.

Digital Library

[47]

Yoav Shoham and Moshe Tennenholtz. Emergent conventions in multi-agent systems: Initial experimental results and observations. In Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning, pages 225--231, Cambridge, 1992.

[48]

Yoav Shoham and Moshe Tennenholtz. On the synthesis of useful social laws for artificial agent societies. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 276--281, San Jose, 1992.

Digital Library

[49]

Reid Simmons and Sven Koenig. Probabilistic robot navigation in partially observable environments. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1080--1087, Montreal, 1995.

Digital Library

[50]

Satinder P. Singh, Tommi Jaakkola, and Michael I. Jordan. Reinforcement learning with soft state aggregation. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 7. Morgan-Kaufmann, San Mateo, 1994.

[51]

Richard D. Smallwood and Edward J. Sondik. The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21:1071--1088, 1973.

Digital Library

[52]

Edward J. Sondik. The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations Research, 26:282--304, 1978.

Digital Library

[53]

Richard S. Sutton. Learning to predict by the method of temporal differences. Machine Learning, 3:9--44, 1988.

[54]

Richard S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, pages 216--224, Austin, 1990.

Digital Library

[55]

Jonathan Tash and Stuart Russell. Control strategies for a stochastic planner. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 1079--1085, Seattle, 1994.

Digital Library

[56]

Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8:279--292, 1992.

Digital Library

[57]

Gerhard Weiß. Learning to coordinate actions in multi-agent systems. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pages 311--316, Chambery, FR, 1993.

Digital Library

[58]

Holly Yanco and Lynn Andrea Stein. An adaptive communication protocol for cooperating mobile robots. In J. A. Meyer, H. L. Roitblat, and S. W. Wilson, editors, From Animals to Animats: Proceedings of the Second International Conference on the Simulation of Adaptive Behavior, pages 478--485. MIT Press, Cambridge, 1993.

Digital Library

[59]

H. Peyton Young. The evolution of conventions. Econometrica, 61(1):57--84, 1993.

Cited By

Triantafyllou SSukovic AMandal DRadanovic GSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Agent-specific effectsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694055(48578-48607)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694055
Bianchi FZorzi ECastellini ASimão TSpaan MFarinelli ASalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Scalable safe policy improvement for factored multi-agent MDPsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692229(3952-3973)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692229
Faruq FLacerda BHawes NParker D(2024)A Framework for Simultaneous Task Allocation and Planning under UncertaintyACM Transactions on Autonomous and Adaptive Systems10.1145/366549919:4(1-30)Online publication date: 28-May-2024
https://dl.acm.org/doi/10.1145/3665499
Show More Cited By

Planning, learning and coordination in multiagent decision processes
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
2. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic representations

Recommendations

Multiagent Reinforcement Social Learning toward Coordination in Cooperative Multiagent Systems

Most previous works on coordination in cooperative multiagent systems study the problem of how two (or more) players can coordinate on Pareto-optimal Nash equilibrium(s) through fixed and repeated interactions in the context of cooperative games. ...
Modeling plan coordination in multiagent decision processes
AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems

In multiagent planning, it is often convenient to view a problem as two subproblems: agent local planning and coordination. Thus, we can classify agent activities into two categories: agent local problem solving activities and coordination activities, ...
Coordination in multiagent reinforcement learning systems by virtual reinforcement signals

This paper presents a novel method for on-line coordination in multiagent reinforcement learning systems. In this method a reinforcement-learning agent learns to select its action estimating system dynamics in terms of both the natural reward for task ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

TARK '96: Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge

March 1996

306 pages

ISBN:1558604179

Editor:
Yoav Shoham
Stanford University, Stanford, California

Publisher

Morgan Kaufmann Publishers Inc.

San Francisco, CA, United States

Publication History

Published: 17 March 1996

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 61 of 177 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

68
Total Citations
View Citations
1,461
Total Downloads

Downloads (Last 12 months)107
Downloads (Last 6 weeks)17

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Triantafyllou SSukovic AMandal DRadanovic GSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Agent-specific effectsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694055(48578-48607)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694055
Bianchi FZorzi ECastellini ASimão TSpaan MFarinelli ASalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Scalable safe policy improvement for factored multi-agent MDPsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692229(3952-3973)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692229
Faruq FLacerda BHawes NParker D(2024)A Framework for Simultaneous Task Allocation and Planning under UncertaintyACM Transactions on Autonomous and Adaptive Systems10.1145/366549919:4(1-30)Online publication date: 28-May-2024
https://dl.acm.org/doi/10.1145/3665499
Sarkar BShih ASadigh DOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Diverse conventions for human-AI collaborationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667125(23115-23139)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667125
Wu ZYu CChen CHao JZhuo HWilliams BChen YNeville J(2023)Models as agentsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i9.26241(10435-10443)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i9.26241
Zhao RSong JYuan YHu HGao YWu YSun ZYang WWilliams BChen YNeville J(2023)Maximum entropy population-based training for zero-shot human-AI coordinationProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i5.25758(6145-6153)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i5.25758
Gunarathna UXie HTanin EKarunasekera SBorovica-Gajic R(2023)Real-time Road Network Optimization with Coordinated Reinforcement LearningACM Transactions on Intelligent Systems and Technology10.1145/360337914:4(1-30)Online publication date: 21-Jul-2023
https://dl.acm.org/doi/10.1145/3603379
Brero GMibuari ELepore NParkes DKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Learning to mitigate AI collusion on economic platformsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3603016(37892-37904)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3603016
Xue KXu JYuan LLi MQian CZhang ZYu YKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Multi-agent dynamic algorithm configurationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601735(20147-20161)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601735
Street CLacerda BStaniaszek MMühlig MHawes NPelachaud CTaylor MFaliszewski PMascardi V(2022)Context-Aware Modelling for Multi-Robot Systems Under UncertaintyProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535987(1228-1236)Online publication date: 9-May-2022
https://dl.acm.org/doi/10.5555/3535850.3535987
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten