Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1029693.1029710guideproceedingsArticle/Chapter ViewAbstractPublication PagestarkConference Proceedingsconference-collections
Article
Free access

Planning, learning and coordination in multiagent decision processes

Published: 17 March 1996 Publication History

Abstract

There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from single-agent planning and learning can be applied in multiagent settings. We survey a number of different techniques from decision-theoretic planning and reinforcement learning and describe a number of interesting issues that arise with regard to coordinating the policies of individual agents. To this end, we describe <i>multiagent Markov decision processes</i> as a general model in which to frame this discussion. These are special <i>n-</i>person cooperative games in which agents share the same utility function. We discuss coordination mechanisms based on imposed conventions (or social laws) as well as learning methods for coordination. Our focus is on the decomposition of sequential decision processes so that coordination can be learned (or imposed) locally, at the level of individual states. We also discuss the use of structured problem representations and their role in the generalization of learned conventions and in approximation.

References

[1]
K. J. Astrom. Optimal control of Markov decision processes with incomplete state estimation. J. Math. Anal. Appl., 10:174--205, 1965.
[2]
Robert Axelrod. The Evolution of Cooperation. Basic Books, New York, 1984.
[3]
A. G. Barto, S. J. Bradtke, and S. P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1-2):81--138, 1995.
[4]
Richard E. Bellman. Dynamic Programming. Princeton University Press, Princeton, 1957.
[5]
D. P. Bertsekas and D. A. Castanon. Adaptive aggregation for infinite hofizon dynamic programming. IEEE Transactions on Automatic Control, 34:589--598, 1989.
[6]
Craig Boutilier, Thomas Dean, and Steve Hanks. Planning under uncertainty: Structural assumptions and computational leverage. In Proceedings of the Third European Workshop on Planning, Assisi, Italy, 1995.
[7]
Craig BoutJlier and Richard Dearden. Using abstractions for decision-theoretic planning with time constraints. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 1016--1022, Seattle, 1994. (Extended version to appear, Artificial Intelligence.)
[8]
Craig Boutilier and Richard Dearden. Approximating value trees in structured dynamic programming. (manuscript), 1996.
[9]
Craig Boutilier, Richard Dearden, and Moisés Goldszmidt. Exploiting structure in policy construction. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1104--1111, Montreal, 1995.
[10]
Craig Boutilier and David Poole. Computing optimal policies for partially observable decision processes using compact representations. (manuscript), 1996.
[11]
Craig Boufilier and Martin L. Puterman. Process-oriented planning and average-reward optimality. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1096--1103, Montreal, 1995.
[12]
Justin A. Boyan and Andrew W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7. MIT Press, Cambridge, 1995.
[13]
Anthony R. Cassandra, Leslie Pack Kaelbling, and Michael L. Littman. Acting optimally in partially observable stochastic domains. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 1023--1028, Seattle, 1994.
[14]
David Chapman and Leslie Pack Kaelbling. Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pages 726--731, Sydney, 1991.
[15]
Peter Dayan. The convergence of TD(λ) for general λ. Machine Learning, 8:341--362, 1992.
[16]
Peter Dayan and Geoffrey E. Hinton. Feudal reinforcement learning. In Advances in Neural Information Processing Systems 5. Morgan-Kaufmann, San Mateo, 1993.
[17]
Thomas Dean, Leslie Pack Kaelbling, Jak Kirman, and Ann Nicholson. Planning with deadlines in stochastic domains. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 574--579, Washington, D.C., 1993.
[18]
Thomas Dean and Keiji Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(3):142--150, 1989.
[19]
Thomas Dean and Shieu-Hong Lin. Decomposition techniques for planning in stochastic domains. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1121--1127, Montreal, 1995.
[20]
Thomas Dean and Michael Wellman. Planning and Control. Morgan Kaufmann, San Mateo, 1991.
[21]
Richard Dearden and Craig Boutilier. Integrating planning and execution in stochastic domains. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pages 162--169, Seattle, 1994.
[22]
Thomas G. Dietterich and Nicholas S. Flann. Explanation-based learning and reinforcement learning: A unified approach. In Proceedings of the Twelfth International Conference Machine Learning, pages 176--184, Lake Tahoe, 1995.
[23]
Eithan Ephrati and Jeffrey S. Rosenschein. Divide and conquer in multiagent planning. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 375--380, Seattle, 1994.
[24]
Drew Fudenberg and David K. Levine. Steady state learning and nash equilibrium. Econometrica, 61(3):547--573, 1993.
[25]
John C. Harsanyi and Reinhard Selten. A General Theory of Equilibrium Selection in Games. MIT Press, Cambridge, 1988.
[26]
Ronald A. Howard. Dynamic Programming and Markov Processes. MIT Press, Cambridge, 1960.
[27]
Leslie Pack Kaelbling. Learning in Embedded Systems. MIT Press, Cambridge, 1993.
[28]
Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning: A survey. (to appear), 1996.
[29]
Ehud Kalai and Ehud Lehrer. Rational learning leads to nash equilibrium. Econometrica, 61(5):1019--1045, 1993.
[30]
D. Kreps and R. Wilson. Sequential equilibria. Econometrica, 50:863--894, 1982.
[31]
David K. Lewis. Conventions, A Philosophical Study. Harvard University Press, Cambridge, 1969.
[32]
Michael L. Littman. Markov games as a framework for multiagent reinforcement learning. In Porceedings of the Eleventh International Conference on Machine Learning, pages 157--163, New Brunswick, NJ, 1994.
[33]
Sridhar Mahadevan. To discount or not to discount in reinforcement learning: A case study in comparing R-learning and Q-learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 164--172, New Brunswick, NJ, 1994.
[34]
Maja Mataric. Reward functions for accelerated learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 181--189, New Brunswick, NJ, 1994.
[35]
George Mailath Michihiro Kandori and Rafael Rob. Learning, mutation and long run equilibria in games. Econometrica, 61(1):29--56, 1993.
[36]
Andrew W. Moore. Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. In Proceedings of the Eighth International Conference on Machine Learning, pages 333--337, Evanston, IL, 1991.
[37]
Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping---reinforcement learning with less data and less time. Machine Learning, 13:103--130, 1993.
[38]
Andrew W. Moore and Christopher G. Atkeson. The partigame algorithm for variable resolution reinforcement learning in multidimensional state spaces. Machine Learning, 1995. To appear.
[39]
Roger B. Myerson. Game Theory: Analysis of Conflict. Harvard University Press, Cambridge, 1991.
[40]
Radford M. Neal. Probabilistic inference methods using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, Toronto, 1993.
[41]
Guillermo Owen. Game Theory. Academic Press, New York, 1982.
[42]
Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York, 1994.
[43]
Martin L. Puterman and M. C. Shin. Modified policy iteration algorithms for discounted Markov decision problems. Management Science, 24:1127--1137, 1978.
[44]
Paul L. Schweitzer, Martin L. Puterman, and Kyle W. Kindle. Iterative aggregation-disaggregation procedures for discounted semi-Markov reward processes. Operations Research, 33:589--605, 1985.
[45]
Sandip Sen and Mahendra Sekaran. Multiagent coordination with learning classifier systems. In Proceedings of the IJCAI Workshop on Adaptation and Learning in Multiagent Systems, pages 84--89, Montreal, 1995.
[46]
Sandip Sen, Mahendra Sekaran, and John Hale. Learning to coordinate without sharing information. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 426--431, Seattle, 1994.
[47]
Yoav Shoham and Moshe Tennenholtz. Emergent conventions in multi-agent systems: Initial experimental results and observations. In Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning, pages 225--231, Cambridge, 1992.
[48]
Yoav Shoham and Moshe Tennenholtz. On the synthesis of useful social laws for artificial agent societies. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 276--281, San Jose, 1992.
[49]
Reid Simmons and Sven Koenig. Probabilistic robot navigation in partially observable environments. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1080--1087, Montreal, 1995.
[50]
Satinder P. Singh, Tommi Jaakkola, and Michael I. Jordan. Reinforcement learning with soft state aggregation. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 7. Morgan-Kaufmann, San Mateo, 1994.
[51]
Richard D. Smallwood and Edward J. Sondik. The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21:1071--1088, 1973.
[52]
Edward J. Sondik. The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations Research, 26:282--304, 1978.
[53]
Richard S. Sutton. Learning to predict by the method of temporal differences. Machine Learning, 3:9--44, 1988.
[54]
Richard S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, pages 216--224, Austin, 1990.
[55]
Jonathan Tash and Stuart Russell. Control strategies for a stochastic planner. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 1079--1085, Seattle, 1994.
[56]
Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8:279--292, 1992.
[57]
Gerhard Weiß. Learning to coordinate actions in multi-agent systems. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pages 311--316, Chambery, FR, 1993.
[58]
Holly Yanco and Lynn Andrea Stein. An adaptive communication protocol for cooperating mobile robots. In J. A. Meyer, H. L. Roitblat, and S. W. Wilson, editors, From Animals to Animats: Proceedings of the Second International Conference on the Simulation of Adaptive Behavior, pages 478--485. MIT Press, Cambridge, 1993.
[59]
H. Peyton Young. The evolution of conventions. Econometrica, 61(1):57--84, 1993.

Cited By

View all
  • (2024)Agent-specific effectsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694055(48578-48607)Online publication date: 21-Jul-2024
  • (2024)Scalable safe policy improvement for factored multi-agent MDPsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692229(3952-3973)Online publication date: 21-Jul-2024
  • (2024)A Framework for Simultaneous Task Allocation and Planning under UncertaintyACM Transactions on Autonomous and Adaptive Systems10.1145/366549919:4(1-30)Online publication date: 28-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
TARK '96: Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge
March 1996
306 pages
ISBN:1558604179
  • Editor:
  • Yoav Shoham

Publisher

Morgan Kaufmann Publishers Inc.

San Francisco, CA, United States

Publication History

Published: 17 March 1996

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 61 of 177 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)107
  • Downloads (Last 6 weeks)17
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Agent-specific effectsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694055(48578-48607)Online publication date: 21-Jul-2024
  • (2024)Scalable safe policy improvement for factored multi-agent MDPsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692229(3952-3973)Online publication date: 21-Jul-2024
  • (2024)A Framework for Simultaneous Task Allocation and Planning under UncertaintyACM Transactions on Autonomous and Adaptive Systems10.1145/366549919:4(1-30)Online publication date: 28-May-2024
  • (2023)Diverse conventions for human-AI collaborationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667125(23115-23139)Online publication date: 10-Dec-2023
  • (2023)Models as agentsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i9.26241(10435-10443)Online publication date: 7-Feb-2023
  • (2023)Maximum entropy population-based training for zero-shot human-AI coordinationProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i5.25758(6145-6153)Online publication date: 7-Feb-2023
  • (2023)Real-time Road Network Optimization with Coordinated Reinforcement LearningACM Transactions on Intelligent Systems and Technology10.1145/360337914:4(1-30)Online publication date: 21-Jul-2023
  • (2022)Learning to mitigate AI collusion on economic platformsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3603016(37892-37904)Online publication date: 28-Nov-2022
  • (2022)Multi-agent dynamic algorithm configurationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601735(20147-20161)Online publication date: 28-Nov-2022
  • (2022)Context-Aware Modelling for Multi-Robot Systems Under UncertaintyProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535987(1228-1236)Online publication date: 9-May-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media