Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1160633.1160683acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
Article

Winning back the CUP for distributed POMDPs: planning over continuous belief spaces

Published: 08 May 2006 Publication History

Abstract

Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are evolving as a popular approach for modeling multiagent systems, and many different algorithms have been proposed to obtain locally or globally optimal policies. Unfortunately, most of these algorithms have either been explicitly designed or experimentally evaluated assuming knowledge of a starting belief point, an assumption that often does not hold in complex, uncertain domains. Instead, in such domains, it is important for agents to explicitly plan over continuous belief spaces. This paper provides a novel algorithm to explicitly compute finite horizon policies over continuous belief spaces, without restricting the space of policies. By marrying an efficient single-agent POMDP solver with a heuristic distributed POMDP policy-generation algorithm, locally optimal joint policies are obtained, each of which dominates within a different part of the belief region. We provide heuristics that significantly improve the efficiency of the resulting algorithm and provide detailed experimental results. To the best of our knowledge, these are the first run-time results for analytically generating policies over continuous belief spaces in distributed POMDPs.

References

[1]
M. L. Littman A. R. Cassandra and N. L. Zhang. Incremental pruning: A simple, fast, exact method for partially observable markov decision processes. In UAI, 1997.
[2]
R. Becker, S. Zilberstein, V. Lesser, and C. V. Goldman. Transition-independent decentralized Markov decision processes. In AAMAS, 2003.
[3]
D. S. Bernstein, S. Zilberstein, and N. Immerman. The complexity of decentralized control of MDPs. In UAI, 2000.
[4]
I. Chadès, B. Scherrer, and F. Charpillet. A heuristic approach for solving decentralized-pomdp: Assessment on the pursuit problem. In SAC, 2002.
[5]
Z. Feng and S. Zilberstein. Region based incremental pruning for POMDPs. In UAI, 2004.
[6]
Claudia V. Goldman and Shlomo Zilberstein. Optimizing information exchange in cooperative multi-agent systems. In AAMAS, 2003.
[7]
D. Bernstein; E. Hansen; and S. Zilberstein. Bounded policy iteration for decentralized pomdps. In IJCAI, 2005.
[8]
Eric A. Hansen, Daniel S. Bernstein, and Shlomo Zilberstein. Dynamic programming for partially observable stochastic games. In AAAI, 2004.
[9]
L. Kaelbling, M. Littman, and A. Cassandra. Planning and acting in partially observable stochastic domains. AIJ, 101(2), 1998.
[10]
R. E. Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for partially observable stochastic games with common payoffs. In AAMAS, 2004.
[11]
R. Nair, D. Pynadath, M. Yokoo, M. Tambe, and S. Marsella. Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In IJCAI, 2003.
[12]
R. Nair, M. Roth, M. Yokoo, and M. Tambe. Communication for improving policy computation in distributed pomdps. In AAMAS, 2004.
[13]
R. Nair, M. Tambe, and S. Marsella. Role allocation and reallocation in multiagent teams: Towards a practical analysis. In AAMAS, 2003.
[14]
R. Nair, P. Varakantham, M. Tambe, and M. Yokoo. Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In AAAI, 2005.
[15]
M. Tambe P. Varakantham, R. Maheswaran. Exploiting belief bounds: Practical pomdps for personal assistant agents. In AAMAS, 2005.
[16]
L. Peshkin, N. Meuleau, K.-E. Kim, and L. Kaelbling. Learning to cooperate via policy search. In UAI, 2000.
[17]
D. V. Pynadath and M. Tambe. The communicative multiagent team decision problem: Analyzing teamwork theories and models. JAIR, 16:389--423, 2002.
[18]
P. Xuan, V. Lesser, and S. Zilberstein. Communication decisions in multiagent cooperation. In Agents, 2001.

Cited By

View all
  • (2008)Optimal and approximate Q-value functions for decentralized POMDPsJournal of Artificial Intelligence Research10.5555/1622673.162268032:1(289-353)Online publication date: 1-May-2008
  • (2008)Towards faster planning with continuous resources in stochastic domainsProceedings of the 23rd national conference on Artificial intelligence - Volume 210.5555/1620163.1620235(1049-1055)Online publication date: 13-Jul-2008
  • (2006)Point-based dynamic programming for DEC-POMDPsproceedings of the 21st national conference on Artificial intelligence - Volume 210.5555/1597348.1597384(1233-1238)Online publication date: 16-Jul-2006

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
May 2006
1631 pages
ISBN:1595933034
DOI:10.1145/1160633
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. continuous initial beliefs
  2. distributed POMDP
  3. multi-agent systems
  4. partially observable Markov decision process (POMDP)

Qualifiers

  • Article

Conference

AAMAS06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2008)Optimal and approximate Q-value functions for decentralized POMDPsJournal of Artificial Intelligence Research10.5555/1622673.162268032:1(289-353)Online publication date: 1-May-2008
  • (2008)Towards faster planning with continuous resources in stochastic domainsProceedings of the 23rd national conference on Artificial intelligence - Volume 210.5555/1620163.1620235(1049-1055)Online publication date: 13-Jul-2008
  • (2006)Point-based dynamic programming for DEC-POMDPsproceedings of the 21st national conference on Artificial intelligence - Volume 210.5555/1597348.1597384(1233-1238)Online publication date: 16-Jul-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media