Article

Winning back the CUP for distributed POMDPs: planning over continuous belief spaces

Authors:

Pradeep Varakantham,

Makoto YokooAuthors Info & Claims

AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems

Pages 289 - 296

https://doi.org/10.1145/1160633.1160683

Published: 08 May 2006 Publication History

Abstract

Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are evolving as a popular approach for modeling multiagent systems, and many different algorithms have been proposed to obtain locally or globally optimal policies. Unfortunately, most of these algorithms have either been explicitly designed or experimentally evaluated assuming knowledge of a starting belief point, an assumption that often does not hold in complex, uncertain domains. Instead, in such domains, it is important for agents to explicitly plan over continuous belief spaces. This paper provides a novel algorithm to explicitly compute finite horizon policies over continuous belief spaces, without restricting the space of policies. By marrying an efficient single-agent POMDP solver with a heuristic distributed POMDP policy-generation algorithm, locally optimal joint policies are obtained, each of which dominates within a different part of the belief region. We provide heuristics that significantly improve the efficiency of the resulting algorithm and provide detailed experimental results. To the best of our knowledge, these are the first run-time results for analytically generating policies over continuous belief spaces in distributed POMDPs.

References

[1]

M. L. Littman A. R. Cassandra and N. L. Zhang. Incremental pruning: A simple, fast, exact method for partially observable markov decision processes. In UAI, 1997.

Digital Library

[2]

R. Becker, S. Zilberstein, V. Lesser, and C. V. Goldman. Transition-independent decentralized Markov decision processes. In AAMAS, 2003.

Digital Library

[3]

D. S. Bernstein, S. Zilberstein, and N. Immerman. The complexity of decentralized control of MDPs. In UAI, 2000.

Digital Library

[4]

I. Chadès, B. Scherrer, and F. Charpillet. A heuristic approach for solving decentralized-pomdp: Assessment on the pursuit problem. In SAC, 2002.

Digital Library

[5]

Z. Feng and S. Zilberstein. Region based incremental pruning for POMDPs. In UAI, 2004.

Digital Library

[6]

Claudia V. Goldman and Shlomo Zilberstein. Optimizing information exchange in cooperative multi-agent systems. In AAMAS, 2003.

Digital Library

[7]

D. Bernstein; E. Hansen; and S. Zilberstein. Bounded policy iteration for decentralized pomdps. In IJCAI, 2005.

Digital Library

[8]

Eric A. Hansen, Daniel S. Bernstein, and Shlomo Zilberstein. Dynamic programming for partially observable stochastic games. In AAAI, 2004.

Digital Library

[9]

L. Kaelbling, M. Littman, and A. Cassandra. Planning and acting in partially observable stochastic domains. AIJ, 101(2), 1998.

Digital Library

[10]

R. E. Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for partially observable stochastic games with common payoffs. In AAMAS, 2004.

[11]

R. Nair, D. Pynadath, M. Yokoo, M. Tambe, and S. Marsella. Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In IJCAI, 2003.

Digital Library

[12]

R. Nair, M. Roth, M. Yokoo, and M. Tambe. Communication for improving policy computation in distributed pomdps. In AAMAS, 2004.

Digital Library

[13]

R. Nair, M. Tambe, and S. Marsella. Role allocation and reallocation in multiagent teams: Towards a practical analysis. In AAMAS, 2003.

Digital Library

[14]

R. Nair, P. Varakantham, M. Tambe, and M. Yokoo. Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In AAAI, 2005.

Digital Library

[15]

M. Tambe P. Varakantham, R. Maheswaran. Exploiting belief bounds: Practical pomdps for personal assistant agents. In AAMAS, 2005.

Digital Library

[16]

L. Peshkin, N. Meuleau, K.-E. Kim, and L. Kaelbling. Learning to cooperate via policy search. In UAI, 2000.

Digital Library

[17]

D. V. Pynadath and M. Tambe. The communicative multiagent team decision problem: Analyzing teamwork theories and models. JAIR, 16:389--423, 2002.

Digital Library

[18]

P. Xuan, V. Lesser, and S. Zilberstein. Communication decisions in multiagent cooperation. In Agents, 2001.

Digital Library

Cited By

Oliehoek FSpaan MVlassis N(2008)Optimal and approximate Q-value functions for decentralized POMDPsJournal of Artificial Intelligence Research10.5555/1622673.162268032:1(289-353)Online publication date: 1-May-2008
https://dl.acm.org/doi/10.5555/1622673.1622680
Marecki JTambe M(2008)Towards faster planning with continuous resources in stochastic domainsProceedings of the 23rd national conference on Artificial intelligence - Volume 210.5555/1620163.1620235(1049-1055)Online publication date: 13-Jul-2008
https://dl.acm.org/doi/10.5555/1620163.1620235
Szer DCharpillet F(2006)Point-based dynamic programming for DEC-POMDPsproceedings of the 21st national conference on Artificial intelligence - Volume 210.5555/1597348.1597384(1233-1238)Online publication date: 16-Jul-2006
https://dl.acm.org/doi/10.5555/1597348.1597384

Recommendations

Letting loose a SPIDER on a network of POMDPs: generating quality guaranteed policies
AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems

Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multi-agent systems acting in uncertain domains. Given the significant complexity of solving distributed POMDPs, particularly as we scale ...
Not all agents are equal: scaling up distributed POMDPs for agent networks
AAMAS '08: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1

Many applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (...
Scaling POMDPs for Spoken Dialog Management

Control in spoken dialog systems is challenging largely because automatic speech recognition is unreliable, and hence the state of the conversation can never be known with certainty. Partially observable Markov decision processes (POMDPs) provide a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems

May 2006

1631 pages

ISBN:1595933034

DOI:10.1145/1160633

General Chairs:
Hideyuki Nakashima
Future University - Hakodate, Japan
,
Michael Wellman
University of Michigan
,
Program Chairs:
Gerhard Weiss
Technical University Munich, Germany
,
Peter Stone
The University of Texas at Austin

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFMAS: The International Foundation for Multiagent Systems
SIGAI: ACM Special Interest Group on Artificial Intelligence
ATAL: The International Workshop on Agent Theories, Architectures, and Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

AAMAS06

Sponsor:

IFMAS
SIGAI
ATAL

AAMAS06: AAMAS '06 - 5th International Joint Conference on Autonomous Agents and Multi-agent Systems 2006

May 8 - 12, 2006

Japan, Hakodate

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
221
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Oliehoek FSpaan MVlassis N(2008)Optimal and approximate Q-value functions for decentralized POMDPsJournal of Artificial Intelligence Research10.5555/1622673.162268032:1(289-353)Online publication date: 1-May-2008
https://dl.acm.org/doi/10.5555/1622673.1622680
Marecki JTambe M(2008)Towards faster planning with continuous resources in stochastic domainsProceedings of the 23rd national conference on Artificial intelligence - Volume 210.5555/1620163.1620235(1049-1055)Online publication date: 13-Jul-2008
https://dl.acm.org/doi/10.5555/1620163.1620235
Szer DCharpillet F(2006)Point-based dynamic programming for DEC-POMDPsproceedings of the 21st national conference on Artificial intelligence - Volume 210.5555/1597348.1597384(1233-1238)Online publication date: 16-Jul-2006
https://dl.acm.org/doi/10.5555/1597348.1597384

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten