research-article

DCOPs and bandits: exploration and exploitation in decentralised coordination

Authors:

Ruben Stranders,

Long Tran-Thanh,

Francesco M. Delle Fave,

Alex Rogers, and

Nicholas R. JenningsAuthors Info & Claims

AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

June 2012

Pages 289 - 296

Published: 04 June 2012 Publication History

Abstract

Real life coordination problems are characterised by stochasticity and a lack of a priori knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely adopted framework for modelling decentralised coordination tasks, assumes perfect knowledge of these factors, thus limiting its practical applicability. To address this shortcoming, we introduce the MAB--DCOP, in which the interactions between agents are modelled by multi-armed bandits (MABs). Unlike canonical DCOPs, a MAB--DCOP is not a single shot optimisation problem. Rather, it is a sequential one in which agents need to coordinate in order to strike a balance between acquiring knowledge about the a priori unknown and stochastic interactions (exploration), and taking the currently believed optimal joint action (exploitation), so as to maximise the cumulative global utility over a finite time horizon. We propose Heist, the first asymptotically optimal algorithm for coordination under stochasticity and lack of prior knowledge. Heist solves MAB--DCOPs in a decentralised fashion using a generalised distributive law (GDL) message passing phase to find the joint action with the highest upper confidence bound (UCB) on global utility. We demonstrate that Heist outperforms other state of the art techniques from the MAB and DCOP literature by up to 1.5 orders of magnitude on MAB--DCOPs in experimental settings.

References

[1]

S. M. Aji and R. J. McEliece. The Generalized Distributive Law. IEEE Trans. Inf. Theory, 46(2):325--343, 2000.

Digital Library

[2]

J. Atlas and K. Decker. Coordination for uncertain outcomes using distributed neighbor exchange. AAMAS'10, pages 1047--1054, 2010.

Digital Library

[3]

P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite--time analysis of the multiarmed bandit problem. Machine Learning, 47:235--256, 2002.

Digital Library

[4]

E. Even-Dar, S. Mannor, and Y. Mansour. Pac bounds for multi--armed bandit and markov decision processes. COLT'02, pages 255--270, 2002.

Digital Library

[5]

S. Fitzpatrick and L. Meertens. Distributed coordination through anarchic optimization. In Distributed Sensor Networks, pages 257--295. Kluwer Academic Publishers, 2003.

[6]

M. Jain, M. Taylor, M. Tambe, and M. Yokoo. DCOPs meet the real world: Exploring unknown reward matrices with applications to mobile sensor networks. IJCAI'09, pages 181--186, 2009.

Digital Library

[7]

F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Trans. on Information Theory, 47(2):498--519, 2001.

Digital Library

[8]

T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math., 6(1):4--22, 1985.

Digital Library

[9]

T. Léauté and B. Faltings. E{DPOP}: Distributed constraint optimization under stochastic uncertainty using collaborative sampling. IJCAI--09 DCR Workshop, pages 87--101, 2009.

[10]

T. Léauté and B. Faltings. Distributed constraint optimization under stochastic uncertainty. AAAI'11, pages 68--73, 2011.

[11]

D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.

Digital Library

[12]

R. J. Maheswaran, J. Pearce, and M. Tambe. A family of graphical-game-based algorithms for distributed constraint optimization problems. In Coordination of Large-Scale Multiagent Systems, pages 127--146. Springer-Verlag, 2005.

[13]

P. J. Modi, W. M. Shen, M. Tambe, and M. Yokoo. Adopt: Asynchronous distributed constraint optimization with quality guarantees. Artif. Intell., 161(1-2):149--180, 2005.

Digital Library

[14]

A. Petcu and B. Faltings. DPOP: A scalable method for multiagent constraint optimization. IJCAI'05, pages 266--271, 2005.

Digital Library

[15]

A. Rogers, A. Farinelli, R. Stranders, and N. R. Jennings. Bounded approximate decentralised coordination via the max-sum algorithm. Artif. Intell., 175(2), 2011.

Digital Library

[16]

R. S. Sutton and A. G. Barto, editors. Reinforcement Learning: An Introduction. MIT Press, 1998.

Digital Library

[17]

M. Taylor, M. Jain, Y. Jin, M. Yokoo, and M. Tambe. When should there be a "Me" in "Team"?: Distributed multi-agent optimization under uncertainty. AAMAS'10, pages 109--116, 2010.

Digital Library

[18]

J. Vermorel and M. Mohri. Multi-armed bandit algorithms and empirical evaluation. ECML'05, pages 437--448, 2005.

Digital Library

Cited By

Verstraeten TDaems PBargiacchi ERoijers DLibin PHelsen JDignum FLomuscio AEndriss UNowé A(2021)Scalable Optimization for Wind Farm Control using Coordination GraphsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464109(1362-1370)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464109
Martínez-Rubio DKanade VRebeschini PWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Decentralized cooperative stochastic banditsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454694(4529-4540)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3454694
Cesa-Bianchi NGentile CMansour Y(2019)Delay and cooperation in nonstochastic banditsThe Journal of Machine Learning Research10.5555/3322706.332272320:1(613-650)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3322706.3322723
Show More Cited By

Index Terms

DCOPs and bandits: exploration and exploitation in decentralised coordination
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Multi-agent systems

Recommendations

Speeding up distributed pseudo-tree optimization procedures with cross edge consistency to solve DCOPs
Abstract
The Distributed Pseudo-tree Optimization Procedure (DPOP) is a well-known message passing algorithm that provides optimal solutions to Distributed Constraint Optimization Problems (DCOPs) in cooperative multi-agent systems. However, the ...
Read More
Stochastic dominance in stochastic DCOPs for risk-sensitive applications
AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

Distributed constraint optimization problems (DCOPs) are well-suited for modeling multi-agent coordination problems where the primary interactions are between local subsets of agents. However, one limitation of DCOPs is the assumption that the ...
Read More
ER-DCOPs: A Framework for Distributed Constraint Optimization with Uncertainty in Constraint Utilities
AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems

Distributed Constraint Optimization Problems (DCOPs) have been used to model a number of multi-agent coordination problems. In DCOPs, agents are assumed to have complete information about the utility of their possible actions. However, in many real-...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

June 2012

592 pages

ISBN:0981738117

General Chairs:
Wiebe van der Hoek
University of Liverpool, UK
,
Lin Padgham
RMIT University, Australia
,
Program Chairs:
Vincent Conitzer
Duke University
,
Michael Winikoff
University of Otago, New Zealand

Sponsors

The International Foundation for Autonomous Agents and Multiagent Systems: The International Foundation for Autonomous Agents and Multiagent Systems

In-Cooperation

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 04 June 2012

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AAMAS 12

Sponsor:

The International Foundation for Autonomous Agents and Multiagent Systems

AAMAS 12: International Conference on Autonomous Agents and Multiagent Systems

June 4 - 8, 2012

Valencia, Spain

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
136
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Verstraeten TDaems PBargiacchi ERoijers DLibin PHelsen JDignum FLomuscio AEndriss UNowé A(2021)Scalable Optimization for Wind Farm Control using Coordination GraphsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464109(1362-1370)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464109
Martínez-Rubio DKanade VRebeschini PWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Decentralized cooperative stochastic banditsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454694(4529-4540)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3454694
Cesa-Bianchi NGentile CMansour Y(2019)Delay and cooperation in nonstochastic banditsThe Journal of Machine Learning Research10.5555/3322706.332272320:1(613-650)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3322706.3322723
Ottens BDimitrakakis CFaltings B(2017)DUCTACM Transactions on Intelligent Systems and Technology10.1145/30661568:5(1-27)Online publication date: 12-Jul-2017
https://dl.acm.org/doi/10.1145/3066156
Xia YQin TYu NLiu TJonker CMarsella SThangarajah JTuyls K(2016)Best Action Selection in a Stochastic EnvironmentProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937036(758-766)Online publication date: 9-May-2016
https://dl.acm.org/doi/10.5555/2936924.2937036
Nguyen DYeoh WLau HZilberstein SZhang CBazzan AHuhns MLomuscio AScerri P(2014)Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPsProceedings of the 2014 international conference on Autonomous agents and multi-agent systems10.5555/2615731.2617463(1341-1342)Online publication date: 5-May-2014
https://dl.acm.org/doi/10.5555/2615731.2617463

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents