Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2343576.2343617acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

DCOPs and bandits: exploration and exploitation in decentralised coordination

Published: 04 June 2012 Publication History

Abstract

Real life coordination problems are characterised by stochasticity and a lack of a priori knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely adopted framework for modelling decentralised coordination tasks, assumes perfect knowledge of these factors, thus limiting its practical applicability. To address this shortcoming, we introduce the MAB--DCOP, in which the interactions between agents are modelled by multi-armed bandits (MABs). Unlike canonical DCOPs, a MAB--DCOP is not a single shot optimisation problem. Rather, it is a sequential one in which agents need to coordinate in order to strike a balance between acquiring knowledge about the a priori unknown and stochastic interactions (exploration), and taking the currently believed optimal joint action (exploitation), so as to maximise the cumulative global utility over a finite time horizon. We propose Heist, the first asymptotically optimal algorithm for coordination under stochasticity and lack of prior knowledge. Heist solves MAB--DCOPs in a decentralised fashion using a generalised distributive law (GDL) message passing phase to find the joint action with the highest upper confidence bound (UCB) on global utility. We demonstrate that Heist outperforms other state of the art techniques from the MAB and DCOP literature by up to 1.5 orders of magnitude on MAB--DCOPs in experimental settings.

References

[1]
S. M. Aji and R. J. McEliece. The Generalized Distributive Law. IEEE Trans. Inf. Theory, 46(2):325--343, 2000.
[2]
J. Atlas and K. Decker. Coordination for uncertain outcomes using distributed neighbor exchange. AAMAS'10, pages 1047--1054, 2010.
[3]
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite--time analysis of the multiarmed bandit problem. Machine Learning, 47:235--256, 2002.
[4]
E. Even-Dar, S. Mannor, and Y. Mansour. Pac bounds for multi--armed bandit and markov decision processes. COLT'02, pages 255--270, 2002.
[5]
S. Fitzpatrick and L. Meertens. Distributed coordination through anarchic optimization. In Distributed Sensor Networks, pages 257--295. Kluwer Academic Publishers, 2003.
[6]
M. Jain, M. Taylor, M. Tambe, and M. Yokoo. DCOPs meet the real world: Exploring unknown reward matrices with applications to mobile sensor networks. IJCAI'09, pages 181--186, 2009.
[7]
F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Trans. on Information Theory, 47(2):498--519, 2001.
[8]
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math., 6(1):4--22, 1985.
[9]
T. Léauté and B. Faltings. E{DPOP}: Distributed constraint optimization under stochastic uncertainty using collaborative sampling. IJCAI--09 DCR Workshop, pages 87--101, 2009.
[10]
T. Léauté and B. Faltings. Distributed constraint optimization under stochastic uncertainty. AAAI'11, pages 68--73, 2011.
[11]
D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.
[12]
R. J. Maheswaran, J. Pearce, and M. Tambe. A family of graphical-game-based algorithms for distributed constraint optimization problems. In Coordination of Large-Scale Multiagent Systems, pages 127--146. Springer-Verlag, 2005.
[13]
P. J. Modi, W. M. Shen, M. Tambe, and M. Yokoo. Adopt: Asynchronous distributed constraint optimization with quality guarantees. Artif. Intell., 161(1-2):149--180, 2005.
[14]
A. Petcu and B. Faltings. DPOP: A scalable method for multiagent constraint optimization. IJCAI'05, pages 266--271, 2005.
[15]
A. Rogers, A. Farinelli, R. Stranders, and N. R. Jennings. Bounded approximate decentralised coordination via the max-sum algorithm. Artif. Intell., 175(2), 2011.
[16]
R. S. Sutton and A. G. Barto, editors. Reinforcement Learning: An Introduction. MIT Press, 1998.
[17]
M. Taylor, M. Jain, Y. Jin, M. Yokoo, and M. Tambe. When should there be a "Me" in "Team"?: Distributed multi-agent optimization under uncertainty. AAMAS'10, pages 109--116, 2010.
[18]
J. Vermorel and M. Mohri. Multi-armed bandit algorithms and empirical evaluation. ECML'05, pages 437--448, 2005.

Cited By

View all
  • (2019)Decentralized cooperative stochastic banditsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454694(4529-4540)Online publication date: 8-Dec-2019
  • (2019)Delay and cooperation in nonstochastic banditsThe Journal of Machine Learning Research10.5555/3322706.332272320:1(613-650)Online publication date: 1-Jan-2019
  • (2017)DUCTACM Transactions on Intelligent Systems and Technology10.1145/30661568:5(1-27)Online publication date: 12-Jul-2017
  • Show More Cited By

Index Terms

  1. DCOPs and bandits: exploration and exploitation in decentralised coordination

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
    June 2012
    592 pages
    ISBN:0981738117

    Sponsors

    • The International Foundation for Autonomous Agents and Multiagent Systems: The International Foundation for Autonomous Agents and Multiagent Systems

    In-Cooperation

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 04 June 2012

    Check for updates

    Author Tags

    1. coordination
    2. distributed problem solving
    3. uncertainty

    Qualifiers

    • Research-article

    Conference

    AAMAS 12
    Sponsor:
    • The International Foundation for Autonomous Agents and Multiagent Systems

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Decentralized cooperative stochastic banditsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454694(4529-4540)Online publication date: 8-Dec-2019
    • (2019)Delay and cooperation in nonstochastic banditsThe Journal of Machine Learning Research10.5555/3322706.332272320:1(613-650)Online publication date: 1-Jan-2019
    • (2017)DUCTACM Transactions on Intelligent Systems and Technology10.1145/30661568:5(1-27)Online publication date: 12-Jul-2017
    • (2016)Best Action Selection in a Stochastic EnvironmentProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937036(758-766)Online publication date: 9-May-2016
    • (2014)Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPsProceedings of the 2014 international conference on Autonomous agents and multi-agent systems10.5555/2615731.2617463(1341-1342)Online publication date: 5-May-2014

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media