Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2343576.2343617acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

DCOPs and bandits: exploration and exploitation in decentralised coordination

Published: 04 June 2012 Publication History
  • Get Citation Alerts
  • Abstract

    Real life coordination problems are characterised by stochasticity and a lack of a priori knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely adopted framework for modelling decentralised coordination tasks, assumes perfect knowledge of these factors, thus limiting its practical applicability. To address this shortcoming, we introduce the MAB--DCOP, in which the interactions between agents are modelled by multi-armed bandits (MABs). Unlike canonical DCOPs, a MAB--DCOP is not a single shot optimisation problem. Rather, it is a sequential one in which agents need to coordinate in order to strike a balance between acquiring knowledge about the a priori unknown and stochastic interactions (exploration), and taking the currently believed optimal joint action (exploitation), so as to maximise the cumulative global utility over a finite time horizon. We propose Heist, the first asymptotically optimal algorithm for coordination under stochasticity and lack of prior knowledge. Heist solves MAB--DCOPs in a decentralised fashion using a generalised distributive law (GDL) message passing phase to find the joint action with the highest upper confidence bound (UCB) on global utility. We demonstrate that Heist outperforms other state of the art techniques from the MAB and DCOP literature by up to 1.5 orders of magnitude on MAB--DCOPs in experimental settings.

    References

    [1]
    S. M. Aji and R. J. McEliece. The Generalized Distributive Law. IEEE Trans. Inf. Theory, 46(2):325--343, 2000.
    [2]
    J. Atlas and K. Decker. Coordination for uncertain outcomes using distributed neighbor exchange. AAMAS'10, pages 1047--1054, 2010.
    [3]
    P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite--time analysis of the multiarmed bandit problem. Machine Learning, 47:235--256, 2002.
    [4]
    E. Even-Dar, S. Mannor, and Y. Mansour. Pac bounds for multi--armed bandit and markov decision processes. COLT'02, pages 255--270, 2002.
    [5]
    S. Fitzpatrick and L. Meertens. Distributed coordination through anarchic optimization. In Distributed Sensor Networks, pages 257--295. Kluwer Academic Publishers, 2003.
    [6]
    M. Jain, M. Taylor, M. Tambe, and M. Yokoo. DCOPs meet the real world: Exploring unknown reward matrices with applications to mobile sensor networks. IJCAI'09, pages 181--186, 2009.
    [7]
    F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Trans. on Information Theory, 47(2):498--519, 2001.
    [8]
    T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math., 6(1):4--22, 1985.
    [9]
    T. Léauté and B. Faltings. E{DPOP}: Distributed constraint optimization under stochastic uncertainty using collaborative sampling. IJCAI--09 DCR Workshop, pages 87--101, 2009.
    [10]
    T. Léauté and B. Faltings. Distributed constraint optimization under stochastic uncertainty. AAAI'11, pages 68--73, 2011.
    [11]
    D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.
    [12]
    R. J. Maheswaran, J. Pearce, and M. Tambe. A family of graphical-game-based algorithms for distributed constraint optimization problems. In Coordination of Large-Scale Multiagent Systems, pages 127--146. Springer-Verlag, 2005.
    [13]
    P. J. Modi, W. M. Shen, M. Tambe, and M. Yokoo. Adopt: Asynchronous distributed constraint optimization with quality guarantees. Artif. Intell., 161(1-2):149--180, 2005.
    [14]
    A. Petcu and B. Faltings. DPOP: A scalable method for multiagent constraint optimization. IJCAI'05, pages 266--271, 2005.
    [15]
    A. Rogers, A. Farinelli, R. Stranders, and N. R. Jennings. Bounded approximate decentralised coordination via the max-sum algorithm. Artif. Intell., 175(2), 2011.
    [16]
    R. S. Sutton and A. G. Barto, editors. Reinforcement Learning: An Introduction. MIT Press, 1998.
    [17]
    M. Taylor, M. Jain, Y. Jin, M. Yokoo, and M. Tambe. When should there be a "Me" in "Team"?: Distributed multi-agent optimization under uncertainty. AAMAS'10, pages 109--116, 2010.
    [18]
    J. Vermorel and M. Mohri. Multi-armed bandit algorithms and empirical evaluation. ECML'05, pages 437--448, 2005.

    Cited By

    View all
    • (2021)Scalable Optimization for Wind Farm Control using Coordination GraphsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464109(1362-1370)Online publication date: 3-May-2021
    • (2019)Decentralized cooperative stochastic banditsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454694(4529-4540)Online publication date: 8-Dec-2019
    • (2019)Delay and cooperation in nonstochastic banditsThe Journal of Machine Learning Research10.5555/3322706.332272320:1(613-650)Online publication date: 1-Jan-2019
    • Show More Cited By

    Index Terms

    1. DCOPs and bandits: exploration and exploitation in decentralised coordination

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
      June 2012
      592 pages
      ISBN:0981738117

      Sponsors

      • The International Foundation for Autonomous Agents and Multiagent Systems: The International Foundation for Autonomous Agents and Multiagent Systems

      In-Cooperation

      Publisher

      International Foundation for Autonomous Agents and Multiagent Systems

      Richland, SC

      Publication History

      Published: 04 June 2012

      Check for updates

      Author Tags

      1. coordination
      2. distributed problem solving
      3. uncertainty

      Qualifiers

      • Research-article

      Conference

      AAMAS 12
      Sponsor:
      • The International Foundation for Autonomous Agents and Multiagent Systems

      Acceptance Rates

      Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Scalable Optimization for Wind Farm Control using Coordination GraphsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464109(1362-1370)Online publication date: 3-May-2021
      • (2019)Decentralized cooperative stochastic banditsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454694(4529-4540)Online publication date: 8-Dec-2019
      • (2019)Delay and cooperation in nonstochastic banditsThe Journal of Machine Learning Research10.5555/3322706.332272320:1(613-650)Online publication date: 1-Jan-2019
      • (2017)DUCTACM Transactions on Intelligent Systems and Technology10.1145/30661568:5(1-27)Online publication date: 12-Jul-2017
      • (2016)Best Action Selection in a Stochastic EnvironmentProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937036(758-766)Online publication date: 9-May-2016
      • (2014)Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPsProceedings of the 2014 international conference on Autonomous agents and multi-agent systems10.5555/2615731.2617463(1341-1342)Online publication date: 5-May-2014

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media