Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2343576.2343626acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Comparative evaluation of MAL algorithms in a diverse set of ad hoc team problems

Published: 04 June 2012 Publication History

Abstract

This paper is concerned with evaluating different multiagent learning (MAL) algorithms in problems where individual agents may be heterogenous, in the sense of utilizing different learning strategies, without the opportunity for prior agreements or information regarding coordination. Such a situation arises in ad hoc team problems, a model of many practical multiagent systems applications. Prior work in multiagent learning has often been focussed on homogeneous groups of agents, meaning that all agents were identical and a priori aware of this fact. Also, those algorithms that are specifically designed for ad hoc team problems are typically evaluated in teams of agents with fixed behaviours, as opposed to agents which are adapting their behaviours. In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems. All teams consist of agents which are continuously adapting their behaviours. The algorithms are evaluated with respect to a comprehensive characterisation of repeated matrix games, using performance criteria that include considerations such as attainment of equilibrium, social welfare and fairness. Our main conclusion is that there is no clear winner. However, the comparative evaluation also highlights the relative strengths of different algorithms with respect to the type of performance criteria, e.g., social welfare vs. attainment of equilibrium.

References

[1]
D. Banerjee and S. Sen. Reaching pareto-optimality in prisoner's dilemma using conditional joint action learning. Autonomous Agents and Multi-Agent Systems, 15(1):91--108, 2007.
[2]
S. Barrett, P. Stone, and S. Kraus. Empirical evaluation of ad hoc teamwork in the pursuit domain. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems, May 2011.
[3]
M. Bowling and M. Veloso. Multiagent learning using a variable learning rate. Artificial Intelligence, 136(2):215--250, 2002.
[4]
G. Brown. Iterative solution of games by fictitious play. In T. Koopmans, editor, Activity Analysis of Production and Allocation. Wiley, 1951.
[5]
C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence, pages 746--752, 1998.
[6]
V. Conitzer and T. Sandholm. Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In Proceedings of the 20th International Conference on Machine Learning, volume 20, pages 83--90, 2003.
[7]
K. Genter, N. Agmon, and P. Stone. Role-based ad hoc teamwork. In Proceedings of the Plan, Activity, and Intent Recognition Workshop at the 25th Conference on Artificial Intelligence, August 2011.
[8]
A. Greenwald and K. Hall. Correlated q-learning. In Proceedings of the 20th International Conference on Machine Learning, pages 242--249, 2003.
[9]
J. Harsanyi. Bargaining in ignorance of the opponents' utility function. Journal of Conflict Resolution, 6(1):29--38, 1962.
[10]
J. Harsanyi. Games with incomplete information played by "bayesian" players, i-iii. part i. the basic model. Management Science, 14(3):159--182, 1967.
[11]
J. Harsanyi. Games with incomplete information played by "bayesian" players, i-iii. part ii. bayesian equilibrium points. Management Science, 14(5):320--334, 1968.
[12]
J. Harsanyi. Games with incomplete information played by "bayesian" players, i-iii. part iii. the basic probability distribution of the game. Management Science, 14(7):486--502, 1968.
[13]
S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127--1150, 2000.
[14]
S. Hart and A. Mas-Colell. A reinforcement procedure leading to correlated equilibrium. Economic Essays: A Festschrift for Werner Hildenbrand, pages 181--200, 2001.
[15]
J. Hu and M. Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning, volume 242, page 250, 1998.
[16]
J. Hu and M. Wellman. Experimental results on q-learning for general-sum stochastic games. In Proceedings of the 17th International Conference on Machine Learning, page 414. Morgan Kaufmann Publishers Inc., 2000.
[17]
J. Hu and M. Wellman. Nash q-learning for general-sum stochastic games. The Journal of Machine Learning Research, 4:1039--1069, 2003.
[18]
T. Jaakkola, M. Jordan, and S. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185--1201, 1994.
[19]
J. Jordan. Bayesian learning in normal form games. Games and Economic Behavior, 3(1):60--81, 1991.
[20]
S. Kimbrough and M. Lu. Simple reinforcement learning agents: Pareto beats nash in an algorithmic game theory study. Information Systems and E-Business Management, 3(1):1--19, 2005.
[21]
C. Lemke and J. Howson. Equilibrium points of bimatrix games. Journal of the Society for Industrial and Applied Mathematics, 12(2):413--423, 1964.
[22]
M. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning, volume 157, page 163, 1994.
[23]
M. Littman. Friend-or-foe q-learning in general-sum games. In Proceedings of the 18th International Conference on Machine Learning, ICML '01, pages 322--328. Morgan Kaufmann Publishers Inc., 2001.
[24]
M. Littman and C. Szepesvári. A generalized reinforcement-learning model: Convergence and applications. In In Proceedings of the 13th International Conference on Machine Learning, pages 310--318. Morgan Kaufmann, 1996.
[25]
A. Rapoport and M. Guyer. A taxonomy of 2 x 2 games. General Systems: Yearbook of the Society for General Systems Research, 11:203--214, 1966.
[26]
T. Schelling. The Strategy of Conflict. Harvard University Press, 1980.
[27]
S. Sen, S. Airiau, and R. Mukherjee. Towards a pareto-optimal solution in general-sum games. In Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, pages 153--160. ACM, 2003.
[28]
L. Shapley. Stochastic games. Proceedings of the National Academy of Sciences of the United States of America, 39(10):1095, 1953.
[29]
P. Stone, G. Kaminka, S. Kraus, and J. Rosenschein. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the 24th Conference on Artificial Intelligence, July 2010.
[30]
P. Stone, G. Kaminka, and J. Rosenschein. Leading a best-response teammate in an ad hoc team. In Agent-Mediated Electronic Commerce: Designing Trading Strategies and Mechanisms for Electronic Markets, pages 132--146, November 2010.
[31]
P. Stone and S. Kraus. To teach or not to teach? decision making under uncertainty in ad hoc teams. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, May 2010.
[32]
W. Uther and M. Veloso. Adversarial reinforcement learning. Technical report, Computer Science Department, Carnegie Mellon University, 1997.
[33]
C. Watkins and P. Dayan. Q-learning. Machine learning, 8(3):279--292, 1992.
[34]
D. Wolpert and W. Macready. No free lunch theorems for search. Technical Report SFI-TR-95-02-010, Santa Fe Institute, 1995.
[35]
D. Wolpert and W. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67--82, 1997.
[36]
F. Wu, S. Zilberstein, and X. Chen. Online planning for ad hoc autonomous agent teams. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence, 2011.
[37]
H. Young. Strategic learning and its limits. Oxford University Press, 2004.

Cited By

View all
  • (2017)Allocating training instances to learning agents for team formationAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9355-331:4(905-940)Online publication date: 1-Jul-2017
  • (2016)Belief and truth in hypothesised behavioursArtificial Intelligence10.1016/j.artint.2016.02.004235:C(63-94)Online publication date: 1-Jun-2016
  • (2014)On convergence and optimality of best-response learning with policy types in multiagent systemsProceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence10.5555/3020751.3020754(12-21)Online publication date: 23-Jul-2014
  • Show More Cited By

Index Terms

  1. Comparative evaluation of MAL algorithms in a diverse set of ad hoc team problems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
    June 2012
    592 pages
    ISBN:0981738117

    Sponsors

    • The International Foundation for Autonomous Agents and Multiagent Systems: The International Foundation for Autonomous Agents and Multiagent Systems

    In-Cooperation

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 04 June 2012

    Check for updates

    Author Tags

    1. ad hoc teams
    2. agent coordination
    3. multiagent learning

    Qualifiers

    • Research-article

    Conference

    AAMAS 12
    Sponsor:
    • The International Foundation for Autonomous Agents and Multiagent Systems

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Allocating training instances to learning agents for team formationAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9355-331:4(905-940)Online publication date: 1-Jul-2017
    • (2016)Belief and truth in hypothesised behavioursArtificial Intelligence10.1016/j.artint.2016.02.004235:C(63-94)Online publication date: 1-Jun-2016
    • (2014)On convergence and optimality of best-response learning with policy types in multiagent systemsProceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence10.5555/3020751.3020754(12-21)Online publication date: 23-Jul-2014
    • (2014)Modeling uncertainty in leading ad hoc teamsProceedings of the 2014 international conference on Autonomous agents and multi-agent systems10.5555/2615731.2615797(397-404)Online publication date: 5-May-2014
    • (2013)Ad hoc coordination in multiagent systems with applications to human-machine interactionProceedings of the 2013 international conference on Autonomous agents and multi-agent systems10.5555/2484920.2485253(1415-1416)Online publication date: 6-May-2013
    • (2013)A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systemsProceedings of the 2013 international conference on Autonomous agents and multi-agent systems10.5555/2484920.2485118(1155-1156)Online publication date: 6-May-2013

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media