article

Ad hoc teamwork by learning teammates' task

Authors:

Francisco S. Melo,

Alberto SardinhaAuthors Info & Claims

Autonomous Agents and Multi-Agent Systems, Volume 30, Issue 2

Pages 175 - 219

https://doi.org/10.1007/s10458-015-9280-x

Published: 01 March 2016 Publication History

Abstract

This paper addresses the problem of ad hoc teamwork, where a learning agent engages in a cooperative task with other (unknown) agents. The agent must effectively coordinate with the other agents towards completion of the intended task, not relying on any pre-defined coordination strategy. We contribute a new perspective on the ad hoc teamwork problem and propose that, in general, the learning agent should not only identify (and coordinate with) the teammates' strategy but also identify the task to be completed. In our approach to the ad hoc teamwork problem, we represent tasks as fully cooperative matrix games. Relying exclusively on observations of the behavior of the teammates, the learning agent must identify the task at hand (namely, the corresponding payoff function) from a set of possible tasks and adapt to the teammates' behavior. Teammates are assumed to follow a bounded-rationality best-response model and thus also adapt their behavior to that of the learning agent. We formalize the ad hoc teamwork problem as a sequential decision problem and propose two novel approaches to address it. In particular, we propose (i) the use of an online learning approach that considers the different tasks depending on their ability to predict the behavior of the teammate; and (ii) a decision-theoretic approach that models the ad hoc teamwork problem as a partially observable Markov decision problem. We provide theoretical bounds of the performance of both approaches and evaluate their performance in several domains of different complexity.

References

[1]

Abbeel, P. (2008). Apprenticeship learning and reinforcement learning with application to robotic control. PhD thesis, Stanford University.

[2]

Agmon, N., & Stone, P. (2012). Leading ad hoc agents in joint action settings with multiple teammates. In Proceedings 11th International Conference on Autonomous Agents and Multiagent Systems (pp. 341---348).

[3]

Albrecht, S., & Ramamoorthy, S. (2013). A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. In: Proceedings 2013 International Conference on Autonomous Agents and Multiagent Systems (pp. 1155---1156).

[4]

Barrett, S., & Stone, P. (2011). Ad hoc teamwork modeled with multi-armed bandits: An extension to discounted infinite rewards. In Proceedings of 2011 AAMAS Workshop on Adaptive and Learning Agents (pp. 9---14).

[5]

Barrett, S., & Stone, P. (2012). An analysis framework for ad hoc teamwork tasks. In Proceedings of 11th International Conference on Autonomous Agents and Multiagent Systems (pp. 357---364).

[6]

Barrett, S., Stone, P., & Kraus, S. (2011). Empirical evaluation of ad hoc teamwork in the pursuit domain. In Proceedings of 10th International Conference on Autonomous Agents and Multiagent Systems (pp. 567---574).

[7]

Barrett, S., Stone, P., Kraus, S., & Rosenfeld, A. (2013). Teamwork with limited knowledge of reammates. In Proceedings of 27th AAAI Conference on Artificial Intelligence.

[8]

Barron, A. (1988). The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions. Technical Report 7, University of Illinois at Urbana-Champaign.

[9]

Blackwell, D., & Dubbins, L. (1962). Merging of opinions with increasing information. The Annals of Mathematical Statistics, 33(3), 882---886.

[10]

Boutilier, C. (1996). Planning, learning and coordination in multiagent decision processes. In Proceedings 6th Conference on Theoretical Aspects of Rationality and Knowledge (pp. 195---210).

[11]

Bowling, M., & McCracken, P. (2005). Coordination and adaptation in impromptu teams. In Proceedings of 20th AAAI Conference on Artificial Intelligence (pp. 53---58).

[12]

Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning and games. New York: Cambridge University Press.

[13]

Chakraborty, D., & Stone, P. (2013). Cooperating with a Markovian ad hoc teammate. In Proceedings of 12th International Conference on Autonomous Agents and Multiagent Systems (pp. 1085---1092).

[14]

Clarke, B., & Barron, A. (1990). Information-theoretic asymptotics of Bayes methods. IEEE Transactions on Information Theory, 36(3), 371---453.

[15]

de Farias, D., & Megiddo, N. (2006). Combining expert advice in reactive environments. The Journal of the ACM, 53(5), 762---799.

Digital Library

[16]

Duff, M. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massassachusetts Amherst.

[17]

Fu, J., & Kass, R. (1988). The exponential rates of convergence of posterior distributions. Annals of the Institute of Statistical Mathematics, 40(4), 683---691.

[18]

Fudenberg, D., & Levine, D. (1989). Reputation and equilibrium selection in games with a patient player. Econometrica, 57(4), 759---778.

[19]

Fudenberg, D., & Levine, D. (1993). Steady state learning and Nash equilibrium. Econometrica, 61(3), 547---573.

[20]

Fudenberg, D., & Levine, D. (1998). The theory of learning in games. Cambridge, MA: MIT Press.

[21]

Ganzfried, S., & Sandholm, T. (2011). Game theory-based opponent modeling in large imperfect-information games. In Proceedings of 10th International Conference on Autonomous Agents and Multiagent Systems (pp. 533---540).

[22]

Genter, K., Agmon, N., & Stone, P. (2011). Role-based ad hoc teamwork. In Proceedings of 25th AAAI Conference on Artificial Intelligence (pp. 1782---1783).

[23]

Genter, K., Agmon, N., & Stone, P. (2013). Ad hoc teamwork for leading a flock. In Proceedings of 12th International Conference on Autonomous Agents and Multiagent Systems (pp. 531---538).

[24]

Ghosal, S., & van der Vaart, A. (2007). Convergence rates of posterior distributions for non IID observations. The Annals of Statistics, 35(1), 192---223.

[25]

Ghosal, S., Ghosh, J., & van der Vaart, A. (2000). Convergence rates of posterior distributions. The Annals of Statistics, 28(2), 500---531.

[26]

Gittins, J. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society B, 41(2), 148---177.

[27]

Gmytrasiewicz, P., & Doshi, P. (2005). A framework for sequential planning in multiagent settings. Journal of Artificial Intelligence Research, 24, 49---79.

Digital Library

[28]

Gossner, O., & Tomala, T. (2008). Entropy bounds on Bayesian learning. Journal of Mathematical Economics, 44, 24---32.

[29]

Haussler, D., & Opper, M. (1997). Mutual information, metric entropy and cumulative entropy risk. Annals of Statistics, 25(6), 2451---2492.

[30]

Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13---30.

[31]

Jordan, J. (1991). Bayesian learning in normal form games. Games and Economic Behavior, 3, 60---81.

[32]

Jordan, J. (1992). The exponential convergence of Bayesian learning in normal form games. Games and Economic Behavior, 4(2), 202---217.

[33]

Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99---134.

[34]

Kalai, E., & Lehrer, E. (1993). Rational learning leads to Nash equilibrium. Econometrica, 61(5), 1019---1045.

[35]

Kauffman, E., Cappé, O., & Garivier, A (2012). On Bayesian upper confidence bounds for bandit problems. In Proceedings of 15th International Conference on Artificial Intelligence and Statistics (pp. 592---600).

[36]

Kauffman, E., Korda, N., & Munos, R. (2012). Thompson sampling: An asymptotically optimal finite-time analysis. In Proceedings of 23rd International Conference on Algorithmic Learning Theory (pp. 199---213).

[37]

Kautz, H., Pelavin, R., Tenenberg, J., & Kaufmann, M. (1991). A formal theory of plan recognition and its implementation. Reasoning about plans (pp. 69---125). San Mateo, CA: Morgan Kaufmann.

[38]

Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Proceedings of 17th European Conference on Machine Learning (pp. 282---293).

[39]

Lai, T., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4---22.

Digital Library

[40]

Leyton-Brown, K., & Shoham, Y. (2008). Essential of game theory: A concise, multidisciplinary introduction. San Rafael, CA: Morgan & Claypool Publishers.

[41]

Liemhetcharat, S., & Veloso, M. (2014). Weighted synergy graphs for effective team formation with heterogeneous ad hoc agents. Artificial Intelligence, 208, 41---65.

Digital Library

[42]

Littman, M. (2001). Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research, 2(1), 55---66.

Digital Library

[43]

Madani, O., Hanks, S., & Condon, A. (1999). On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proceedings of 16th AAAI Conference Artificial Intelligence (pp. 541---548).

[44]

Nachbar, J. (1997). Prediction, optimization and learning in repeated games. Econometrica, 65(2), 275---309.

[45]

Ng, A., & Russel, S. (2000). Algorithms for inverse reinforcement learning. In Proceedings of 17th International Conference on Machine Learning (pp. 663---670).

[46]

Pineau, J., Gordon, G., & Thrun, S. (2006). Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 27, 335---380.

[47]

Poupart, P., Vlassis, N., Hoey, J., & Regan, K. (2006). An analytic solution to discrete Bayesian reinforcement learning. In Proceedings of 23rd International Conference on Machine Learning (pp. 697---704).

[48]

Pourmehr, S., & Dadkhah, C. (2012). An overview on opponent modeling in RoboCup soccer simulation 2D. Robot soccer world cup XV (pp. 402---414). Berlin: Springer.

[49]

Puterman, M. (2005). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.

[50]

Ramchurn, S., Osborne, M., Parson, O., Rahwan, T., Maleki, S., Reece, S., Huynh, T., Alam, M., Fischer, J., Rodden, T., Moreau, L., & Roberts, S. (2013). AgentSwitch: Towards smart energy tariff selection. In Proceedings of 12th International Conference on Autonomous Agents and Multiagent Systems (pp. 981---988).

[51]

Rosenthal, S., Biswas, J., & Veloso, M. (2010). An effective personal mobile robot agent through symbiotic human-robot interaction. In Proceedings of 9th International Conference on Autonomous Agents and Multiagent Systems (pp. 915---922).

[52]

Seuken, S., & Zilberstein, S. (2008). Formal models and algorithms for decentralized decision making under uncertainty. Journal of Autonomous Agents and Multiagent Systems, 17(2), 190---250.

Digital Library

[53]

Shani, G., Pineau, J., & Kaplow, R. (2013). A survey of point-based POMDP solvers. Journal of Autonomous Agents and Multiagent Systems, 27(1), 1---51.

Digital Library

[54]

Shen, X., & Wasserman, L. (2001). Rates of convergence of posterior distributions. The Annals of Statistics, 29(3), 687---714.

[55]

Shiryaev, A. (1996). Probability. New York: Springer.

[56]

Spaan, M., & Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Jornal of Artificial Intelligence Reseasrch, 24, 195---220.

Digital Library

[57]

Spaan, M., Gordon, G., & Vlassis, N. (2006). Decentralized planning under uncertainty for teams of communicating agents. In Proceedings of 5th International Conference on Autonomous Agents and Multi Agent Systems (pp. 249---256).

[58]

Stone, P., & Kraus, S. (2010). To teach or not to teach? Decision-making under uncertainty in ad hoc teams. In Proceedings of 9th International Conference on Autonomous Agents and Multiagent Systems (pp. 117---124).

[59]

Stone, P., & Veloso, M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3), 345---383.

Digital Library

[60]

Stone, P., Kaminka, G., Kraus, S., & Rosenschein, J. (2010). Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of 24th AAAI Conference on Artificial Intelligence (pp. 1504---1509).

[61]

Stone, P., Kaminka, G., & Rosenschein, J. (2010). Leading a best-response teammate in an ad hoc team. Agent-mediated electronic commerce. Designing trading strategies and mechanisms for electronic markets. Lecture notes in business information processing (pp. 132---146). Berlin: Springer.

[62]

Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.

[63]

Walker, S., Lijoi, A., & Prünster, I. (2007). On rates of convergence for posterior distributions in infinite-dimensional models. The Annals of Statistics, 35(2), 738---746.

[64]

Wang, X., & Sandholm, T. (2002). Reinforcement learning to play an optimal Nash equilibrium in team Markov games. Advances in Neural Information Processing Systems, 15, 1571---1578.

[65]

Watkins, C. (1989). Learning from delayed rewards. PhD thesis, King's College, Cambridge Univ.

[66]

Wu, F., Zilberstein, S., & Chen, X. (2011). Online planning for ad hoc autonomous agent teams. In Proceedings of 22nd International Joint Conference on Artificial Intelligence (pp. 439---445).

[67]

Yorke-Smith, N., Saadati, S., Myers, K., & Morley, D. (2012). The design of a proactive personal agent for task management. International Journal on Artificial Intelligence Tools, 21(1), 90---119.

Cited By

Carvalho Boaro JBezerra da Costa Pde Sousa Moraes DCutrim dos Santos PRibeiro JDuarte JSardinha AColcher S(2024)A Middleware Architecture for Enhancing Multimedia Flows with High-Level Semantic InformationProceedings of the 2024 ACM International Conference on Interactive Media Experiences Workshops10.1145/3672406.3672407(1-6)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3672406.3672407
Xing DGu PZheng QWang XLiu SZheng LAn BPan GKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Controlling type confounding in ad hoc teamwork with instance-wise teammate feedback rectificationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620001(38272-38285)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3620001
Ahmed IBrewitt CCarlucho IChristianos FDunion MFosong EGarcin SGuo SGyevnar BMcInroe TPapoudakis GRahman ASchäfer LTamborski MVecchio GWang CAlbrecht SAlbrecht SWoolridge M(2022)Deep reinforcement learning for multi-agent interactionAI Communications10.3233/AIC-22011635:4(357-368)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.3233/AIC-220116
Show More Cited By

Index Terms

Ad hoc teamwork by learning teammates' task
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Multi-agent systems
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Ad Hoc Teamwork by Learning Teammates' Task
AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems

We address ad hoc teamwork, where an agent must coordinate with other agents in an unknown common task without pre-defined coordination. We formalize the ad hoc teamwork problem as a sequential decision problem and propose (i) the use of an online ...
Ad hoc teamwork for leading a flock
AAMAS '13: Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Designing agents that can cooperate with other agents as a team, without prior coordination or explicit communication, is becoming more desirable as autonomous agents become more prevalent. In this paper we examine an aspect of the problem of leading ...
Ad hoc teamwork for leading a flock
AAMAS '13: Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Designing agents that can cooperate with other agents as a team, without prior coordination or explicit communication, is becoming more desirable as autonomous agents become more prevalent. In my work I examine an aspect of the problem of leading ...

Comments

Information & Contributors

Information

Published In

cover image Autonomous Agents and Multi-Agent Systems

Autonomous Agents and Multi-Agent Systems Volume 30, Issue 2

March 2016

228 pages

ISSN:1387-2532

Issue’s Table of Contents

Copyright © Copyright © 2016 The Author(s).

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 March 2016

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Carvalho Boaro JBezerra da Costa Pde Sousa Moraes DCutrim dos Santos PRibeiro JDuarte JSardinha AColcher S(2024)A Middleware Architecture for Enhancing Multimedia Flows with High-Level Semantic InformationProceedings of the 2024 ACM International Conference on Interactive Media Experiences Workshops10.1145/3672406.3672407(1-6)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3672406.3672407
Xing DGu PZheng QWang XLiu SZheng LAn BPan GKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Controlling type confounding in ad hoc teamwork with instance-wise teammate feedback rectificationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620001(38272-38285)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3620001
Ahmed IBrewitt CCarlucho IChristianos FDunion MFosong EGarcin SGuo SGyevnar BMcInroe TPapoudakis GRahman ASchäfer LTamborski MVecchio GWang CAlbrecht SAlbrecht SWoolridge M(2022)Deep reinforcement learning for multi-agent interactionAI Communications10.3233/AIC-22011635:4(357-368)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.3233/AIC-220116
Mirsky RCarlucho IRahman AFosong EMacke WSridharan MStone PAlbrecht S(2022)A Survey of Ad Hoc Teamwork ResearchMulti-Agent Systems10.1007/978-3-031-20614-6_16(275-293)Online publication date: 14-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-20614-6_16
Neves ASardinha A(2022)Learning to Cooperate with Completely Unknown TeammatesProgress in Artificial Intelligence10.1007/978-3-031-16474-3_60(739-750)Online publication date: 31-Aug-2022
https://dl.acm.org/doi/10.1007/978-3-031-16474-3_60
Santos PRibeiro JSardinha AMelo F(2021)Ad Hoc Teamwork in the Presence of Non-stationary TeammatesProgress in Artificial Intelligence10.1007/978-3-030-86230-5_51(648-660)Online publication date: 7-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-86230-5_51
Ribeiro JFaria MSardinha AMelo F(2021)Helping People on the Fly: Ad Hoc Teamwork for Human-Robot TeamsProgress in Artificial Intelligence10.1007/978-3-030-86230-5_50(635-647)Online publication date: 7-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-86230-5_50
Chen SAndrejczuk EIrissappane AZhang J(2019)ATSISProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367032.3367058(172-179)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367032.3367058
Metcalf KTheobald BApostoloff NAndre EKoenig SDastani MSukthankar G(2018)Learning Sharing Behaviors with Arbitrary Numbers of AgentsProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237882(1232-1240)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.5555/3237383.3237882
Liemhetcharat SVeloso M(2017)Allocating training instances to learning agents for team formationAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9355-331:4(905-940)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1007/s10458-016-9355-3
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents