article

Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams

Authors:

Muthukumaran Chandrasekaran,

Prashant Doshi,

Yifeng Zeng, and

Yingke ChenAuthors Info & Claims

Autonomous Agents and Multi-Agent Systems, Volume 31, Issue 4

Pages 821 - 860

https://doi.org/10.1007/s10458-016-9354-4

Published: 01 July 2017 Publication History

Abstract

Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents' actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams (I-DID). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DIDs. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations.

References

[1]

Adam, B., & Dekel, E. (1993). Hierarchies of beliefs and common knowledge. International Journal of Game Theory, 59(1), 189---198

[2]

Adoe, F., Chen, Y., & Doshi, P. (2015). Fast solving of influence diagrams for multiagent planning on GPU-enabled architectures. In International conference on agents and artificial intelligence (ICAART) (pp. 183---195)

Digital Library

[3]

Agmon, N., Barrett, S., & Stone, P. (2014). Modeling uncertainty in leading ad hoc teams. In Proceedings of the 13th international conference on autonomous agents and multiagent systems (AAMAS)

Digital Library

[4]

Agmon, N., & Stone, P. (2012). Leading ad hoc agents in joint action settings with multiple teammates. In Proceedings of the 11th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems (Vol. 1, pp. 341---348)

Digital Library

[5]

Agogino, A., & Turner, K. (2005). Multi-agent reward analysis for learning in noisy domains. In Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems (pp. 81---88). Providence, RI: ACM

Digital Library

[6]

Albrecht, S., Crandall, J., & Ramamoorthy, S. (2016). Belief and truth in hypothesised behaviours. Artificial Intelligence 235, 63---94

Digital Library

[7]

Albrecht, S., & Ramamoorthy, S. (2013). A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. Tech. rep., Univ. of Edinburgh

[8]

Albrecht, S., & Ramamoorthy, S. (2013). A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems (extended abstract). In AAMAS (pp. 1155---1156)

Digital Library

[9]

Albrecht, S., & Ramamoorthy, S. (2014). On convergence and optimality of best-response learning with policy types in multiagent systems. In Proceedings of the 30th conference on uncertainty in artificial intelligence (UAI-14). Quebec City

Digital Library

[10]

Albrecht, S., & Ramamoorthy, S. (2015). Are you doing what i think you are doing? criticising uncertain agent models. In Proceedings of the 31st conference on uncertainty in artificial intelligence (UAI-15). Amsterdam

Digital Library

[11]

Amato, C., Konidaris, G. D., & Kaelbling, L. P. (2014). Planning with macro-actions in decentralized pomdps. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International foundation for autonomous agents and multiagent systems (pp. 1273---1280)

Digital Library

[12]

Amato, C., & Oliehoek, F. A. (2015). Scalable planning and learning for multiagent pomdps. In Proceedings of the 29th AAAI conference on artificial intelligence

Digital Library

[13]

Aumann, R. J. (1999). Interactive epistemology II: Probability. International Journal of Game Theory, 28, 301---314.

[14]

Banerjee, B., Lyle, J., Kraemer, L., & Yellamraju, R. (2012) Solving finite horizon decentralized pomdps by distributed reinforcement learning. In AAMAS workshop on MSDM (pp. 9---16)

[15]

Barrett, S., Stone, P., & Kraus, S. (2011). Empirical evaluation of ad hoc teamwork in the pursuit domain. In Autonomous agents and multi-agent systems

Digital Library

[16]

Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819---840.

Digital Library

[17]

Bernstein, D. S., Hansen, E. A., & Zilberstein, S. (2005). Bounded policy iteration for decentralized pomdps. In International joint conference on artificial intelligence

Digital Library

[18]

Binmore, K. (1982). Essays on foundations of game theory. Boston, MA: Pitman.

[19]

Boutilier, C. (1999). Sequential optimality and coordination in multiagent systems. IJCAI, 99, 478---485.

Digital Library

[20]

Bowling, M., & McCracken, P. (2005). Coordination and adaptation in impromptu teams. AAAI, 5, 53---58.

Digital Library

[21]

Bowling, M.H., & McCracken, P. (2005). Coordination and adaptation in impromptu teams. In Association for the advancement of artificial intelligence (pp. 53---58)

Digital Library

[22]

Brandenburger, A. (2007). The power of paradox: Some recent developments in interactive epistemology. International Journal of Game Theory, 35, 465---492.

Digital Library

[23]

Brown, G. W. (1951). Iterative solution of games by fictitious play. Activity analysis of production and allocation, 13(1), 374---376.

[24]

Camerer, C. (2003). Behavioral game theory: Experiments in strategic interaction. Princeton, NJ: Princeton University Press.

[25]

Camerer, C. F., Ho, T. H., & Chong, J. K. (2004). A cognitive hierarchy model of games. The Quarterly Journal of Economics, 119(3), 861---898.

[26]

Carlin, A., & Zilberstein, S. (2008). Value-based observation compression for dec-pomdps. In Proceedings of the 7th international joint conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems (Vol. 1, pp. 501---508)

Digital Library

[27]

Chakraborty, D., & Stone, P. (2013). Cooperating with a markovian ad hoc teammate. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems. International foundation for autonomous agents and multiagent systems (pp. 1085---1092)

Digital Library

[28]

Chandrasekaran, M., Prashant, D., & Zeng, Y. (2010). Approximate solutions of interactive dynamic influence diagrams using epsilon-behavioral equivalence. In 11th international symposium on artificial intelligence and mathematics (ISAIM)

[29]

Chang, Y.h., Ho, T., & Kaelbling, L.P. (2004). All learning is local: Multi-agent learning in global reward games. In Advances in neural information processing systems (pp. 807---814)

Digital Library

[30]

Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Association for the advancement of artificial intelligence (pp. 183---188)

Digital Library

[31]

Dibangoye, J. S., Amato, C., Buffet, O., & Charpillet, F. (2013). Optimally solving Dec-POMDPs as continuous-state MDPs. In Proceedings of the twenty-third international joint conference on artificial intelligence (pp. 90---96). Palo Alto, CA: AAAI Press

Digital Library

[32]

Doshi, P. (2012). Decision making in complex mulitiagent contexts: A tale of two frameworks. AI Magazine, 4(33), 82---95.

[33]

Doshi, P., Chandrasekaran, M., & Zeng, Y. (2010). Epsilon-subjective equivalence of models for interactive dynamic influence diagrams. In IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology(WI-IAT) (Vol. 2, pp. 165---172)

Digital Library

[34]

Doshi, P., & Zeng, Y. (2009). Improved approximation of interactive dynamic influence diagrams using discriminative model updates. In Proceedings of The 8th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent system (Vol. 2, pp. 907---914)

Digital Library

[35]

Doshi, P., & Zeng, Y. (2009). Improved approximation of interactive dynamic influence diagrams using discriminative model updates. In Autonomous agents and multi-agent systems

Digital Library

[36]

Doshi, P., Zeng, Y., & Chen, Q. (2009). Graphical models for interactive pomdps: Representations and solutions. JAAMAS, 18(3), 376---416.

Digital Library

[37]

Gal, Y., & Pfeffer, A. (2003). A language for modeling agent's decision-making processes in games. In Autonomous agents and multi-agent systems (pp. 265---272)

Digital Library

[38]

Gilboa, I., & Schmeidler, D. (2001). A theory of case-based decisions. Cambridge: Cambridge University Press.

[39]

Gmytrasiewicz, P., & Doshi, P. (2005). A framework for sequential planning in multiagent settings. JAIR, 24, 49---79.

[40]

Goodwine, B., & Antsaklis, P. (2013). Multi-agent compositional stability exploiting system symmetries. Automatica, 49(11), 3158---3166.

Digital Library

[41]

Guestrin, C., Koller, D., & Parr, R. (2001). Multiagent planning with factored mdps. NIPS, 1, 1523---1530.

Digital Library

[42]

Hansen, E.A., Bernstein, D.S., & Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. In Association for the advancement of artificial intelligence (pp. 709---715)

Digital Library

[43]

Harsanyi, J. C. (1967). Games with incomplete information played by bayesian players. Management Science, 14(3), 159---182.

Digital Library

[44]

Hoang, T.N., & Low, K.H. (2013). Interactive pomdp lite: Towards practical planning to predict and exploit intentions for interacting with self-interested agents. In International joint conference on artificial intelligence (pp. 2298---2305)

Digital Library

[45]

Kalai, E., & Lehrer, E. (1993). Rational learning leads to nash equilibrium. Econometrica, 61(5), 1019---1045.

[46]

Kim, Y., Nair, R., Varakantham, P., Tambe, M., & Yokoo, M. (2006). Exploiting locality of interaction in networked distributed POMDPs. In AAAI Spring symposium on distributed plan and schedule management

[47]

Koller, D., & Milch, B. (2001). Multi-agent influence diagrams for representing and solving games. In International joint conference on artificial intelligence (pp. 1027---1034)

Digital Library

[48]

Koller, D., & Milch, B. (2003). Multi-agent influence diagrams for representing and solving games. Games and Economic Behavior, 45(1), 181---221.

[49]

Kumar, A., Zilberstein, S., & Toussaint, M. (2011). Scalable multiagent planning using probabilistic inference. In International joint conference on artificial intelligence

Digital Library

[50]

Liu, B., Singh, S., Lewis, R.L., & Qin, S. (2012). Optimal rewards in multiagent teams. In 2012 IEEE international conference on development and learning and epigenetic robotics (ICDL) (pp. 1---8)

[51]

Mccallum, A. K. (1996). Reinforcement learning with selective perception and hidden state. Ph.D. thesis, University of Rochester

Digital Library

[52]

Mertens, J., & Zamir, S. (1985). Formulation of bayesian analysis for games with incomplete information. International Journal of Game Theory, 14, 1---29.

Digital Library

[53]

Meuleau, N., Peshkin, L., eung Kim, K., & Kaelbling, L. P. (1999) Learning finite-state controllers for partially observable environments. In Uncertainty in artificial intelligence (pp. 427---436)

Digital Library

[54]

Nair, R., Tambe, M., Yokoo, M., Pynadath, D., & Marsella, S. (2003) Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In International joint conference on artificial intelligence (pp. 705---711)

Digital Library

[55]

Nair, R., Varakantham, P., Tambe, M., & Yokoo, M. (2005). Networked distributed pomdps: A synthesis of distributed constraint optimization and pomdps. AAAI, 5, 133---139.

Digital Library

[56]

Ng, B., Boakye, K., Meyers, C., & Wang, A. (2012). Bayes-adaptive interactive pomdps. In Association for the advancement of artificial intelligence

Digital Library

[57]

Oliehoek, F.A., Spaan, M.T., Amato, C., & Whiteson, S. (2013) Incremental clustering and expansion for faster optimal planning in Dec-POMDPs. Journal of Artificial Intelligence Research46, 449---509

Digital Library

[58]

Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. JAAMAS, 11(3), 387---434.

Digital Library

[59]

Perkins, T.J. (2002). Reinforcement learning for pomdps based on action values and stochastic optimization. In Association for the advancement of artificial intelligence (pp. 199---204)

Digital Library

[60]

Pineau, J., Gordon, G., & Thrun, S. (2006). Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 27, 335---380

[61]

Pynadath, D., & Marsella, S. (2007). Minimal mental models. In Association for the advancement of artificial intelligence (pp. 1038---1044)

Digital Library

[62]

Pynadath, D. V., & Tambe, M. (2002). The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research, 16, 389---423.

[63]

Rathnasabapathy, B., Doshi, P., & Gmytrasiewicz, P. (2006). Exact solutions of interactive POMDPs using behavioral equivalence. In Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems (pp. 1025---1032). Providence, RI: ACM

Digital Library

[64]

Seuken, S., & Zilberstein, S. (2007). Improved memory-bounded dynamic programming for decentralized pomdps. In Uncertainty in artificial intelligence

Digital Library

[65]

Seuken, S., & Zilberstein, S. (2008). Formal models and algorithms for decentralized decision making under uncertainty. Autonomous Agents and Multi-Agent Systems, 17(2), 190---250.

Digital Library

[66]

Spaan, M., & Oliehoek, F. (2008). The multiagent decision process toolbox: Software for decision-theoretic planning in multiagent systems. In AAMAS workshop on MSDM (pp. 107---121)

[67]

Spaan, M. T. J. (2006). Decentralized planning under uncertainty for teams of communicating agents. In Autonomous agents and multi-agent systems (pp. 249---256)

Digital Library

[68]

Stone, P., Kaminka, G. A., Kraus, S., & Rosenschein, J. S. (2010). Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Association for the advancement of artificial intelligence

Digital Library

[69]

Stone, P., Kaminka, G.A., & Rosenschein, J.S. (2010). Leading a best-response teammate in an ad hoc team. In Agent-mediated electronic commerce. Designing trading strategies and mechanisms for electronic markets (pp. 132---146). New York: Springer

[70]

Stone, P., & Kraus, S. (2010). To teach or not to teach? decision making under uncertainty in ad hoc teams. In Autonomous agents and multi-agent systems

Digital Library

[71]

Tatman, J. A., & Shachter, R. D. (1990). Dynamic programming and influence diagrams. IEEE Transactions on Systems, Man, and Cybernetics, 20(2), 365---379.

[72]

Wageman, R., & Baker, G. (1997). Incentives and cooperation: The joint effects of task and reward interdependence on group performance. Journal of Organizational Behavior, 18(2), 139---158.

[73]

Wright, J. R., & Leyton-Brown, K. (2014). Level-0 meta-models for predicting human behavior in games. In Fifteenth ACM conference on economics and computation (EC) (pp. 857---874)

Digital Library

[74]

Wu, F., Zilberstein, S., & Chen, X. (2011). Online planning for ad hoc autonomous agent teams. In International joint conference on artificial intelligence (pp. 439---445)

Digital Library

[75]

Zeng, Y., & Doshi, P. (2012). Exploiting model equivalences for solving interactive dynamic influence diagrams. JAIR, 43, 211---255.

Digital Library

[76]

Zeng, Y., Doshi, P., Pan, Y., Mao, H., Chandrasekaran, M., & Luo, J. (2011). Utilizing partial policies for identifying equivalence of behavioral models. In Proceedings of the 25th AAAI conference on artificial intelligence

Digital Library

Cited By

de Weerd HVerbrugge RVerheij B(2022)Higher-order theory of mind is especially useful in unpredictable negotiationsAutonomous Agents and Multi-Agent Systems10.1007/s10458-022-09558-636:2Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1007/s10458-022-09558-6
Chen SAndrejczuk EIrissappane AZhang J(2019)ATSISProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367032.3367058(172-179)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367032.3367058

Index Terms

Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Multi-agent systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Autonomous Learning Agents: Layered Learning and Ad Hoc Teamwork
AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems

In order to achieve long-term autonomy in the real world, fully autonomous agents need to be able to learn, both to improve their behaviors in a complex, dynamically changing world, and to enable interaction with previously unfamiliar agents. This talk ...
Read More
Communicating with unknown teammates
AAMAS '14: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems

Past research has investigated a number of methods for coordinating teams of agents, but, with the growing number of sources of agents, it is likely that agents will encounter teammates that do not share their coordination methods. Therefore, it is ...
Read More
Making friends on the fly

Robots are being deployed in an increasing variety of environments for longer periods of time. As the number of robots grows, they will increasingly need to interact with other robots. Additionally, the number of companies and research laboratories ...
Read More

Comments

Information & Contributors

Information

Published In

cover image Autonomous Agents and Multi-Agent Systems

Autonomous Agents and Multi-Agent Systems Volume 31, Issue 4

July 2017

176 pages

ISSN:1387-2532

Issue’s Table of Contents

Copyright © Copyright © 2017 The Author(s).

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 2017

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

de Weerd HVerbrugge RVerheij B(2022)Higher-order theory of mind is especially useful in unpredictable negotiationsAutonomous Agents and Multi-Agent Systems10.1007/s10458-022-09558-636:2Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1007/s10458-022-09558-6
Chen SAndrejczuk EIrissappane AZhang J(2019)ATSISProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367032.3367058(172-179)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367032.3367058

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents