Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3295222.3295296guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

Multi-view decision processes: the helper-AI problem

Published: 04 December 2017 Publication History

Abstract

We consider a two-player sequential game in which agents have the same reward function but may disagree on the transition probabilities of an underlying Markovian model of the world. By committing to play a specific policy, the agent with the correct model can steer the behavior of the other agent, and seek to improve utility. We model this setting as a multi-view decision process, which we use to formally analyze the positive effect of steering policies. Furthermore, we develop an algorithm for computing the agents' achievable joint policy, and we experimentally show that it can lead to a large utility increase when the agents' models diverge.

References

[1]
Ofra Amir, Ece Kamar, Andrey Kolobov, and Barbara Grosz. Interactive teaching strategies for agent training. In IJCAI 2016, 2016.
[2]
Branislav Bošanský, Simina Brânzei, Kristoffer Arnsfelt Hansen, Peter Bro Miltersen, and Troels Bjerre Sørensen. Computation of Stackelberg Equilibria of Finite Sequential Games. 2015.
[3]
Branislav Bošanskỳ, Viliam Lisỳ, Marc Lanctot, Jiří Čermák, and Mark HM Winands. Algorithms
[4]
for computing strategies in two-player simultaneous move games. Artificial Intelligence, 237:1-40, 2016.
[5]
Avshalom Elmalech, David Sarne, Avi Rosenfeld, and Eden Shalom Erez. When suboptimal rules. In AAAI, pages 1313-1319, 2015.
[6]
Eyal Even-Dar and Yishai Mansour. Approximate equivalence of markov decision processes. In Learning Theory and Kernel Machines. COLT/Kernel 2003, Lecture notes in Computer science, pages 581-594, Washington, DC, USA, 2003. Springer.
[7]
Ya'akov Gal and Avi Pfeffer. Networks of influence diagrams: A formalism for representing agents' beliefs and decision-making processes. Journal of Artificial Intelligence Research, 33(1):109-147, 2008.
[8]
Xiaoxiao Guo, Satinder Singh, and Richard L Lewis. Reward mapping for transfer in long-lived agents. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 2130-2138. 2013.
[9]
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell. Cooperative inverse reinforcement learning, 2016.
[10]
Joshua Letchford, Liam MacDermed, Vincent Conitzer, Ronald Parr, and Charles L. Isbell. Computing optimal strategies to commit to in stochastic games. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, AAAI'12, 2012.
[11]
J. C. R. Licklider. Man-computer symbiosis. RE Transactions on Human Factors in Electronics, 1: 4-11, 1960.
[12]
Michael L Littman, Thomas L Dean, and Leslie Pack Kaelbling. On the complexity of solving markov decision problems. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 394-402. Morgan Kaufmann Publishers Inc., 1995.
[13]
Yishay Mansour and Satinder Singh. On the complexity of policy iteration. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, pages 401-408. Morgan Kaufmann Publishers Inc., 1999.
[14]
Andrew Y Ng, Stuart J Russell, et al. Algorithms for inverse reinforcement learning. In ICML, pages 663-670, 2000.
[15]
Jonathan Sorg, Satinder P Singh, and Richard L Lewis. Internal rewards mitigate agent boundedness. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 1007-1014, 2010.
[16]
Haoqi Zhang and David C. Parkes. Value-based policy teaching with active indirect elicitation. In Proc. 23rd AAAI Conference on Artificial Intelligence (AAAI'08), page 208-214, Chicago, IL, July 2008.
[17]
Haoqi Zhang, David C. Parkes, and Yiling Chen. Policy teaching through reward function learning. In 10th ACM Electronic Commerce Conference (EC'09), page 295-304, 2009.
[18]
Martin Zinkevich, Amy Greenwald, and Michael Littman. Cyclic equilibria in markov games. In Advances in Neural Information Processing Systems, 2005.

Cited By

View all
  1. Multi-view decision processes: the helper-AI problem

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems
    December 2017
    7104 pages

    Publisher

    Curran Associates Inc.

    Red Hook, NY, United States

    Publication History

    Published: 04 December 2017

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)46
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 16 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media