Article

Free access

Multi-view decision processes: the helper-AI problem

Authors:

Christos Dimitrakakis,

David C. Parkes,

Goran Radanovic,

Paul TylkinAuthors Info & Claims

NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems

Pages 5449 - 5458

Published: 04 December 2017 Publication History

PDF eReader Publisher Site

Abstract

We consider a two-player sequential game in which agents have the same reward function but may disagree on the transition probabilities of an underlying Markovian model of the world. By committing to play a specific policy, the agent with the correct model can steer the behavior of the other agent, and seek to improve utility. We model this setting as a multi-view decision process, which we use to formally analyze the positive effect of steering policies. Furthermore, we develop an algorithm for computing the agents' achievable joint policy, and we experimentally show that it can lead to a large utility increase when the agents' models diverge.

References

[1]

Ofra Amir, Ece Kamar, Andrey Kolobov, and Barbara Grosz. Interactive teaching strategies for agent training. In IJCAI 2016, 2016.

[2]

Branislav Bošanský, Simina Brânzei, Kristoffer Arnsfelt Hansen, Peter Bro Miltersen, and Troels Bjerre Sørensen. Computation of Stackelberg Equilibria of Finite Sequential Games. 2015.

Digital Library

[3]

Branislav Bošanskỳ, Viliam Lisỳ, Marc Lanctot, Jiří Čermák, and Mark HM Winands. Algorithms

[4]

for computing strategies in two-player simultaneous move games. Artificial Intelligence, 237:1-40, 2016.

Digital Library

[5]

Avshalom Elmalech, David Sarne, Avi Rosenfeld, and Eden Shalom Erez. When suboptimal rules. In AAAI, pages 1313-1319, 2015.

Digital Library

[6]

Eyal Even-Dar and Yishai Mansour. Approximate equivalence of markov decision processes. In Learning Theory and Kernel Machines. COLT/Kernel 2003, Lecture notes in Computer science, pages 581-594, Washington, DC, USA, 2003. Springer.

[7]

Ya'akov Gal and Avi Pfeffer. Networks of influence diagrams: A formalism for representing agents' beliefs and decision-making processes. Journal of Artificial Intelligence Research, 33(1):109-147, 2008.

Digital Library

[8]

Xiaoxiao Guo, Satinder Singh, and Richard L Lewis. Reward mapping for transfer in long-lived agents. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 2130-2138. 2013.

[9]

Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell. Cooperative inverse reinforcement learning, 2016.

Digital Library

[10]

Joshua Letchford, Liam MacDermed, Vincent Conitzer, Ronald Parr, and Charles L. Isbell. Computing optimal strategies to commit to in stochastic games. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, AAAI'12, 2012.

[11]

J. C. R. Licklider. Man-computer symbiosis. RE Transactions on Human Factors in Electronics, 1: 4-11, 1960.

[12]

Michael L Littman, Thomas L Dean, and Leslie Pack Kaelbling. On the complexity of solving markov decision problems. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 394-402. Morgan Kaufmann Publishers Inc., 1995.

Digital Library

[13]

Yishay Mansour and Satinder Singh. On the complexity of policy iteration. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, pages 401-408. Morgan Kaufmann Publishers Inc., 1999.

Digital Library

[14]

Andrew Y Ng, Stuart J Russell, et al. Algorithms for inverse reinforcement learning. In ICML, pages 663-670, 2000.

Digital Library

[15]

Jonathan Sorg, Satinder P Singh, and Richard L Lewis. Internal rewards mitigate agent boundedness. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 1007-1014, 2010.

Digital Library

[16]

Haoqi Zhang and David C. Parkes. Value-based policy teaching with active indirect elicitation. In Proc. 23rd AAAI Conference on Artificial Intelligence (AAAI'08), page 208-214, Chicago, IL, July 2008.

[17]

Haoqi Zhang, David C. Parkes, and Yiling Chen. Policy teaching through reward function learning. In 10th ACM Electronic Commerce Conference (EC'09), page 295-304, 2009.

Digital Library

[18]

Martin Zinkevich, Amy Greenwald, and Michael Littman. Cyclic equilibria in markov games. In Advances in Neural Information Processing Systems, 2005.

Cited By

Gkatzelis VHartline J(2019)SIGecom job market candidate profiles 2019ACM SIGecom Exchanges10.1145/3331033.333103517:1(2-36)Online publication date: 7-May-2019
https://dl.acm.org/doi/10.1145/3331033.3331035

Multi-view decision processes: the helper-AI problem
1. Computing methodologies

Recommendations

Markov Decision Processes with Sample Path Constraints: The Communicating Case

We consider time-average Markov Decision Processes MDPs, which accumulate a reward and cost at each decision epoch. A policy meets the sample-path constraint if the time-average cost is below a specified value with probability one. The optimization ...
Variability Sensitive Markov Decision Processes

Considered are time-average Markov Decision Processes MDPs with finite state and action spaces. Two definitions of variability are introduced, namely, the expected time-average variability and time-average expected variability. The two criteria are in ...
Continuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach

This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems

December 2017

7104 pages

ISBN:9781510860964

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 04 December 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
63
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)7

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gkatzelis VHartline J(2019)SIGecom job market candidate profiles 2019ACM SIGecom Exchanges10.1145/3331033.333103517:1(2-36)Online publication date: 7-May-2019
https://dl.acm.org/doi/10.1145/3331033.3331035

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents