Abstract
This paper addresses the problem of automated advice provision in scenarios that involve repeated interactions between people and computer agents. This problem arises in many applications such as route selection systems, office assistants and climate control systems. To succeed in such settings agents must reason about how their advice influences people’s future actions or decisions over time. This work models such scenarios as a family of repeated bilateral interaction called “choice selection processes”, in which humans or computer agents may share certain goals, but are essentially self-interested. We propose a social agent for advice provision (SAP) for such environments that generates advice using a social utility function which weighs the sum of the individual utilities of both agent and human participants. The SAP agent models human choice selection using hyperbolic discounting and samples the model to infer the best weights for its social utility function. We demonstrate the effectiveness of SAP in two separate domains which vary in the complexity of modeling human behavior as well as the information that is available to people when they need to decide whether to accept the agent’s advice. In both of these domains, we evaluated SAP in extensive empirical studies involving hundreds of human subjects. SAP was compared to agents using alternative models of choice selection processes informed by behavioral economics and psychological models of decision-making. Our results show that in both domains, the SAP agent was able to outperform alternative models. This work demonstrates the efficacy of combining computational methods with behavioral economics to model how people reason about machine-generated advice and presents a general methodology for agent-design in such repeated advice settings.
Similar content being viewed by others
Notes
We use the term “world state” to disambiguate the states of an MDP from those of a selection process.
This method is more common in POMDPs, however, since our state space is very large, we use this method as well.
This model does not require an additional parameter for the actual cost for the receiver (\(c_R(a,v)\)), since \(c_R(a,v)\) is already a linear combination of the comfort level and the energy consumption.
In fact, the exact equivalent to the road selection domain, would be assuming that the user set a cost to each of the possible combinations of the heat load and each of the possible power levels. However, such an assumption would result with too many arms, most of which would not be sampled or sampled only once, and thus would not result in a good human model.
References
Camerer, C. F. (2003). Behavioral game theory. Experiments in strategic interaction, Chapter 2. Princeton: Princeton University Press.
Bonaccio, S., & Dalal, R. S. (2006). Advice taking and decision-making: An integrative literature review and implications for the organizational sciences. Organizational Behavior and Human Decision Processes, 101(2), 127–151.
Yaniv, I., & Kleinberger, E. (2000). Advice taking in decision making: Egocentric discounting and reputation formation. Organizational Behavior and Human Decision Processes, 83(2), 260–281.
Gans, N., Knox, G., & Croson, R. (2007). Simple models of discrete choice and their performance in bandit experiments. Manufacturing & Service Operations Management, 9(4), 383–408.
Haile, P. A., Hortasu, A., & Kosenok, G. (2008). On the empirical content of quantal response equilibrium. American Economic Review, 98(1), 180–200.
Amazon. (2010). Mechanical turk services. Retrieved from http://www.mturk.com/.
Azaria, A., Rabinovich, Z., Kraus, S., Goldman, C. V., & Gal, Y. (2012). Strategic advice provision in repeated human-agent interactions. In The 26th AAAI Conference on Artificial Intelligence (AAAI), Bellevue, WA.
Jonker, C. M., Hindriks, K. V., Wiggers, P., & Broekens, J. (2012). Negotiating agents. AI Magazine, 33(3), 79.
Rovatsos, M., & Belesiotis, A. (2007). Advice taking in multiagent reinforcement learning. In AAMAS (pp. 237). New York: ACM.
Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749.
Ricci, F., Rokach, L., Shapira, B., & Kantor, P. B. (Eds.). (2011). Recommender systems handbook. New York: Springer.
Azaria, A., Hassidim, A., Kraus, S., Eshkol, A., Weintraub, O., & Netanely, I. (2013). Movie recommender system for profit maximization. In RecSys (pp. 121–128).
Chen, L. S., Hsu, F. H., Chen, M. C., & Hsu, Y. C. (2008). Developing recommender systems with the consideration of product profitability for sellers. Information Sciences, 178(4), 1032–1048.
Das, A., Mathieu, C., & Ricketts, D. (2009). Maximizing profit using recommender systems. ArXiv e-prints, pp. 0908, 3633.
Pathak, B., Garfinkel, R., Gopal, R. D., Venkatesan, R., & Yin, F. (2010). Empirical analysis of the impact of recommender systems on sales. Journal of Management Information Systems, 27(2), 159–188.
Shani, G., Heckerman, D., & Brafman, R. I. (2005). An MDP-based recommender system. The Journal of Machine Learning Research, 6, 1265–1295.
Rosenberg, S. W., Bohan, L., McCafferty, P., & Harris, K. (1986). The image and the vote: The effect of candidate presentation on voter preference. American Journal of Political Science, 30, 108–127.
Fenster, M., Zuckerman, I., & Kraus, S. (2012). Guiding user choice during discussion by silence, examples and justifications. ECAI (pp. 330–335). Amsterdam: IOS Press.
Azaria, A., Rabinovich, Z., Kraus, S., & Goldman, C. V. (2011). Strategic information disclosure to people with multiple alternatives. In Proceedings of the 26th AAAI Conference on artificial intelligence (AAAI), Maryland.
Hajaj, C., Hazon, N., & Sarne, D. (2014). Ordering effects and belief adjustment in the use of comparison shopping agents. In AAAI-14 (pp. 930–936). Israel: Bar-Ilan University.
Hajaj, C., Hazon, N., Sarne, D., & Elmalech, A. (2013). Search more, disclose less. In Proceedings of the twenty-seventh AAAI conference on artificial intelligence (pp. 401–408), Bellevue.
Elmalech, A., Sarne, D., Rosenfeld, A., & Erez, E. S. (2015). When suboptimal rules. In Proceedings of AAAI-15, Menlo Park, CA.
Wahlster, W., & Kobsa, A. (1989). User models in dialog systems. Berlin: Springer.
Horvitz, E., Breese, J., Heckerman, D., Hovel, D., & Rommelse, K. (1998). The lumiere project: Bayesian user modeling for inferring the goals and needs of software users. In Proceedings of the fourteenth conference on uncertainty in artificial intelligence (pp. 256–265), Madison.
Amir, O., & Gal, Y. K. (2013). Plan recognition and visualization in exploratory learning environments. ACM Transactions on Interactive Intelligent Systems (TiiS), 3(3), 16.
Kim, T., Hong, H., & Magerko, B. (2009). Coralog: Use-aware visualization connecting human micro-activities to environmental change. In CHI’09 Extended abstracts on human factors in computing systems (pp. 4303–4308). New York: ACM.
Petersen, D., Steele, J., & Wilkerson, J. (2009). Wattbot: A residential electricity monitoring and feedback system. In CHI’09 extended abstracts on human factors in computing systems (pp. 2847–2852). New York: ACM.
Pierce, J., Schiano, D. J., & Paulos, E. (2010). Home, habits, and energy: Examining domestic interactions and energy consumption. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1985–1994). New York: ACM.
Froehlich, J., Findlater, L., & Landay, J. (2010). The design of eco-feedback technology. In SIGCHI conference on human factors in computing systems (pp. 1999–2008). New York: ACM.
Fogg, B. J. (2002). Persuasive technology: Using computers to change what we think and do. Ubiquity, 2002, 5.
Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of 36th annual symposium on foundations of computer science (FOCS), (pp. 322–331). Alamitos: IEEE Computer Society Press.
Chabris, C. F., Laibson, D. I., & Schuldt, J. P. (2006). Intertemporal choice. The New Palgrave Dictionary of Economics, 2, 1–11.
Deaton, A., & Paxson, C. (1994). Intertemporal choice and inequality. The Journal of Political Economy, 102(3), 437–467.
Lisman, J. E., & Idiart, M. A. P. (1995). Storage of 7 \(\pm \) 2 short-term memories in oscillatory subcycles. Science, 267, 1512–1515.
Miller, G. A. (1956). The magical number seven plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97.
Vermorel, Joanns, & Mohri, Mehryar. (2005). Multi-armed bandit algorithms and empirical evaluation. European conference on machine learning (pp. 437–448). New York: Springer.
Goldman, C. V., & Zilberstein, S. (2003). Optimizing information exchange in cooperative multi-agent systems. In Proceedings of the second international joint conference on autonomous agents and multiagent systems (pp. 137–144). Melbourne: ACM Press.
Guestrin, C., Koller, D., & Parr, R. (2001). Multiagent planning with factored mdps. In NIPS (Vol. 1, pp. 1523–1530). Dordrecht: Kluwer Academic Publishers.
Marecki, J., Koenig, S., & Tambe, M. (2007). A fast analytical algorithm for solving markov decision processes with real-valued resources. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 2536–2541), Hyderabad.
Feng, Z., Dearden, R., Meuleau, N., & Washington, R. (2004). Dynamic programming for structured continuous markov decision problems. In The 20th conference on uncertainty in artificial intelligence (pp. 154–161). Orlando: AUAI Press.
Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49(2), 161–178.
Keith, W. (1970). Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1), 97–109.
Metropolis, N., & Ulam, S. (1949). The Monte carlo method. Journal of the American statistical association, 44(247), 335–341.
Gal, Y., Kraus, S., Gelfand, M., Khashan, H., & Salmon, E. (2011). An adaptive agent for negotiating with people in different cultures. ACM Transactions on Intelligent Systems and Technology (TIST), 3(1), 8.
Silver, D., & Veness, J. (2010). Monte-carlo planning in large pomdps. In Advances in neural information processing systems (pp. 2164–2172).
Stone, P., & Kraus, S. (2010). To teach or not to teach? Decision making under uncertainty in ad hoc teams. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (Vol. pp. 117–124). Toronto: International Foundation for Autonomous Agents and Multiagent Systems.
Nguyen, T., Yang, R., Azaria, A., Kraus, S., & Tambe, M. (2013). Analyzing the effectiveness of adversary modeling in security games. In AAAI, New York.
Azaria, A., Rabinovich, Z., Kraus, S., & Goldman, C. V. (2014). Strategic information disclosure to people with multiple alternatives. Transactions on Intelligent Systems and Technology (TIST), 5(4), 64–86.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Azaria, A., Gal, Y., Kraus, S. et al. Strategic advice provision in repeated human-agent interactions. Auton Agent Multi-Agent Syst 30, 4–29 (2016). https://doi.org/10.1007/s10458-015-9284-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10458-015-9284-6