Abstract
We consider the problem of adapting the parameters of an influence diagram in an online fashion for real-time personalization. This problem is important when we use the influence diagram repeatedly to make decisions and we are uncertain about its parameters. We describe learning algorithms to solve this problem. In particular, we show how to modify various explore-versus-exploit strategies that are known to work well for Markov decision processes to the more general influence-diagram model. As an illustration, we describe how our techniques for online personalization allow a voice-enabled browser to adapt to a particular speaker for spoken dialogue management. We evaluate all the explore-versus-exploit strategies in this domain.
Similar content being viewed by others
References
Albrecht, D., Zukerman, I., Nicholson, A.: Bayesian models for keyhole plan recognition in an adventure game. User Model. User-Adapted Interaction, Special Issue Machine Learning User Model. 8(1–2) 5–47 (1998)
Auer P. (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3: 397–422
Auer, P., Cesa-Bianchi, M., Freund, Y., Schapire, R.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos, CA (1995)
Berry, D., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments Chapman and Hall, London (1985)
Boutilier C., Dean T., Hanks S. (1999) Decision-theoretic planning: structural assumptions and computational leverage. J. Aritif. Intell. Res. 1: 1–93
Chickering, D.M.: The winmine toolkit. Technical Report MSR-TR-2002-103, Microsoft Redmond, WA (2002)
Cooper G.F. (1993) A method for using belief networks as influence diagrams. In: Heckerman D., Mamdani A. (eds) Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann , Washington DC, pp. 55–63
Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 761–768. Madison, WI (1998)
Heckerman D. (1995) A Bayesian approach for learning causal networks. In: Hanks S., Besnard P. (eds) Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, Montreal, QU
Heckerman, D.: A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research (1996)
Horvitz, E., Breese, J., Heckerman, D., Hovel, D., Rommelse, K.: The lumiere project: Bayesian user modeling for inferring the goals and needs of software users. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 256–265. Madison, Wisconsin (1998)
Howard, R., Matheson, J.: Influence diagrams. In: Readings on the Principles and Applications of Decision Analysis, Vol. II, pp. 721–762. Strategic Decisions Group, Menlo Park, CA (1981)
Kaelbling, L.P.: Learning in Embedded Systems. The MIT Press, Cambridge, MA (1993)
Kaelbling L.P., Littman M.L., Moore A.W. (1996) Reinforcement learning: a survey. J. Artif. Intell. Res. 4: 237–285
Kakade S.M., Ng A.Y. (2005) Online bounds for bayesian algorithms. In: Saul L.K., Weiss Y., Bottou L. (eds) Advances in Neural Information Processing Systems. MIT Press, Cambridge MA, Vol. 17, pp. 641–648
Lauritzen S.L., Nilsson D. (2001) Representing and solving decision problems with limited information. Manage. Sci. 47(9): 1235–1251
Roy, N., Pineau, J., Thrun, S.: Spoken dialogue management using probabilistic reasoning. In: Proceedings of ACL-2000, pp. 93–100. Hong Kong, China (2000)
Shachter, R., Peot, M.: Decision making using probabilistic inference methods. In: Proceedings of the 8th Annual Conference on Uncertainty in Artificial Intelligence, pp. 276–283. San Mateo, CA, Morgan Kaufmann Publishers (1992)
Singh S., Litman D., Kearns M., Walker M. (2002) Optimizing dialogue management with reinforcement learning: experiments with the njfun system. J. Artif. Intell. Res. 16: 105–133
Sutton, R., Barto A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Tatman J.A., Shachter R.D. (1990) Dynamic programming and influence diagrams. IEEE Trans. Syst. Man Cybernet. 20(2): 365–379
Thompson W.R. (1993) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometricka. 25: 285–294
Wyatt, J.: Exploration and Inference in Learning from Reinforcement. PhD thesis, University of Edinburgh (1997)
Young S. (2000) Probabilistic methods in spoken dialogue systems. Philos. Trans. Roy. Soc. (Ser A) 358(1769): 1389–1402
Zukerman I., Albrecht D. (2001) Predictive statistical models for user modeling. User Model. User-Adapted Interact. 11(1): 5–18
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chickering, D.M., Paek, T. Personalizing influence diagrams: applying online learning strategies to dialogue management. User Model User-Adap Inter 17, 71–91 (2007). https://doi.org/10.1007/s11257-006-9020-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11257-006-9020-7