Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Personalizing influence diagrams: applying online learning strategies to dialogue management

  • Original Paper
  • Published:
User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

We consider the problem of adapting the parameters of an influence diagram in an online fashion for real-time personalization. This problem is important when we use the influence diagram repeatedly to make decisions and we are uncertain about its parameters. We describe learning algorithms to solve this problem. In particular, we show how to modify various explore-versus-exploit strategies that are known to work well for Markov decision processes to the more general influence-diagram model. As an illustration, we describe how our techniques for online personalization allow a voice-enabled browser to adapt to a particular speaker for spoken dialogue management. We evaluate all the explore-versus-exploit strategies in this domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Albrecht, D., Zukerman, I., Nicholson, A.: Bayesian models for keyhole plan recognition in an adventure game. User Model. User-Adapted Interaction, Special Issue Machine Learning User Model. 8(1–2) 5–47 (1998)

  • Auer P. (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3: 397–422

    Article  MathSciNet  Google Scholar 

  • Auer, P., Cesa-Bianchi, M., Freund, Y., Schapire, R.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos, CA (1995)

  • Berry, D., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments Chapman and Hall, London (1985)

  • Boutilier C., Dean T., Hanks S. (1999) Decision-theoretic planning: structural assumptions and computational leverage. J. Aritif. Intell. Res. 1: 1–93

    MathSciNet  Google Scholar 

  • Chickering, D.M.: The winmine toolkit. Technical Report MSR-TR-2002-103, Microsoft Redmond, WA (2002)

  • Cooper G.F. (1993) A method for using belief networks as influence diagrams. In: Heckerman D., Mamdani A. (eds) Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann , Washington DC, pp. 55–63

    Google Scholar 

  • Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 761–768. Madison, WI (1998)

  • Heckerman D. (1995) A Bayesian approach for learning causal networks. In: Hanks S., Besnard P. (eds) Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, Montreal, QU

    Google Scholar 

  • Heckerman, D.: A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research (1996)

  • Horvitz, E., Breese, J., Heckerman, D., Hovel, D., Rommelse, K.: The lumiere project: Bayesian user modeling for inferring the goals and needs of software users. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 256–265. Madison, Wisconsin (1998)

  • Howard, R., Matheson, J.: Influence diagrams. In: Readings on the Principles and Applications of Decision Analysis, Vol. II, pp. 721–762. Strategic Decisions Group, Menlo Park, CA (1981)

  • Kaelbling, L.P.: Learning in Embedded Systems. The MIT Press, Cambridge, MA (1993)

  • Kaelbling L.P., Littman M.L., Moore A.W. (1996) Reinforcement learning: a survey. J. Artif. Intell. Res. 4: 237–285

    Google Scholar 

  • Kakade S.M., Ng A.Y. (2005) Online bounds for bayesian algorithms. In: Saul L.K., Weiss Y., Bottou L. (eds) Advances in Neural Information Processing Systems. MIT Press, Cambridge MA, Vol. 17, pp. 641–648

    Google Scholar 

  • Lauritzen S.L., Nilsson D. (2001) Representing and solving decision problems with limited information. Manage. Sci. 47(9): 1235–1251

    Article  Google Scholar 

  • Roy, N., Pineau, J., Thrun, S.: Spoken dialogue management using probabilistic reasoning. In: Proceedings of ACL-2000, pp. 93–100. Hong Kong, China (2000)

  • Shachter, R., Peot, M.: Decision making using probabilistic inference methods. In: Proceedings of the 8th Annual Conference on Uncertainty in Artificial Intelligence, pp. 276–283. San Mateo, CA, Morgan Kaufmann Publishers (1992)

  • Singh S., Litman D., Kearns M., Walker M. (2002) Optimizing dialogue management with reinforcement learning: experiments with the njfun system. J. Artif. Intell. Res. 16: 105–133

    Google Scholar 

  • Sutton, R., Barto A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)

  • Tatman J.A., Shachter R.D. (1990) Dynamic programming and influence diagrams. IEEE Trans. Syst. Man Cybernet. 20(2): 365–379

    Article  MATH  MathSciNet  Google Scholar 

  • Thompson W.R. (1993) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometricka. 25: 285–294

    Google Scholar 

  • Wyatt, J.: Exploration and Inference in Learning from Reinforcement. PhD thesis, University of Edinburgh (1997)

  • Young S. (2000) Probabilistic methods in spoken dialogue systems. Philos. Trans. Roy. Soc. (Ser A) 358(1769): 1389–1402

    Article  MATH  Google Scholar 

  • Zukerman I., Albrecht D. (2001) Predictive statistical models for user modeling. User Model. User-Adapted Interact. 11(1): 5–18

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Maxwell Chickering.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chickering, D.M., Paek, T. Personalizing influence diagrams: applying online learning strategies to dialogue management. User Model User-Adap Inter 17, 71–91 (2007). https://doi.org/10.1007/s11257-006-9020-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11257-006-9020-7

Keywords