Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Learning automata-based approach to learn dialogue policies in large state space

Published: 01 March 2012 Publication History

Abstract

This paper addresses the problem of scalable optimisation of dialogue policies in speech-based conversational systems using reinforcement learning. More specifically, for large state spaces several difficulties like large tables, an account of prior knowledge and data sparsity are faced. Hence, we present an online policy learning algorithm based on hierarchical structure learning automata using eligibility trace method to find optimal dialogue strategies that cover large state spaces. The proposed algorithm is capable of deriving an optimal policy that prescribes what action should be taken in various states of conversation so as to maximise the expected total reward to attain the goal and incorporates good exploration and exploitation in its updates to improve the naturalness of human-computer interaction. The proposed model is tested using the most sophisticated evaluation framework PARADISE for accessing the travel information system.

References

[1]
Carletta, J.C. (1996) 'Assessing the reliability of subjective coding', Computational Linguistics, Vol. 22, No. 2, pp.249-254.
[2]
Cheyer, A. and Martin, D. (2001) 'The open agent architecture', Journal of Autonomous Agents and Multi-Agent Systems, Vol. 4, pp.143-148.
[3]
Cuayáhuitl, H., Renals, S., Lemon, O. and Shimodaira, H. (2010) 'Evaluation of a hierarchical reinforcement learning spoken dialogue system', Computer Speech and Language, Vol. 24, pp.395-429.
[4]
Frampton, M. and Lemon, O. (2008) 'Using dialogue acts to learn better repair strategies for spoken dialogue systems', Proceedings of IEEE International Conference on Acoustics Speech, and Signal Processing (ICASSP), pp.5045-5048, Las Vegas, NV, USA.
[5]
Frampton, M. and Lemon, O. (2009) 'Recent advances in reinforcement learning in spoken dialogue systems', Knowledge Engineering Review, Vol. 24, No. 4, pp.375-408.
[6]
Goddeau, D. and Meng, H. (1996) 'A form-based dialogue manager for spoken language applications', Proceedings of International Conference on Speech and Language Processing, pp.701-704, Philadelphia, PA, USA.
[7]
Levin, E., Pieraccini, R. and Eckert, W. (2000) 'A stochastic model of human-machine interaction for learning dialogue strategies', IEEE Trans. on Speech and Audio Processing, Vol. 8, No. 1, pp.11-23.
[8]
McTear, M. (2004) Spoken Dialogue Technology: Toward the Conversational User Interface, Springer.
[9]
Narendra, K.S. and Thathachar, M.A.L. (1989) Learning Automata: An Introduction, Prentice-Hall, Englewood Cliffs, NJ.
[10]
Nowe, A. and Verbeeck, K. (2002) 'Colonies of learning automata', IEEE Trans. Syst. Man Cybern. B, Vol. 32, pp.772-780.
[11]
Peak, T. and Pieraccini, R. (2008) 'Automating spoken dialogue management design using machine learning: an industry perspective', Speech Communication, Vol. 50, pp.716-729.
[12]
Pietquin, O. and Dutoit, T. (2006) 'A probabilistic framework for dialogue simulation and optimal strategy learning', IEEE Transactions on Speech and Audio Processing, Vol. 14, No. 2, pp.589-599.
[13]
Rieser, V. and Lemon, O. (2007) 'Learning dialogue strategies for interactive database search', Proceedings of Interspeech, pp.2689-2692, Antwerp, Belgium.
[14]
Rieser, V. and Lemon, O. (2008) 'Learning effective multimodal dialogue strategies from Wizard-of-Oz data: bootstrapping and evaluation', Proceedings of International Conference on Computational Linguistics (ACL), pp.638-646, Columbus, Ohio, USA.
[15]
Roy, N., Pineau, J. and Thrun, S. (2000) 'Spoken dialogue management using probabilistic reasoning', Proceedings of International Conference on Association for Computational Linguistics (ACL), pp.93-100, Hong Kong, China.
[16]
Schatzmann, J., Weilhammer, K., Stuttle, M.N. and Young, S. (2006) 'A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies', Knowledge Engineering Review, Vol. 21, No. 2, pp.97-126.
[17]
Scheffler, K. and Young, S. (2002) 'Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning', Proceedings of Human Language Technology Conference (HLT), pp.12-19, San Diego, CA, USA.
[18]
Singh, S., Litman, D. and Kearns, M. (2002) 'Optimizing dialogue management with reinforcement learning: experiments with the NJFun system', Journal of Artificial Intelligence Research, Vol. 16, pp.105-133.
[19]
Sutton, R. and Barto, A. (1998) Reinforcement Learning: An Introduction, MIT Press, Cambridge, USA.
[20]
Thathachar, M.A.L. and Sastry, P.S. (2004) Networks of Learning Automata: Techniques for Online Stochastic Optimization, Kluwer, Norwell, MA.
[21]
Thomson, B. and Young, S. (2010) 'Bayesian update of dialogue state: a POMDP framework for spoken dialogue systems', Computer Speech and Language, Vol. 24, pp.562-588.
[22]
Toney, D., Moore, J. and Lemon, O. (2006) 'Evolving optimal inspectable strategies for spoken dialogue systems', Proceedings of Human Language Technology Conference (HLT), pp.173-176, NY, USA.
[23]
Verbeeck, K., Nowé, A., Parent, J. and Tuyls, K. (2006) 'Exploring selfish reinforcement learning in repeated games with stochastic rewards', Journal of Autonomous Agents and Multi-agent Systems, Vol. 14, No. 3, pp.239-269.
[24]
Walker, M. and Passonneau, R. (2001) 'DATE: a dialogue act tagging scheme for evaluation of spoken dialogue systems', Proceedings of the Human Language Technology Conference, pp.1-8, San Diego, CA.
[25]
Walker, M., Litman, D., Kamm, C. and Abella, A. (1997) 'PARADISE: a framework for evaluating spoken dialogue agents', Proceedings of the 5th Annual Meeting of the Association for Computational Linguistics (ACL-97), pp.271-280.
[26]
Ward, W. (1994) 'Extracting information from spontaneous speech', Proceedings of the International Conference on Speech and Language Processing (ICSLP), pp.18-22, Yokohama, Japan.
[27]
Watkins, C.J.C.H. and Dayan, P. (1992) 'Q-learning', Machine Learning, Vol. 8, pp.279-292.
[28]
Williams, J. (2007) 'Partially observable Markov decision processes for spoken dialog systems', Computer Speech and Language, Vol. 21, No. 2, pp.393-422.
[29]
Yamagishi, J., Zen, H., Toda, T. and Tokuda, K. (2007) 'Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard challenge 2007', in The Blizzard Challenge, Germany.
[30]
Young, S. (2007) ATK: An Application Toolkit for HTK, Cambridge University Engineering Department.
[31]
Young, S., Gasic, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B. and Yu, K. (2010) 'The hidden information state model: a practical framework for POMDP-based spoken dialogue management', Computer Speech and Language, Vol. 24, pp.150-174.
  1. Learning automata-based approach to learn dialogue policies in large state space

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image International Journal of Intelligent Information and Database Systems
    International Journal of Intelligent Information and Database Systems  Volume 6, Issue 2
    March 2012
    107 pages
    ISSN:1751-5858
    EISSN:1751-5866
    Issue’s Table of Contents

    Publisher

    Inderscience Publishers

    Geneva 15, Switzerland

    Publication History

    Published: 01 March 2012

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Dec 2024

    Other Metrics

    Citations

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media