article

Learning automata-based approach to learn dialogue policies in large state space

Authors:

G. Kumaravelan,

R. SivakumarAuthors Info & Claims

International Journal of Intelligent Information and Database Systems, Volume 6, Issue 2

Pages 180 - 199

https://doi.org/10.1504/IJIIDS.2012.045844

Published: 01 March 2012 Publication History

Abstract

This paper addresses the problem of scalable optimisation of dialogue policies in speech-based conversational systems using reinforcement learning. More specifically, for large state spaces several difficulties like large tables, an account of prior knowledge and data sparsity are faced. Hence, we present an online policy learning algorithm based on hierarchical structure learning automata using eligibility trace method to find optimal dialogue strategies that cover large state spaces. The proposed algorithm is capable of deriving an optimal policy that prescribes what action should be taken in various states of conversation so as to maximise the expected total reward to attain the goal and incorporates good exploration and exploitation in its updates to improve the naturalness of human-computer interaction. The proposed model is tested using the most sophisticated evaluation framework PARADISE for accessing the travel information system.

References

[1]

Carletta, J.C. (1996) 'Assessing the reliability of subjective coding', Computational Linguistics, Vol. 22, No. 2, pp.249-254.

Digital Library

[2]

Cheyer, A. and Martin, D. (2001) 'The open agent architecture', Journal of Autonomous Agents and Multi-Agent Systems, Vol. 4, pp.143-148.

[3]

Cuayáhuitl, H., Renals, S., Lemon, O. and Shimodaira, H. (2010) 'Evaluation of a hierarchical reinforcement learning spoken dialogue system', Computer Speech and Language, Vol. 24, pp.395-429.

Digital Library

[4]

Frampton, M. and Lemon, O. (2008) 'Using dialogue acts to learn better repair strategies for spoken dialogue systems', Proceedings of IEEE International Conference on Acoustics Speech, and Signal Processing (ICASSP), pp.5045-5048, Las Vegas, NV, USA.

[5]

Frampton, M. and Lemon, O. (2009) 'Recent advances in reinforcement learning in spoken dialogue systems', Knowledge Engineering Review, Vol. 24, No. 4, pp.375-408.

Digital Library

[6]

Goddeau, D. and Meng, H. (1996) 'A form-based dialogue manager for spoken language applications', Proceedings of International Conference on Speech and Language Processing, pp.701-704, Philadelphia, PA, USA.

[7]

Levin, E., Pieraccini, R. and Eckert, W. (2000) 'A stochastic model of human-machine interaction for learning dialogue strategies', IEEE Trans. on Speech and Audio Processing, Vol. 8, No. 1, pp.11-23.

[8]

McTear, M. (2004) Spoken Dialogue Technology: Toward the Conversational User Interface, Springer.

[9]

Narendra, K.S. and Thathachar, M.A.L. (1989) Learning Automata: An Introduction, Prentice-Hall, Englewood Cliffs, NJ.

[10]

Nowe, A. and Verbeeck, K. (2002) 'Colonies of learning automata', IEEE Trans. Syst. Man Cybern. B, Vol. 32, pp.772-780.

Digital Library

[11]

Peak, T. and Pieraccini, R. (2008) 'Automating spoken dialogue management design using machine learning: an industry perspective', Speech Communication, Vol. 50, pp.716-729.

Digital Library

[12]

Pietquin, O. and Dutoit, T. (2006) 'A probabilistic framework for dialogue simulation and optimal strategy learning', IEEE Transactions on Speech and Audio Processing, Vol. 14, No. 2, pp.589-599.

Digital Library

[13]

Rieser, V. and Lemon, O. (2007) 'Learning dialogue strategies for interactive database search', Proceedings of Interspeech, pp.2689-2692, Antwerp, Belgium.

[14]

Rieser, V. and Lemon, O. (2008) 'Learning effective multimodal dialogue strategies from Wizard-of-Oz data: bootstrapping and evaluation', Proceedings of International Conference on Computational Linguistics (ACL), pp.638-646, Columbus, Ohio, USA.

[15]

Roy, N., Pineau, J. and Thrun, S. (2000) 'Spoken dialogue management using probabilistic reasoning', Proceedings of International Conference on Association for Computational Linguistics (ACL), pp.93-100, Hong Kong, China.

Digital Library

[16]

Schatzmann, J., Weilhammer, K., Stuttle, M.N. and Young, S. (2006) 'A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies', Knowledge Engineering Review, Vol. 21, No. 2, pp.97-126.

Digital Library

[17]

Scheffler, K. and Young, S. (2002) 'Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning', Proceedings of Human Language Technology Conference (HLT), pp.12-19, San Diego, CA, USA.

[18]

Singh, S., Litman, D. and Kearns, M. (2002) 'Optimizing dialogue management with reinforcement learning: experiments with the NJFun system', Journal of Artificial Intelligence Research, Vol. 16, pp.105-133.

[19]

Sutton, R. and Barto, A. (1998) Reinforcement Learning: An Introduction, MIT Press, Cambridge, USA.

[20]

Thathachar, M.A.L. and Sastry, P.S. (2004) Networks of Learning Automata: Techniques for Online Stochastic Optimization, Kluwer, Norwell, MA.

[21]

Thomson, B. and Young, S. (2010) 'Bayesian update of dialogue state: a POMDP framework for spoken dialogue systems', Computer Speech and Language, Vol. 24, pp.562-588.

Digital Library

[22]

Toney, D., Moore, J. and Lemon, O. (2006) 'Evolving optimal inspectable strategies for spoken dialogue systems', Proceedings of Human Language Technology Conference (HLT), pp.173-176, NY, USA.

[23]

Verbeeck, K., Nowé, A., Parent, J. and Tuyls, K. (2006) 'Exploring selfish reinforcement learning in repeated games with stochastic rewards', Journal of Autonomous Agents and Multi-agent Systems, Vol. 14, No. 3, pp.239-269.

Digital Library

[24]

Walker, M. and Passonneau, R. (2001) 'DATE: a dialogue act tagging scheme for evaluation of spoken dialogue systems', Proceedings of the Human Language Technology Conference, pp.1-8, San Diego, CA.

Digital Library

[25]

Walker, M., Litman, D., Kamm, C. and Abella, A. (1997) 'PARADISE: a framework for evaluating spoken dialogue agents', Proceedings of the 5th Annual Meeting of the Association for Computational Linguistics (ACL-97), pp.271-280.

[26]

Ward, W. (1994) 'Extracting information from spontaneous speech', Proceedings of the International Conference on Speech and Language Processing (ICSLP), pp.18-22, Yokohama, Japan.

[27]

Watkins, C.J.C.H. and Dayan, P. (1992) 'Q-learning', Machine Learning, Vol. 8, pp.279-292.

Digital Library

[28]

Williams, J. (2007) 'Partially observable Markov decision processes for spoken dialog systems', Computer Speech and Language, Vol. 21, No. 2, pp.393-422.

Digital Library

[29]

Yamagishi, J., Zen, H., Toda, T. and Tokuda, K. (2007) 'Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard challenge 2007', in The Blizzard Challenge, Germany.

[30]

Young, S. (2007) ATK: An Application Toolkit for HTK, Cambridge University Engineering Department.

[31]

Young, S., Gasic, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B. and Yu, K. (2010) 'The hidden information state model: a practical framework for POMDP-based spoken dialogue management', Computer Speech and Language, Vol. 24, pp.150-174.

Digital Library

Learning automata-based approach to learn dialogue policies in large state space
1. Computing methodologies

Recommendations

An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system
EACL '06: Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations

We demonstrate a multimodal dialogue system using reinforcement learning for in-car scenarios, developed at Edinburgh University and Cambridge University for the TALK project. This prototype is the first "Information State Update" (ISU) dialogue system ...
A framework to co-optimize task and social dialogue policies using Reinforcement Learning
IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents

One of the main challenges for conversational agents is to select the optimal dialogue policy based on the state of the interaction. This challenge becomes even harder when the conversational agent not only has to achieve a specific task, but also aims ...
Learning cooperative persuasive dialogue policies using framing

Systems Performance for the user simulator is greatly improved by reinforcement learning.Framing is somewhat effective for the user simulator.Average rewards of system reach the minimum value with the policy where the estimated GPF reaches the highest ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of Intelligent Information and Database Systems

International Journal of Intelligent Information and Database Systems Volume 6, Issue 2

March 2012

107 pages

ISSN:1751-5858

EISSN:1751-5866

Issue’s Table of Contents

Publisher

Inderscience Publishers

Geneva 15, Switzerland

Publication History

Published: 01 March 2012

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents