Abstract
Compared to conventional hand-crafted rule-based dialogue management systems, statistical POMDP-based dialogue managers offer the promise of increased robustness, reduced development and maintenance costs, and scaleability to large open-domains. As a consequence, there has been considerable research activity in approaches to statistical spoken dialogue systems over recent years. However, building and deploying a real-time spoken dialogue system is expensive, and even when operational, it is hard to recruit sufficient users to get statistically significant results. Instead, researchers have tended to evaluate using user simulators or by reprocessing existing corpora, both of which are unconvincing predictors of actual real world performance. This paper describes the deployment of a real-world restaurant information system and its evaluation in a motor car using subjects recruited locally and by remote users recruited using Amazon Mechanical Turk. The paper explores three key questions: are statistical dialogue systems more robust than conventional hand-crafted systems; how does the performance of a system evaluated on a user simulator compare to performance with real users; and can performance of a system tested over the telephone network be used to predict performance in more hostile environments such as a motor car? The results show that the statistical approach is indeed more robust, but results from a simulator significantly over-estimate performance both absolute and relative. Finally, by matching WER rates, performance results obtained over the telephone can provide useful predictors of performance in noisier environments such as the motor car, but again they tend to over-estimate performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
As well as being used to train the POMDP-based system, the user simulator was used to tune the rules in the conventional hand-crafted system.
References
Roy N, Pineau J, Thrun S (2000) Spoken dialogue management using probabilistic reasoning. In: Proceedings of ACL
Young S (2002) Talking to machines (statistically speaking). In: Proceedings of ICSLP
Williams J, Young S (2007) Partially observable markov decision processes for spoken dialog systems. Comput Speech Lang 21(2):393–422
Young S, Gasic M, Thomson B, Williams J (2013) POMDP-based statistical spoken dialogue systems: a review. Proc IEEE 101(5):1160–1179
Scheffler K, Young S (2000) Probabilistic simulation of human-machine dialogues. In: ICASSP
Pietquin O, Dutoit T (2006) A probabilistic framework for dialog simulation and optimal strategy learning. IEEE Trans Speech Audio Process, Spec Issue Data Min Speech, Audio Dialog 14(2):589–599
Schatzmann J, Weilhammer K, Stuttle M, Young S (2006) A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. KER 21(2):97–126
Pietquin O, Renals S (2002) ASR system modelling for automatic evaluation and optimisation of dialogue systems. In: International Conference on Acoustics Speech and Signal Processing. Florida
Thomson B, Henderson M, Gasic M, Tsiakoulis P, Young S (2012) N-Best error simulation for training spoken dialogue systems. In: IEEE SLT 2012. Miami
Tsiakoulis P, Gašić M, Henderson M, Planells-Lerma J, Prombonas J, Thomson B, Yu K, Young S, Tzirkel E (2012) Statistical methods for building robust spoken dialogue systems in an automobile. In: Proceedings of the 4th applied human factors and ergonomics
Jurčíček F, Keizer S, Gašić M, Mairesse F, Thomson B, Yu K, Young S (2011) Real user evaluation of spoken dialogue systems using amazon mechanical Turk. In: Proceedings of interspeech
Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book version 3.4. Cambridge University, Cambridge
Mairesse F, Gašić M, Jurčíček F, Keizer S, Thomson B, Yu K, Young S (2009) Spoken language understanding from unaligned data using discriminative classification models. In: Proceedings of ICASSP
Henderson M, Gasic M, Thomson B, Tsiakoulis P, Yu K, Young S (2012) Discriminative spoken language understanding using word confusion networks. In: IEEE SLT 2012. Miami
Young S (2007) CUED standard dialogue acts. Cambridge University Engineering Department (14 October 2007)
Thomson B, Young S (2010) Bayesian update of dialogue state: a POMDP framework for spoken dialogue systems. Comput Speech Lang 24(4):562–588
Minka T (2001) Expectation propagation for approximate bayesian inference. In: Proceedings of the 17th conference in uncertainty in artificial intelligence (Seattle). Morgan-Kaufmann, pp 362–369
Thomson B, Jurcicek F, Gasic M, Keizer S, Mairesse F, Yu K, Young S (2010) Parameter learning for POMDP spoken dialogue models. In: IEEE workshop on spoken language technology (SLT 2010). Berkeley
Jurcicek F, Thomson B, Young S (2011) Natural actor and belief critic: reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs. ACM Trans Speech Lang Process 7(3)
Schatzmann J, Thomson B, Weilhammer K, Ye H, Young S (2007) Agenda-Based user simulation for bootstrapping a POMDP dialogue system. In: Proceedings of HLT
Yu K, Young S (2011) Continuous F0 modelling for HMM based statistical parametric speech synthesis. IEEE Audio, Speech Lang Process 19(5):1071–1079
Mairesse F, Gašić M, Jurčíček F, Keizer S, Thomson B, Yu K, Young S (2010) Phrase-based statistical language generation using graphical models and active learning. In: Proceedings of ACL
OnStar (2013) OnStar FMV mirror. http://www.onstarconnections.com/
Williams J (2012) A critical analysis of two statistical spoken dialog systems in public use. In: Spoken language technology workshop (SLT). Miami
Gasic M, Breslin C, Henderson M, Kim D, Szummer M, Thomson B, Tsiakoulis P, Young S (2013) POMDP-based dialogue manager adaptation to extended domains. In: SigDial 13. Metz
Gasic M, Breslin C, Henderson M, Kim D, Szummer M, Thomson B, Tsiakoulis P, Young S (2013) On-line policy optimisation of bayesian spoken dialogue systems via human interaction. In: ICASSP 2013. Vancouver
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Young, S. et al. (2016). Evaluation of Statistical POMDP-Based Dialogue Systems in Noisy Environments. In: Rudnicky, A., Raux, A., Lane, I., Misu, T. (eds) Situated Dialog in Speech-Based Human-Computer Interaction. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-21834-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-21834-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21833-5
Online ISBN: 978-3-319-21834-2
eBook Packages: EngineeringEngineering (R0)