Extrinsic Versus Intrinsic Evaluation of Natural Language Generation for Spoken Dialogue Systems and Social Robotics

Hastie, Helen; Cuayáhuitl, Heriberto; Dethlefs, Nina; Keizer, Simon; Liu, Xingkun

doi:10.1007/978-981-10-2585-3_24

Helen Hastie³,
Heriberto Cuayáhuitl⁴,
Nina Dethlefs⁵,
Simon Keizer³ &
…
Xingkun Liu³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 427))

1620 Accesses
2 Citations

Abstract

In the past 10 years, very few published studies include some kind of extrinsic evaluation of an NLG component in an end-to-end-system, be it for phone or mobile-based dialogues or social robotic interaction. This may be attributed to the fact that these types of evaluations are very costly to set-up and run for a single component. The question therefore arises whether there is anything to be gained over and above intrinsic quality measures obtained in off-line experiments? In this article, we describe a case study of evaluating two variants of an NLG surface realiser and show that there are significant differences in both extrinsic measures and intrinsic measures. These differences can be used to inform further iterations of component and system development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Usefulness, localizability, humanness, and language-benefit: additional evaluation criteria for natural language dialogue systems

Article 04 January 2016

Personality-dependent content selection in natural language generation systems

Article Open access 29 April 2020

From Commands to Goal-Based Dialogs: A Roadmap to Achieve Natural Language Interaction in RoboCup@Home

Notes

1.
http://www.parlance-project.eu
2.
http://crowdflower.com
3.
Extrinsic user-task-success was hand-annotated by a single annotator, being set to 1 if the caller received information on a restaurant that matched their request and if other information (e.g. address, name, phoneNumber) was asked for and correctly received.

References

Kelleher, J.D., Kruijff, G.J.M.: Incremental generation of spatial referring expressions in situated dialog. In: Proceedings of ACL, Sydney, Australia (2006)
Google Scholar
Giuliani, M., Foster, M.E., Isard, A., Matheson, C., Oberlander, J., Knoll, A.: Situated reference in a hybrid human-robot interaction system. In: Proceedings of the INLG, Trim, Ireland (2010)
Google Scholar
Gkatzia, D., Mahamood, S.: A snapshot of NLG evaluation practices 2005 to 2014. In: Proceedings of ENLG (2015)
Google Scholar
Deshmukh, A., Janarthanam, S., Hastie, H., Lim, M.Y., Aylett, R., Castellano, G.: How expressiveness of a robotic tutor is perceived by children in a learning environment. In: Proceedings of HRI (2016)
Google Scholar
Rieser, V., Lemon, O., Keizer, S.: Natural language generation as incremental planning under uncertainty: adaptive information presentation for statistical dialogue systems. IEEE/ACM Trans. Audio Speech Lang. Process. 22(5) (2014)
Google Scholar
Cox, R., O’Donnell, M., Oberlander, J.: Dynamic versus static hypermedia in museum education: an evaluation of ILEX, the intelligent labelling explorer. In: Proceedings of AIED (1999)
Google Scholar
Karasimos, A., Isard, A.: Multi-lingual evaluation of a natural language generation systems. In: Proceedings of LREC (2004)
Google Scholar
Williams, S., Reiter, E.: Generating basic skills reports for low-skilled readers. Nat. Lang. Eng. 14(4), 495–525 (2008)
Article Google Scholar
Dethlefs, N., Cuayáhuitl, H., Hastie, H., Rieser, V., Lemon, O.: Cluster-based prediction of user ratings for stylistic surface realisation. In: Proceedings of the European Chapter of the Annual Meeting of the Association for Computational Linguistics (EACL), Gothenburg, Sweden (2014)
Google Scholar
Cuayáhuitl, H., Dethlefs, N., Hastie, H., Liu, X.: Training a statistical surface realiser from automatic slot labelling. In: Proceedings of SLT, South Lake Tahoe, CA, USA (2014)
Google Scholar
Dethlefs, N., Hastie, H., Cuayáhuitl, H., Lemon, O.: Conditional random fields for responsive surface realisation using global features. In: Proceedings of ACL (2013)
Google Scholar
Hastie, H., Aufaure, M.A., Alexopoulos, P., Cuayáhuitl, H., Dethlefs, N., Gašić, M., Henderson, J., Lemon, O., Liu, X., Mika, P., Ben Mustapha, N., Rieser, V., Thomson, B., Tsiakoulis, P., Vanrompay, Y., Villazon-Terrazas, B.: Demonstration of the PARLANCE system: a data-driven incremental, spoken dialogue system for interactive search. In: Proceedings of SIGDIAL (2013)
Google Scholar
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book Version 3.0. Cambridge University, UK (2000)
Google Scholar
Yazdani, M., Breslin, C., Tsiakoulis, P., Young, S., Henderson, J.: Domain adaptation in ASR and SLU. Technical report, PARLANCE FP7 Project (2014)
Google Scholar
Gašić, M., Breslin, C., Henderson, M., Kim, D., Szummer, M., Thomson, B., Tsiakoulis, P., Young, S.: POMDP-based dialogue manager adaptation to extended domains. In: Proceedings of SIGDIAL (2013)
Google Scholar
Tsiakoulis, P., Breslin, C., Gašić, M., Henderson, M., Kim, D., Young, S.J.: Dialogue context sensitive speech synthesis using factorized decision trees. In: Proceedings of INTERSPEECH (2014)
Google Scholar
Cuayáhuitl, H., Dethlefs, N., Hastie, H.: A semi-supervised clustering approach for semantic slot labelling. In: Proceedings of ICMLA, Detroit, MI, USA (2014)
Google Scholar
Castellano, G., Paiva, A., Kappas, A., Aylett, R., Hastie, H., Barendregt, W., Nabais, F., Bull, S.: Towards empathic virtual and robotic tutors. In: Artificial Intelligence in Education, pp. 733–736. Springer, Berlin (2013)
Google Scholar

Download references

Acknowledgements

This research was funded by the European Commission FP7 programme FP7/2011-14 under grant agreement no. 287615 (PARLANCE). We thank all members of the PARLANCE consortium for their help in designing, building and testing the Parlance end-to-end spoken dialogue system. We would also like to acknowledge other members of the Heriot-Watt Parlance team in particular Prof. Oliver Lemon and Dr. Verena Rieser.

Author information

Authors and Affiliations

School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK
Helen Hastie, Simon Keizer & Xingkun Liu
School of Computer Science, University of Lincoln, Lincoln, UK
Heriberto Cuayáhuitl
School of Engineering and Computer Science, University of Hull, Hull, UK
Nina Dethlefs

Authors

Helen Hastie
View author publications
You can also search for this author in PubMed Google Scholar
Heriberto Cuayáhuitl
View author publications
You can also search for this author in PubMed Google Scholar
Nina Dethlefs
View author publications
You can also search for this author in PubMed Google Scholar
Simon Keizer
View author publications
You can also search for this author in PubMed Google Scholar
Xingkun Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Helen Hastie .

Editor information

Editors and Affiliations

Institute of Behavioural Sciences, University of Helsinki Institute of Behavioural Sciences, Helsinki, Finland
Kristiina Jokinen
University of Helsinki , Helsinki, Finland
Graham Wilcock

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hastie, H., Cuayáhuitl, H., Dethlefs, N., Keizer, S., Liu, X. (2017). Extrinsic Versus Intrinsic Evaluation of Natural Language Generation for Spoken Dialogue Systems and Social Robotics. In: Jokinen, K., Wilcock, G. (eds) Dialogues with Social Robots. Lecture Notes in Electrical Engineering, vol 427. Springer, Singapore. https://doi.org/10.1007/978-981-10-2585-3_24

Download citation

DOI: https://doi.org/10.1007/978-981-10-2585-3_24
Published: 25 December 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2584-6
Online ISBN: 978-981-10-2585-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics