Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3139491.3139504acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
abstract

A review of evaluation techniques for social dialogue systems

Published: 13 November 2017 Publication History

Abstract

In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success. Consequently, evaluation of these systems is notoriously hard. In this paper, we review current evaluation methods, focusing on automatic metrics. We conclude that turn-based metrics often ignore the context and do not account for the fact that several replies are valid, while end-of-dialogue rewards are mainly hand-crafted. Both lack grounding in human perceptions.

References

[1]
Helen Hastie. 2012. Metrics and evaluation of spoken dialogue systems. In Data-Driven Methods for Adaptive Spoken Dialogue Systems. Springer New York, 131–150.
[2]
Anjuli Kannan and Oriol Vinyals. 2017. Adversarial Evaluation of Dialogue Models. CoRR abs/1701.08198 (2017). http://arxiv.org/abs/1701.08198
[3]
Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. 2016. Deep Reinforcement Learning for Dialogue Generation. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP).
[4]
Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text summarization branches out: Proceedings of the ACL-04 workshop. Barcelona, Spain, 74–81. http://aclweb.org/anthology/W04-1013
[5]
Chia-Wei Liu, Ryan Lowe, Iulian Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP). Austin, TX, USA. arXiv:1603.08023.
[6]
Ryan Lowe, Michael Noseworthy, Iulian Serban, Nicolas Angelard-Gontier, Yoshua Bengio, and Joelle Pineau. 2017. Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses. In (under review).
[7]
Ryan Lowe, Iulian Vlad Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. On the Evaluation of Dialogue Systems with Next Utterance Classification. In Proceedings of the SIGDIAL 2016 Conference, The 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 13-15 September 2016, Los Angeles, CA, USA. 264–269. http://aclweb.org/anthology/W/W16/W16-3634.pdf
[8]
Mike McTear. 2004. Spoken Dialogue Technology: Toward the Conversational User Interface. Springer, London.
[9]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL) (ACL ’02). 311–318.
[10]
Verena Rieser and Oliver Lemon. 2011. Reinforcement Learning for Adaptive Dialogue Systems: A Data-driven Methodology for Dialogue Management and Natural Language Generation. Book Series: Theory and Applications of Natural Language Processing, Springer, Berlin/Heidelberg.
[11]
Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan. 2015. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Denver, Colorado.
[12]
Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869 (2015).
[13]
Zhou Yu, Ziyu Xu, Alan W. Black, and Alexander I. Rudnicky. 2016. Strategy and Policy Learning for Non-Task-Oriented Conversational Systems. In Proceedings of the SIGDIAL. Abstract 1 Introduction 2 Automatic Metrics 2.1 Word-Overlap Metrics 2.2 Machine Learning Methods for Dialogue Evaluation 2.3 Reward-based Metrics 3 Conclusion and Discussion References

Cited By

View all
  • (2024)An Automatic Evaluation Framework for Social Conversations with RobotsProceedings of the 2024 International Symposium on Technological Advances in Human-Robot Interaction10.1145/3648536.3648543(56-64)Online publication date: 9-Mar-2024
  • (2023)Cooperative Attention-Based Learning between Diverse Data SourcesAlgorithms10.3390/a1605024016:5(240)Online publication date: 4-May-2023
  • (2023)Lost in Dialogue: A Review and Categorisation of Current Dialogue System Approaches and Technical SolutionsKI 2023: Advances in Artificial Intelligence10.1007/978-3-031-42608-7_9(98-113)Online publication date: 18-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISIAA 2017: Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents
November 2017
48 pages
ISBN:9781450355582
DOI:10.1145/3139491
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2017

Check for updates

Author Tags

  1. Automatic Evaluation
  2. Conversational Agents
  3. Evaluation Metrics
  4. Social Dialogue Systems

Qualifiers

  • Abstract

Funding Sources

  • RAEng/Leverhulme Trust Senior Research Fellowship Scheme
  • EPSRC

Conference

ICMI '17
Sponsor:

Upcoming Conference

CHI '25
CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Automatic Evaluation Framework for Social Conversations with RobotsProceedings of the 2024 International Symposium on Technological Advances in Human-Robot Interaction10.1145/3648536.3648543(56-64)Online publication date: 9-Mar-2024
  • (2023)Cooperative Attention-Based Learning between Diverse Data SourcesAlgorithms10.3390/a1605024016:5(240)Online publication date: 4-May-2023
  • (2023)Lost in Dialogue: A Review and Categorisation of Current Dialogue System Approaches and Technical SolutionsKI 2023: Advances in Artificial Intelligence10.1007/978-3-031-42608-7_9(98-113)Online publication date: 18-Sep-2023
  • (2021)Computational Grounding: An Overview of Common Ground Applications in Conversational AgentsItalian Journal of Computational Linguistics10.4000/ijcol.8907:1 | 2(133-156)Online publication date: 1-Dec-2021
  • (2020)Pragmatics Research and Non-task Dialog TechnologyProceedings of the 2nd Conference on Conversational User Interfaces10.1145/3405755.3406142(1-3)Online publication date: 22-Jul-2020
  • (2019)Exploring Interaction with Remote Autonomous Systems using Conversational AgentsProceedings of the 2019 on Designing Interactive Systems Conference10.1145/3322276.3322318(1543-1556)Online publication date: 18-Jun-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media