Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3409256.3409814acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
short-paper

Interactive Evaluation of Conversational Agents: Reflections on the Impact of Search Task Design

Published: 14 September 2020 Publication History

Abstract

Undertaking an interactive evaluation of goal-oriented conversational agents (CAs) is challenging, it requires the search task to be realistic and relatable while accounting for the users cognitive limitations. In the current paper we discuss findings of two Wizard of Oz studies and provide our reflections regarding the impact of different interactive search task designs on participants? performance, satisfaction and cognitive workload. In the first study, we tasked participants with finding a cheapest flight that met a certain departure time. In the second study we added an additional criterion: "travel time" and asked participants to find a fight option that offered a good trade-off between price and travel time. We found that using search tasks where participants need to decide between several competing search criteria (price vs. time) led to a higher search involvement and lower variance in usability and cognitive workload ratings between different CAs. We hope that our results will provoke discussion on how to make the evaluation of voice-only goal-oriented CAs more reliable and ecologically valid.

References

[1]
Leif Azzopardi, Mateusz Dubiel, Martin Halvey, and Jeffery Dalton. 2018. Conceptualizing agent-human interactions during the conversational search process. In 2nd International Workshop on Conversational Approaches to Information Retrieval.
[2]
Hans H Bauer, Nicola E Sauer, and Christine Becker. 2006. Investigating the relationship between product involvement and consumer decision-making styles. Journal of Consumer Behaviour 5, 4 (2006), 342--354.
[3]
James V Bradley. 1958. Complete counterbalancing of immediate sequential effects in a Latin square design. J. Amer. Statist. Assoc. 53, 282 (1958), 525--528.
[4]
John Brooke et al. 1996. SUS-A quick and dirty usability scale. Usability evaluation in industry 189, 194 (1996), 4--7.
[5]
Bogeum Choi, Robert Capra, and Jaime Arguello. 2019. The Effects of Working Memory during Search Tasks of Varying Complexity. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval. 261--265.
[6]
Michael H Cohen, Michael Harris Cohen, James P Giangola, and Jennifer Balogh. 2004. Voice user interface design. Addison-Wesley Professional.
[7]
Nils Dahlbäck, Arne Jönsson, and Lars Ahrenberg. 1993. Wizard of Oz studies?why and how. Knowledge-based systems 6, 4 (1993), 258--266.
[8]
Mateusz Dubiel, Martin Halvey, Leif Azzopardi, Damien Anderson, and Daronnat Sylvain. 2020. Conversational Strategies: Impact on Search Performance in a Goal-Oriented Task. In 3rd International Workshop on Conversational Approaches to Information Retrieval (CAIR'20). ACM, 1--7.
[9]
Mateusz Dubiel, Martin Halvey, Leif Azzopardi, and Sylvain Daronnat. 2018. Investigating how conversational search agents affect user?s behaviour, performance and search experience. In The Second International Workshop on Conversational Approaches to Information Retrieval.
[10]
Grice et al. 1975. Logic and conversation. 1975 (1975), 41--58.
[11]
Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index). Advances in psychology 52 (1988), 139--183.
[12]
Diane Kelly, Jaime Arguello, Ashlee Edwards, and Wan-ching Wu. 2015. Development and evaluation of search tasks for IIR experiments using a cognitive complexity framework. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval. 101--110.
[13]
Page Laubheimer and Raluca Budiu. 2018. Intelligent Assistants: Creepy, Childish, or a Tool? Users? Attitudes Toward Alexa, Google Assistant, and Siri. Nielsen Norman Group (2018).
[14]
Alistair Moffat, Peter Bailey, Falk Scholer, and Paul Thomas. 2017. Incorporating user expectations and behavior into the measurement of search effectiveness. ACM Transactions on Information Systems (TOIS) 35, 3 (2017), 1--38.
[15]
Ditte Mortensen. 2017. How to Design Voice User Interfaces. Interaction Design Foundation (2017).
[16]
PWC. 2015. 2015 IATA global passenger survey. (2015).
[17]
Filip Radlinski and Nick Craswell. 2017. A theoretical framework for conversational search. In CHIIR 2017. ACM, 117--126.
[18]
Paul Thomas, Daniel McDuff, Mary Czerwinski, and Nick Craswell. 2017. MISC: A data set of information-seeking conversations. In SIGIR CAIR.
[19]
Johanne R Trippas, Damiano Spina, Lawrence Cavedon, Hideo Joho, and Mark Sanderson. 2018. Informing the design of spoken conversational search: perspective paper. In Proceedings of the ACM CHIIR 2018. ACM, 32--41.
[20]
Johanne R Trippas, Damiano Spina, Lawrence Cavedon, and Mark Sanderson. 2017. How do people interact in conversational speech-only search tasks: A preliminary analysis. In Proceedings of the ACM CHIIR 2017. 325--328.
[21]
Johanne R Trippas, Damiano Spina, Mark Sanderson, and Lawrence Cavedon. 2015. Results presentation methods for a spoken conversational search system. In Proceedings of the First International Workshop on Novel Web Search Interfaces and Systems. ACM, 13--15.
[22]
Maria Wolters, Kallirroi Georgila, Johanna D Moore, Robert H Logie, Sarah E MacPherson, and Matthew Watson. 2009. Reducing working memory load in spoken dialogue systems. Interacting with Computers 21, 4 (2009), 276--287.

Cited By

View all
  • (2023)A Systematic Review of Cost, Effort, and Load Research in Information Search and Retrieval, 1972–2020ACM Transactions on Information Systems10.1145/358306942:1(1-39)Online publication date: 9-Feb-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval
September 2020
207 pages
ISBN:9781450380676
DOI:10.1145/3409256
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. conversational search
  2. performance evaluation
  3. user study

Qualifiers

  • Short-paper

Conference

ICTIR '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Systematic Review of Cost, Effort, and Load Research in Information Search and Retrieval, 1972–2020ACM Transactions on Information Systems10.1145/358306942:1(1-39)Online publication date: 9-Feb-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media