short-paper

Interactive Evaluation of Conversational Agents: Reflections on the Impact of Search Task Design

Authors:

Mateusz Dubiel,

Leif Azzopardi,

Sylvain DaronnatAuthors Info & Claims

ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

Pages 85 - 88

https://doi.org/10.1145/3409256.3409814

Published: 14 September 2020 Publication History

Abstract

Undertaking an interactive evaluation of goal-oriented conversational agents (CAs) is challenging, it requires the search task to be realistic and relatable while accounting for the users cognitive limitations. In the current paper we discuss findings of two Wizard of Oz studies and provide our reflections regarding the impact of different interactive search task designs on participants? performance, satisfaction and cognitive workload. In the first study, we tasked participants with finding a cheapest flight that met a certain departure time. In the second study we added an additional criterion: "travel time" and asked participants to find a fight option that offered a good trade-off between price and travel time. We found that using search tasks where participants need to decide between several competing search criteria (price vs. time) led to a higher search involvement and lower variance in usability and cognitive workload ratings between different CAs. We hope that our results will provoke discussion on how to make the evaluation of voice-only goal-oriented CAs more reliable and ecologically valid.

References

[1]

Leif Azzopardi, Mateusz Dubiel, Martin Halvey, and Jeffery Dalton. 2018. Conceptualizing agent-human interactions during the conversational search process. In 2nd International Workshop on Conversational Approaches to Information Retrieval.

[2]

Hans H Bauer, Nicola E Sauer, and Christine Becker. 2006. Investigating the relationship between product involvement and consumer decision-making styles. Journal of Consumer Behaviour 5, 4 (2006), 342--354.

[3]

James V Bradley. 1958. Complete counterbalancing of immediate sequential effects in a Latin square design. J. Amer. Statist. Assoc. 53, 282 (1958), 525--528.

[4]

John Brooke et al. 1996. SUS-A quick and dirty usability scale. Usability evaluation in industry 189, 194 (1996), 4--7.

[5]

Bogeum Choi, Robert Capra, and Jaime Arguello. 2019. The Effects of Working Memory during Search Tasks of Varying Complexity. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval. 261--265.

Digital Library

[6]

Michael H Cohen, Michael Harris Cohen, James P Giangola, and Jennifer Balogh. 2004. Voice user interface design. Addison-Wesley Professional.

[7]

Nils Dahlbäck, Arne Jönsson, and Lars Ahrenberg. 1993. Wizard of Oz studies?why and how. Knowledge-based systems 6, 4 (1993), 258--266.

[8]

Mateusz Dubiel, Martin Halvey, Leif Azzopardi, Damien Anderson, and Daronnat Sylvain. 2020. Conversational Strategies: Impact on Search Performance in a Goal-Oriented Task. In 3rd International Workshop on Conversational Approaches to Information Retrieval (CAIR'20). ACM, 1--7.

[9]

Mateusz Dubiel, Martin Halvey, Leif Azzopardi, and Sylvain Daronnat. 2018. Investigating how conversational search agents affect user?s behaviour, performance and search experience. In The Second International Workshop on Conversational Approaches to Information Retrieval.

[10]

Grice et al. 1975. Logic and conversation. 1975 (1975), 41--58.

[11]

Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index). Advances in psychology 52 (1988), 139--183.

[12]

Diane Kelly, Jaime Arguello, Ashlee Edwards, and Wan-ching Wu. 2015. Development and evaluation of search tasks for IIR experiments using a cognitive complexity framework. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval. 101--110.

Digital Library

[13]

Page Laubheimer and Raluca Budiu. 2018. Intelligent Assistants: Creepy, Childish, or a Tool? Users? Attitudes Toward Alexa, Google Assistant, and Siri. Nielsen Norman Group (2018).

[14]

Alistair Moffat, Peter Bailey, Falk Scholer, and Paul Thomas. 2017. Incorporating user expectations and behavior into the measurement of search effectiveness. ACM Transactions on Information Systems (TOIS) 35, 3 (2017), 1--38.

Digital Library

[15]

Ditte Mortensen. 2017. How to Design Voice User Interfaces. Interaction Design Foundation (2017).

[16]

PWC. 2015. 2015 IATA global passenger survey. (2015).

[17]

Filip Radlinski and Nick Craswell. 2017. A theoretical framework for conversational search. In CHIIR 2017. ACM, 117--126.

Digital Library

[18]

Paul Thomas, Daniel McDuff, Mary Czerwinski, and Nick Craswell. 2017. MISC: A data set of information-seeking conversations. In SIGIR CAIR.

[19]

Johanne R Trippas, Damiano Spina, Lawrence Cavedon, Hideo Joho, and Mark Sanderson. 2018. Informing the design of spoken conversational search: perspective paper. In Proceedings of the ACM CHIIR 2018. ACM, 32--41.

Digital Library

[20]

Johanne R Trippas, Damiano Spina, Lawrence Cavedon, and Mark Sanderson. 2017. How do people interact in conversational speech-only search tasks: A preliminary analysis. In Proceedings of the ACM CHIIR 2017. 325--328.

Digital Library

[21]

Johanne R Trippas, Damiano Spina, Mark Sanderson, and Lawrence Cavedon. 2015. Results presentation methods for a spoken conversational search system. In Proceedings of the First International Workshop on Novel Web Search Interfaces and Systems. ACM, 13--15.

Digital Library

[22]

Maria Wolters, Kallirroi Georgila, Johanna D Moore, Robert H Logie, Sarah E MacPherson, and Matthew Watson. 2009. Reducing working memory load in spoken dialogue systems. Interacting with Computers 21, 4 (2009), 276--287.

Digital Library

Cited By

McGregor MAzzopardi LHalvey M(2023)A Systematic Review of Cost, Effort, and Load Research in Information Search and Retrieval, 1972–2020ACM Transactions on Information Systems10.1145/358306942:1(1-39)Online publication date: 9-Feb-2023
https://dl.acm.org/doi/10.1145/3583069

Index Terms

Interactive Evaluation of Conversational Agents: Reflections on the Impact of Search Task Design
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Empirical studies in HCI
2. Information systems
  1. Information retrieval
    1. Users and interactive retrieval
      1. Search interfaces

Recommendations

Informing the Design of Spoken Conversational Search: Perspective Paper
CHIIR '18: Proceedings of the 2018 Conference on Human Information Interaction & Retrieval

We conducted a laboratory-based observational study where pairs of people performed search tasks communicating verbally. Examination of the discourse allowed commonly used interactions to be identified for Spoken Conversational Search (SCS). We compared ...
The Effects of System Initiative during Conversational Collaborative Search
CSCW1

Our research in this paper lies at the intersection of collaborative and conversational search. We report on a Wizard of Oz lab study in which 27 pairs of participants collaborated on search tasks over the Slack messaging platform. To complete tasks, ...
A Large-scale Analysis of Mixed Initiative in Information-Seeking Dialogues for Conversational Search
Conversational search is a relatively young area of research that aims at automating an information-seeking dialogue. In this article, we help to position it with respect to other research areas within conversational artificial intelligence (AI) by ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

September 2020

207 pages

ISBN:9781450380676

DOI:10.1145/3409256

General Chairs:
Krisztian Balog
University of Stavanger, Norway
,
Vinay Setty
University of Stavanger, Norway
,
Program Chairs:
Christina Lioma
University of Copenhagen, Denmark
,
Yiqun Liu
Tsinghua University, China
,
Min Zhang
Tsinghua University, China
,
Klaus Berberich
HTW Saar & MPI for Informatics, Germany

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

ICTIR '20

Sponsor:

SIGIR

ICTIR '20: The 2020 ACM SIGIR International Conference on the Theory of Information Retrieval

September 14 - 17, 2020

Virtual Event, Norway

Acceptance Rates

Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
169
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

McGregor MAzzopardi LHalvey M(2023)A Systematic Review of Cost, Effort, and Load Research in Information Search and Retrieval, 1972–2020ACM Transactions on Information Systems10.1145/358306942:1(1-39)Online publication date: 9-Feb-2023
https://dl.acm.org/doi/10.1145/3583069

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten