short-paper

Open access

Beyond Accurate Answers: Evaluating Open-Domain Question Answering in Enterprise Search

Authors:

Daniel Xiaodan Zhou,

Maansi Shandilya,

William Yang Wang,

Zhiheng HuangAuthors Info & Claims

CHIIR '23: Proceedings of the 2023 Conference on Human Information Interaction and Retrieval

Pages 308 - 312

https://doi.org/10.1145/3576840.3578314

Published: 20 March 2023 Publication History

All formats PDF

Abstract

Open-domain question answering (OpenQA) research has grown rapidly in recent years. However, OpenQA usability evaluation in its real world applications is largely left under studied. In this paper, we evaluated the actual user experience of OpenQA model deployed in a large tech company’s production enterprise search portal. From qualitative query log analysis and user interviews, our preliminary findings are: 1) There exists a large number of “contingency answers” that cannot be simply evaluated against their face textual values, due to noisy source passages and ambiguous query intents from short keywords queries. 2) Contingency answers contribute to positive search experience for providing “information scents”. 3) Click-through-rate (CTR) is a good user-behavior metric to measure OpenQA result quality, despite the rare existence of “good abandonment”. This exploratory study reveals an often neglected gap between existing OpenQA research and its search engine applications that disconnects the offline research effort with online user experience. We call for reformulating OpenQA model objective beyond answer face value and developing new dataset and metrics for better evaluation protocols.

References

[1]

Akari Asai and Eunsol Choi. 2020. Challenges in Information Seeking QA:Unanswerable Questions and Paragraph Retrieval. (2020). arxiv:2010.11915http://arxiv.org/abs/2010.11915

[2]

Valeriia Bolotova, Vladislav Blinov, Falk Scholer, W. Bruce Croft, and Mark Sanderson. 2022. A Non-Factoid Question-Answering Taxonomy. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1196–1207. https://doi.org/10.1145/3477495.3531926

Digital Library

[3]

B. Barla Cambazoglu, Valeriia Baranova, Falk Scholer, Mark Sanderson, Leila Tavakoli, and Bruce Croft. 2021. Quantifying Human-Perceived Answer Utility in Non-factoid Question Answering. CHIIR 2021 - Proceedings of the 2021 Conference on Human Information Interaction and Retrieval, 75–84. https://doi.org/10.1145/3406522.3446028

Digital Library

[4]

Danqi Chen, Adam Fisch, Jason Weston, and Antoine. Bordes. 2017. Reading Wikipedia to answer opendomain questions. In Association for Computational Linguistics (ACL) (2017), 1870–1879.

[5]

Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen Tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2020. QUAC: Question answering in context. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018(2020), 2174–2184. https://doi.org/10.18653/v1/d18-1241 arxiv:1808.07036

[6]

Aleksandr Chuklin and Pavel Serdyukov. 2012. Good abandonments in factoid queries. WWW’12 - Proceedings of the 21st Annual Conference on World Wide Web Companion (2012), 483–484. https://doi.org/10.1145/2187980.2188088

Digital Library

[7]

Paul H. Cleverley and Simon Burnett. 2019. Enterprise search: A state of the art. Business Information Review 36, 2 (2019), 60–69. https://doi.org/10.1177/0266382119851880

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina. Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Association for Computational Linguistics (NAACL) (2019).

[9]

Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, and Danqi Chen. 2019. MRQA 2019 shared task: Evaluating generalization in reading comprehension. MRQA@EMNLP 2019 - Proceedings of the 2nd Workshop on Machine Reading for Question Answering (2019), 1–13. https://doi.org/10.18653/v1/d19-5801 arxiv:1910.09753

[10]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Ledell Wu, Sergey Edunov, Danqi Chen, and Wentau. Yih. 2020. Dense passage retrieval for open-domain question answering. Empirical Methods in Natural Language Processing (EMNLP 2020) (2020), 1870–1879.

[11]

Klaus Krippendorff. 1989. Content Analysis: An Introduction to its methodology. 1 (1989), 403–407.

[12]

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: A Benchmark for Question Answering Research. Transactions of the Association for Computational Linguistics 7 (2019), 453–466. https://doi.org/10.1162/tacl_a_00276

[13]

Jane Li, Scott Huffman, and Akihito Tokuda. 2009. Good abandonment in mobile and PC internet search. Proceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009 (2009), 43–50. https://doi.org/10.1145/1571941.1571951

Digital Library

[14]

Gina Neff, Manuel Gomez Rodriguez, and Adrian Weller. 2021. NeurIPS 2021 Workshop Proposal : Human Centered AI. November (2021). https://doi.org/10.13140/RG.2.2.33952.51206

[15]

Kathryn E Newcomer, Harry P Hatry, and Joseph S Wholey. 2015. Conducting semi-structured interviews. Handbook of practical program evaluation 492 (2015), 492.

[16]

Andrew Ng. 2021. MLOps: From Model-centric to Data-centric AI. Technical Report. https://www.deeplearning.ai/wp-content/uploads/2021/06/MLOps-From-Model-centric-to-Data-centric-AI.pdf

[17]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated MAchine reading COmprehension dataset. CEUR Workshop Proceedings 1773 (2016), 1–10. arxiv:1611.09268

[18]

Peter Pirolli and Stuart Card. 1999. Information foraging. Psychological Review 106, 4 (1999), 643–675. https://doi.org/10.1037/0033-295X.106.4.643

[19]

Martin Potthast, Matthias Hagen, and Benno Stein. 2020. The dilemma of the direct answer. ACM SIGIR Forum 54, 1 (2020), 1–12. https://doi.org/10.1145/3451964.3451978

Digital Library

[20]

Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable questions for SQuAD. ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) 2 (2018), 784–789. https://doi.org/10.18653/v1/p18-2124 arxiv:arXiv:1806.03822v1

[21]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuad: 100,000+ questions for machine comprehension of text. In EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings. 2383–2392. arxiv:arXiv:1606.05250v3

[22]

Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. CoQA: A Conversational Question Answering Challenge. Transactions of the Association for Computational Linguistics 7 (2019), 249–266. https://doi.org/10.1162/tacl_a_00266 arxiv:1808.07042

[23]

Stephen Robertson and Zaragoza Hugo. 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval 3, 4 (2009), 333–389. https://doi.org/10.1561/1500000019

Digital Library

[24]

Nithya Sambasivan, Shivani Kapania, and Hannah Highfll. 2021. Everyone wants to do the model work, not the data work: Data cascades in high-stakes AI. Conference on Human Factors in Computing Systems - Proceedings (2021). https://doi.org/10.1145/3411764.3445518

Digital Library

[25]

Ellen M Voorhees. 1999. The TREC-8 question answering track report. TREC 99(1999), 77–82.

[26]

Zhiguo Wang, Patrick Ng, Xiaofei Ma, Ramesh Nallapati, and Bing Xiang. 2019. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5877–5881. http://arxiv.org/abs/1908.08167

[27]

Ryen W White. 2016. Interactions with Search Systems. https://www.amazon.com/Interactions-Search-Systems-Ryen-White/dp/1107034221

[28]

Zhijing Wu, Mark Sanderson, B. Barla Cambazoglu, W. Bruce Croft, and Falk Scholer. 2020. Providing Direct Answers in Search Results: A Study of User Behavior. International Conference on Information and Knowledge Management, Proceedings, 1635–1644. https://doi.org/10.1145/3340531.3412017

Digital Library

[29]

Wei Xu. 2019. Toward Human-Centered AI: A Perspective from Human-Computer Interaction. InteractionsApril 2019(2019). https://dl.acm.org/doi/10.1145/3328485

[30]

Fengbin Zhu, Wenqiang Lei, Chao Wang, Jianming Zheng, Soujanya Poria, and Tat-Seng Chua. 2021. Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering. (2021), 1–21. arxiv:2101.00774http://arxiv.org/abs/2101.00774

Index Terms

Beyond Accurate Answers: Evaluating Open-Domain Question Answering in Enterprise Search
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Information retrieval query processing

Index terms have been assigned to the content through auto-classification.

Recommendations

Query completion in community-based Question Answering search
Abstract
Query completion has long been proved useful to help a user explore and express his information need. In general search, such completions can be generated from a large scale query log and other accessory information. However, without ...
Beyond search: Retrieving complete tuples from a text-database

A common task of Web users is querying structured information from Web pages. For realizing this interesting scenario we propose a novel query processor for systematically discovering instances of semantic relations in Web search results and joining ...
Neural Network Models for Tasks in Open-domain and Closed-domain Question Answering

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHIIR '23: Proceedings of the 2023 Conference on Human Information Interaction and Retrieval

March 2023

520 pages

ISBN:9798400700354

DOI:10.1145/3576840

Editors:
Jacek Gwizdka
School of Information, The University of Texas at Austin, Texas, USA
,
Soo Young Rieh
School of Information, The University of Texas at Austin, Texas, USA

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 March 2023

Check for updates

Qualifiers

Short-paper
Research
Refereed limited

Conference

CHIIR '23

Sponsor:

CHIIR '23: ACM SIGIR Conference on Human Information Interaction and Retrieval

March 19 - 23, 2023

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 55 of 163 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
360
Total Downloads

Downloads (Last 12 months)148
Downloads (Last 6 weeks)17

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten