Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3576840.3578314acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper
Open access

Beyond Accurate Answers: Evaluating Open-Domain Question Answering in Enterprise Search

Published: 20 March 2023 Publication History

Abstract

Open-domain question answering (OpenQA) research has grown rapidly in recent years. However, OpenQA usability evaluation in its real world applications is largely left under studied. In this paper, we evaluated the actual user experience of OpenQA model deployed in a large tech company’s production enterprise search portal. From qualitative query log analysis and user interviews, our preliminary findings are: 1) There exists a large number of “contingency answers” that cannot be simply evaluated against their face textual values, due to noisy source passages and ambiguous query intents from short keywords queries. 2) Contingency answers contribute to positive search experience for providing “information scents”. 3) Click-through-rate (CTR) is a good user-behavior metric to measure OpenQA result quality, despite the rare existence of “good abandonment”. This exploratory study reveals an often neglected gap between existing OpenQA research and its search engine applications that disconnects the offline research effort with online user experience. We call for reformulating OpenQA model objective beyond answer face value and developing new dataset and metrics for better evaluation protocols.

References

[1]
Akari Asai and Eunsol Choi. 2020. Challenges in Information Seeking QA:Unanswerable Questions and Paragraph Retrieval. (2020). arxiv:2010.11915http://arxiv.org/abs/2010.11915
[2]
Valeriia Bolotova, Vladislav Blinov, Falk Scholer, W. Bruce Croft, and Mark Sanderson. 2022. A Non-Factoid Question-Answering Taxonomy. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1196–1207. https://doi.org/10.1145/3477495.3531926
[3]
B. Barla Cambazoglu, Valeriia Baranova, Falk Scholer, Mark Sanderson, Leila Tavakoli, and Bruce Croft. 2021. Quantifying Human-Perceived Answer Utility in Non-factoid Question Answering. CHIIR 2021 - Proceedings of the 2021 Conference on Human Information Interaction and Retrieval, 75–84. https://doi.org/10.1145/3406522.3446028
[4]
Danqi Chen, Adam Fisch, Jason Weston, and Antoine. Bordes. 2017. Reading Wikipedia to answer opendomain questions. In Association for Computational Linguistics (ACL) (2017), 1870–1879.
[5]
Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen Tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2020. QUAC: Question answering in context. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018(2020), 2174–2184. https://doi.org/10.18653/v1/d18-1241 arxiv:1808.07036
[6]
Aleksandr Chuklin and Pavel Serdyukov. 2012. Good abandonments in factoid queries. WWW’12 - Proceedings of the 21st Annual Conference on World Wide Web Companion (2012), 483–484. https://doi.org/10.1145/2187980.2188088
[7]
Paul H. Cleverley and Simon Burnett. 2019. Enterprise search: A state of the art. Business Information Review 36, 2 (2019), 60–69. https://doi.org/10.1177/0266382119851880
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina. Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Association for Computational Linguistics (NAACL) (2019).
[9]
Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, and Danqi Chen. 2019. MRQA 2019 shared task: Evaluating generalization in reading comprehension. MRQA@EMNLP 2019 - Proceedings of the 2nd Workshop on Machine Reading for Question Answering (2019), 1–13. https://doi.org/10.18653/v1/d19-5801 arxiv:1910.09753
[10]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Ledell Wu, Sergey Edunov, Danqi Chen, and Wentau. Yih. 2020. Dense passage retrieval for open-domain question answering. Empirical Methods in Natural Language Processing (EMNLP 2020) (2020), 1870–1879.
[11]
Klaus Krippendorff. 1989. Content Analysis: An Introduction to its methodology. 1 (1989), 403–407.
[12]
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: A Benchmark for Question Answering Research. Transactions of the Association for Computational Linguistics 7 (2019), 453–466. https://doi.org/10.1162/tacl_a_00276
[13]
Jane Li, Scott Huffman, and Akihito Tokuda. 2009. Good abandonment in mobile and PC internet search. Proceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009 (2009), 43–50. https://doi.org/10.1145/1571941.1571951
[14]
Gina Neff, Manuel Gomez Rodriguez, and Adrian Weller. 2021. NeurIPS 2021 Workshop Proposal : Human Centered AI. November (2021). https://doi.org/10.13140/RG.2.2.33952.51206
[15]
Kathryn E Newcomer, Harry P Hatry, and Joseph S Wholey. 2015. Conducting semi-structured interviews. Handbook of practical program evaluation 492 (2015), 492.
[16]
Andrew Ng. 2021. MLOps: From Model-centric to Data-centric AI. Technical Report. https://www.deeplearning.ai/wp-content/uploads/2021/06/MLOps-From-Model-centric-to-Data-centric-AI.pdf
[17]
Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated MAchine reading COmprehension dataset. CEUR Workshop Proceedings 1773 (2016), 1–10. arxiv:1611.09268
[18]
Peter Pirolli and Stuart Card. 1999. Information foraging. Psychological Review 106, 4 (1999), 643–675. https://doi.org/10.1037/0033-295X.106.4.643
[19]
Martin Potthast, Matthias Hagen, and Benno Stein. 2020. The dilemma of the direct answer. ACM SIGIR Forum 54, 1 (2020), 1–12. https://doi.org/10.1145/3451964.3451978
[20]
Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable questions for SQuAD. ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) 2 (2018), 784–789. https://doi.org/10.18653/v1/p18-2124 arxiv:arXiv:1806.03822v1
[21]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuad: 100,000+ questions for machine comprehension of text. In EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings. 2383–2392. arxiv:arXiv:1606.05250v3
[22]
Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. CoQA: A Conversational Question Answering Challenge. Transactions of the Association for Computational Linguistics 7 (2019), 249–266. https://doi.org/10.1162/tacl_a_00266 arxiv:1808.07042
[23]
Stephen Robertson and Zaragoza Hugo. 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval 3, 4 (2009), 333–389. https://doi.org/10.1561/1500000019
[24]
Nithya Sambasivan, Shivani Kapania, and Hannah Highfll. 2021. Everyone wants to do the model work, not the data work: Data cascades in high-stakes AI. Conference on Human Factors in Computing Systems - Proceedings (2021). https://doi.org/10.1145/3411764.3445518
[25]
Ellen M Voorhees. 1999. The TREC-8 question answering track report. TREC 99(1999), 77–82.
[26]
Zhiguo Wang, Patrick Ng, Xiaofei Ma, Ramesh Nallapati, and Bing Xiang. 2019. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5877–5881. http://arxiv.org/abs/1908.08167
[27]
Ryen W White. 2016. Interactions with Search Systems. https://www.amazon.com/Interactions-Search-Systems-Ryen-White/dp/1107034221
[28]
Zhijing Wu, Mark Sanderson, B. Barla Cambazoglu, W. Bruce Croft, and Falk Scholer. 2020. Providing Direct Answers in Search Results: A Study of User Behavior. International Conference on Information and Knowledge Management, Proceedings, 1635–1644. https://doi.org/10.1145/3340531.3412017
[29]
Wei Xu. 2019. Toward Human-Centered AI: A Perspective from Human-Computer Interaction. InteractionsApril 2019(2019). https://dl.acm.org/doi/10.1145/3328485
[30]
Fengbin Zhu, Wenqiang Lei, Chao Wang, Jianming Zheng, Soujanya Poria, and Tat-Seng Chua. 2021. Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering. (2021), 1–21. arxiv:2101.00774http://arxiv.org/abs/2101.00774

Index Terms

  1. Beyond Accurate Answers: Evaluating Open-Domain Question Answering in Enterprise Search
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CHIIR '23: Proceedings of the 2023 Conference on Human Information Interaction and Retrieval
      March 2023
      520 pages
      ISBN:9798400700354
      DOI:10.1145/3576840
      • Editors:
      • Jacek Gwizdka,
      • Soo Young Rieh
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 March 2023

      Check for updates

      Qualifiers

      • Short-paper
      • Research
      • Refereed limited

      Conference

      CHIIR '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 55 of 163 submissions, 34%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 360
        Total Downloads
      • Downloads (Last 12 months)148
      • Downloads (Last 6 weeks)17
      Reflects downloads up to 08 Feb 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media