Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Towards Question-based High-recall Information Retrieval: Locating the Last Few Relevant Documents for Technology-assisted Reviews

Published: 18 May 2020 Publication History
  • Get Citation Alerts
  • Abstract

    While continuous active learning algorithms have proven effective in finding most of the relevant documents in a collection, the cost for locating the last few remains high for applications such as Technology-assisted Reviews (TAR). To locate these last few but significant documents efficiently, Zou et al. [2018] have proposed a novel interactive algorithm. The algorithm is based on constructing questions about the presence or absence of entities in the missing relevant documents. The hypothesis made is that entities play a central role in documents carrying key information and that the users are able to answer questions about the presence or absence of an entity in the missing relevance documents. Based on this, a Sequential Bayesian Search-based approach that selects the optimal sequence of questions to ask was devised. In this work, we extend Zou et al. [2018] by (a) investigating the noise tolerance of the proposed algorithm; (b) proposing an alternative objective function to optimize, which accounts for user “erroneous” answers; (c) proposing a method that sequentially decides the best point to stop asking questions to the user; and (d) conducting a small user study to validate some of the assumptions made by Zou et al. [2018]. Furthermore, all experiments are extended to demonstrate the effectiveness of the proposed algorithms not only in the phase of abstract appraisal (i.e., finding the abstracts of potentially relevant documents in a collection) but also finding the documents to be included in the review (i.e., finding the subset of those relevant abstracts for which the article remains relevant). The experimental results demonstrate that the proposed algorithms can greatly improve performance, requiring reviewing fewer irrelevant documents to find the last relevant ones compared to state-of-the-art methods, even in the case of noisy answers. Further, they show that our algorithm learns to stop asking questions at the right time. Last, we conduct a small user study involving an expert reviewer. The user study validates some of the assumptions made in this work regarding the user’s willingness to answer the system questions and the extent of it, as well as the ability of the user to answer these questions.

    References

    [1]
    Mustafa Abualsaud, Nimesh Ghelani, Haotian Zhang, Mark D. Smucker, Gordon V. Cormack, and Maura R. Grossman. 2018. A system for efficient high-recall retrieval. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’18). Association for Computing Machinery, New York, NY, 1317--1320.
    [2]
    Avi Arampatzis, Jaap Kamps, and Stephen Robertson. 2009. Where to stop reading a ranked list? Threshold optimization using truncated score distributions. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). Association for Computing Machinery, New York, NY, 524--531.
    [3]
    Kyoungman Bae and Youngjoong Ko. 2019. Improving question retrieval in community question answering service using dependency relations and question classification. J. Assoc. Info. Sci. Technol. 70, 11 (2019), 1194--1209. arXiv:https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/asi.24196
    [4]
    Jason R. Baron, David D. Lewis, and Douglas W. Oard. 2006. TREC 2006 legal track overview. In Proceedings of the Text Retrieval Conference (TREC’06). Citeseer.
    [5]
    Chris Buckley and Stephen Robertson. 2008. Relevance Feedback Track Overview: TREC 2008. Technical Report. Microsoft Corporation, Redmond, WA.
    [6]
    Ricardo Campos, Vítor Mangaravite, Arian Pasquali, Alípio Mário Jorge, Célia Nunes, and Adam Jatowt. 2018. A text feature-based automatic keyword extraction method for single documents. In Advances in Information Retrieval. Springer International Publishing, Cham, 684--691.
    [7]
    Joyce Y. Chai, Chen Zhang, and Rong Jin. 2007. An empirical investigation of user term feedback in text-based targeted image search. ACM Trans. Info. Syst. 25, 1 (2007), 3.
    [8]
    Zheqian Chen, Chi Zhang, Zhou Zhao, Chengwei Yao, and Deng Cai. 2018. Question retrieval for community-based question answering via heterogeneous social influential network. Neurocomputing 285 (2018), 117--124.
    [9]
    Gordon V. Cormack and Maura R. Grossman. 2014. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). Association for Computing Machinery, New York, NY, 153--162.
    [10]
    Gordon V. Cormack and Maura R. Grossman. 2015a. Autonomy and reliability of continuous active learning for technology-assisted review. arXiv preprint arXiv:1504.06868.
    [11]
    Gordon V. Cormack and Maura R. Grossman. 2015b. Waterloo (Cormack) participation in the TREC 2015 total recall track. In Proceedings of the Text Retrieval Conference (TREC’15).
    [12]
    Gordon V. Cormack and Maura R. Grossman. 2016a. ”When to stop” Waterloo (Cormack) participation in the TREC 2016 total recall track. In Proceedings of the Text Retrieval Conference (TREC’16).
    [13]
    Gordon V. Cormack and Maura R. Grossman. 2016b. Engineering quality and reliability in technology-assisted review. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’16). Association for Computing Machinery, New York, NY, 75--84.
    [14]
    Gordon V. Cormack and Maura R. Grossman. 2016c. Scalability of continuous active learning for reliable high-recall text classification. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM’16). Association for Computing Machinery, New York, NY, 1039--1048.
    [15]
    Gordon V. Cormack and Maura R. Grossman. 2017. Technology-assisted review in empirical medicine: Waterloo participation in CLEF eHealth 2017. In Proceedings of the Conference and Labs of the Evaluation Forum (CLEF’17). Retrieved from http://ceur-ws.org/Vol-1866/paper_51.pdf.
    [16]
    Gordon V. Cormack and Maura R. Grossman. 2018. Beyond pooling. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’18). Association for Computing Machinery, New York, NY, 1169--1172.
    [17]
    Gordon V. Cormack, Maura R. Grossman, Bruce Hedin, and Douglas W. Oard. 2010. Overview of the TREC 2010 legal track. In Proceedings of the 19th Text Retrieval Conference (TREC’10).
    [18]
    Gordon V. Cormack and Thomas R. Lynam. 2005. TREC 2005 spam track overview. In Proceedings of the Text Retrieval Conference (TREC’05). 500--274.
    [19]
    Gordon V. Cormack and Mona Mojdeh. 2009. Machine learning for information retrieval: TREC 2009 web, relevance feedback and legal tracks. In Proceedings of the Text Retrieval Conference (TREC’09).
    [20]
    Gordon V. Cormack, Christopher R. Palmer, and Charles L. A. Clarke. 1998. Efficient construction of large test collections. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98). Association for Computing Machinery, New York, NY, 282--289.
    [21]
    Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita, Stefan Rüd, and Hinrich Schütze. 2018. SMAPH: A Piggyback approach for entity-linking in web queries. ACM Trans. Info. Syst. 37, 1 (2018), 13.
    [22]
    Van Dang and Bruce W. Croft. 2010. Query reformulation using anchor text. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM’10). Association for Computing Machinery, New York, NY, 41--50.
    [23]
    Giorgio Maria Di Nunzio. 2018. A study of an automatic stopping strategy for technologically assisted medical reviews. In Advances in Information Retrieval. Springer International Publishing, Cham, 672--677.
    [24]
    Harris Drucker, Behzad Shahrary, and David Gibbon. 2001. Relevance feedback using support vector machines. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, 122--129.
    [25]
    Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, and Masrour Zoghi. 2015. Contextual dueling bandits. arXiv preprint arXiv:1502.06362.
    [26]
    Patrick Ernst, Arunav Mishra, Avishek Anand, and Vinay Setty. 2017. BioNex: A system for biomedical news event exploration. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). Association for Computing Machinery, New York, NY, 1277--1280.
    [27]
    Elena Erosheva, Stephen Fienberg, and John Lafferty. 2004. Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. U.S.A. 101, suppl. 1 (2004), 5220--5227.
    [28]
    Paolo Ferragina and Ugo Scaiella. 2010. TAGME: On-the-fly annotation of short text fragments (by Wikipedia entities). In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). ACM, New York, NY, 1625--1628.
    [29]
    Nicola Ferro and Carol Peters. 2019. Information Retrieval Evaluation in a Changing World: Lessons Learned from 20 Years of CLEF. Vol. 41. Springer.
    [30]
    Robin Genuer, Jean-Michel Poggi, and Christine Tuleau-Malot. 2010. Variable selection using random forests. Pattern Recogn. Lett. 31, 14 (2010), 2225--2236.
    [31]
    Lorraine Goeuriot, Liadh Kelly, Hanna Suominen, Aurélie Névéol, Aude Robert, Evangelos Kanoulas, Rene Spijker, João Palotti, and Guido Zuccon. 2017. CLEF 2017 eHealth evaluation lab overview. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. Springer International Publishing, Cham, 291--303.
    [32]
    Maura R. Grossman and Gordon V. Cormack. 2010. Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Rich. JL Tech. 17 (2010), 1.
    [33]
    Maura R. Grossman, Gordon V. Cormack, and Adam Roegiest. 2016. TREC 2016 total recall track overview. In Proceedings of the 25th Text Retrieval Conference (TREC’16). Retrieved from http://trec.nist.gov/pubs/trec25/papers/Overview-TR.pdf.
    [34]
    Maura R. Grossman, Gordon V. Cormack, and Adam Roegiest. 2017. Automatic and semi-automatic document selection for technology-assisted review. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). ACM, New York, NY, 905--908.
    [35]
    Kai Hakala and Sampo Pyysalo. 2019. Biomedical named entity recognition with multilingual BERT. In Proceedings of The 5th Workshop on BioNLP Open Shared Tasks. Association for Computational Linguistics, Hong Kong, China, 56--61.
    [36]
    Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. 2015. Entity linking in queries: Tasks and evaluation. In Proceedings of the International Conference on the Theory of Information Retrieval (ICTIR’15). Association for Computing Machinery, New York, NY, 171--180.
    [37]
    Bruce Hedin, Stephen Tomlinson, Jason R. Baron, and Douglas W. Oard. 2009. Overview of the TREC 2009 Legal Track. Technical Report. National Archives and Records Administration, College Park, MD.
    [38]
    Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2013. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Info. Retriev. 16, 1 (2013), 63--90.
    [39]
    Evangelos Kanoulas, Dan Li, Leif Azzopardi, and René Spijker. 2017. Technologically assisted reviews in empirical medicine overview. In Proceedings of the Conference and Labs of the Evaluation Forum (CLEF’17). Retrieved from http://ceur-ws.org/Vol-1866/invited_paper_12.pdf.
    [40]
    Anita Krishnakumar. 2007. Active Learning Literature Survey. Technical Report. University of California, Santa Cruz.
    [41]
    Dipankar Kundu and Deba Prasad Mandal. 2019. Formulation of a hybrid expertise retrieval system in community question answering services. Appl. Intell. 49, 2 (Feb. 2019), 463--477.
    [42]
    Branislav Kveton and Shlomo Berkovsky. 2015. Minimal interaction search in recommender systems. In Proceedings of the 20th International Conference on Intelligent User Interfaces (IUI’15). Association for Computing Machinery, New York, NY, 236--246.
    [43]
    Branislav Kveton and Shlomo Berkovsky. 2016. Minimal interaction content discovery in recommender systems. ACM Trans. Interact. Intell. Syst. 6, 2 (2016), 15.
    [44]
    Victor Lavrenko and W. Bruce Croft. 2017. Relevance-based language models. SIGIR Forum 51, 2, 260--267.
    [45]
    Grace E. Lee and Aixin Sun. 2018. Seed-driven document ranking for systematic reviews in evidence-based medicine. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’18). Association for Computing Machinery, New York, NY, 455--464.
    [46]
    Baichuan Li and Irwin King. 2010. Routing questions to appropriate answerers in community question answering services. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). Association for Computing Machinery, New York, NY, 1585--1588.
    [47]
    Dekang Lin and Patrick Pantel. 2001. Discovery of inference rules for question-answering. Nat. Lang. Eng. 7, 4 (2001), 343--360.
    [48]
    Ming Liu, Lei Chen, Bingquan Liu, Guidong Zheng, and Xiaoming Zhang. 2017. DBpedia-based entity linking via greedy search and adjusted Monte Carlo random walk. ACM Trans. Info. Syst. 36, 2 (2017), 16.
    [49]
    Ming Liu, Gu Gong, Bing Qin, and Ting Liu. 2019. A multi-view--based collective entity-linking method. ACM Trans. Info. Syst. 37, 2 (2019), 23.
    [50]
    Yuanjie Liu, Shasha Li, Yunbo Cao, Chin-Yew Lin, Dingyi Han, and Yong Yu. 2008. Understanding and summarizing answers in community-based question answering services. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING’08). Association for Computational Linguistics, 497--504.
    [51]
    David E. Losada, Javier Parapar, and Alvaro Barreiro. 2019. When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections. J. Assoc. Info. Sci. Technol. 70, 1 (2019), 49--60.
    [52]
    Yuanhua Lv and ChengXiang Zhai. 2009. Adaptive relevance feedback in information retrieval. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM’09). Association for Computing Machinery, New York, NY, 255--264.
    [53]
    Mstislav Maslennikov and Tat-Seng Chua. 2010. Combining relations for information extraction from free text. ACM Trans. Info. Syst. 28, 3 (2010), 14.
    [54]
    Graham McDonald, Craig Macdonald, and Iadh Ounis. 2018. Active learning strategies for technology-assisted sensitivity review. In Advances in Information Retrieval. Springer International Publishing, Cham, 439--453.
    [55]
    Douglas W. Oard, Bruce Hedin, Stephen Tomlinson, and Jason R. Baron. 2008. Overview of the TREC 2008 Legal Track. Technical Report. University of Maryland College of Information Studies, College Park, MD.
    [56]
    Douglas W. Oard, Fabrizio Sebastiani, and Jyothi K. Vinjumur. 2018. Jointly minimizing the expected costs of review for responsiveness and privilege in E-discovery. ACM Trans. Info. Syst. 37, 1 (2018), 11.
    [57]
    Alison O’Mara-Eves, James Thomas, John McNaught, Makoto Miwa, and Sophia Ananiadou. 2015. Using text mining for study identification in systematic reviews: A systematic review of current approaches. System. Rev. 4, 1 (Jan. 2015), 5.
    [58]
    Meeyoung Park, Hariprasad Sampathkumar, Bo Luo, and Xue-wen Chen. 2013. Content-based assessment of the credibility of online healthcare information. In Proceedings of the IEEE International Conference on Big Data. IEEE, 51--58.
    [59]
    Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). Association for Computing Machinery, New York, NY, 784--791.
    [60]
    Hadas Raviv, Oren Kurland, and David Carmel. 2016. Document retrieval using entity-based language models. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’16). Association for Computing Machinery, New York, NY, 65--74.
    [61]
    Sathish Reddy, Dinesh Raghu, Mitesh M. Khapra, and Sachindra Joshi. 2017. Generating natural language question-answer pairs from a knowledge graph using a RNN-based question generation model. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 376--385. Retrieved from https://www.aclweb.org/anthology/E17-1036.
    [62]
    Ellen Riloff and Wendy Lehnert. 1994. Information extraction as a basis for high-precision text classification. ACM Trans. Info. Syst. 12, 3 (1994), 296--333.
    [63]
    Stephen E. Robertson and K. Spärck Jones. 1976. Relevance weighting of search terms. J. Assoc. Info. Sci. Technol. 27, 3 (1976), 129--146.
    [64]
    J. Rocchio. 1971. Relevance feedback in information retrieval. Smart Retriev. Syst.-Exper. Autom. Doc. Process. (1971), 313--323. Retrieved from https://ci.nii.ac.jp/naid/10000074359/en/.
    [65]
    Adam Roegiest, Gordon V. Cormack, Maura R. Grossman, and Charles Clarke. 2015. TREC 2015 total recall track overview. Proc. TREC-2015 (2015).
    [66]
    Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. 2004. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI’04). AUAI Press, Arlington, VA, 487--494.
    [67]
    Tuukka Ruotsalo, Jaakko Peltonen, Manuel J. A. Eugster, Dorota Głowacka, Patrik Floréen, Petri Myllymäki, Giulio Jacucci, and Samuel Kaski. 2018. Interactive intent modeling for exploratory search. ACM Trans. Info. Syst. 36, 4 (2018), 44.
    [68]
    Gerard Salton and Chris Buckley. 1990. Improving retrieval performance by relevance feedback. J. Amer. Soc. Info. Sci. 41, 4 (1990), 288--297.
    [69]
    Mark Sanderson. 1998. Accurate user directed summarization from existing tools. In Proceedings of the 7th International Conference on Information and Knowledge Management (CIKM’98). Association for Computing Machinery, New York, NY, 45--51.
    [70]
    Mark Sanderson and Hideo Joho. 2004. Forming test collections with no system pooling. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04). ACM, New York, NY, 33--40.
    [71]
    V. Satopaa, J. Albrecht, D. Irwin, and B. Raghavan. 2011. Finding a “Kneedle” in a Haystack: Detecting knee points in system behavior. In Proceedings of the 31st International Conference on Distributed Computing Systems Workshops. 166--171.
    [72]
    Harrisen Scells, Leif Azzopardi, Guido Zuccon, and Bevan Koopman. 2018. Query variation performance prediction for systematic reviews. In Proceedings of the 41st International ACM SIGIR Conference on Research 8 Development in Information Retrieval (SIGIR’18). Association for Computing Machinery, New York, NY, 1089--1092.
    [73]
    Ian Soboroff and Stephen Robertson. 2003. Building a filtering test collection for TREC 2002. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (SIGIR’03). Association for Computing Machinery, New York, NY, 243--250.
    [74]
    K. Spärck Jones. 1975. Report on the need for and provision of an “ideal” information retrieval test collection. Computer Laboratory. Retrieved from https://ci.nii.ac.jp/naid/10000151848/en/.
    [75]
    Hanna Suominen, Liadh Kelly, Lorraine Goeuriot, Aurélie Névéol, Lionel Ramadier, Aude Robert, Evangelos Kanoulas, Rene Spijker, Leif Azzopardi, Dan Li, Jimmy, João Palotti, and Guido Zuccon. 2018. Overview of the CLEF eHealth evaluation lab 2018. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. Springer International Publishing, Cham, 286--301.
    [76]
    Anastasios Tombros and Mark Sanderson. 1998. Advantages of query biased summaries in information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98). Association for Computing Machinery, New York, NY, 2--10.
    [77]
    Stephen Tomlinson, Douglas W. Oard, Jason R. Baron, and Paul Thompson. 2007. Overview of the TREC 2007 legal track. In Proceedings of the Text Retrieval Conference (TREC’07). Citeseer.
    [78]
    Ellen M. Voorhees, Donna K. Harman et al. 2005. TREC: Experiment and Evaluation in Information Retrieval. Vol. 63. MIT Press, Cambridge.
    [79]
    Byron C. Wallace, Issa J. Dahabreh, Kelly H. Moran, Carla E. Brodley, and Thomas A. Trikalinos. 2013. Active literature discovery for scoping evidence reviews: How many needles are there. In Proceedings of the KDD Workshop on Data Mining for Healthcare (KDD-DMH’13).
    [80]
    Byron C. Wallace, Joël Kuiper, Aakash Sharma, Mingxi Zhu, and Iain J. Marshall. 2016. Extracting PICO sentences from clinical trial reports using supervised distant supervision. J. Mach. Learn. Res. 17, 1 (2016), 4572--4596.
    [81]
    Byron C. Wallace, Thomas A. Trikalinos, Joseph Lau, Carla Brodley, and Christopher H. Schmid. 2010. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinform. 11, 1 (2010), 55.
    [82]
    Yao Wan, Guandong Xu, Liang Chen, Zhou Zhao, and Jian Wu. 2018. Exploiting cross-source knowledge for warming up community question answering services. Neurocomputing 320 (2018), 25--34.
    [83]
    Xuan Wang, Yu Zhang, Xiang Ren, Yuhao Zhang, Marinka Zitnik, Jingbo Shang, Curtis Langlotz, and Jiawei Han. 2019. Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics 35, 10 (2019), 1745--1752.
    [84]
    Zheng Wen, Branislav Kveton, Brian Eriksson, and Sandilya Bhamidipati. 2013. Sequential Bayesian search. In Proceedings of the 30th International Conference on International Conference on Machine Learning (ICML’13). JMLR.org, II--226--II--234. Retrieved from http://dl.acm.org/citation.cfm?id=3042817.3042919.
    [85]
    Chenyan Xiong and Jamie Callan. 2015. EsdRank: Connecting query and documents through external semi-structured data. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM’15). Association for Computing Machinery, New York, NY, 951--960.
    [86]
    Chenyan Xiong, Jamie Callan, and Tie-Yan Liu. 2017. Word-entity duet representations for document ranking. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). Association for Computing Machinery, New York, NY, 763--772.
    [87]
    Zhe Yu, Nicholas A. Kraft, and Tim Menzies. 2016. How to read less: Better machine-assisted reading methods for systematic literature reviews. arXiv preprint arXiv:1612.03224 (2016).
    [88]
    Zhe Yu, Nicholas A. Kraft, and Tim Menzies. 2018. Finding better active learners for faster literature reviews. Empir. Softw. Eng. 23, 6 (2018), 3161--3186.
    [89]
    Chengxiang Zhai and John Lafferty. 2001. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM’01). Association for Computing Machinery, New York, NY, 403--410.
    [90]
    Haotian Zhang, Mustafa Abualsaud, Nimesh Ghelani, Mark D. Smucker, Gordon V. Cormack, and Maura R. Grossman. 2018a. Effective user interaction for high-recall retrieval: Less is more. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM’18). Association for Computing Machinery, New York, NY, 187--196.
    [91]
    Haotian Zhang, Gordon V. Cormack, Maura R. Grossman, and Mark D. Smucker. 2018c. Evaluating sentence-level relevance feedback for high-recall information retrieval. arXiv preprint arXiv:1803.08988.
    [92]
    Haotian Zhang, Wu Lin, Yipeng Wang, Charles L. A. Clarke, and Mark D. Smucker. 2015. WaterlooClarke: TREC 2015 total recall track. In Proceedings of the Text Retrieval Conference (TREC’15).
    [93]
    Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang, and W. Bruce Croft. 2018b. Toward conversational search and recommendation: System ask, user respond. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM’18). ACM, New York, NY, 177--186.
    [94]
    Zhou Zhao, Qifan Yang, Deng Cai, Xiaofei He, and Yueting Zhuang. 2016. Expert finding for community-based question answering via ranking metric network learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). AAAI Press, 3000--3006.
    [95]
    Zhou Zhao, Lijun Zhang, Xiaofei He, and Wilfred Ng. 2014. Expert finding for question answering via graph regularized matrix completion. IEEE Trans. Knowl. Data Eng. 27, 4 (2014), 993--1004.
    [96]
    Jie Zou and Evangelos Kanoulas. 2019. Learning to ask: Question-based sequential Bayesian product search. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’19). Association for Computing Machinery, New York, NY, 369--378.
    [97]
    Jie Zou, Dan Li, and Evangelos Kanoulas. 2018. Technology-assisted reviews: Finding the last few relevant documents by asking yes/no questions to reviewers. In Proceedings of the 41st International ACM SIGIR Conference on Research 8 Development in Information Retrieval (SIGIR’18). Association for Computing Machinery, New York, NY, 949--952.
    [98]
    Jie Zou, Ling Xu, Weikang Guo, Meng Yan, Dan Yang, and Xiaohong Zhang. 2015. Which non-functional requirements do developers focus on? An empirical study on stack overflow using topic analysis. In Proceedings of the IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 446--449.
    [99]
    Jie Zou, Ling Xu, Mengning Yang, Xiaohong Zhang, and Dan Yang. 2017. Toward comprehending the non-functional requirements through Developers’ eyes: An exploration of Stack Overflow using topic analysis. Info. Softw. Technol. 84 (2017), 19--32.
    [100]
    Jie Zou, Ling Xu, Mengning Yang, Xiaohong Zhang, Jun Zeng, and Sachio Hirokawa. 2016. Automated duplicate bug report detection using multi-factor analysis. IEICE Trans. Info. Syst. 99, 7 (2016), 1762--1775.

    Cited By

    View all

    Index Terms

    1. Towards Question-based High-recall Information Retrieval: Locating the Last Few Relevant Documents for Technology-assisted Reviews

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Information Systems
        ACM Transactions on Information Systems  Volume 38, Issue 3
        July 2020
        311 pages
        ISSN:1046-8188
        EISSN:1558-2868
        DOI:10.1145/3394096
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 18 May 2020
        Online AM: 07 May 2020
        Accepted: 01 March 2020
        Revised: 01 February 2020
        Received: 01 August 2019
        Published in TOIS Volume 38, Issue 3

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. SBSTAR
        2. SBSTARext
        3. Technology-assisted reviews
        4. asking questions
        5. interactive search

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • Google Faculty Research Awards
        • China Scholarship Council, the European Union
        • Netherlands Organisation for Scientific Research (NWO)
        • Societal Challenges?Smart, Green, and Integrated Transport

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)47
        • Downloads (Last 6 weeks)5
        Reflects downloads up to 26 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Error-Tolerant E-Discovery ProtocolsProceedings of the Symposium on Computer Science and Law10.1145/3614407.3643703(24-35)Online publication date: 12-Mar-2024
        • (2024)Fully Automated Scholarly Search for Biomedical Systematic Literature ReviewsIEEE Access10.1109/ACCESS.2024.340552912(83764-83773)Online publication date: 2024
        • (2024)Community answer recommendation based on heterogeneous semantic fusionExpert Systems with Applications10.1016/j.eswa.2023.121919238(121919)Online publication date: Mar-2024
        • (2024)Third Workshop on Augmented Intelligence in Technology-Assisted Review Systems (ALTARS)Advances in Information Retrieval10.1007/978-3-031-56069-9_59(432-436)Online publication date: 24-Mar-2024
        • (2023)Leveraging Event Schema to Ask Clarifying Questions for Conversational Legal Case RetrievalProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614953(1513-1522)Online publication date: 21-Oct-2023
        • (2023)Users Meet Clarifying Questions: Toward a Better Understanding of User Interactions for Search ClarificationACM Transactions on Information Systems10.1145/352411041:1(1-25)Online publication date: 9-Jan-2023
        • (2023)2nd Workshop on Augmented Intelligence in Technology-Assisted Review Systems (ALTARS)Advances in Information Retrieval10.1007/978-3-031-28241-6_41(384-387)Online publication date: 2-Apr-2023
        • (2022)Learning to Ask: Conversational Product Search via Representation LearningACM Transactions on Information Systems10.1145/355537141:2(1-27)Online publication date: 21-Dec-2022
        • (2022)Improving search and recommendation by asking clarifying questionsACM SIGIR Forum10.1145/3527546.352757855:2(1-2)Online publication date: 17-Mar-2022
        • (2022)Extreme Systematic Reviews: A Large Literature Screening Dataset to Support Environmental PolicymakingProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557600(4029-4033)Online publication date: 17-Oct-2022
        • Show More Cited By

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media