Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3539618.3591902acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

Published: 18 July 2023 Publication History

Abstract

Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic evaluation results. Another missing piece is, that a majority of prior work evaluates sparse retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO. However, a key requirement in practical retrieval systems requires models that can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In this work, we provide SPRINT, a unified python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval. The toolkit currently includes five built-in models: uniCOIL, DeepImpact, SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by defining their term weighting method. Using our toolkit, we establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2 achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural sparse retrievers. In this work, we further uncover the reasons behind its performance gain. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document which is often crucial for its performance gains, i.e. a limitation among its other sparse counterparts. We provide our SPRINT toolkit, models, and data used in our experiments publicly here: https://github.com/thakur-nandan/sprint.

References

[1]
Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, and Qun Liu. 2020. SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval. https://doi.org/10.48550/ARXIV.2010.00768
[2]
Adam Berger, Rich Caruana, David Cohn, Dayne Freitag, and Vibhu Mittal. 2000. Bridging the Lexical Chasm: Statistical Approaches to Answer-Finding. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '00). Association for Computing Machinery, New York, NY, USA, 192--199. https://doi.org/10.1145/345508.345576
[3]
Leonid Boytsov and Bilegsaikhan Naidan. 2013. Engineering Efficient and Effective Non-metric Space Library. In Similarity Search and Applications - 6th International Conference, SISAP 2013, A Coru n a, Spain, October 2--4, 2013, Proceedings (Lecture Notes in Computer Science), Nieves R. Brisaboa, Oscar Pedreira, and Pavel Zezula (Eds.), Vol. 8199. Springer, 280--293. https://doi.org/10.1007/978-3-642-41062-8_28
[4]
Leonid Boytsov and Eric Nyberg. 2020. Flexible retrieval with NMSLIB and FlexNeuART. In Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS). Association for Computational Linguistics, Online, 32--43. https://doi.org/10.18653/v1/2020.nlposs-1.6
[5]
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2020. Overview of the TREC 2019 deep learning track. https://doi.org/10.48550/ARXIV.2003.07820
[6]
Zhuyun Dai and Jamie Callan. 2020. Context-Aware Term Weighting For First Stage Passage Retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 1533--1536. https://doi.org/10.1145/3397271.3401204
[7]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19--1423
[8]
Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. 2021a. SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval. https://doi.org/10.48550/ARXIV.2109.10086
[9]
Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant. 2021b. SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 2288--2292. https://doi.org/10.1145/3404835.3463098
[10]
Luyu Gao, Zhuyun Dai, and Jamie Callan. 2021. COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 3030--3042. https://doi.org/10.18653/v1/2021.naacl-main.241
[11]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. https://doi.org/10.48550/ARXIV.1503.02531
[12]
Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, and Allan Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 113--122. https://doi.org/10.1145/3404835.3462891
[13]
Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury. 2021. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation. https://doi.org/10.48550/ARXIV.2010.02666
[14]
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2020. TinyBERT: Distilling BERT for Natural Language Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 4163--4174. https://doi.org/10.18653/v1/2020.findings-emnlp.372
[15]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, Vol. 7, 3 (2019), 535--547. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8733051
[16]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 6769--6781. https://doi.org/10.18653/v1/2020.emnlp-main.550
[17]
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Matthew Kelcey, Jacob Devlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: a Benchmark for Question Answering Research. Transactions of the Association of Computational Linguistics (2019). https://aclanthology.org/Q19-1026/
[18]
Carlos Lassance and Stéphane Clinchant. 2022. An Efficiency Study for SPLADE Models. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 2220--2226. https://doi.org/10.1145/3477495.3531833
[19]
Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent Retrieval for Weakly Supervised Open Domain Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 6086--6096. https://doi.org/10.18653/v1/P19-1612
[20]
Minghan Li, Sheng-Chieh Lin, Xueguang Ma, and Jimmy Lin. 2023. SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes. https://doi.org/10.48550/ARXIV.2302.06587
[21]
Davis Liang, Peng Xu, Siamak Shakeri, Cicero Nogueira dos Santos, Ramesh Nallapati, Zhiheng Huang, and Bing Xiang. 2020. Embedding-based Zero-shot Retrieval through Query Generation. https://doi.org/10.48550/ARXIV.2009.10270
[22]
Jimmy Lin and Xueguang Ma. 2021. A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques. https://doi.org/10.48550/ARXIV.2106.14807
[23]
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 2356--2362. https://doi.org/10.1145/3404835.3463238
[24]
Sheng-Chieh Lin and Jimmy Lin. 2021. Densifying Sparse Representations for Passage Retrieval by Representational Slicing. https://doi.org/10.48550/ARXIV.2112.04666
[25]
Ji Ma, Ivan Korotkov, Yinfei Yang, Keith Hall, and Ryan McDonald. 2021. Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 1075--1088. https://aclanthology.org/2021.eacl-main.92
[26]
Craig Macdonald and Nicola Tonellotto. 2020. Declarative Experimentation in Information Retrieval Using PyTerrier. In Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval (ICTIR '20). Association for Computing Machinery, New York, NY, USA, 161--168. https://doi.org/10.1145/3409256.3409829
[27]
Joel Mackenzie, Andrew Trotman, and Jimmy Lin. 2021. Wacky Weights in Learned Sparse Representations and the Revenge of Score-at-a-Time Query Evaluation. https://doi.org/10.48550/ARXIV.2110.11540
[28]
Macedo Maia, Siegfried Handschuh, André Freitas, Brian Davis, Ross McDermott, Manel Zarrouk, and Alexandra Balahur. 2018. WWW'18 Open Challenge: Financial Opinion Mining and Question Answering. In Companion Proceedings of the The Web Conference 2018 (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1941--1942. https://doi.org/10.1145/3184558.3192301
[29]
Antonio Mallia, Omar Khattab, Torsten Suel, and Nicola Tonellotto. 2021. Learning Passage Impacts for Inverted Indexes. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 1723--1727. https://doi.org/10.1145/3404835.3463030
[30]
Antonio Mallia, Michal Siedlaczek, Joel Mackenzie, and Torsten Suel. 2019. PISA: Performant Indexes and Search for Academia. In Proceedings of the Open-Source IR Replicability Challenge co-located with 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, OSIRRC@SIGIR 2019, Paris, France, July 25, 2019. 50--56. http://ceur-ws.org/Vol-2409/docker08.pdf
[31]
Thong Nguyen, Sean MacAvaney, and Andrew Yates. 2023. A Unified Framework For Learned Sparse Retrieval. In Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2-6, 2023, Proceedings, Part III. Springer-Verlag, Berlin, Heidelberg, 101--116. https://doi.org/10.1007/978-3-031-28241-6_7
[32]
Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. (November 2016). https://www.microsoft.com/en-us/research/publication/ms-marco-human-generated-machine-reading-comprehension-dataset/
[33]
Rodrigo Nogueira and Jimmy Lin. 2019. From doc2query to docTTTTTquery. Online preprint (2019). https://cs.uwaterloo.ca/jimmylin/publications/Nogueira_Lin_2019_docTTTTTquery-v2.pdf
[34]
Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document Expansion by Query Prediction. https://doi.org/10.48550/ARXIV.1904.08375
[35]
Biswajit Paria, Chih-Kuan Yeh, Ian E.H. Yen, Ning Xu, Pradeep Ravikumar, and Barnabás Póczos. 2020. Minimizing FLOPs to Learn Efficient Sparse Representations. In International Conference on Learning Representations. https://openreview.net/forum?id=SygpC6Ntvr
[36]
Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Wayne Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. 2021. RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 5835--5847. https://doi.org/10.18653/v1/2021.naacl-main.466
[37]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, Vol. 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.html
[38]
Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval, Vol. 3, 4 (2009), 333--389. https://doi.org/10.1561/1500000019
[39]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https://doi.org/10.48550/ARXIV.1910.01108
[40]
Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). https://openreview.net/forum?id=wCu6T5xFjeJ
[41]
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2019. Representation Learning with Contrastive Predictive Coding. https://doi.org/10.48550/ARXIV.1807.03748
[42]
David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. 2020. Fact or Fiction: Verifying Scientific Claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 7534--7550. https://doi.org/10.18653/v1/2020.emnlp-main.609
[43]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. HuggingFace's Transformers: State-of-the-art Natural Language Processing. https://doi.org/10.48550/ARXIV.1910.03771
[44]
Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In International Conference on Learning Representations. https://openreview.net/forum?id=zeFrfgyZln
[45]
Jheng-Hong Yang, Xueguang Ma, and Jimmy Lin. 2021. Sparsifying Sparse Representations for Passage Retrieval by Top-k Masking. https://doi.org/10.48550/ARXIV.2112.09628
[46]
Andrew Yates, Kevin Martin Jose, Xinyu Zhang, and Jimmy Lin. 2020. Flexible IR Pipelines with Capreolus. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20). Association for Computing Machinery, New York, NY, USA, 3181--3188. https://doi.org/10.1145/3340531.3412780
[47]
Andrew Yates, Rodrigo Nogueira, and Jimmy Lin. 2021. Pretrained Transformers for Text Ranking: BERT and Beyond. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM '21). Association for Computing Machinery, New York, NY, USA, 1154--1156. https://doi.org/10.1145/3437963.3441667
[48]
Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, Erik Learned-Miller, and Jaap Kamps. 2018. From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM '18). Association for Computing Machinery, New York, NY, USA, 497--506. https://doi.org/10.1145/3269206.3271800
[49]
Tiancheng Zhao, Xiaopeng Lu, and Kyusong Lee. 2021. SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 565--575. https://doi.org/10.18653/v1/2021.naacl-main.47
[50]
Shengyao Zhuang and Guido Zuccon. 2021a. Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion. https://doi.org/10.48550/ARXIV.2108.08513
[51]
Shengyao Zhuang and Guido Zuccon. 2021b. TILDE: Term Independent Likelihood MoDEl for Passage Re-Ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 1483--1492. https://doi.org/10.1145/3404835.3462922

Cited By

View all
  • (2024)Towards Effective and Efficient Sparse Neural Information RetrievalACM Transactions on Information Systems10.1145/363491242:5(1-46)Online publication date: 29-Apr-2024
  • (2024)Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIRProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657861(1420-1430)Online publication date: 10-Jul-2024
  • (2023)Privacy-aware document retrieval with two-level inverted indexingInformation Retrieval10.1007/s10791-023-09428-z26:1-2Online publication date: 17-Nov-2023

Index Terms

  1. SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2023
    3567 pages
    ISBN:9781450394086
    DOI:10.1145/3539618
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    SIGIR '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)76
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 11 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Towards Effective and Efficient Sparse Neural Information RetrievalACM Transactions on Information Systems10.1145/363491242:5(1-46)Online publication date: 29-Apr-2024
    • (2024)Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIRProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657861(1420-1430)Online publication date: 10-Jul-2024
    • (2023)Privacy-aware document retrieval with two-level inverted indexingInformation Retrieval10.1007/s10791-023-09428-z26:1-2Online publication date: 17-Nov-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media