Abstract
In this paper, we present TWOLAR: a two-stage pipeline for passage reranking based on the distillation of knowledge from Large Language Models (LLM). TWOLAR introduces a new scoring strategy and a distillation process consisting in the creation of a novel and diverse training dataset. The dataset consists of 20K queries, each associated with a set of documents retrieved via four distinct retrieval methods to ensure diversity, and then reranked by exploiting the zero-shot reranking capabilities of an LLM. Our ablation studies demonstrate the contribution of each new component we introduced. Our experimental results show that TWOLAR significantly enhances the document reranking ability of the underlying model, matching and in some cases even outperforming state-of-the-art models with three orders of magnitude more parameters on the TREC-DL test sets and the zero-shot evaluation benchmark BEIR. To facilitate future work we release our data set, finetuned models, and code (Code: https://github.com/Dundalia/TWOLAR; Models and Dataset: https://huggingface.co/Dundalia).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The average intersection rates were then calculated to provide a comprehensive view of the overall overlap among the retrieved documents from all sources:
$$\begin{aligned} \frac{1}{|\mathcal {Q}|}\sum _{q\in \mathcal {Q}} \frac{|S^1_q \cap S^2_q|}{N} \end{aligned}$$(3)where \(\mathcal {Q}\) is the whole query set, \(S^1_q\) and \(S^2_q\) represent the retrieved document set from two sources given the query q. This process was carried out separately for both types of queries: the cropped sentence queries and the docT5query-generated queries.
- 2.
References
Baktash, J.A., Dawodi, M.: Gpt-4: a review on advancements and opportunities in natural language processing (2023)
Bonifacio, L., Abonizio, H., Fadaee, M., Nogueira, R.: InPars: unsupervised dataset generation for information retrieval. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 2387–2392. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3477495.3531863
Brown, T.B., et al.: Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS 2020. Curran Associates Inc., Red Hook, NY, USA (2020)
Burges, C., et al.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 89–96. Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10.1145/1102351.1102363
Chung, H.W., et al.: Scaling instruction-finetuned language models (2022)
Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track (2021)
Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Fan, Y., et al.: Pre-training methods in information retrieval. Found. Trends® Inf. Retrieval 16(3), 178–317 (2022)
Formal, T., Lassance, C., Piwowarski, B., Clinchant, S.: SPLADE v2: sparse lexical and expansion model for information retrieval. arXiv preprint arXiv:2109.10086 (2021)
Formal, T., Lassance, C., Piwowarski, B., Clinchant, S.: From distillation to hard negative sampling: making sparse neural IR models more effective. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2353–2359 (2022)
Formal, T., Piwowarski, B., Clinchant, S.: SPLADE: sparse lexical and expansion model for first stage ranking. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 2288–2292. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3404835.3463098
Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, October 2016. https://doi.org/10.1145/2983323.2983769
Guo, J., et al.: A deep look into neural ranking models for information retrieval. Inf. Process. Manag. 57(6), 102067 (2020)
He, P., Gao, J., Chen, W.: DeBERTaV 3: improving DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. arXiv preprint arXiv:2111.09543 (2021)
Jeronymo, V., et al.: InPars-v2: large language models as efficient dataset generators for information retrieval (2023). https://doi.org/10.48550/ARXIV.2301.01820, https://arxiv.org/abs/2301.01820
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769–6781. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.emnlp-main.550, https://aclanthology.org/2020.emnlp-main.550
Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist. 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276, https://aclanthology.org/Q19-1026
Li, H.: Learning to rank for information retrieval and natural language processing, second edition. Synth. Lect. Hum. Lang. Technol. 7, 1–123 (2015). https://doi.org/10.2200/S00607ED2V01Y201410HLT026
Lin, J., Nogueira, R., Yates, A.: Pretrained Transformers for Text Ranking: BERT and Beyond. Springer, New York (2022). https://doi.org/10.1007/978-3-031-02181-7
Lin, S.C., et al.: How to train your dragon: diverse augmentation towards generalizable dense retrieval. arXiv preprint arXiv:2302.07452 (2023)
Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009). https://doi.org/10.1561/1500000016
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bkg6RiCqY7
Ma, X., Zhang, X., Pradeep, R., Lin, J.: Zero-shot listwise document reranking with a large language model. arXiv preprint arXiv:2305.02156 (2023)
Mitra, B., Craswell, N.: Neural models for information retrieval. arXiv preprint arXiv:1705.01509 (2017)
Nguyen, T., et al.: MS MARCO: a human-generated MAchine reading COmprehension dataset (2017). https://openreview.net/forum?id=Hk1iOLcle
Nogueira, R., Cho, K.: Passage re-ranking with BERT. arXiv preprint arXiv:1901.04085 (2019)
Nogueira, R., Jiang, Z., Pradeep, R., Lin, J.: Document ranking with a pretrained sequence-to-sequence model. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 708–718. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.63, https://aclanthology.org/2020.findings-emnlp.63
Nogueira, R., Lin, J., Epistemic, A.: From doc2query to docTTTTTquery. Online Preprint 6, 2 (2019)
Nogueira, R., Yang, W., Cho, K., Lin, J.: Multi-stage document ranking with BERT. arXiv preprint arXiv:1910.14424 (2019)
Qin, Z., et al.: Large language models are effective text rankers with pairwise ranking prompting. arXiv preprint arXiv:2306.17563 (2023)
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at TREC-3. NIST Special Publ. SP 109, 109 (1995)
Rosa, G.M., et al.: No parameter left behind: how distillation and model size affect zero-shot retrieval. arXiv preprint arXiv:2206.02873 (2022)
Sun, W., Yan, L., Ma, X., Ren, P., Yin, D., Ren, Z.: Is ChatGPT good at search? Investigating large language models as re-ranking agent. arXiv preprint arXiv:2304.09542 (2023)
Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., Gurevych, I.: BEIR: a heterogeneous benchmark for zero-shot evaluation of information retrieval models. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021). https://openreview.net/forum?id=wCu6T5xFjeJ
VanBuskirk, A.: GPT-3.5 turbo vs GPT-4: what’s the difference? March 2023. https://blog.wordbot.io/ai-artificial-intelligence/gpt-3-5-turbo-vs-gpt-4-whats-the-difference/
Wang, B., Komatsuzaki, A.: GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model, May 2021. https://github.com/kingoflolz/mesh-transformer-jax
Yang, Z., et al.: HotpotQA: a dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600 (2018)
Zhai, C., et al.: Statistical language models for information retrieval a critical review. Found. Trends® Inf. Retr. 2(3), 137–213 (2008)
Zhao, W.X., Liu, J., Ren, R., Wen, J.R.: Dense text retrieval based on pretrained language models: a survey. arXiv preprint arXiv:2211.14876 (2022)
Zhu, Y., et al.: Large language models for information retrieval: a survey. arXiv preprint arXiv:2308.07107 (2023)
Zhuang, H., et al.: RankT5: fine-tuning T5 for text ranking with ranking losses. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2308–2313 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Baldelli, D., Jiang, J., Aizawa, A., Torroni, P. (2024). TWOLAR: A TWO-Step LLM-Augmented Distillation Method for Passage Reranking. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-56027-9_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56026-2
Online ISBN: 978-3-031-56027-9
eBook Packages: Computer ScienceComputer Science (R0)