TWOLAR: A TWO-Step LLM-Augmented Distillation Method for Passage Reranking

Baldelli, Davide; Jiang, Junfeng; Aizawa, Akiko; Torroni, Paolo

doi:10.1007/978-3-031-56027-9_29

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14608))

Included in the following conference series:

European Conference on Information Retrieval

1143 Accesses
1 Citations

Abstract

In this paper, we present TWOLAR: a two-stage pipeline for passage reranking based on the distillation of knowledge from Large Language Models (LLM). TWOLAR introduces a new scoring strategy and a distillation process consisting in the creation of a novel and diverse training dataset. The dataset consists of 20K queries, each associated with a set of documents retrieved via four distinct retrieval methods to ensure diversity, and then reranked by exploiting the zero-shot reranking capabilities of an LLM. Our ablation studies demonstrate the contribution of each new component we introduced. Our experimental results show that TWOLAR significantly enhances the document reranking ability of the underlying model, matching and in some cases even outperforming state-of-the-art models with three orders of magnitude more parameters on the TREC-DL test sets and the zero-shot evaluation benchmark BEIR. To facilitate future work we release our data set, finetuned models, and code (Code: https://github.com/Dundalia/TWOLAR; Models and Dataset: https://huggingface.co/Dundalia).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Query Generation Using Large Language Models

An Experimental Study on Pretraining Transformers from Scratch for IR

Rethink Training of BERT Rerankers in Multi-stage Retrieval Pipeline

Notes

1.
The average intersection rates were then calculated to provide a comprehensive view of the overall overlap among the retrieved documents from all sources:
$$\begin{aligned} \frac{1}{|\mathcal {Q}|}\sum _{q\in \mathcal {Q}} \frac{|S^1_q \cap S^2_q|}{N} \end{aligned}$$
(3)
where $\mathcal {Q}$ is the whole query set, $S^1_q$ and $S^2_q$ represent the retrieved document set from two sources given the query q. This process was carried out separately for both types of queries: the cropped sentence queries and the docT5query-generated queries.
2.
https://github.com/sunnweiwei/RankGPT.

References

Baktash, J.A., Dawodi, M.: Gpt-4: a review on advancements and opportunities in natural language processing (2023)
Google Scholar
Bonifacio, L., Abonizio, H., Fadaee, M., Nogueira, R.: InPars: unsupervised dataset generation for information retrieval. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 2387–2392. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3477495.3531863
Brown, T.B., et al.: Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS 2020. Curran Associates Inc., Red Hook, NY, USA (2020)
Google Scholar
Burges, C., et al.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 89–96. Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10.1145/1102351.1102363
Chung, H.W., et al.: Scaling instruction-finetuned language models (2022)
Google Scholar
Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track (2021)
Google Scholar
Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Fan, Y., et al.: Pre-training methods in information retrieval. Found. Trends® Inf. Retrieval 16(3), 178–317 (2022)
Google Scholar
Formal, T., Lassance, C., Piwowarski, B., Clinchant, S.: SPLADE v2: sparse lexical and expansion model for information retrieval. arXiv preprint arXiv:2109.10086 (2021)
Formal, T., Lassance, C., Piwowarski, B., Clinchant, S.: From distillation to hard negative sampling: making sparse neural IR models more effective. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2353–2359 (2022)
Google Scholar
Formal, T., Piwowarski, B., Clinchant, S.: SPLADE: sparse lexical and expansion model for first stage ranking. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 2288–2292. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3404835.3463098
Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, October 2016. https://doi.org/10.1145/2983323.2983769
Guo, J., et al.: A deep look into neural ranking models for information retrieval. Inf. Process. Manag. 57(6), 102067 (2020)
Article Google Scholar
He, P., Gao, J., Chen, W.: DeBERTaV 3: improving DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. arXiv preprint arXiv:2111.09543 (2021)
Jeronymo, V., et al.: InPars-v2: large language models as efficient dataset generators for information retrieval (2023). https://doi.org/10.48550/ARXIV.2301.01820, https://arxiv.org/abs/2301.01820
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769–6781. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.emnlp-main.550, https://aclanthology.org/2020.emnlp-main.550
Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist. 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276, https://aclanthology.org/Q19-1026
Li, H.: Learning to rank for information retrieval and natural language processing, second edition. Synth. Lect. Hum. Lang. Technol. 7, 1–123 (2015). https://doi.org/10.2200/S00607ED2V01Y201410HLT026
Lin, J., Nogueira, R., Yates, A.: Pretrained Transformers for Text Ranking: BERT and Beyond. Springer, New York (2022). https://doi.org/10.1007/978-3-031-02181-7
Lin, S.C., et al.: How to train your dragon: diverse augmentation towards generalizable dense retrieval. arXiv preprint arXiv:2302.07452 (2023)
Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009). https://doi.org/10.1561/1500000016
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bkg6RiCqY7
Ma, X., Zhang, X., Pradeep, R., Lin, J.: Zero-shot listwise document reranking with a large language model. arXiv preprint arXiv:2305.02156 (2023)
Mitra, B., Craswell, N.: Neural models for information retrieval. arXiv preprint arXiv:1705.01509 (2017)
Nguyen, T., et al.: MS MARCO: a human-generated MAchine reading COmprehension dataset (2017). https://openreview.net/forum?id=Hk1iOLcle
Nogueira, R., Cho, K.: Passage re-ranking with BERT. arXiv preprint arXiv:1901.04085 (2019)
Nogueira, R., Jiang, Z., Pradeep, R., Lin, J.: Document ranking with a pretrained sequence-to-sequence model. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 708–718. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.63, https://aclanthology.org/2020.findings-emnlp.63
Nogueira, R., Lin, J., Epistemic, A.: From doc2query to docTTTTTquery. Online Preprint 6, 2 (2019)
Google Scholar
Nogueira, R., Yang, W., Cho, K., Lin, J.: Multi-stage document ranking with BERT. arXiv preprint arXiv:1910.14424 (2019)
Qin, Z., et al.: Large language models are effective text rankers with pairwise ranking prompting. arXiv preprint arXiv:2306.17563 (2023)
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
MathSciNet Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at TREC-3. NIST Special Publ. SP 109, 109 (1995)
Google Scholar
Rosa, G.M., et al.: No parameter left behind: how distillation and model size affect zero-shot retrieval. arXiv preprint arXiv:2206.02873 (2022)
Sun, W., Yan, L., Ma, X., Ren, P., Yin, D., Ren, Z.: Is ChatGPT good at search? Investigating large language models as re-ranking agent. arXiv preprint arXiv:2304.09542 (2023)
Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., Gurevych, I.: BEIR: a heterogeneous benchmark for zero-shot evaluation of information retrieval models. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021). https://openreview.net/forum?id=wCu6T5xFjeJ
VanBuskirk, A.: GPT-3.5 turbo vs GPT-4: what’s the difference? March 2023. https://blog.wordbot.io/ai-artificial-intelligence/gpt-3-5-turbo-vs-gpt-4-whats-the-difference/
Wang, B., Komatsuzaki, A.: GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model, May 2021. https://github.com/kingoflolz/mesh-transformer-jax
Yang, Z., et al.: HotpotQA: a dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600 (2018)
Zhai, C., et al.: Statistical language models for information retrieval a critical review. Found. Trends® Inf. Retr. 2(3), 137–213 (2008)
Google Scholar
Zhao, W.X., Liu, J., Ren, R., Wen, J.R.: Dense text retrieval based on pretrained language models: a survey. arXiv preprint arXiv:2211.14876 (2022)
Zhu, Y., et al.: Large language models for information retrieval: a survey. arXiv preprint arXiv:2308.07107 (2023)
Zhuang, H., et al.: RankT5: fine-tuning T5 for text ranking with ranking losses. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2308–2313 (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Bologna, Bologna, Italy
Davide Baldelli & Paolo Torroni
University of Tokyo, Tokyo, Japan
Junfeng Jiang
National Institute of Informatics, Tokyo, Japan
Davide Baldelli & Akiko Aizawa

Authors

Davide Baldelli
View author publications
You can also search for this author in PubMed Google Scholar
Junfeng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Akiko Aizawa
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Torroni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Davide Baldelli .

Editor information

Editors and Affiliations

Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baldelli, D., Jiang, J., Aizawa, A., Torroni, P. (2024). TWOLAR: A TWO-Step LLM-Augmented Distillation Method for Passage Reranking. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-56027-9_29
Published: 20 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56026-2
Online ISBN: 978-3-031-56027-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TWOLAR: A TWO-Step LLM-Augmented Distillation Method for Passage Reranking

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Query Generation Using Large Language Models

An Experimental Study on Pretraining Transformers from Scratch for IR

Rethink Training of BERT Rerankers in Multi-stage Retrieval Pipeline

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

TWOLAR: A TWO-Step LLM-Augmented Distillation Method for Passage Reranking

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Query Generation Using Large Language Models

An Experimental Study on Pretraining Transformers from Scratch for IR

Rethink Training of BERT Rerankers in Multi-stage Retrieval Pipeline

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation