Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

TWOLAR: A TWO-Step LLM-Augmented Distillation Method for Passage Reranking

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2024)

Abstract

In this paper, we present TWOLAR: a two-stage pipeline for passage reranking based on the distillation of knowledge from Large Language Models (LLM). TWOLAR introduces a new scoring strategy and a distillation process consisting in the creation of a novel and diverse training dataset. The dataset consists of 20K queries, each associated with a set of documents retrieved via four distinct retrieval methods to ensure diversity, and then reranked by exploiting the zero-shot reranking capabilities of an LLM. Our ablation studies demonstrate the contribution of each new component we introduced. Our experimental results show that TWOLAR significantly enhances the document reranking ability of the underlying model, matching and in some cases even outperforming state-of-the-art models with three orders of magnitude more parameters on the TREC-DL test sets and the zero-shot evaluation benchmark BEIR. To facilitate future work we release our data set, finetuned models, and code (Code: https://github.com/Dundalia/TWOLAR; Models and Dataset: https://huggingface.co/Dundalia).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The average intersection rates were then calculated to provide a comprehensive view of the overall overlap among the retrieved documents from all sources:

    $$\begin{aligned} \frac{1}{|\mathcal {Q}|}\sum _{q\in \mathcal {Q}} \frac{|S^1_q \cap S^2_q|}{N} \end{aligned}$$
    (3)

    where \(\mathcal {Q}\) is the whole query set, \(S^1_q\) and \(S^2_q\) represent the retrieved document set from two sources given the query q. This process was carried out separately for both types of queries: the cropped sentence queries and the docT5query-generated queries.

  2. 2.

    https://github.com/sunnweiwei/RankGPT.

References

  1. Baktash, J.A., Dawodi, M.: Gpt-4: a review on advancements and opportunities in natural language processing (2023)

    Google Scholar 

  2. Bonifacio, L., Abonizio, H., Fadaee, M., Nogueira, R.: InPars: unsupervised dataset generation for information retrieval. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 2387–2392. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3477495.3531863

  3. Brown, T.B., et al.: Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS 2020. Curran Associates Inc., Red Hook, NY, USA (2020)

    Google Scholar 

  4. Burges, C., et al.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 89–96. Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10.1145/1102351.1102363

  5. Chung, H.W., et al.: Scaling instruction-finetuned language models (2022)

    Google Scholar 

  6. Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track (2021)

    Google Scholar 

  7. Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track (2020)

    Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423

  9. Fan, Y., et al.: Pre-training methods in information retrieval. Found. Trends® Inf. Retrieval 16(3), 178–317 (2022)

    Google Scholar 

  10. Formal, T., Lassance, C., Piwowarski, B., Clinchant, S.: SPLADE v2: sparse lexical and expansion model for information retrieval. arXiv preprint arXiv:2109.10086 (2021)

  11. Formal, T., Lassance, C., Piwowarski, B., Clinchant, S.: From distillation to hard negative sampling: making sparse neural IR models more effective. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2353–2359 (2022)

    Google Scholar 

  12. Formal, T., Piwowarski, B., Clinchant, S.: SPLADE: sparse lexical and expansion model for first stage ranking. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 2288–2292. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3404835.3463098

  13. Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, October 2016. https://doi.org/10.1145/2983323.2983769

  14. Guo, J., et al.: A deep look into neural ranking models for information retrieval. Inf. Process. Manag. 57(6), 102067 (2020)

    Article  Google Scholar 

  15. He, P., Gao, J., Chen, W.: DeBERTaV 3: improving DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. arXiv preprint arXiv:2111.09543 (2021)

  16. Jeronymo, V., et al.: InPars-v2: large language models as efficient dataset generators for information retrieval (2023). https://doi.org/10.48550/ARXIV.2301.01820, https://arxiv.org/abs/2301.01820

  17. Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769–6781. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.emnlp-main.550, https://aclanthology.org/2020.emnlp-main.550

  18. Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist. 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276, https://aclanthology.org/Q19-1026

  19. Li, H.: Learning to rank for information retrieval and natural language processing, second edition. Synth. Lect. Hum. Lang. Technol. 7, 1–123 (2015). https://doi.org/10.2200/S00607ED2V01Y201410HLT026

  20. Lin, J., Nogueira, R., Yates, A.: Pretrained Transformers for Text Ranking: BERT and Beyond. Springer, New York (2022). https://doi.org/10.1007/978-3-031-02181-7

  21. Lin, S.C., et al.: How to train your dragon: diverse augmentation towards generalizable dense retrieval. arXiv preprint arXiv:2302.07452 (2023)

  22. Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009). https://doi.org/10.1561/1500000016

  23. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bkg6RiCqY7

  24. Ma, X., Zhang, X., Pradeep, R., Lin, J.: Zero-shot listwise document reranking with a large language model. arXiv preprint arXiv:2305.02156 (2023)

  25. Mitra, B., Craswell, N.: Neural models for information retrieval. arXiv preprint arXiv:1705.01509 (2017)

  26. Nguyen, T., et al.: MS MARCO: a human-generated MAchine reading COmprehension dataset (2017). https://openreview.net/forum?id=Hk1iOLcle

  27. Nogueira, R., Cho, K.: Passage re-ranking with BERT. arXiv preprint arXiv:1901.04085 (2019)

  28. Nogueira, R., Jiang, Z., Pradeep, R., Lin, J.: Document ranking with a pretrained sequence-to-sequence model. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 708–718. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.63, https://aclanthology.org/2020.findings-emnlp.63

  29. Nogueira, R., Lin, J., Epistemic, A.: From doc2query to docTTTTTquery. Online Preprint 6, 2 (2019)

    Google Scholar 

  30. Nogueira, R., Yang, W., Cho, K., Lin, J.: Multi-stage document ranking with BERT. arXiv preprint arXiv:1910.14424 (2019)

  31. Qin, Z., et al.: Large language models are effective text rankers with pairwise ranking prompting. arXiv preprint arXiv:2306.17563 (2023)

  32. Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  33. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)

    MathSciNet  Google Scholar 

  34. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at TREC-3. NIST Special Publ. SP 109, 109 (1995)

    Google Scholar 

  35. Rosa, G.M., et al.: No parameter left behind: how distillation and model size affect zero-shot retrieval. arXiv preprint arXiv:2206.02873 (2022)

  36. Sun, W., Yan, L., Ma, X., Ren, P., Yin, D., Ren, Z.: Is ChatGPT good at search? Investigating large language models as re-ranking agent. arXiv preprint arXiv:2304.09542 (2023)

  37. Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., Gurevych, I.: BEIR: a heterogeneous benchmark for zero-shot evaluation of information retrieval models. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021). https://openreview.net/forum?id=wCu6T5xFjeJ

  38. VanBuskirk, A.: GPT-3.5 turbo vs GPT-4: what’s the difference? March 2023. https://blog.wordbot.io/ai-artificial-intelligence/gpt-3-5-turbo-vs-gpt-4-whats-the-difference/

  39. Wang, B., Komatsuzaki, A.: GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model, May 2021. https://github.com/kingoflolz/mesh-transformer-jax

  40. Yang, Z., et al.: HotpotQA: a dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600 (2018)

  41. Zhai, C., et al.: Statistical language models for information retrieval a critical review. Found. Trends® Inf. Retr. 2(3), 137–213 (2008)

    Google Scholar 

  42. Zhao, W.X., Liu, J., Ren, R., Wen, J.R.: Dense text retrieval based on pretrained language models: a survey. arXiv preprint arXiv:2211.14876 (2022)

  43. Zhu, Y., et al.: Large language models for information retrieval: a survey. arXiv preprint arXiv:2308.07107 (2023)

  44. Zhuang, H., et al.: RankT5: fine-tuning T5 for text ranking with ranking losses. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2308–2313 (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davide Baldelli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baldelli, D., Jiang, J., Aizawa, A., Torroni, P. (2024). TWOLAR: A TWO-Step LLM-Augmented Distillation Method for Passage Reranking. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56027-9_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56026-2

  • Online ISBN: 978-3-031-56027-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics