survey

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

Authors:

Jimmy Xiangji Huang,

Angela Jennifer Huang,

Md Tahmid Rahman Laskar,

Amran BhuiyanAuthors Info & Claims

ACM Computing Surveys, Volume 56, Issue 7

Article No.: 185, Pages 1 - 33

https://doi.org/10.1145/3648471

Published: 09 April 2024 Publication History

Abstract

Recent years have witnessed a substantial increase in the use of deep learning to solve various natural language processing (NLP) problems. Early deep learning models were constrained by their sequential or unidirectional nature, such that they struggled to capture the contextual relationships across text inputs. The introduction of bidirectional encoder representations from transformers (BERT) leads to a robust encoder for the transformer model that can understand the broader context and deliver state-of-the-art performance across various NLP tasks. This has inspired researchers and practitioners to apply BERT to practical problems, such as information retrieval (IR). A survey that focuses on a comprehensive analysis of prevalent approaches that apply pretrained transformer encoders like BERT to IR can thus be useful for academia and the industry. In light of this, we revisit a variety of BERT-based methods in this survey, cover a wide range of techniques of IR, and group them into six high-level categories: (i) handling long documents, (ii) integrating semantic information, (iii) balancing effectiveness and efficiency, (iv) predicting the weights of terms, (v) query expansion, and (vi) document expansion. We also provide links to resources, including datasets and toolkits, for BERT-based IR systems. Additionally, we highlight the advantages of employing encoder-based BERT models in contrast to recent large language models like ChatGPT, which are decoder-based and demand extensive computational resources. Finally, we summarize the comprehensive outcomes of the survey and suggest directions for future research in the area.

References

[1]

A. Abolghasemi, S. Verberne, and L. Azzopardi. 2022. Improving BERT-based query-by-document retrieval with multi-task optimization. In European Conference on Information Retrieval. Springer, Cham.

[2]

A. Babashzadeh, M. Daoud, and J. X. Huang. 2023. Using semantic-based association rule mining for improving clinical text retrieval. In Health Information Science: Second International Conference, HIS 2013. Springer, Berlin, 2013.

Digital Library

[3]

A. Baddeley. 1992. Working memory. Science 255, 5044 (1992), 556–559.

[4]

A. Chowdhery, S. Narang, J. Devlin, M. Bosma et al. 2023. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 24, 240 (2023), 1–113.

[5]

A. Graves and J. Schmidhuber. 2005. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18, 5-6 (2005), 602–610.

Digital Library

[6]

A. H. Lashkari, F. Mahdavi, and V. Ghomi. 2009. A boolean model in information retrieval for search engines. In Proceedings of the International Conference on Information Management and Engineering. IEEE, 385–389.

[7]

A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi. 2020. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53, 8 (2020), 5455–5516.

Digital Library

[8]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.

Digital Library

[9]

A. Mallia, O. Khattab, T. Suel, and N. Tonellotto. 2021. Learning passage impacts for inverted indexes. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1723–1727.

[10]

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. 2018. Improving language understanding by generative pre-training.

[11]

A. See, P. J. Liu, and C. D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1073-1083.

[12]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In Proceedings of Advances in Neural Information Processing Systems, 30.

[13]

A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv:1804.07461. Retrieved from https://arxiv.org/abs/1804.07461

[14]

A. Yates, S. Arora, X. Zhang, W. Yang, K. M. Jose, and J. Lin. 2020. Capreolus: A toolkit for end-to-end neural ad hoc retrieval. In Proceedings of the 13th International Conference on Web Search and Data Mining. 861–864.

[15]

A. Yadav and D. K. Vishwakarma. 2020. Sentiment analysis using deep learning architectures: A review. Artif. Intell. Rev. 53, 6 (2020), 4335–4385.

Digital Library

[16]

B. He, J. X. Huang, and X. Zhou. 2011. Modeling term proximity for probabilistic information retrieval models. Inf. Sci. 181, 14 (2011), 3017–3031.

Digital Library

[17]

B. Hu, Z. Lu, H. Li, and Q. Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Volume 2. 2042–2050.

[18]

C. Xiong, Z. Dai, and J. Callan. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 55–64.

[19]

B. McCann, J. Bradbury, C. Xiong, and R. Socher. 2017. Learned in translation: Contextualized word vectors. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6297–6308.

[20]

B. Mitra, F. Diaz, and N. Craswell. 2017. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web. 1291–1299.

[21]

B. Mitra and N. Craswell. 2017. Neural models for information retrieval. arXiv:1705.01509. Retrieved from https://arxiv.org/abs/1705.01509

[22]

C. Macdonald and N. Tonellotto. 2020. Declarative experimentation in information retrieval using PyTerrier. In Proceedings of the ACM SIGIR on International Conference on Theory of Information Retrieval. 161–168.

[23]

C. Li, A. Yates, S. MacAvaney, B. He, and Y. Sun. 2020. PARADE: Passage representation aggregation for document reranking. arXiv:2008.09093. Retrieved from https://arxiv.org/abs/2008.09093

[24]

C. Luo, Y. Zheng, J. Mao, Y. Liu, M. Zhang, and S. Ma. 2017. Training deep ranking model with weak relevance labels. In Australasian Database Conference. Springer, Cham, 205–216.

[25]

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv:1910.10683. Retrieved from https://arxiv.org/abs/1910.10683

[26]

D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco, R. S John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, Y.-H. Sung, B. Strope, and R. Kurzweil. 2018. Universal sentence encoder. arXiv: 1803.11175. Retrieved from https://arxiv.org/abs/1803.11175

[27]

D. Hiemstra. 2001. Using language models for information retrieval. Taaluitgeverij Neslia Paniculata, CTIT Ph.D. Thesis Series No. 01-32.

[28]

D. W. Otter, J. R. Medina, and J. K. Kalita. 2020. A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32, 2 (2020), 604–624.

[29]

E. Alsentzer, J. R. Murphy, W. Boag, W. H. Weng, D. Jin, T. Naumann, and M. McDermott. 2019. Publicly available clinical BERT embeddings. arXiv:1904.03323. Retrieved from https://arxiv.org/abs/1904.03323

[30]

E. Kamalloo, X. Zhang, O. Ogundepo, N. Thakur, D. Alfonso-Hermelo, M. Rezagholizadeh, and J. Lin. 2023. Evaluating embedding APIs for information retrieval. arXiv:2305.06300. Retrieved from https://arxiv.org/abs/2305.06300

[31]

E. M. Voorhees. 2004. Overview of the TREC 2004 robust track. In Proceedings of the 23rd Text REtrieval Conference (TREC ’04). 52–69.

[32]

G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, B. Kingsbury, and T. Sainath. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Sign. Process. Mag. 29, 6 (2012), 82–97.

[33]

G. Salton, A. Wong, and C. S. A. Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613–620.

Digital Library

[34]

G. Salton, E. A. Fox, and H. Wu. 1983. Extended boolean information retrieval. Commun. ACM. 26, 11 (1983) 1022–36.

Digital Library

[35]

H. C. Yu, C. Xiong, and J. Callan. 2021. Improving query representations for dense retrieval with pseudo-relevance feedback. arXiv:2108.13454. Retrieved from https://arxiv.org/abs/2108.13454

[36]

H. Gharagozlou, J. Mohammadzadeh, A. Bastanfard, and S. S. Ghidary. 2022. RLAS-BIABC: A reinforcement learning-based answer selection using the BERT model boosted by an improved ABC algorithm. Comput. Intell. Neurosci. (2022).

[37]

H. Schütze, C. D. Manning, and P. Raghavan. 2008. Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK.

[38]

H. Touvron, T. Lavril, G. Izacard, X. Martinet et al. 2023. Llama: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https://arxiv.org/abs/2302.13971

[39]

H. Touvron, L. Martin, K. Stone, P. Albert et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https://arxiv.org/abs/2307.09288

[40]

H. Zamani, and W. B. Croft. 2018. On the theory of weak supervision for information retrieval. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 147–154.

[41]

H. Zamani, W. B. Croft, and J. S. Culpepper. 2018. Neural query performance prediction using weak supervision from multiple signals. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 105–114.

[42]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2020. Generative adversarial nets. Commun. ACM 63, 11 (2020), 139–144.

Digital Library

[43]

I. Jahan, M. T. R. Laskar, C. Peng, J. X. Huang. 2022. Evaluation of ChatGPT on biomedical tasks: A zero-shot comparison with fine-tuned generative transformers. In Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks. 326–336.

[44]

J. Devlin, M. W Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.

[45]

J. Guo, Y. Fan, X. Ji, and X. Cheng. 2019. Matchzoo: A learning, practicing, and developing system for neural text matching. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1297–1300.

[46]

J. Guo, Y. Fan, Q. Ai, and W. B. Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 55–64.

[47]

J. Guo, Y. Fan, L. Pang, L. Yang, Q. Ai, H. Zamani, C. Wu, W. B. Croft, and X. Cheng. 2020. A deep look into neural ranking models for information retrieval. Inf. Process. Manage. 57, 6 (2020), 102067.

[48]

J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang. 2020. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.

[49]

J. Lin, X. Ma, S. C. Lin, J. H. Yang, R. Pradeep, and R. Nogueira. 2021. Pyserini: An easy-to-use python toolkit to support replicable ir research with sparse and dense representations. arXiv:2102.10073. Retrieved from https://arxiv.org/abs/2102.10073

[50]

J. Lin, R. Nogueira, and A. Yates. 2021. Pretrained transformers for text ranking: BERT and beyond. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining.

[51]

J. Pennington, R. Socher, and C. D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’14).

[52]

J. Lin, M. Efron, Y. Wang, and G. Sherman. 2014. Overview of the TREC-2014 microblog track. In Proceedings of the 23rd Text REtrieval Conference (TREC ’14).

[53]

J. Ponte and W. B. Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference (1998), 275–281.

[54]

J. X. Huang, F. Peng, D. Schuurmans, and N. Cercone. 2003. Applying machine learning to text segmentation for information retrieval. Inf. Retriev. 6, 3 (2003), 333–362.

Digital Library

[55]

J. X. Huang and Q. Hu. 2009. A bayesian learning approach to promoting diversity in ranking for biomedical information retrieval. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval.

[56]

J. X. Huang, M. Zhong, and L. Si. 2005. York University at TREC 2005: Genomics Track. In TREC. 2005.

[57]

J. X. Huang, J. Miao, and B. He. 2013. High performance query expansion using adaptive co-training. Inf. Process. Manage. 49, 2 (2013), 441–453.

Digital Library

[58]

J. Zhan, J. Mao, Y. Liu, M. Zhang, and S. Ma. 2020. RepBERT: Contextualized text embeddings for first-stage retrieval. 2020. arXiv:2006.15498. Retrieved from https://arxiv.org/abs/2006.15498

[59]

J. Zhao, J. X. Huang, and Z. Ye. 2014. Modeling term associations for probabilistic information retrieval. ACM Trans. Inf. Syst. 32, 2 (2014), 1–47.

Digital Library

[60]

J. Zhao, J. X. Huang, and B. He. 2011. CRTER: Using cross terms to enhance probabilistic information retrieval. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval.

[61]

K. D. Onal, Y. Zhang, I. S. Altingovde, M. M. Rahman, P. Karagoz, A. Braylan, B. Dang, H. L. Chang, H. Kim, Q. McNamara, and A. Angert. 2018. Neural information retrieval: At the end of the early years. Inf. Retriev. J. 21, 2 (2018) 111–182.

Digital Library

[62]

K. Clark, M. T. Luong, Q. V. Le, and C. D. Manning. 2020. ELECTRA: Pre-training text encoders as discriminators rather than generatorsg. In Proceedings of International Conference on Learning Representations (ICLR ’20).

[63]

K. Hui, A. Yates, K. Berberich, and G. Melo. 2018. Co-PACRR: A context-aware neural IR model for ad-hoc retrieval. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 279–287.

[64]

K. Keyvan and J. X. Huang. 2022. How to approach ambiguous queries in conversational search: A survey of techniques, approaches, tools, and challenges. ACM Comput. Surv. 55, 6 (2022), 1–40.

Digital Library

[65]

K. Zhang, C. Xiong, Z. Liu, and Z. Liu. 2020. Selective weak supervision for neural information retrieval. In Proceedings of The Web Conference. 474–485.

[66]

L. Dietz, M. Verma, F. Radlinski, and N. Craswell. 2017. TREC complex answer retrieval overview. In Proceedings of the Text REtrieval Conference (TREC ’17).

[67]

L. Liu, M. Li, J. Lin, S. Riedel, and P. Stenetorp. 2022. Query expansion Using contextual clue sampling with language models. arXiv:2210.07093. Retrieved from https://arxiv.org/abs/2210.07093

[68]

L. Pang, Y. Lan, J. Guo, J. Xu, S. Wan, and X. Cheng. 2016. Text matching as image recognition. In Proceedings of the AAAI Conference on Artificial Intelligence. 2793–2799.

[69]

L. Xiong, C. Xiong, Y. Li, K. Tang, and J. Liu. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv:2007.00808. Retrieved from https://arxiv.org/abs/2007.000808

[70]

M. Dehghani, A. Severyn, S. Rothe, and J. Kamps. 2017. Learning to learn from weak supervision by full supervision. arXiv:1711.11383. Retrieved from https://arxiv.org/abs/1711.11383

[71]

M. Ding, C. Zhou, H. Yang, and J. Tang. 2020. Cogltx: Applying bert to long texts. In Advances in Neural Information Processing Systems, Vol. 33, 12792–12804.

[72]

M. Gaur, K. Gunaratna, V. Srinivasan, and H. Jin. 2022. Iseeq: Information seeking question generation using dynamic meta-information retrieval and knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence. 10672–10680.

[73]

M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. 2018. Deep contextualized word representation. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2227–2237.

[74]

M. Li and E. Gaussier. 2021. KeyBLD: Selecting key blocks with local pre-ranking for long document information retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2207–2211.

[75]

M. Li and E. Gaussier. 2022. BERT-based dense intra-ranking and contextualized late interaction via multi-task learning for long document retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.

[76]

M. Lupu, J. X. Huang, J. Zhu, and J. Tait. 2009. TREC-CHEM: Large scale chemical information retrieval evaluation at TREC. In Proceedings of the ACM SIGIR Forum. 63–70.

[77]

M. Lupu, J. X. Huang, J. Zhu, and J. Tait. 2009. Overview of the TREC 2009 chemical IR track. In Proceedings of the Text REtrieval Conference (TREC ’09).

[78]

M. Pan, J. Wang, J. X. Huang, A. J. Huang, Q. Chen, and J. Chen. 2022. A probabilistic framework for integrating sentence-level semantics via BERT into pseudo-relevance feedback. Inf. Process. Manage. 59, 1 (2022), 102734.

Digital Library

[79]

M. Pan, Q. Pei, Y. Liu, T. Li, E. A. Huang, J. Wang, and J. X. Huang. 2023. SPRF: A semantic pseudo-relevance feedback enhancement for information retrieval via ConceptNet. Knowl.-Bas. Syst. 274, 110602 (2023).

[80]

M. T. R. Laskar, M. S. Bari, M. Rahman, M. A. H. Bhuiyan, S. Joty, and J. X. Huang. 2023. A systematic study and comprehensive evaluation of ChatGPT on benchmark datasets. In Findings of the Association for Computational Linguistics. 431–469.

[81]

M. T. R. Laskar, J. X. Huang, and E. Hoque. 2020. Contextualized embeddings-based transformer encoder for sentence similarity modeling in answer selection task. In Proceedings of the 12th Language Resources and Evaluation Conference.

[82]

M. T. R. Laskar, E. Hoque, and J. X. Huang. 2022. Domain adaptation with pre-trained transformers for query-focused abstractive text summarization. Comput. Ling. 48, 2 (2022), 279–320.

[83]

M. T. R. Laskar, E. Hoque, and J. X. Huang. 2020. WSL-DS: Weakly supervised learning with distant supervision for query-focused multi-document abstractive summarization. In Proceedings of the 28th International Conference on Computational Linguistics. 5647–5654.

[84]

M. Yan, C. Li, C. Wu, B. Bi, W. Wang, J. Xia, and L. Si. 2019. IDST at TREC 2019 deep learning track: Deep cascade ranking with generation-based document expansion and pre-trained language modeling. In Proceedings of the Text REtrieval Conference (TREC ’19).

[85]

N. Reimers and I. Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-networks. arXiv:1908.10084. Retrieved from https://arxiv.org/abs/1908.10084

[86]

O. Khattab and M. Zaharia. 2020. ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 39–48.

[87]

R. OpenAI. 2023. GPT-4 technical report. arXiv:2303.08774. View in Article, 2, 13.

[88]

P. Lewis, E. Perez, A. Piktus, F. Petroni et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, Vol. 33, 9459–9474.

[89]

P. S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 2333–2338.

[90]

P. Shi and J. Lin. 2019. Cross-lingual relevance transfer for document retrieval. arXiv:1911.02989. Retrieved from https://arxiv.org/abs/1911.02989

[91]

P. Punyani, R. Gupta, and A. Kumar. 2020. Neural networks for facial age estimation: A survey on recent advances. Artif. Intell. Rev. 53, 5 (2020), 3299–3347.

Digital Library

[92]

P. Xu, X. Ma, R. Nallapati, and B. Xiang. 2019. Passage ranking with weak supervision. arXiv:1905.05910. Retrieved from https://arxiv.org/abs/1905.05910

[93]

P. Yang, H. Fang, and J. Lin. 2017. Anserini: Enabling the use of lucene for information retrieval research. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1253–1256.

[94]

Q. Chen, Q. Hu, J. X. Huang, L. He, and W. An. 2017. Enhancing recurrent neural networks with positional attention for question answering. 2017. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.

[95]

R. Anil, A. M. Dai, O. Firat, M. Johnson, D. Lepikhin et al. 2023. Palm 2 technical report. arXiv:2305.10403. Retrieved from https://arxiv.org/abs/2305.10403

[96]

R. Padaki, Z. Dai, and J. Callan. 2020. Rethinking query expansion for BERT reranking. In European Conference on Information Retrieval. Springer, Cham, 297–304.

[97]

R. Nogueira, J. Lin, and A. I. Epistemic. 2019. From doc2query to docTTTTTquery. Online preprint, 6, 2.

[98]

R. Nogueira and K. Cho. 2019. Passage Re-ranking with BERT. arXiv:1901.04085. Retrieved from https://arxiv.org/abs/1901.04085

[99]

R. Nogueira, W. Yang, K. Cho, and J. Lin. 2019. Multi-stage document ranking with BERT. arXiv:1904.08375. Retrieved from https://arxiv.org/abs/1904.08375

[100]

R. Nogueira, W. Yang, J. Lin, and K. Cho. 2019. Document expansion by query prediction. arXiv:1904.08375. Retrieved from https://arxiv.org/abs/1904.08375

[101]

R. Sennrich, H. Barry, and B. Alexandra. 2015. Neural machine translation of rare words with subword units. arXiv:1508.07909. Retrieved from https://arxiv.org/abs/1508.07909

[102]

R. Zhu, X. Tu, and J. X. Huang. 2020. Deep learning on information retrieval and its applications. In Deep Learning for Data Analytics. Academic Press, San Diego, CA, 125–153.

[103]

S. C. Lin, J. H. Yang, and J. Lin. 2020. Distilling dense representations for ranking using tightly-coupled teachers. arXiv:2010.11386. Retrieved from https://arxiv.org/abs/2010.11386

[104]

S. Hofstätter, M. Zlabinger, and A. Hanbury. 2019. TU Wien TREC deep learning’19—Simple contextualization for re-ranking. arXiv:1912.01385. Retrieved from https://arxiv.org/abs/1912.01385

[105]

S. Hofstätter, M. Zlabinger, and A. Hanbury. 2020. Interpretable & time-budget-constrained contextualization for re-ranking. arXiv:2002.01854. Retrieved from https://arxiv.org/abs/2002.01854

[106]

S. Hofstätter, H. Zamani, B. Mitra, N. Craswell, and A. Hanbury. 2020. Local self-attention over long text for efficient document retrieval. In Proceedings of the 43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). 2021–2024.

[107]

S. Hofstätter, S. C. Lin, J. H. Yang, J. Lin, and A. Hanbury. 2021. Efficiently teaching an effective dense retriever with balanced topic aware sampling. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 113–122.

[108]

S. Hofstätter, B. Mitra, H. Zamani, N. Craswell, and A. Hanbury. 2021. Intra-document cascading: Learning to select passages for neural document ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1349–1358.

[109]

S. MacAvaney. 2020. OpenNIR: A complete neural ad-hoc ranking pipeline. In Proceedings of the 13th International Conference on Web Search and Data Mining. 845–848.

Digital Library

[110]

S. MacAvaney, F. M. Nardini, R. Perego, N. Tonellotto, N. Goharian, and O. Frieder. 2020. Expansion via prediction of importance with contextualization. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1573–1576.

[111]

S. MacAvaney, A. Yates, A. Cohan, and N. Goharian. 2019. CEDR: Contextualized embeddings for document ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1101–1104.

Digital Library

[112]

S. Macavaney, F. M. Nardini, R. Perego, N. Tonellotto, N. Goharian, and O. Frieder. 2020. Training curricula for open domain answer re-ranking. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 529–538.

[113]

S. Naseri, J. Dalton, A. Yates, and J. Allan. 2021. CEQE: Contextualized embeddings for query expansion. In European Conference on Information Retrieval. Springer, Cham, 467–482.

[114]

S. Robertson and H. Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retriev. 3, 4 (2009), 333–389.

Digital Library

[115]

S. Wang, S. Zhuang, and G. Zuccon. 2021. BERT-based dense retrievers require interpolation with BM25 for effective passage retrieval. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 317–324. DOI:

Digital Library

[116]

S. Wan, Y. Lan, J. Xu, J. Guo, L. Pang, and X. Cheng. 2016. Match-SRNN: modeling the recursive matching structure with spatial RNN. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2922–2928.

[117]

S. Wan, Y. Lan, J. Guo, J. Xu, L. Pang, and X. Cheng. 2016. A deep architecture for semantic matching with multiple positional sentence representations. In Proceedings of the AAAI Conference on Artificial Intelligence.

[118]

S. Zhang, L. Yao, A. Sun, and Y. Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv. 52, 1 (2019), 1–38.

Digital Library

[119]

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, Vol. 33, 1877–1901.

[120]

T. Formal, B. Piwowarski, and S. Clinchant. 2021. SPLADE: Sparse lexical and expansion model for first stage ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2288–2292.

[121]

T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from https://arxiv.org/abs/1301.3781

[122]

T. Nguyen, M. Rosenberg, X. Song, J. Guo, S. Tiwary, R. Majumder, and L. Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. In Proceedings of the Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches Co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS ’16).

[123]

V. Karpukhin, B. O\(\breve{g}\)uz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W. Yih. 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’20).

[124]

V. Sanh, L. Debut, J. Chaumond, and T. Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv:1910.01108. Retrieved from https://arxiv.org/abs/1910.01108

[125]

W. B. Croft, D. Metzler, and T. Strohman. 2010. Search Engines: Information Retrieval in Practice. Addison-Wesley, Reading, MA.

Digital Library

[126]

W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi. 2017. A survey of deep neural network architectures and their applications. Neurocomputing 234 (2017), 11–26.

[127]

W. Liu, P. Zhou, Z. Zhao, Z. Wang, H. Deng, and Q. Ju. FastBERT: A self-distilling BERT with adaptive inference time. 2020. arXiv:2004.02178. Retrieved from https://arxiv.org/abs/2004.02178

[128]

W. Lu, J. Jiao, and R. Zhang. 2020. Twinbert: Distilling knowledge to twin-structured compressed bert models for large-scale retrieval. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2645–2652.

[129]

W. Sun, L. Yan, X. Ma, P. Ren, D. Yin, and Z. Ren. 2023. Is ChatGPT good at search? Investigating large language models as re-ranking agent. arXiv:2304.09542. Retrieved from https://arxiv.org/abs/2304.09542

[130]

W. Wang, B. Bi, M. Yan, C. Wu, Z. Bao, J. Xia, L. Peng, and Si L. 2019. StructBERT: Incorporating language structures into pre-training for deep language understanding. arXiv: 1908.04577. Retrieved from https://arxiv.org/abs/1908.04577

[131]

W. Yang, H. Zhang, and J. Lin. 2019. Simple applications of BERT for ad hoc document retrieval. arXiv: 1903.10972. Retrieved from https://arxiv.org/abs/1903.10972

[132]

X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, and Q. Liu. 2020. TinyBERT: Distilling BERT for natural language understanding. In Findings of the Association for Computational Linguistics: EMNLP. 4163–4174.

[133]

X. Wang, C. Macdonald, N. Tonellotto, and I. Ounis. 2021. Pseudo-relevance feedback for multiple representation dense retrieval. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 297–306.

[134]

X. Yin, J. X. Huang, Z. Li, and X. Zhou. 2013. A survival modeling approach to biomedical search result diversification using Wikipedia. IEEE Trans. Knowl. Data Eng. 25, 06 (2013), 1201–1212.

Digital Library

[135]

X. Zhang, A. Yates, and J. Lin. 2021. Comparing score aggregation approaches for document retrieval with pretrained transformers. In European Conference on Information Retrieval. Springer, Cham, 150–163.

[136]

Y. Bai, X. Li, G. Wang, C. Zhang, L. Shang, J. Xu, Z. Wang, F. Wang, and Q. Liu. 2020. SparTerm: Learning term-based sparse representation for fast text retrieval. arXiv:2010.00768. Retrieved from https://arxiv.org/abs/2010.00768

[137]

Y. Cui, W. Che, T. Liu, B. Qin, S. Wang, and G. Hu. 2020. Revisiting pre-trained models for Chinese natural language processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: Findings. 657–668.

[138]

Y. Lecun, Y. Bengio, and G. Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.

[139]

Y. Liu, J. X. Huang, A. An, and X. Yu. 2007. ARSA: A sentiment-aware model for predicting sales performance using blogs. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[140]

Y. Liu, M. Ott, N. Goyal, J. F. Du, M. Joshi, D. Q. Chen, O. Levy, M. Lewis, Zettlemoyer L, and Stoyanov V. Roberta. 2019. A robustly optimized BERT pretraining approach. arXiv:1907.11692. Retrieved from https://arxiv.org/abs/1907.11692

[141]

Y. Luan, J. Eisenstein, K. Toutanova, and M. Collins. 2021. Sparse, dense, and attentional representations for text retrieval. Trans. Assoc. Comput. Ling. 9 (2021), 329–345.

[142]

Y. Wu, M. Schuster, Z. Chen, Q. V.Le, and M. Norouzi. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from https://arxiv.org/abs/1609.08144

[143]

Z. A. Yilmaz, S. Wang, W. Yang, H. Zhang, and J. Lin. 2019. Applying BERT to document retrieval with birch. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ’19). 19–24.

[144]

Z. A. Yilmaz, W. Yang, H. Zhang, and J. Lin. 2019. Cross-domain sentence modeling for relevance transfer with BERT. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ’19).

[145]

Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. Le, and R. Salakhutdinov. 2019. Transformer-XL: Attention language models beyond a fixed-length comtext. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2978–2988.

[146]

Z. Dai and J. Callan. 2019. Deeper text understanding for IR with contextual neural language modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 985–988.

[147]

Z. Dai and J. Callan. 2020. Context-aware passage term weighting for first stage retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1533–1536.

[148]

Z. Dai and J. Callan. 2020. Context-aware document term weighting for ad-hoc search. In Proceedings of the Web Conference. 1897–1907.

[149]

Z. Dai, C. Xiong, J. Callan, and Z. Liu. 2018. Convolutional neural networks for soft-matching n-grams in ad-hoc search. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 126–134.

[150]

Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. 2019. AlBERT: A lite BERT for self-supervised learning of language representations. arXiv:1909.11942. Retrieved from https://arxiv.org/abs/1909.11942

[151]

Z. Liu, K. Zhang, C. Xiong, Z. Liu, and M. Sun. 2021. OpenMatch: An open source library for Neu-IR research. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2531–2535.

[152]

Z. Qin, R. Jagerman, K. Hui, H. Zhuang, J. Wu, J. Shen, T. Liu, J. Liu, D. Metzler, X. Wang et al. 2023. Large language models are effective text rankers with pairwise ranking prompting. arXiv:2306.17563. Retrieved from https://arxiv.org/abs/2306.17563

[153]

Z. Wu, J. Mao, Y. Liu, J. Zhan, Y. Zheng, M. Zhang, and S. Ma. 2020. Leveraging passage-level cumulative gain for document ranking. In Proceedings of the Web Conference. 2421–2431.

[154]

Z. Yang, Z. Dai, Y. Yang, and J. Carbonell. 2019. XLNET: Generalized autoregressive pretraining for language understanding. In Proceedings of Advances in Neural Information Processing Systems, Vol. 32, 5753–5763.

[155]

Z. Ye, J. X. Huang, and H. Lin. 2011. Finding a good query-related topic for boosting pseudo-relevance feedback. J. Am. Soc. Inf. Sci. Technol. 62, 4 (2011), 748–760.

Digital Library

[156]

Z. Zheng, K. Hui, B. He, X. Han, L. Sun, and A. Yates. 2020. BERT-QE: Contextualized query expansion for document re-ranking. arXiv:2009.07288. Retrieved from http://arxiv.org/abs/2009.07258

Cited By

Kalogeropoulos NKontogiannis GMakris C(2025)Spectral clustering and query expansion using embeddings on the graph-based extension of the set-based information retrieval modelExpert Systems with Applications10.1016/j.eswa.2024.125771263(125771)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125771
Ravi RGinde GRokne J(2025)PRAGyan - Connecting the Dots in TweetsSocial Networks Analysis and Mining10.1007/978-3-031-78548-1_25(338-354)Online publication date: 24-Jan-2025
https://doi.org/10.1007/978-3-031-78548-1_25
Zhang LCai MZhang YWang SXiao Y(2024)Two-layer network evolutionary game model applied to complex systemsThe European Physical Journal B10.1140/epjb/s10051-024-00809-x97:11Online publication date: 1-Nov-2024
https://doi.org/10.1140/epjb/s10051-024-00809-x
Show More Cited By

Index Terms

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges
1. Information systems
  1. Information retrieval

Recommendations

A lemmatization method for Mongolian and its application to indexing for information retrieval

In Mongolian, two different alphabets are used, Cyrillic and Mongolian. In this paper, we focus solely on the Mongolian language using the Cyrillic alphabet, in which a content word can be inflected when concatenated with one or more suffixes. ...
Construction of Amharic information retrieval resources and corpora
Abstract
The development of information retrieval systems and natural language processing tools has been made possible for many natural languages because of the availability of natural language resources and corpora. Although Amharic is the working ...
Effect of relationships between words on Japanese information retrieval

Two Japanese-language information retrieval (IR) methods that enhance retrieval effectiveness by utilizing the relationships between words are proposed. The first method uses dependency relationships between words in a sentence. The second method uses ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 56, Issue 7

July 2024

1006 pages

EISSN:1557-7341

DOI:10.1145/3613612

Editors:
David Atienza
Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland
,
Michela Milano
University of Bologna, Italy

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2024

Online AM: 14 February 2024

Accepted: 31 January 2024

Revised: 21 December 2023

Received: 19 July 2021

Published in CSUR Volume 56, Issue 7

Check for updates

Author Tags

Qualifiers

Survey

Funding Sources

Natural Science and Engineering Research Council (NSERC) of Canada
York Research Chairs (YRC)
Ontario Research Fund-Research Excellence (ORF-RE)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
1,919
Total Downloads

Downloads (Last 12 months)1,919
Downloads (Last 6 weeks)236

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kalogeropoulos NKontogiannis GMakris C(2025)Spectral clustering and query expansion using embeddings on the graph-based extension of the set-based information retrieval modelExpert Systems with Applications10.1016/j.eswa.2024.125771263(125771)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125771
Ravi RGinde GRokne J(2025)PRAGyan - Connecting the Dots in TweetsSocial Networks Analysis and Mining10.1007/978-3-031-78548-1_25(338-354)Online publication date: 24-Jan-2025
https://doi.org/10.1007/978-3-031-78548-1_25
Zhang LCai MZhang YWang SXiao Y(2024)Two-layer network evolutionary game model applied to complex systemsThe European Physical Journal B10.1140/epjb/s10051-024-00809-x97:11Online publication date: 1-Nov-2024
https://doi.org/10.1140/epjb/s10051-024-00809-x
Nahali SSafari LKhanteymoori AHuang J(2024)StructmRNA a BERT based model with dual level and conditional masking for mRNA representationScientific Reports10.1038/s41598-024-77172-514:1Online publication date: 29-Oct-2024
https://doi.org/10.1038/s41598-024-77172-5
Li HXing WJiao HYuen KGao RLi YMatthews CYang Z(2024)Bi-directional information fusion-driven deep network for ship trajectory prediction in intelligent transportation systemsTransportation Research Part E: Logistics and Transportation Review10.1016/j.tre.2024.103770192(103770)Online publication date: Dec-2024
https://doi.org/10.1016/j.tre.2024.103770
Stacchio LGarzarella SCascarano PDe Filippo ACervellati EMarfia G(2024)DanXe: An extended artificial intelligence framework to analyze and promote dance heritageDigital Applications in Archaeology and Cultural Heritage10.1016/j.daach.2024.e0034333(e00343)Online publication date: Jun-2024
https://doi.org/10.1016/j.daach.2024.e00343

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents