Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

Published: 09 April 2024 Publication History

Abstract

Recent years have witnessed a substantial increase in the use of deep learning to solve various natural language processing (NLP) problems. Early deep learning models were constrained by their sequential or unidirectional nature, such that they struggled to capture the contextual relationships across text inputs. The introduction of bidirectional encoder representations from transformers (BERT) leads to a robust encoder for the transformer model that can understand the broader context and deliver state-of-the-art performance across various NLP tasks. This has inspired researchers and practitioners to apply BERT to practical problems, such as information retrieval (IR). A survey that focuses on a comprehensive analysis of prevalent approaches that apply pretrained transformer encoders like BERT to IR can thus be useful for academia and the industry. In light of this, we revisit a variety of BERT-based methods in this survey, cover a wide range of techniques of IR, and group them into six high-level categories: (i) handling long documents, (ii) integrating semantic information, (iii) balancing effectiveness and efficiency, (iv) predicting the weights of terms, (v) query expansion, and (vi) document expansion. We also provide links to resources, including datasets and toolkits, for BERT-based IR systems. Additionally, we highlight the advantages of employing encoder-based BERT models in contrast to recent large language models like ChatGPT, which are decoder-based and demand extensive computational resources. Finally, we summarize the comprehensive outcomes of the survey and suggest directions for future research in the area.

References

[1]
A. Abolghasemi, S. Verberne, and L. Azzopardi. 2022. Improving BERT-based query-by-document retrieval with multi-task optimization. In European Conference on Information Retrieval. Springer, Cham.
[2]
A. Babashzadeh, M. Daoud, and J. X. Huang. 2023. Using semantic-based association rule mining for improving clinical text retrieval. In Health Information Science: Second International Conference, HIS 2013. Springer, Berlin, 2013.
[3]
A. Baddeley. 1992. Working memory. Science 255, 5044 (1992), 556–559.
[4]
A. Chowdhery, S. Narang, J. Devlin, M. Bosma et al. 2023. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 24, 240 (2023), 1–113.
[5]
A. Graves and J. Schmidhuber. 2005. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18, 5-6 (2005), 602–610.
[6]
A. H. Lashkari, F. Mahdavi, and V. Ghomi. 2009. A boolean model in information retrieval for search engines. In Proceedings of the International Conference on Information Management and Engineering. IEEE, 385–389.
[7]
A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi. 2020. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53, 8 (2020), 5455–5516.
[8]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.
[9]
A. Mallia, O. Khattab, T. Suel, and N. Tonellotto. 2021. Learning passage impacts for inverted indexes. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1723–1727.
[10]
A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. 2018. Improving language understanding by generative pre-training.
[11]
A. See, P. J. Liu, and C. D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1073-1083.
[12]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In Proceedings of Advances in Neural Information Processing Systems, 30.
[13]
A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv:1804.07461. Retrieved from https://arxiv.org/abs/1804.07461
[14]
A. Yates, S. Arora, X. Zhang, W. Yang, K. M. Jose, and J. Lin. 2020. Capreolus: A toolkit for end-to-end neural ad hoc retrieval. In Proceedings of the 13th International Conference on Web Search and Data Mining. 861–864.
[15]
A. Yadav and D. K. Vishwakarma. 2020. Sentiment analysis using deep learning architectures: A review. Artif. Intell. Rev. 53, 6 (2020), 4335–4385.
[16]
B. He, J. X. Huang, and X. Zhou. 2011. Modeling term proximity for probabilistic information retrieval models. Inf. Sci. 181, 14 (2011), 3017–3031.
[17]
B. Hu, Z. Lu, H. Li, and Q. Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Volume 2. 2042–2050.
[18]
C. Xiong, Z. Dai, and J. Callan. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 55–64.
[19]
B. McCann, J. Bradbury, C. Xiong, and R. Socher. 2017. Learned in translation: Contextualized word vectors. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6297–6308.
[20]
B. Mitra, F. Diaz, and N. Craswell. 2017. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web. 1291–1299.
[21]
B. Mitra and N. Craswell. 2017. Neural models for information retrieval. arXiv:1705.01509. Retrieved from https://arxiv.org/abs/1705.01509
[22]
C. Macdonald and N. Tonellotto. 2020. Declarative experimentation in information retrieval using PyTerrier. In Proceedings of the ACM SIGIR on International Conference on Theory of Information Retrieval. 161–168.
[23]
C. Li, A. Yates, S. MacAvaney, B. He, and Y. Sun. 2020. PARADE: Passage representation aggregation for document reranking. arXiv:2008.09093. Retrieved from https://arxiv.org/abs/2008.09093
[24]
C. Luo, Y. Zheng, J. Mao, Y. Liu, M. Zhang, and S. Ma. 2017. Training deep ranking model with weak relevance labels. In Australasian Database Conference. Springer, Cham, 205–216.
[25]
C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv:1910.10683. Retrieved from https://arxiv.org/abs/1910.10683
[26]
D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco, R. S John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, Y.-H. Sung, B. Strope, and R. Kurzweil. 2018. Universal sentence encoder. arXiv: 1803.11175. Retrieved from https://arxiv.org/abs/1803.11175
[27]
D. Hiemstra. 2001. Using language models for information retrieval. Taaluitgeverij Neslia Paniculata, CTIT Ph.D. Thesis Series No. 01-32.
[28]
D. W. Otter, J. R. Medina, and J. K. Kalita. 2020. A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32, 2 (2020), 604–624.
[29]
E. Alsentzer, J. R. Murphy, W. Boag, W. H. Weng, D. Jin, T. Naumann, and M. McDermott. 2019. Publicly available clinical BERT embeddings. arXiv:1904.03323. Retrieved from https://arxiv.org/abs/1904.03323
[30]
E. Kamalloo, X. Zhang, O. Ogundepo, N. Thakur, D. Alfonso-Hermelo, M. Rezagholizadeh, and J. Lin. 2023. Evaluating embedding APIs for information retrieval. arXiv:2305.06300. Retrieved from https://arxiv.org/abs/2305.06300
[31]
E. M. Voorhees. 2004. Overview of the TREC 2004 robust track. In Proceedings of the 23rd Text REtrieval Conference (TREC ’04). 52–69.
[32]
G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, B. Kingsbury, and T. Sainath. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Sign. Process. Mag. 29, 6 (2012), 82–97.
[33]
G. Salton, A. Wong, and C. S. A. Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613–620.
[34]
G. Salton, E. A. Fox, and H. Wu. 1983. Extended boolean information retrieval. Commun. ACM. 26, 11 (1983) 1022–36.
[35]
H. C. Yu, C. Xiong, and J. Callan. 2021. Improving query representations for dense retrieval with pseudo-relevance feedback. arXiv:2108.13454. Retrieved from https://arxiv.org/abs/2108.13454
[36]
H. Gharagozlou, J. Mohammadzadeh, A. Bastanfard, and S. S. Ghidary. 2022. RLAS-BIABC: A reinforcement learning-based answer selection using the BERT model boosted by an improved ABC algorithm. Comput. Intell. Neurosci. (2022).
[37]
H. Schütze, C. D. Manning, and P. Raghavan. 2008. Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK.
[38]
H. Touvron, T. Lavril, G. Izacard, X. Martinet et al. 2023. Llama: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https://arxiv.org/abs/2302.13971
[39]
H. Touvron, L. Martin, K. Stone, P. Albert et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https://arxiv.org/abs/2307.09288
[40]
H. Zamani, and W. B. Croft. 2018. On the theory of weak supervision for information retrieval. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 147–154.
[41]
H. Zamani, W. B. Croft, and J. S. Culpepper. 2018. Neural query performance prediction using weak supervision from multiple signals. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 105–114.
[42]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2020. Generative adversarial nets. Commun. ACM 63, 11 (2020), 139–144.
[43]
I. Jahan, M. T. R. Laskar, C. Peng, J. X. Huang. 2022. Evaluation of ChatGPT on biomedical tasks: A zero-shot comparison with fine-tuned generative transformers. In Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks. 326–336.
[44]
J. Devlin, M. W Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.
[45]
J. Guo, Y. Fan, X. Ji, and X. Cheng. 2019. Matchzoo: A learning, practicing, and developing system for neural text matching. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1297–1300.
[46]
J. Guo, Y. Fan, Q. Ai, and W. B. Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 55–64.
[47]
J. Guo, Y. Fan, L. Pang, L. Yang, Q. Ai, H. Zamani, C. Wu, W. B. Croft, and X. Cheng. 2020. A deep look into neural ranking models for information retrieval. Inf. Process. Manage. 57, 6 (2020), 102067.
[48]
J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang. 2020. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.
[49]
J. Lin, X. Ma, S. C. Lin, J. H. Yang, R. Pradeep, and R. Nogueira. 2021. Pyserini: An easy-to-use python toolkit to support replicable ir research with sparse and dense representations. arXiv:2102.10073. Retrieved from https://arxiv.org/abs/2102.10073
[50]
J. Lin, R. Nogueira, and A. Yates. 2021. Pretrained transformers for text ranking: BERT and beyond. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining.
[51]
J. Pennington, R. Socher, and C. D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’14).
[52]
J. Lin, M. Efron, Y. Wang, and G. Sherman. 2014. Overview of the TREC-2014 microblog track. In Proceedings of the 23rd Text REtrieval Conference (TREC ’14).
[53]
J. Ponte and W. B. Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference (1998), 275–281.
[54]
J. X. Huang, F. Peng, D. Schuurmans, and N. Cercone. 2003. Applying machine learning to text segmentation for information retrieval. Inf. Retriev. 6, 3 (2003), 333–362.
[55]
J. X. Huang and Q. Hu. 2009. A bayesian learning approach to promoting diversity in ranking for biomedical information retrieval. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval.
[56]
J. X. Huang, M. Zhong, and L. Si. 2005. York University at TREC 2005: Genomics Track. In TREC. 2005.
[57]
J. X. Huang, J. Miao, and B. He. 2013. High performance query expansion using adaptive co-training. Inf. Process. Manage. 49, 2 (2013), 441–453.
[58]
J. Zhan, J. Mao, Y. Liu, M. Zhang, and S. Ma. 2020. RepBERT: Contextualized text embeddings for first-stage retrieval. 2020. arXiv:2006.15498. Retrieved from https://arxiv.org/abs/2006.15498
[59]
J. Zhao, J. X. Huang, and Z. Ye. 2014. Modeling term associations for probabilistic information retrieval. ACM Trans. Inf. Syst. 32, 2 (2014), 1–47.
[60]
J. Zhao, J. X. Huang, and B. He. 2011. CRTER: Using cross terms to enhance probabilistic information retrieval. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[61]
K. D. Onal, Y. Zhang, I. S. Altingovde, M. M. Rahman, P. Karagoz, A. Braylan, B. Dang, H. L. Chang, H. Kim, Q. McNamara, and A. Angert. 2018. Neural information retrieval: At the end of the early years. Inf. Retriev. J. 21, 2 (2018) 111–182.
[62]
K. Clark, M. T. Luong, Q. V. Le, and C. D. Manning. 2020. ELECTRA: Pre-training text encoders as discriminators rather than generatorsg. In Proceedings of International Conference on Learning Representations (ICLR ’20).
[63]
K. Hui, A. Yates, K. Berberich, and G. Melo. 2018. Co-PACRR: A context-aware neural IR model for ad-hoc retrieval. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 279–287.
[64]
K. Keyvan and J. X. Huang. 2022. How to approach ambiguous queries in conversational search: A survey of techniques, approaches, tools, and challenges. ACM Comput. Surv. 55, 6 (2022), 1–40.
[65]
K. Zhang, C. Xiong, Z. Liu, and Z. Liu. 2020. Selective weak supervision for neural information retrieval. In Proceedings of The Web Conference. 474–485.
[66]
L. Dietz, M. Verma, F. Radlinski, and N. Craswell. 2017. TREC complex answer retrieval overview. In Proceedings of the Text REtrieval Conference (TREC ’17).
[67]
L. Liu, M. Li, J. Lin, S. Riedel, and P. Stenetorp. 2022. Query expansion Using contextual clue sampling with language models. arXiv:2210.07093. Retrieved from https://arxiv.org/abs/2210.07093
[68]
L. Pang, Y. Lan, J. Guo, J. Xu, S. Wan, and X. Cheng. 2016. Text matching as image recognition. In Proceedings of the AAAI Conference on Artificial Intelligence. 2793–2799.
[69]
L. Xiong, C. Xiong, Y. Li, K. Tang, and J. Liu. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv:2007.00808. Retrieved from https://arxiv.org/abs/2007.000808
[70]
M. Dehghani, A. Severyn, S. Rothe, and J. Kamps. 2017. Learning to learn from weak supervision by full supervision. arXiv:1711.11383. Retrieved from https://arxiv.org/abs/1711.11383
[71]
M. Ding, C. Zhou, H. Yang, and J. Tang. 2020. Cogltx: Applying bert to long texts. In Advances in Neural Information Processing Systems, Vol. 33, 12792–12804.
[72]
M. Gaur, K. Gunaratna, V. Srinivasan, and H. Jin. 2022. Iseeq: Information seeking question generation using dynamic meta-information retrieval and knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence. 10672–10680.
[73]
M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. 2018. Deep contextualized word representation. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2227–2237.
[74]
M. Li and E. Gaussier. 2021. KeyBLD: Selecting key blocks with local pre-ranking for long document information retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2207–2211.
[75]
M. Li and E. Gaussier. 2022. BERT-based dense intra-ranking and contextualized late interaction via multi-task learning for long document retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[76]
M. Lupu, J. X. Huang, J. Zhu, and J. Tait. 2009. TREC-CHEM: Large scale chemical information retrieval evaluation at TREC. In Proceedings of the ACM SIGIR Forum. 63–70.
[77]
M. Lupu, J. X. Huang, J. Zhu, and J. Tait. 2009. Overview of the TREC 2009 chemical IR track. In Proceedings of the Text REtrieval Conference (TREC ’09).
[78]
M. Pan, J. Wang, J. X. Huang, A. J. Huang, Q. Chen, and J. Chen. 2022. A probabilistic framework for integrating sentence-level semantics via BERT into pseudo-relevance feedback. Inf. Process. Manage. 59, 1 (2022), 102734.
[79]
M. Pan, Q. Pei, Y. Liu, T. Li, E. A. Huang, J. Wang, and J. X. Huang. 2023. SPRF: A semantic pseudo-relevance feedback enhancement for information retrieval via ConceptNet. Knowl.-Bas. Syst. 274, 110602 (2023).
[80]
M. T. R. Laskar, M. S. Bari, M. Rahman, M. A. H. Bhuiyan, S. Joty, and J. X. Huang. 2023. A systematic study and comprehensive evaluation of ChatGPT on benchmark datasets. In Findings of the Association for Computational Linguistics. 431–469.
[81]
M. T. R. Laskar, J. X. Huang, and E. Hoque. 2020. Contextualized embeddings-based transformer encoder for sentence similarity modeling in answer selection task. In Proceedings of the 12th Language Resources and Evaluation Conference.
[82]
M. T. R. Laskar, E. Hoque, and J. X. Huang. 2022. Domain adaptation with pre-trained transformers for query-focused abstractive text summarization. Comput. Ling. 48, 2 (2022), 279–320.
[83]
M. T. R. Laskar, E. Hoque, and J. X. Huang. 2020. WSL-DS: Weakly supervised learning with distant supervision for query-focused multi-document abstractive summarization. In Proceedings of the 28th International Conference on Computational Linguistics. 5647–5654.
[84]
M. Yan, C. Li, C. Wu, B. Bi, W. Wang, J. Xia, and L. Si. 2019. IDST at TREC 2019 deep learning track: Deep cascade ranking with generation-based document expansion and pre-trained language modeling. In Proceedings of the Text REtrieval Conference (TREC ’19).
[85]
N. Reimers and I. Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-networks. arXiv:1908.10084. Retrieved from https://arxiv.org/abs/1908.10084
[86]
O. Khattab and M. Zaharia. 2020. ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 39–48.
[87]
R. OpenAI. 2023. GPT-4 technical report. arXiv:2303.08774. View in Article, 2, 13.
[88]
P. Lewis, E. Perez, A. Piktus, F. Petroni et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, Vol. 33, 9459–9474.
[89]
P. S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 2333–2338.
[90]
P. Shi and J. Lin. 2019. Cross-lingual relevance transfer for document retrieval. arXiv:1911.02989. Retrieved from https://arxiv.org/abs/1911.02989
[91]
P. Punyani, R. Gupta, and A. Kumar. 2020. Neural networks for facial age estimation: A survey on recent advances. Artif. Intell. Rev. 53, 5 (2020), 3299–3347.
[92]
P. Xu, X. Ma, R. Nallapati, and B. Xiang. 2019. Passage ranking with weak supervision. arXiv:1905.05910. Retrieved from https://arxiv.org/abs/1905.05910
[93]
P. Yang, H. Fang, and J. Lin. 2017. Anserini: Enabling the use of lucene for information retrieval research. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1253–1256.
[94]
Q. Chen, Q. Hu, J. X. Huang, L. He, and W. An. 2017. Enhancing recurrent neural networks with positional attention for question answering. 2017. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[95]
R. Anil, A. M. Dai, O. Firat, M. Johnson, D. Lepikhin et al. 2023. Palm 2 technical report. arXiv:2305.10403. Retrieved from https://arxiv.org/abs/2305.10403
[96]
R. Padaki, Z. Dai, and J. Callan. 2020. Rethinking query expansion for BERT reranking. In European Conference on Information Retrieval. Springer, Cham, 297–304.
[97]
R. Nogueira, J. Lin, and A. I. Epistemic. 2019. From doc2query to docTTTTTquery. Online preprint, 6, 2.
[98]
R. Nogueira and K. Cho. 2019. Passage Re-ranking with BERT. arXiv:1901.04085. Retrieved from https://arxiv.org/abs/1901.04085
[99]
R. Nogueira, W. Yang, K. Cho, and J. Lin. 2019. Multi-stage document ranking with BERT. arXiv:1904.08375. Retrieved from https://arxiv.org/abs/1904.08375
[100]
R. Nogueira, W. Yang, J. Lin, and K. Cho. 2019. Document expansion by query prediction. arXiv:1904.08375. Retrieved from https://arxiv.org/abs/1904.08375
[101]
R. Sennrich, H. Barry, and B. Alexandra. 2015. Neural machine translation of rare words with subword units. arXiv:1508.07909. Retrieved from https://arxiv.org/abs/1508.07909
[102]
R. Zhu, X. Tu, and J. X. Huang. 2020. Deep learning on information retrieval and its applications. In Deep Learning for Data Analytics. Academic Press, San Diego, CA, 125–153.
[103]
S. C. Lin, J. H. Yang, and J. Lin. 2020. Distilling dense representations for ranking using tightly-coupled teachers. arXiv:2010.11386. Retrieved from https://arxiv.org/abs/2010.11386
[104]
S. Hofstätter, M. Zlabinger, and A. Hanbury. 2019. TU Wien TREC deep learning’19—Simple contextualization for re-ranking. arXiv:1912.01385. Retrieved from https://arxiv.org/abs/1912.01385
[105]
S. Hofstätter, M. Zlabinger, and A. Hanbury. 2020. Interpretable & time-budget-constrained contextualization for re-ranking. arXiv:2002.01854. Retrieved from https://arxiv.org/abs/2002.01854
[106]
S. Hofstätter, H. Zamani, B. Mitra, N. Craswell, and A. Hanbury. 2020. Local self-attention over long text for efficient document retrieval. In Proceedings of the 43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). 2021–2024.
[107]
S. Hofstätter, S. C. Lin, J. H. Yang, J. Lin, and A. Hanbury. 2021. Efficiently teaching an effective dense retriever with balanced topic aware sampling. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 113–122.
[108]
S. Hofstätter, B. Mitra, H. Zamani, N. Craswell, and A. Hanbury. 2021. Intra-document cascading: Learning to select passages for neural document ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1349–1358.
[109]
S. MacAvaney. 2020. OpenNIR: A complete neural ad-hoc ranking pipeline. In Proceedings of the 13th International Conference on Web Search and Data Mining. 845–848.
[110]
S. MacAvaney, F. M. Nardini, R. Perego, N. Tonellotto, N. Goharian, and O. Frieder. 2020. Expansion via prediction of importance with contextualization. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1573–1576.
[111]
S. MacAvaney, A. Yates, A. Cohan, and N. Goharian. 2019. CEDR: Contextualized embeddings for document ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1101–1104.
[112]
S. Macavaney, F. M. Nardini, R. Perego, N. Tonellotto, N. Goharian, and O. Frieder. 2020. Training curricula for open domain answer re-ranking. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 529–538.
[113]
S. Naseri, J. Dalton, A. Yates, and J. Allan. 2021. CEQE: Contextualized embeddings for query expansion. In European Conference on Information Retrieval. Springer, Cham, 467–482.
[114]
S. Robertson and H. Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retriev. 3, 4 (2009), 333–389.
[115]
S. Wang, S. Zhuang, and G. Zuccon. 2021. BERT-based dense retrievers require interpolation with BM25 for effective passage retrieval. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 317–324. DOI:
[116]
S. Wan, Y. Lan, J. Xu, J. Guo, L. Pang, and X. Cheng. 2016. Match-SRNN: modeling the recursive matching structure with spatial RNN. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2922–2928.
[117]
S. Wan, Y. Lan, J. Guo, J. Xu, L. Pang, and X. Cheng. 2016. A deep architecture for semantic matching with multiple positional sentence representations. In Proceedings of the AAAI Conference on Artificial Intelligence.
[118]
S. Zhang, L. Yao, A. Sun, and Y. Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv. 52, 1 (2019), 1–38.
[119]
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, Vol. 33, 1877–1901.
[120]
T. Formal, B. Piwowarski, and S. Clinchant. 2021. SPLADE: Sparse lexical and expansion model for first stage ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2288–2292.
[121]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from https://arxiv.org/abs/1301.3781
[122]
T. Nguyen, M. Rosenberg, X. Song, J. Guo, S. Tiwary, R. Majumder, and L. Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. In Proceedings of the Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches Co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS ’16).
[123]
V. Karpukhin, B. O\(\breve{g}\)uz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W. Yih. 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’20).
[124]
V. Sanh, L. Debut, J. Chaumond, and T. Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv:1910.01108. Retrieved from https://arxiv.org/abs/1910.01108
[125]
W. B. Croft, D. Metzler, and T. Strohman. 2010. Search Engines: Information Retrieval in Practice. Addison-Wesley, Reading, MA.
[126]
W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi. 2017. A survey of deep neural network architectures and their applications. Neurocomputing 234 (2017), 11–26.
[127]
W. Liu, P. Zhou, Z. Zhao, Z. Wang, H. Deng, and Q. Ju. FastBERT: A self-distilling BERT with adaptive inference time. 2020. arXiv:2004.02178. Retrieved from https://arxiv.org/abs/2004.02178
[128]
W. Lu, J. Jiao, and R. Zhang. 2020. Twinbert: Distilling knowledge to twin-structured compressed bert models for large-scale retrieval. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2645–2652.
[129]
W. Sun, L. Yan, X. Ma, P. Ren, D. Yin, and Z. Ren. 2023. Is ChatGPT good at search? Investigating large language models as re-ranking agent. arXiv:2304.09542. Retrieved from https://arxiv.org/abs/2304.09542
[130]
W. Wang, B. Bi, M. Yan, C. Wu, Z. Bao, J. Xia, L. Peng, and Si L. 2019. StructBERT: Incorporating language structures into pre-training for deep language understanding. arXiv: 1908.04577. Retrieved from https://arxiv.org/abs/1908.04577
[131]
W. Yang, H. Zhang, and J. Lin. 2019. Simple applications of BERT for ad hoc document retrieval. arXiv: 1903.10972. Retrieved from https://arxiv.org/abs/1903.10972
[132]
X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, and Q. Liu. 2020. TinyBERT: Distilling BERT for natural language understanding. In Findings of the Association for Computational Linguistics: EMNLP. 4163–4174.
[133]
X. Wang, C. Macdonald, N. Tonellotto, and I. Ounis. 2021. Pseudo-relevance feedback for multiple representation dense retrieval. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 297–306.
[134]
X. Yin, J. X. Huang, Z. Li, and X. Zhou. 2013. A survival modeling approach to biomedical search result diversification using Wikipedia. IEEE Trans. Knowl. Data Eng. 25, 06 (2013), 1201–1212.
[135]
X. Zhang, A. Yates, and J. Lin. 2021. Comparing score aggregation approaches for document retrieval with pretrained transformers. In European Conference on Information Retrieval. Springer, Cham, 150–163.
[136]
Y. Bai, X. Li, G. Wang, C. Zhang, L. Shang, J. Xu, Z. Wang, F. Wang, and Q. Liu. 2020. SparTerm: Learning term-based sparse representation for fast text retrieval. arXiv:2010.00768. Retrieved from https://arxiv.org/abs/2010.00768
[137]
Y. Cui, W. Che, T. Liu, B. Qin, S. Wang, and G. Hu. 2020. Revisiting pre-trained models for Chinese natural language processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: Findings. 657–668.
[138]
Y. Lecun, Y. Bengio, and G. Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[139]
Y. Liu, J. X. Huang, A. An, and X. Yu. 2007. ARSA: A sentiment-aware model for predicting sales performance using blogs. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[140]
Y. Liu, M. Ott, N. Goyal, J. F. Du, M. Joshi, D. Q. Chen, O. Levy, M. Lewis, Zettlemoyer L, and Stoyanov V. Roberta. 2019. A robustly optimized BERT pretraining approach. arXiv:1907.11692. Retrieved from https://arxiv.org/abs/1907.11692
[141]
Y. Luan, J. Eisenstein, K. Toutanova, and M. Collins. 2021. Sparse, dense, and attentional representations for text retrieval. Trans. Assoc. Comput. Ling. 9 (2021), 329–345.
[142]
Y. Wu, M. Schuster, Z. Chen, Q. V.Le, and M. Norouzi. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from https://arxiv.org/abs/1609.08144
[143]
Z. A. Yilmaz, S. Wang, W. Yang, H. Zhang, and J. Lin. 2019. Applying BERT to document retrieval with birch. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ’19). 19–24.
[144]
Z. A. Yilmaz, W. Yang, H. Zhang, and J. Lin. 2019. Cross-domain sentence modeling for relevance transfer with BERT. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ’19).
[145]
Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. Le, and R. Salakhutdinov. 2019. Transformer-XL: Attention language models beyond a fixed-length comtext. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2978–2988.
[146]
Z. Dai and J. Callan. 2019. Deeper text understanding for IR with contextual neural language modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 985–988.
[147]
Z. Dai and J. Callan. 2020. Context-aware passage term weighting for first stage retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1533–1536.
[148]
Z. Dai and J. Callan. 2020. Context-aware document term weighting for ad-hoc search. In Proceedings of the Web Conference. 1897–1907.
[149]
Z. Dai, C. Xiong, J. Callan, and Z. Liu. 2018. Convolutional neural networks for soft-matching n-grams in ad-hoc search. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 126–134.
[150]
Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. 2019. AlBERT: A lite BERT for self-supervised learning of language representations. arXiv:1909.11942. Retrieved from https://arxiv.org/abs/1909.11942
[151]
Z. Liu, K. Zhang, C. Xiong, Z. Liu, and M. Sun. 2021. OpenMatch: An open source library for Neu-IR research. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2531–2535.
[152]
Z. Qin, R. Jagerman, K. Hui, H. Zhuang, J. Wu, J. Shen, T. Liu, J. Liu, D. Metzler, X. Wang et al. 2023. Large language models are effective text rankers with pairwise ranking prompting. arXiv:2306.17563. Retrieved from https://arxiv.org/abs/2306.17563
[153]
Z. Wu, J. Mao, Y. Liu, J. Zhan, Y. Zheng, M. Zhang, and S. Ma. 2020. Leveraging passage-level cumulative gain for document ranking. In Proceedings of the Web Conference. 2421–2431.
[154]
Z. Yang, Z. Dai, Y. Yang, and J. Carbonell. 2019. XLNET: Generalized autoregressive pretraining for language understanding. In Proceedings of Advances in Neural Information Processing Systems, Vol. 32, 5753–5763.
[155]
Z. Ye, J. X. Huang, and H. Lin. 2011. Finding a good query-related topic for boosting pseudo-relevance feedback. J. Am. Soc. Inf. Sci. Technol. 62, 4 (2011), 748–760.
[156]
Z. Zheng, K. Hui, B. He, X. Han, L. Sun, and A. Yates. 2020. BERT-QE: Contextualized query expansion for document re-ranking. arXiv:2009.07288. Retrieved from http://arxiv.org/abs/2009.07258

Cited By

View all
  • (2025)Spectral clustering and query expansion using embeddings on the graph-based extension of the set-based information retrieval modelExpert Systems with Applications10.1016/j.eswa.2024.125771263(125771)Online publication date: Mar-2025
  • (2025)PRAGyan - Connecting the Dots in TweetsSocial Networks Analysis and Mining10.1007/978-3-031-78548-1_25(338-354)Online publication date: 24-Jan-2025
  • (2024)Two-layer network evolutionary game model applied to complex systemsThe European Physical Journal B10.1140/epjb/s10051-024-00809-x97:11Online publication date: 1-Nov-2024
  • Show More Cited By

Index Terms

  1. Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 56, Issue 7
    July 2024
    1006 pages
    EISSN:1557-7341
    DOI:10.1145/3613612
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 April 2024
    Online AM: 14 February 2024
    Accepted: 31 January 2024
    Revised: 21 December 2023
    Received: 19 July 2021
    Published in CSUR Volume 56, Issue 7

    Check for updates

    Author Tags

    1. BERT
    2. information retrieval
    3. natural language processing
    4. artificial intelligence

    Qualifiers

    • Survey

    Funding Sources

    • Natural Science and Engineering Research Council (NSERC) of Canada
    • York Research Chairs (YRC)
    • Ontario Research Fund-Research Excellence (ORF-RE)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,919
    • Downloads (Last 6 weeks)236
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Spectral clustering and query expansion using embeddings on the graph-based extension of the set-based information retrieval modelExpert Systems with Applications10.1016/j.eswa.2024.125771263(125771)Online publication date: Mar-2025
    • (2025)PRAGyan - Connecting the Dots in TweetsSocial Networks Analysis and Mining10.1007/978-3-031-78548-1_25(338-354)Online publication date: 24-Jan-2025
    • (2024)Two-layer network evolutionary game model applied to complex systemsThe European Physical Journal B10.1140/epjb/s10051-024-00809-x97:11Online publication date: 1-Nov-2024
    • (2024)StructmRNA a BERT based model with dual level and conditional masking for mRNA representationScientific Reports10.1038/s41598-024-77172-514:1Online publication date: 29-Oct-2024
    • (2024)Bi-directional information fusion-driven deep network for ship trajectory prediction in intelligent transportation systemsTransportation Research Part E: Logistics and Transportation Review10.1016/j.tre.2024.103770192(103770)Online publication date: Dec-2024
    • (2024)DanXe: An extended artificial intelligence framework to analyze and promote dance heritageDigital Applications in Archaeology and Cultural Heritage10.1016/j.daach.2024.e0034333(e00343)Online publication date: Jun-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media