Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations

Xiao, Yan; Fan, Yixing; Zhang, Ruqing; Guo, Jiafeng

doi:10.1007/978-3-031-24755-2_7

Yan Xiao⁹,
Yixing Fan⁹,
Ruqing Zhang⁹ &
…
Jiafeng Guo⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13819))

Included in the following conference series:

China Conference on Information Retrieval

308 Accesses

Abstract

Vocabulary mismatch is a central problem in information retrieval (IR), i.e., the relevant documents may not contain the same (symbolic) terms of the query. Recently, neural representations have shown great success in capturing semantic relatedness, leading to new possibilities to alleviate the vocabulary mismatch problem in IR. However, most existing efforts in this direction have been devoted to the re-ranking stage. That is to leverage neural representations to help re-rank a set of candidate documents, which are typically obtained from an initial retrieval stage based on some symbolic index and search scheme (e.g., BM25 over the inverted index). This naturally raises a question: if the relevant documents have not been found in the initial retrieval stage due to vocabulary mismatch, there would be no chance to re-rank them to the top positions later. Therefore, in this paper, we study the problem how to employ neural representations to improve the recall of relevant documents in the initial retrieval stage. Specifically, to meet the efficiency requirement of the initial stage, we introduce a neural index for the neural representations of documents, and propose two hybrid search schemes based on both neural and symbolic indices, namely the parallel search scheme and the sequential search scheme. Our experiments show that both hybrid index and search schemes can improve the recall of the initial retrieval stage with small overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Query-Space Document Representations for High-Recall Retrieval

DynamicRetriever: A Pre-trained Model-based IR System Without an Explicit Index

Article 11 January 2023

Match Your Words! A Study of Lexical Matching in Neural Information Retrieval

Notes

1.
http://lucene.apache.org.
2.
https://code.google.com/p/word2vec/.
3.
The dump from https://archive.org/download/stackexchange dated March 10th 2016.
4.
https://webscope.sandbox.yahoo.com.

References

Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article MATH Google Scholar
Boytsov, L., Novak, D., Malkov, Y., Nyberg, E.: Off the beaten path: let’s replace term-based retrieval with k-NN search. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1099–1108. ACM (2016)
Google Scholar
Brokos, G.-I., Malakasiotis, P., Androutsopoulos, I.: Using centroids of word embeddings and word mover’s distance for biomedical document retrieval in question answering. arXiv preprint arXiv:1608.03905 (2016)
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Google Scholar
Callan, J.P., Croft, W.B., Broglio, J.: TREC and TIPSTER experiments with inquery. Inf. Process. Manag. 31(3), 327–343 (1995)
Article Google Scholar
Chen, J., Fang, H.-R., Saad, Y.: Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. J. Mach. Learn. Res. 10(Sep), 1989–2012 (2009)
MATH Google Scholar
Dang, V., Bendersky, M., Croft, W.B.: Two-stage learning to rank for information retrieval. In: Serdyukov, P., et al. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 423–434. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_36
Chapter Google Scholar
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on P-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004)
Google Scholar
Dong, W., Moses, C., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web, pages 577–586. ACM (2011)
Google Scholar
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(Nov), 933–969 (2003)
MATH Google Scholar
Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2916–2929 (2013)
Article Google Scholar
Guo, J., Cai, Y., Fan, Y., Sun, F., Zhang, R., Cheng, X.: Semantic models for the first-stage retrieval: a comprehensive review. ACM Trans. Inf. Syst. 40(4), 1–42 (2022)
Article Google Scholar
Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 55–64. ACM (2016)
Google Scholar
Guo, J., Fan, Y., Ai, Q., Croft, W.B.: Semantic matching by non-linear word transportation for information retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 701–710. ACM (2016)
Google Scholar
Guo, J., et al.: A deep look into neural ranking models for information retrieval. Inf. Process. Manag. 57(6), 102067 (2020)
Google Scholar
Hajebi, K., Abbasi-Yadkori, Y., Shahbazi, H., Zhang, H.: Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, p. 1312 (2011)
Google Scholar
Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 2333–2338. ACM (2013)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142. ACM (2002)
Google Scholar
Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–202. ACM (1993)
Google Scholar
Li, H., Liu, W., Ji, H.: Two-stage hashing for fast document retrieval. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 495–500 (2014)
Google Scholar
Li, W., Zhang, Y., Sun, Y., Wang, W., Zhang, W., Lin, X.: Approximate nearest neighbor search on high dimensional data–experiments, analyses, and improvement (v1. 0). arXiv preprint arXiv:1610.02455 (2016)
Malkov, Y.A., Yashunin, D.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. arXiv preprint arXiv:1603.09320 (2016)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mitra, B., Nalisnick, E., Craswell, N., Caruana, R.: A dual embedding space model for document ranking. arXiv preprint arXiv:1602.01137 (2016)
Pang, L., Lan, Y., Guo, J., Xu, J., Cheng, X.: A deep investigation of deep IR models. arXiv preprint arXiv:1707.07700 (2017)
Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994, pp. 232–241. Springer, New York (1994). https://doi.org/10.1007/978-1-4471-2099-5_24
Chapter Google Scholar
Tellez, E.S., Chavez, E., Navarro, G.: Succinct nearest neighbor search. Inf. Syst. 38(7), 1019–1030 (2013)
Article Google Scholar
Xu, J., Wu, W., Li, H., Xu, G.: A kernel approach to addressing term mismatch. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 153–154. ACM (2011)
Google Scholar

Download references

Acknowledgement

This work was funded by the National Natural Science Foundation of China (NSFC) under Grants No. 61902381, No. 62006218, and No. 61732008, the Youth Innovation Promotion Association CAS under Grants No. 2021100, and 20144310, the Young Elite Scientist Sponsorship Program by CAST under Grants No. YESS20200121, the Lenovo- CAS Joint Lab Youth Scientist Project.

Author information

Authors and Affiliations

CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Yan Xiao, Yixing Fan, Ruqing Zhang & Jiafeng Guo

Authors

Yan Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Yixing Fan
View author publications
You can also search for this author in PubMed Google Scholar
Ruqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiafeng Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yixing Fan .

Editor information

Editors and Affiliations

Dingxin Building C403, Jilin University, Changchun, China
Yi Chang
College of Computer Science and Engineering, Chongqing University of Technology, Chongqing, China
Xiaofei Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiao, Y., Fan, Y., Zhang, R., Guo, J. (2023). Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations. In: Chang, Y., Zhu, X. (eds) Information Retrieval. CCIR 2022. Lecture Notes in Computer Science, vol 13819. Springer, Cham. https://doi.org/10.1007/978-3-031-24755-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-24755-2_7
Published: 03 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24754-5
Online ISBN: 978-3-031-24755-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Query-Space Document Representations for High-Recall Retrieval

DynamicRetriever: A Pre-trained Model-based IR System Without an Explicit Index

Match Your Words! A Study of Lexical Matching in Neural Information Retrieval

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Query-Space Document Representations for High-Recall Retrieval

DynamicRetriever: A Pre-trained Model-based IR System Without an Explicit Index

Match Your Words! A Study of Lexical Matching in Neural Information Retrieval

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation