Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations

  • Conference paper
  • First Online:
Information Retrieval (CCIR 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13819))

Included in the following conference series:

  • 308 Accesses

Abstract

Vocabulary mismatch is a central problem in information retrieval (IR), i.e., the relevant documents may not contain the same (symbolic) terms of the query. Recently, neural representations have shown great success in capturing semantic relatedness, leading to new possibilities to alleviate the vocabulary mismatch problem in IR. However, most existing efforts in this direction have been devoted to the re-ranking stage. That is to leverage neural representations to help re-rank a set of candidate documents, which are typically obtained from an initial retrieval stage based on some symbolic index and search scheme (e.g., BM25 over the inverted index). This naturally raises a question: if the relevant documents have not been found in the initial retrieval stage due to vocabulary mismatch, there would be no chance to re-rank them to the top positions later. Therefore, in this paper, we study the problem how to employ neural representations to improve the recall of relevant documents in the initial retrieval stage. Specifically, to meet the efficiency requirement of the initial stage, we introduce a neural index for the neural representations of documents, and propose two hybrid search schemes based on both neural and symbolic indices, namely the parallel search scheme and the sequential search scheme. Our experiments show that both hybrid index and search schemes can improve the recall of the initial retrieval stage with small overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://lucene.apache.org.

  2. 2.

    https://code.google.com/p/word2vec/.

  3. 3.

    The dump from https://archive.org/download/stackexchange dated March 10th 2016.

  4. 4.

    https://webscope.sandbox.yahoo.com.

References

  1. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MATH  Google Scholar 

  2. Boytsov, L., Novak, D., Malkov, Y., Nyberg, E.: Off the beaten path: let’s replace term-based retrieval with k-NN search. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1099–1108. ACM (2016)

    Google Scholar 

  3. Brokos, G.-I., Malakasiotis, P., Androutsopoulos, I.: Using centroids of word embeddings and word mover’s distance for biomedical document retrieval in question answering. arXiv preprint arXiv:1608.03905 (2016)

  4. Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)

    Google Scholar 

  5. Callan, J.P., Croft, W.B., Broglio, J.: TREC and TIPSTER experiments with inquery. Inf. Process. Manag. 31(3), 327–343 (1995)

    Article  Google Scholar 

  6. Chen, J., Fang, H.-R., Saad, Y.: Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. J. Mach. Learn. Res. 10(Sep), 1989–2012 (2009)

    MATH  Google Scholar 

  7. Dang, V., Bendersky, M., Croft, W.B.: Two-stage learning to rank for information retrieval. In: Serdyukov, P., et al. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 423–434. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_36

    Chapter  Google Scholar 

  8. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on P-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004)

    Google Scholar 

  9. Dong, W., Moses, C., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web, pages 577–586. ACM (2011)

    Google Scholar 

  10. Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(Nov), 933–969 (2003)

    MATH  Google Scholar 

  11. Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2916–2929 (2013)

    Article  Google Scholar 

  12. Guo, J., Cai, Y., Fan, Y., Sun, F., Zhang, R., Cheng, X.: Semantic models for the first-stage retrieval: a comprehensive review. ACM Trans. Inf. Syst. 40(4), 1–42 (2022)

    Article  Google Scholar 

  13. Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 55–64. ACM (2016)

    Google Scholar 

  14. Guo, J., Fan, Y., Ai, Q., Croft, W.B.: Semantic matching by non-linear word transportation for information retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 701–710. ACM (2016)

    Google Scholar 

  15. Guo, J., et al.: A deep look into neural ranking models for information retrieval. Inf. Process. Manag. 57(6), 102067 (2020)

    Google Scholar 

  16. Hajebi, K., Abbasi-Yadkori, Y., Shahbazi, H., Zhang, H.: Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, p. 1312 (2011)

    Google Scholar 

  17. Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 2333–2338. ACM (2013)

    Google Scholar 

  18. Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142. ACM (2002)

    Google Scholar 

  19. Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–202. ACM (1993)

    Google Scholar 

  20. Li, H., Liu, W., Ji, H.: Two-stage hashing for fast document retrieval. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 495–500 (2014)

    Google Scholar 

  21. Li, W., Zhang, Y., Sun, Y., Wang, W., Zhang, W., Lin, X.: Approximate nearest neighbor search on high dimensional data–experiments, analyses, and improvement (v1. 0). arXiv preprint arXiv:1610.02455 (2016)

  22. Malkov, Y.A., Yashunin, D.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. arXiv preprint arXiv:1603.09320 (2016)

  23. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  24. Mitra, B., Nalisnick, E., Craswell, N., Caruana, R.: A dual embedding space model for document ranking. arXiv preprint arXiv:1602.01137 (2016)

  25. Pang, L., Lan, Y., Guo, J., Xu, J., Cheng, X.: A deep investigation of deep IR models. arXiv preprint arXiv:1707.07700 (2017)

  26. Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994, pp. 232–241. Springer, New York (1994). https://doi.org/10.1007/978-1-4471-2099-5_24

    Chapter  Google Scholar 

  27. Tellez, E.S., Chavez, E., Navarro, G.: Succinct nearest neighbor search. Inf. Syst. 38(7), 1019–1030 (2013)

    Article  Google Scholar 

  28. Xu, J., Wu, W., Li, H., Xu, G.: A kernel approach to addressing term mismatch. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 153–154. ACM (2011)

    Google Scholar 

Download references

Acknowledgement

This work was funded by the National Natural Science Foundation of China (NSFC) under Grants No. 61902381, No. 62006218, and No. 61732008, the Youth Innovation Promotion Association CAS under Grants No. 2021100, and 20144310, the Young Elite Scientist Sponsorship Program by CAST under Grants No. YESS20200121, the Lenovo- CAS Joint Lab Youth Scientist Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yixing Fan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xiao, Y., Fan, Y., Zhang, R., Guo, J. (2023). Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations. In: Chang, Y., Zhu, X. (eds) Information Retrieval. CCIR 2022. Lecture Notes in Computer Science, vol 13819. Springer, Cham. https://doi.org/10.1007/978-3-031-24755-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24755-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24754-5

  • Online ISBN: 978-3-031-24755-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics