research-article

Lexically-Accelerated Dense Retrieval

Authors:

Hrishikesh Kulkarni,

Sean MacAvaney,

Nazli Goharian, and

Ophir FriederAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

Pages 152 - 162

https://doi.org/10.1145/3539618.3591715

Published: 18 July 2023 Publication History

Get Access

Abstract

Retrieval approaches that score documents based on learned dense vectors (i.e., dense retrieval) rather than lexical signals (i.e., conventional retrieval) are increasingly popular. Their ability to identify related documents that do not necessarily contain the same terms as those appearing in the user's query (thereby improving recall) is one of their key advantages. However, to actually achieve these gains, dense retrieval approaches typically require an exhaustive search over the document collection, making them considerably more expensive at query-time than conventional lexical approaches. Several techniques aim to reduce this computational overhead by approximating the results of a full dense retriever. Although these approaches reasonably approximate the top results, they suffer in terms of recall -- one of the key advantages of dense retrieval. We introduce 'LADR' (Lexically-Accelerated Dense Retrieval), a simple-yet-effective approach that improves the efficiency of existing dense retrieval models without compromising on retrieval effectiveness. LADR uses lexical retrieval techniques to seed a dense retrieval exploration that uses a document proximity graph. Through extensive experiments, we find that LADR establishes a new dense retrieval effectiveness-efficiency Pareto frontier among approximate k nearest neighbor techniques. When tuned to take around 8ms per query in retrieval latency on our hardware, LADR consistently achieves both precision and recall that are on par with an exhaustive search on standard benchmarks. Importantly, LADR accomplishes this using only a single CPU -- no hardware accelerators such as GPUs -- which reduces the deployment cost of dense retrieval systems.

Supplementary Material

MP4 File (SIGIR23-frp1854.mp4)

Traditionally, lexical methods have been used to obtain retrieval results efficiently. Retrieval approaches scoring documents based on learned dense vectors outperform lexical methods by eliminating the term overlap dependency. This comes at the cost of latency as an exhaustive search over the document collection is required. Several techniques aim to reduce this computational overhead by approximating the results of a full dense retriever. Even though reasonable results are achieved, approximation methods suffer in terms of recall. We introduce LADR (Lexically-Accelerated Dense Retrieval) which improves efficiency of dense retrieval methods without compromising on retrieval effectiveness. Further, through extensive experiments we infer that LADR establishes a Pareto frontier among approximate k nearest neighbor techniques.

Download
32.53 MB

References

[1]

Negar Arabzadeh, Xinyi Yan, and Charles L. A. Clarke. 2021. Predicting Efficiency/Effectiveness Trade-Offs for Dense vs. Sparse Retrieval Strategy Selection. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM '21). Association for Computing Machinery, New York, NY, USA, 2862--2866. https://doi.org/10.1145/3459637.3482159

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Improving zero-shot retrieval using dense external expansion

Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes

Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations