research-article

Fast top-k retrieval for model based recommendation

Authors:

Deepak Agarwal,

Maxim GurevichAuthors Info & Claims

WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining

Pages 483 - 492

https://doi.org/10.1145/2124295.2124354

Published: 08 February 2012 Publication History

Get Access

Abstract

A crucial task in many recommender problems like computational advertising, content optimization, and others is to retrieve a small set of items by scoring a large item inventory through some elaborate statistical/machine-learned model. This is challenging since the retrieval has to be fast (few milliseconds) to load the page quickly. Fast retrieval is well studied in the information retrieval (IR) literature, especially in the context of document retrieval for queries. When queries and documents have sparse representation and relevance is measured through cosine similarity (or some variant thereof), one could build highly efficient retrieval algorithms that scale gracefully to increasing item inventory. The key components exploited by such algorithms is sparse query-document representation and the special form of the relevance function. Many machine-learned models used in modern recommender problems do not satisfy these properties and since brute force evaluation is not an option with large item inventory, heuristics that filter out some items are often employed to reduce model computations at runtime.

In this paper, we take a two-stage approach where the first stage retrieves top-K items using our approximate procedures and the second stage selects the desired top-k using brute force model evaluation on the K retrieved items. The main idea of our approach is to reduce the first stage to a standard IR problem, where each item is represented by a sparse feature vector (a.k.a. the vector-space representation) and the query-item relevance score is given by vector dot product. The sparse item representation is learnt to closely approximate the original machine-learned score by using retrospective data. Such a reduction allows leveraging extensive work in IR that resulted in highly efficient retrieval systems. Our approach is model-agnostic, relying only on data generated from the machine-learned model. We obtain significant improvements in the computational cost vs. accuracy tradeoff compared to several baselines in our empirical evaluation on both synthetic models and on a click-through (CTR) model used in online advertising.

Supplementary Material

JPG File (wsdm_day2_session3_2.jpg)

Download
19.47 KB

MP4 File (wsdm_day2_session3_2.mp4)

Download
90.88 MB

References

[1]

D. Agarwal, B.-C. Chen, P. Elango, N. Motgi, S.-T. Park, R. Ramakrishnan, S. Roy, and J. Zachariah. Online models for content optimization. In NIPS, pages 17--24, 2008.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Niche Product Retrieval in Top-N Recommendation

A novel log-based relevance feedback technique in content-based image retrieval

Fast Forward Index Methods for Pseudo-Relevance Feedback Retrieval

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations