Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2124295.2124354acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Fast top-k retrieval for model based recommendation

Published: 08 February 2012 Publication History

Abstract

A crucial task in many recommender problems like computational advertising, content optimization, and others is to retrieve a small set of items by scoring a large item inventory through some elaborate statistical/machine-learned model. This is challenging since the retrieval has to be fast (few milliseconds) to load the page quickly. Fast retrieval is well studied in the information retrieval (IR) literature, especially in the context of document retrieval for queries. When queries and documents have sparse representation and relevance is measured through cosine similarity (or some variant thereof), one could build highly efficient retrieval algorithms that scale gracefully to increasing item inventory. The key components exploited by such algorithms is sparse query-document representation and the special form of the relevance function. Many machine-learned models used in modern recommender problems do not satisfy these properties and since brute force evaluation is not an option with large item inventory, heuristics that filter out some items are often employed to reduce model computations at runtime.
In this paper, we take a two-stage approach where the first stage retrieves top-K items using our approximate procedures and the second stage selects the desired top-k using brute force model evaluation on the K retrieved items. The main idea of our approach is to reduce the first stage to a standard IR problem, where each item is represented by a sparse feature vector (a.k.a. the vector-space representation) and the query-item relevance score is given by vector dot product. The sparse item representation is learnt to closely approximate the original machine-learned score by using retrospective data. Such a reduction allows leveraging extensive work in IR that resulted in highly efficient retrieval systems. Our approach is model-agnostic, relying only on data generated from the machine-learned model. We obtain significant improvements in the computational cost vs. accuracy tradeoff compared to several baselines in our empirical evaluation on both synthetic models and on a click-through (CTR) model used in online advertising.

Supplementary Material

JPG File (wsdm_day2_session3_2.jpg)
MP4 File (wsdm_day2_session3_2.mp4)

References

[1]
D. Agarwal, B.-C. Chen, P. Elango, N. Motgi, S.-T. Park, R. Ramakrishnan, S. Roy, and J. Zachariah. Online models for content optimization. In NIPS, pages 17--24, 2008.
[2]
A. Broder. Computational advertising and recommender systems. In Proceedings of the 2nd ACM International Conference on Recommender Systems (RecSys), 2008.
[3]
A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM, 2003.
[4]
B. B. Cambazoglu, H. Zaragoza, O. Chapelle, J. Chen, C. Liao, Z. Zheng, and J. Degenhardt. Early exit optimizations for additive machine learned ranking systems. In WSDM, pages 411--420, 2010.
[5]
W. Chu and S. T. Park. Personalized recommendation on dynamic content using predictive bilinear models. In Proceedings of the 18th international conference on World Wide Web (WWW), 2009.
[6]
P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems, RecSys '10, pages 39--46, New York, NY, USA, 2010. ACM.
[7]
A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th international conference on World Wide Web (WWW), 2007.
[8]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614--656, 2003.
[9]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. J. Mach. Learn. Res., 9:1871--1874, June 2008.
[10]
J. Friedman, T. Hastie, H. Hoefling, and R. Tibshirani. Pathwise coordinate optimization. Annals of Applied Statistics, 1(2):302--332, 2007.
[11]
S. Goel, J. Langford, and A. Strehl. Predictive indexing for fast search. In NIPS, 2008.
[12]
S. R. Gunn. Support vector machines for classification and regression. Technical report, Dept. of Electronics and Computer Science, University of Southampton, 1998.
[13]
V. Josifovski and A. Z. Broder. Web advertising. In Encyclopedia of Database Systems, pages 3457--3459. 2009.
[14]
S. Pandey, A. Z. Broder, F. Chierichetti, V. Josifovski, R. Kumar, and S. Vassilvitskii. Nearest-neighbor caching for content-match applications. In WWW, pages 441--450, 2009.
[15]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58(1):267--288, 1996.
[16]
H. Turtle and J. Flood. Query evaluation: strategies and optimizations. Inf. Process. Manage., 31(6), 1995.
[17]
K. Q. Weinberger, A. Dasgupta, J. Langford, A. J. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In ICML, page 140, 2009.

Cited By

View all
  • (2023)A Systematic Review of Fairness, Accountability, Transparency and Ethics in Information RetrievalACM Computing Surveys10.1145/3637211Online publication date: 15-Dec-2023
  • (2023)Machine Learning in Online Advertising Research: A Systematic Mapping StudyIndustry 4.0: The Power of Data10.1007/978-3-031-29382-5_16(147-160)Online publication date: 8-Jul-2023
  • (2022)The Timeliness Position Recommendation Based on Geographical Impacts and Social ImpactsGenetic and Evolutionary Computing10.1007/978-981-16-8430-2_47(515-526)Online publication date: 4-Jan-2022
  • Show More Cited By

Index Terms

  1. Fast top-k retrieval for model based recommendation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining
      February 2012
      792 pages
      ISBN:9781450307475
      DOI:10.1145/2124295
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 February 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. inverted index
      2. lasso
      3. machine learning
      4. support vector machines
      5. top-k retrieval

      Qualifiers

      • Research-article

      Conference

      Acceptance Rates

      Overall Acceptance Rate 498 of 2,863 submissions, 17%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)17
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)A Systematic Review of Fairness, Accountability, Transparency and Ethics in Information RetrievalACM Computing Surveys10.1145/3637211Online publication date: 15-Dec-2023
      • (2023)Machine Learning in Online Advertising Research: A Systematic Mapping StudyIndustry 4.0: The Power of Data10.1007/978-3-031-29382-5_16(147-160)Online publication date: 8-Jul-2023
      • (2022)The Timeliness Position Recommendation Based on Geographical Impacts and Social ImpactsGenetic and Evolutionary Computing10.1007/978-981-16-8430-2_47(515-526)Online publication date: 4-Jan-2022
      • (2022)Machine Learning Optimization in Computational Advertising—A Systematic Literature ReviewIntelligent Systems Modeling and Simulation II10.1007/978-3-031-04028-3_8(97-111)Online publication date: 13-Oct-2022
      • (2021)Point of Interest Recommendation Acceleration Using Clustering2021 IEEE 6th International Conference on Big Data Analytics (ICBDA)10.1109/ICBDA51983.2021.9402951(175-180)Online publication date: 5-Mar-2021
      • (2017)Efficient Retrieval for Contextual Advertising Utilizing Past Click Logsコンテキスト広告におけるクリックログを用いた効率的な広告引き当て手法Transactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.A-H5232:6(A-H52_1-10)Online publication date: 1-Nov-2017
      • (2017)Latency Reduction via Decision Tree Based Query ConstructionProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132865(1399-1407)Online publication date: 6-Nov-2017
      • (2017)Candidate Selection for Large Scale Personalized Search and Recommender SystemsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3082066(1391-1393)Online publication date: 7-Aug-2017
      • (2016)Fast First-Phase Candidate Generation for Cascading RankersProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911515(295-304)Online publication date: 7-Jul-2016
      • (2016)Anytime Algorithms for Recommendation Service ProvidersACM Transactions on Intelligent Systems and Technology10.1145/28354967:3(1-26)Online publication date: 11-Feb-2016
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media