abstract

Jointly Optimize Capacity, Latency and Engagement in Large-scale Recommendation Systems

Authors:

Nan Du,

Qi HuAuthors Info & Claims

RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems

Pages 559 - 561

https://doi.org/10.1145/3460231.3474606

Published: 13 September 2021 Publication History

Get Access

Abstract

As the recommendation systems behind commercial services scale up and apply more and more sophisticated machine learning models, it becomes important to optimize computational cost (capacity) and runtime latency, besides the traditional objective of user engagement. Caching recommended results and reusing them later is a common technique used to reduce capacity and latency. However, the standard caching approach negatively impacts user engagement. To overcome the challenge, this paper presents an approach to optimizing capacity, latency and engagement simultaneously. We propose a smart caching system including a lightweight adjuster model to refresh the cached ranking scores, achieving significant capacity savings without impacting ranking quality. To further optimize latency, we introduce a prefetching strategy which leverages the smart cache. Our production deployment on Facebook Marketplace demonstrates that the approach reduces capacity demand by 50% and p75 end-to-end latency by 35%. While Facebook Marketplace is used as a case study, the approach is applicable to other industrial recommendation systems as well.

Supplementary Material

MP4 File (recsys.mp4)

As recommendation systems leverage sophisticated machine learning models and scale up for users, it becomes important to optimize computational cost (capacity) and runtime latency, besides the traditional objective of user engagement. Caching and reusing recommendations is a common technique used to reduce capacity and latency. However, the standard caching approach has a large negative impact on engagement. To overcome the challenge, we present an approach to jointly optimize capacity, latency and engagement. We propose a smart caching system including a lightweight ML model to refresh the cached ranking scores, achieving significant capacity savings without impacting ranking quality. To further optimize latency, we introduce a prefetching technique leveraging the smart cache. In production deployment on Facebook Marketplace, capacity reduced by 50% and p75 latency reduced by 35%. While Facebook Marketplace is used as a case study, the approach is applicable to most other recommendation systems as well.

Download
15.30 MB

References

[1]

Livia Elena Chatzieleftheriou, Merkourios Karaliopoulos, and Iordanis Koutsopoulos. 2017. Caching-aware recommendations: Nudging user preferences towards better caching performance. In IEEE Conference on Computer Communications, INFOCOM, Atlanta, GA, USA. 1–9.

Crossref

Google Scholar

[2]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA. ACM, 191–198.

Digital Library

Google Scholar

[3]

Viet Ha-Thuc, Matthew Wood, Yunli Liu, and Jagadeesan Sundaresan. 2021. From Producer Success to Retention: a New Role of Search and Recommendation Systems on Marketplaces. In The 44th International ACM Conference on Research and Development in Information Retrieval, ACM SIGIR.

Digital Library

Google Scholar

[4]

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. CoRR abs/1503.02531(2015).

Google Scholar

[5]

Dilip Kumar Krishnappa, Michael Zink, Carsten Griwodz, and Pål Halvorsen. 2015. Cache-Centric Video Recommendation: An Approach to Improve the Efficiency of YouTube Caches. ACM Trans. Multim. Comput. Commun. Appl. 11, 4 (2015), 48:1–48:20.

Digital Library

Google Scholar

[6]

Benjamin Letham and Eytan Bakshy. 2019. Bayesian Optimization for Policy Search via Online-Offline Experimentation. J. Mach. Learn. Res. 20(2019), 145:1–145:30.

Google Scholar

Recommendations

Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer Architecture

On-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches ...
Modeling LRU cache with invalidation

Least Recently Used (LRU) is a very popular caching replacement policy. It is very easy to implement and offers good performance, especially when data requests are temporally correlated, as in the case of web traffic.When the data content can change ...
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Off-chip main memory has long been a bottleneck for system performance. With increasing memory pressure due to multiple on-chip cores, effective cache utilization is important. In a system with limited cache space, we would ideally like to prevent 1) ...

Comments

Information & Contributors

Information

Published In

RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems

September 2021

883 pages

ISBN:9781450384582

DOI:10.1145/3460231

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2021

Check for updates

Author Tags

Qualifiers

Abstract
Research
Refereed limited

Conference

RecSys '21

Sponsor:

RecSys '21: Fifteenth ACM Conference on Recommender Systems

September 27 - October 1, 2021

Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
437
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

Supplementary Material

References

Recommendations

Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies

Modeling LRU cache with invalidation

The evicted-address filter: a unified mechanism to address both cache pollution and thrashing

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Get Access

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations