research-article

Hetero-Rec++: Modelling-based Robust and Optimal Deployment of Embeddings Recommendations

Authors:

Ashwin Krishnan,

Rekha SinghalAuthors Info & Claims

AIMLSystems '23: Proceedings of the Third International Conference on AI-ML Systems

Article No.: 22, Pages 1 - 9

https://doi.org/10.1145/3639856.3639878

Published: 17 May 2024 Publication History

Abstract

Deep Neural Network (DNN)-based recommendation models (RMs) are widely adopted in enterprise applications to suggest products, videos, tweets, and posts to users. These models heavily rely on embedding tables, which contain latent embedding vectors and are accessed to evaluate the probability of user interactions. Recently published, Hetero-Rec framework leverages the embedding access history to allocate storage partitions in heterogeneous memory architecture, aiming to maximize the chance of having frequently accessed embeddings available in faster memory. In this work, we extend the study of Hetero-Rec to larger embedding tables (up to 350), low-hot embedding tables, and evaluate the end-to-end speedups. Consequently, we present Hetro-Rec++ with heuristic based pre-optimizer and advanced formulation of optimizer’s cost function. Further, we demonstrate its effectiveness in reducing the embedding’s average fetch latency and hence, improving the inference latency for RMs deployed on Field Programmable Gate Arrays (FPGAs) with heterogeneous memory architectures.

References

[1]

[n.d.]. https://www.intel.com/content/www/us/en/products/sku/120473/intel-xeon-gold-5118-processor-16-5m-cache-2-30-ghz/specifications.html

[2]

[n.d.]. CriteoLabs. Terabyte click logs. https://ailab.criteo.com/download-criteo-1tb-click-logs-dataset/

[3]

Xilinx, Inc. 2017. Block Memory Generator v8.3. Xilinx, Inc. https://docs.xilinx.com/v/u/8.3-English/pg058-blk-mem-gen

[4]

Xilinx, Inc. 2021. UltraRAM Readback and Writeback v1.0. Xilinx, Inc. https://www.xilinx.com/content/dam/xilinx/support/documents/ip_documentation/uram_rd_back/v1_0/pg356-uram-rdback.pdf

[5]

Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, and Prashant J. Nair. 2021. Accelerating Recommendation System Training by Leveraging Popular Choices. arxiv:2103.00686 [cs.IR]

[6]

Alimama. 2018. Ad Display/Click Data on Taobao.com. https://tianchi.aliyun.com/dataset/dataDetail?dataId=56&lang=en-us

[7]

Xilinx, Inc. 2021. Alveo u280 data center accelerator card. Xilinx, Inc. https://www.xilinx.com/products/boards-and-kits/alveo/u280.html

[8]

Avazu. 2015. Avazu Dataset. https://www.kaggle.com/c/avazu-ctr-prediction

[9]

Jiechao Gao, Haoyu Wang, and Haiying Shen. 2020. Machine Learning Based Workload Prediction in Cloud Computing. In 2020 29th International Conference on Computer Communications and Networks (ICCCN). 1–9. https://doi.org/10.1109/ICCCN49398.2020.9209730

[10]

Priyanka Gupta, Diksha Garg, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2019. NISER: Normalized item and session representations to handle popularity bias. arXiv preprint arXiv:1909.04276 (2019).

[11]

Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S Lee, David Brooks, and Carole-Jean Wu. 2020. Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 982–995.

Digital Library

[12]

Udit Gupta, Samuel Hsia, Jeff Zhang, Mark Wilkening, Javin Pombra, Hsien-Hsin Sean Lee, Gu-Yeon Wei, Carole-Jean Wu, and David Brooks. 2021. RecPipe: Co-Designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Virtual Event, Greece) (MICRO ’21). Association for Computing Machinery, New York, NY, USA, 870–884. https://doi.org/10.1145/3466752.3480127

Digital Library

[13]

Samuel Hsia, Udit Gupta, Mark Wilkening, Carole-Jean Wu, Gu-Yeon Wei, and David Brooks. 2020. Cross-Stack Workload Characterization of Deep Recommendation Systems. https://doi.org/10.48550/ARXIV.2010.05037

[14]

Ranggi Hwang, Taehun Kim, Youngeun Kwon, and Minsoo Rhu. 2020. Centaur: A Chiplet-Based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (Virtual Event) (ISCA ’20). IEEE Press, 968–981. https://doi.org/10.1109/ISCA45697.2020.00083

Digital Library

[15]

Mohamed Assem Ibrahim, Onur Kayiran, and Shaizeen Aga. 2022. Efficient Cache Utilization via Model-Aware Data Placement for Recommendation Models. In Proceedings of the International Symposium on Memory Systems (Washington DC, DC, USA) (MEMSYS ’21). Association for Computing Machinery, New York, NY, USA, Article 2, 11 pages. https://doi.org/10.1145/3488423.3519317

Digital Library

[16]

Rishabh Jain, Scott Cheng, Vishwas Kalagi, Vrushabh Sanghavi, Samvit Kaul, Meena Arunachalam, Kiwan Maeng, Adwait Jog, Anand Sivasubramaniam, Mahmut Taylan Kandemir, and Chita R. Das. 2023. Optimizing CPU Performance for Recommendation Systems At-Scale. In Proceedings of the 50th Annual International Symposium on Computer Architecture (Orlando, FL, USA) (ISCA ’23). Association for Computing Machinery, New York, NY, USA, Article 77, 15 pages. https://doi.org/10.1145/3579371.3589112

Digital Library

[17]

Wenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B. Preußer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, and Gustavo Alonso. 2021. MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions. arxiv:2010.05894 [cs.AR]

[18]

Wenqi Jiang, Zhenhao He, Shuai Zhang, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, and Gustavo Alonso. 2021. FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters. Association for Computing Machinery, New York, NY, USA, 3097–3105. https://doi.org/10.1145/3447548.3467139

Digital Library

[19]

Liu Ke, Udit Gupta, Mark Hempstead, Carole-Jean Wu, Hsien-Hsin S. Lee, and Xuan Zhang. 2022. Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation. arxiv:2203.07424 [cs.DC]

[20]

Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: A Single-Pass Learned Index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (Portland, Oregon) (aiDM ’20). Association for Computing Machinery, New York, NY, USA, Article 5, 5 pages. https://doi.org/10.1145/3401071.3401659

Digital Library

[21]

Ashwin Krishnan, Manoj Nambiar, Nupur Sumeet, and Sana Iqbal. 2022. Performance Model and Profile Guided Design of a High-Performance Session Based Recommendation Engine. In Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering (Beijing, China) (ICPE ’22). Association for Computing Machinery, New York, NY, USA, 133–144. https://doi.org/10.1145/3489525.3511692

Digital Library

[22]

Jitendra Kumar and Ashutosh Kumar Singh. 2018. Workload prediction in cloud using artificial neural network and adaptive differential evolution. Future Generation Computer Systems 81 (2018), 41–52. https://doi.org/10.1016/j.future.2017.10.047

Digital Library

[23]

Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2019. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (Columbus, OH, USA) (MICRO ’52). Association for Computing Machinery, New York, NY, USA, 740–753. https://doi.org/10.1145/3352460.3358284

Digital Library

[24]

Chinmay Mahajan, Ashwin Krishnan, Manoj Nambiar, and Rekha Singhal. 2023. Hetero-Rec: Optimal Deployment of Embeddings for High-Speed Recommendations. In Proceedings of the Second International Conference on AI-ML Systems (Bangalore, India) (AIMLSystems ’22). Association for Computing Machinery, New York, NY, USA, Article 11, 9 pages. https://doi.org/10.1145/3564121.3564134

Digital Library

[25]

Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2020. Benchmarking Learned Indexes. Proc. VLDB Endow. 14, 1 (sep 2020), 1–13. https://doi.org/10.14778/3421424.3421425

Digital Library

[26]

Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).

[27]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

Digital Library

[28]

Geet Sethi, Bilge Acun, Niket Agarwal, Christos Kozyrakis, Caroline Trippel, and Carole-Jean Wu. 2022. RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation. https://doi.org/10.48550/ARXIV.2201.10095

[29]

Wikipedia contributors. 2021. Linear Interpolation. https://en.wikipedia.org/wiki/Linear_interpolation. Accessed September 23, 2023.

[30]

Mark Wilkening, Udit Gupta, Samuel Hsia, Caroline Trippel, Carole-Jean Wu, David Brooks, and Gu-Yeon Wei. 2021. RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA) (ASPLOS 2021). Association for Computing Machinery, New York, NY, USA, 717–729. https://doi.org/10.1145/3445814.3446763

Digital Library

[31]

Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.

Digital Library

[32]

Inc. Xilinx. 2019. Xilinx QDMA DPDK Poll Mode Driver. https://xilinx.github.io/dma_ip_drivers/2019.1/DPDK/html/index.html

[33]

Guorui Zhou, Kun Gai, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, and Han Li. 2018. Deep Interest Network for Click-Through Rate Prediction. 1059–1068. https://doi.org/10.1145/3219819.3219823

Digital Library

[34]

Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 5941–5948.

Digital Library

[35]

Y. Zhu, Z. He, W. Jiang, K. Zeng, J. Zhou, and G. Alonso. 2021. Distributed Recommendation Inference on FPGA Clusters. In 2021 31st International Conference on Field-Programmable Logic and Applications (FPL). IEEE Computer Society, Los Alamitos, CA, USA, 279–285. https://doi.org/10.1109/FPL53798.2021.00057

Index Terms

Hetero-Rec++: Modelling-based Robust and Optimal Deployment of Embeddings Recommendations
1. Computer systems organization
  1. Real-time systems
    1. Real-time system architecture
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs

Recommendations

Hetero-Rec: Optimal Deployment of Embeddings for High-Speed Recommendations
AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems

We see two trends emerging due to exponential increase in AI research- rise in adoption of AI based models in enterprise applications and development of different types of hardware accelerators with varying memory and computing architectures for ...
A robust watermarking scheme based on maximum wavelet coefficient modification and optimal threshold technique

Digital watermarking has received extensive attention as a new method for copyright protection. This paper proposes a robust watermarking algorithm based on maximum wavelet coefficient modification and optimal threshold technique. The medium wavelet ...
Novel robust image watermarking based on subsampling and DWT

This paper presents a robust digital image watermarking scheme based on subsampling and DWT. Subsampling is firstly used to construct a subimage sequence as a video segment. Then, a random watermark sequence satisfied with Gaussian distribution is block-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIMLSystems '23: Proceedings of the Third International Conference on AI-ML Systems

October 2023

381 pages

ISBN:9798400716492

DOI:10.1145/3639856

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

AIMLSystems 2023

AIMLSystems 2023: The Third International Conference on Artificial Intelligence and Machine Learning Systems

October 25 - 28, 2023

Bangalore, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
21
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)4

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents