Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3639856.3639878acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaimlsystemsConference Proceedingsconference-collections
research-article

Hetero-Rec++: Modelling-based Robust and Optimal Deployment of Embeddings Recommendations

Published: 17 May 2024 Publication History

Abstract

Deep Neural Network (DNN)-based recommendation models (RMs) are widely adopted in enterprise applications to suggest products, videos, tweets, and posts to users. These models heavily rely on embedding tables, which contain latent embedding vectors and are accessed to evaluate the probability of user interactions. Recently published, Hetero-Rec framework leverages the embedding access history to allocate storage partitions in heterogeneous memory architecture, aiming to maximize the chance of having frequently accessed embeddings available in faster memory. In this work, we extend the study of Hetero-Rec to larger embedding tables (up to 350), low-hot embedding tables, and evaluate the end-to-end speedups. Consequently, we present Hetro-Rec++ with heuristic based pre-optimizer and advanced formulation of optimizer’s cost function. Further, we demonstrate its effectiveness in reducing the embedding’s average fetch latency and hence, improving the inference latency for RMs deployed on Field Programmable Gate Arrays (FPGAs) with heterogeneous memory architectures.

References

[1]
[n.d.]. https://www.intel.com/content/www/us/en/products/sku/120473/intel-xeon-gold-5118-processor-16-5m-cache-2-30-ghz/specifications.html
[2]
[n.d.]. CriteoLabs. Terabyte click logs. https://ailab.criteo.com/download-criteo-1tb-click-logs-dataset/
[3]
Xilinx, Inc. 2017. Block Memory Generator v8.3. Xilinx, Inc. https://docs.xilinx.com/v/u/8.3-English/pg058-blk-mem-gen
[4]
Xilinx, Inc. 2021. UltraRAM Readback and Writeback v1.0. Xilinx, Inc. https://www.xilinx.com/content/dam/xilinx/support/documents/ip_documentation/uram_rd_back/v1_0/pg356-uram-rdback.pdf
[5]
Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, and Prashant J. Nair. 2021. Accelerating Recommendation System Training by Leveraging Popular Choices. arxiv:2103.00686 [cs.IR]
[6]
Alimama. 2018. Ad Display/Click Data on Taobao.com. https://tianchi.aliyun.com/dataset/dataDetail?dataId=56&lang=en-us
[7]
Xilinx, Inc. 2021. Alveo u280 data center accelerator card. Xilinx, Inc. https://www.xilinx.com/products/boards-and-kits/alveo/u280.html
[8]
Avazu. 2015. Avazu Dataset. https://www.kaggle.com/c/avazu-ctr-prediction
[9]
Jiechao Gao, Haoyu Wang, and Haiying Shen. 2020. Machine Learning Based Workload Prediction in Cloud Computing. In 2020 29th International Conference on Computer Communications and Networks (ICCCN). 1–9. https://doi.org/10.1109/ICCCN49398.2020.9209730
[10]
Priyanka Gupta, Diksha Garg, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2019. NISER: Normalized item and session representations to handle popularity bias. arXiv preprint arXiv:1909.04276 (2019).
[11]
Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S Lee, David Brooks, and Carole-Jean Wu. 2020. Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 982–995.
[12]
Udit Gupta, Samuel Hsia, Jeff Zhang, Mark Wilkening, Javin Pombra, Hsien-Hsin Sean Lee, Gu-Yeon Wei, Carole-Jean Wu, and David Brooks. 2021. RecPipe: Co-Designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Virtual Event, Greece) (MICRO ’21). Association for Computing Machinery, New York, NY, USA, 870–884. https://doi.org/10.1145/3466752.3480127
[13]
Samuel Hsia, Udit Gupta, Mark Wilkening, Carole-Jean Wu, Gu-Yeon Wei, and David Brooks. 2020. Cross-Stack Workload Characterization of Deep Recommendation Systems. https://doi.org/10.48550/ARXIV.2010.05037
[14]
Ranggi Hwang, Taehun Kim, Youngeun Kwon, and Minsoo Rhu. 2020. Centaur: A Chiplet-Based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (Virtual Event) (ISCA ’20). IEEE Press, 968–981. https://doi.org/10.1109/ISCA45697.2020.00083
[15]
Mohamed Assem Ibrahim, Onur Kayiran, and Shaizeen Aga. 2022. Efficient Cache Utilization via Model-Aware Data Placement for Recommendation Models. In Proceedings of the International Symposium on Memory Systems (Washington DC, DC, USA) (MEMSYS ’21). Association for Computing Machinery, New York, NY, USA, Article 2, 11 pages. https://doi.org/10.1145/3488423.3519317
[16]
Rishabh Jain, Scott Cheng, Vishwas Kalagi, Vrushabh Sanghavi, Samvit Kaul, Meena Arunachalam, Kiwan Maeng, Adwait Jog, Anand Sivasubramaniam, Mahmut Taylan Kandemir, and Chita R. Das. 2023. Optimizing CPU Performance for Recommendation Systems At-Scale. In Proceedings of the 50th Annual International Symposium on Computer Architecture (Orlando, FL, USA) (ISCA ’23). Association for Computing Machinery, New York, NY, USA, Article 77, 15 pages. https://doi.org/10.1145/3579371.3589112
[17]
Wenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B. Preußer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, and Gustavo Alonso. 2021. MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions. arxiv:2010.05894 [cs.AR]
[18]
Wenqi Jiang, Zhenhao He, Shuai Zhang, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, and Gustavo Alonso. 2021. FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters. Association for Computing Machinery, New York, NY, USA, 3097–3105. https://doi.org/10.1145/3447548.3467139
[19]
Liu Ke, Udit Gupta, Mark Hempstead, Carole-Jean Wu, Hsien-Hsin S. Lee, and Xuan Zhang. 2022. Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation. arxiv:2203.07424 [cs.DC]
[20]
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: A Single-Pass Learned Index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (Portland, Oregon) (aiDM ’20). Association for Computing Machinery, New York, NY, USA, Article 5, 5 pages. https://doi.org/10.1145/3401071.3401659
[21]
Ashwin Krishnan, Manoj Nambiar, Nupur Sumeet, and Sana Iqbal. 2022. Performance Model and Profile Guided Design of a High-Performance Session Based Recommendation Engine. In Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering (Beijing, China) (ICPE ’22). Association for Computing Machinery, New York, NY, USA, 133–144. https://doi.org/10.1145/3489525.3511692
[22]
Jitendra Kumar and Ashutosh Kumar Singh. 2018. Workload prediction in cloud using artificial neural network and adaptive differential evolution. Future Generation Computer Systems 81 (2018), 41–52. https://doi.org/10.1016/j.future.2017.10.047
[23]
Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2019. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (Columbus, OH, USA) (MICRO ’52). Association for Computing Machinery, New York, NY, USA, 740–753. https://doi.org/10.1145/3352460.3358284
[24]
Chinmay Mahajan, Ashwin Krishnan, Manoj Nambiar, and Rekha Singhal. 2023. Hetero-Rec: Optimal Deployment of Embeddings for High-Speed Recommendations. In Proceedings of the Second International Conference on AI-ML Systems (Bangalore, India) (AIMLSystems ’22). Association for Computing Machinery, New York, NY, USA, Article 11, 9 pages. https://doi.org/10.1145/3564121.3564134
[25]
Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2020. Benchmarking Learned Indexes. Proc. VLDB Endow. 14, 1 (sep 2020), 1–13. https://doi.org/10.14778/3421424.3421425
[26]
Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).
[27]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[28]
Geet Sethi, Bilge Acun, Niket Agarwal, Christos Kozyrakis, Caroline Trippel, and Carole-Jean Wu. 2022. RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation. https://doi.org/10.48550/ARXIV.2201.10095
[29]
Wikipedia contributors. 2021. Linear Interpolation. https://en.wikipedia.org/wiki/Linear_interpolation. Accessed September 23, 2023.
[30]
Mark Wilkening, Udit Gupta, Samuel Hsia, Caroline Trippel, Carole-Jean Wu, David Brooks, and Gu-Yeon Wei. 2021. RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA) (ASPLOS 2021). Association for Computing Machinery, New York, NY, USA, 717–729. https://doi.org/10.1145/3445814.3446763
[31]
Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.
[32]
Inc. Xilinx. 2019. Xilinx QDMA DPDK Poll Mode Driver. https://xilinx.github.io/dma_ip_drivers/2019.1/DPDK/html/index.html
[33]
Guorui Zhou, Kun Gai, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, and Han Li. 2018. Deep Interest Network for Click-Through Rate Prediction. 1059–1068. https://doi.org/10.1145/3219819.3219823
[34]
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 5941–5948.
[35]
Y. Zhu, Z. He, W. Jiang, K. Zeng, J. Zhou, and G. Alonso. 2021. Distributed Recommendation Inference on FPGA Clusters. In 2021 31st International Conference on Field-Programmable Logic and Applications (FPL). IEEE Computer Society, Los Alamitos, CA, USA, 279–285. https://doi.org/10.1109/FPL53798.2021.00057

Index Terms

  1. Hetero-Rec++: Modelling-based Robust and Optimal Deployment of Embeddings Recommendations

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      AIMLSystems '23: Proceedings of the Third International Conference on AI-ML Systems
      October 2023
      381 pages
      ISBN:9798400716492
      DOI:10.1145/3639856
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 May 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Recommendation model deployment
      2. embedding tables
      3. heterogeneous deployment

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      AIMLSystems 2023

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 21
        Total Downloads
      • Downloads (Last 12 months)21
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 13 Jan 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media