Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

EMS-i: An Efficient Memory System Design with Specialized Caching Mechanism for Recommendation Inference

Published: 09 September 2023 Publication History

Abstract

Recommendation systems have been widely embedded into many Internet services. For example, Meta’s deep learning recommendation model (DLRM) shows high prefictive accuracy of click-through rate in processing large-scale embedding tables. The SparseLengthSum (SLS) kernel of the DLRM dominates the inference time of the DLRM due to intensive irregular memory accesses to the embedding vectors. Some prior works directly adopt near data processing (NDP) solutions to obtain higher memory bandwidth to accelerate SLS. However, their inferior memory hierarchy induces low performance-cost ratio and fails to fully exploit the data locality. Although some software-managed cache policies were proposed to improve the cache hit rate, the incurred cache miss penalty is unacceptable considering the high overheads of executing the corresponding programs and the communication between the host and the accelerator. To address the issues aforementioned, we propose EMS-i, an efficient memory system design that integrates Solide State Drive (SSD) into the memory hierarchy using Compute Express Link (CXL) for recommendation system inference. We specialize the caching mechanism according to the characteristics of various DLRM workloads and propose a novel prefetching mechanism to further improve the performance. In addition, we delicately design the inference kernel and develop a customized mapping scheme for SLS operation, considering the multi-level parallelism in SLS and the data locality within a batch of queries. Compared to the state-of-the-art NDP solutions, EMS-i achieves up to 10.9× speedup over RecSSD and the performance comparable to RecNMP with 72% energy savings. EMS-i also saves up to 8.7× and 6.6 × memory cost w.r.t. RecSSD and RecNMP, respectively.

References

[2]
Ehsan K. Ardestani et al. 2022. Supporting massive DLRM inference through software defined memory. In ICDCS. IEEE.
[3]
Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and Angela Y. Wu. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM (JACM) 45, 6 (1998), 891–923.
[4]
Artem Babenko and Victor Lempitsky. 2016. Efficient indexing of billion-scale datasets of deep descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2055–2063.
[5]
Keshav Balasubramanian, Abdulla Alshabanah, Joshua D. Choe, and Murali Annavaram. 2021. cDLRM: Look ahead caching for scalable training of recommendation models. In Proceedings of the 15th ACM Conference on Recommender Systems. 263–272.
[10]
Udit Gupta et al. 2020. DeepRecSys: A system for optimizing end-to-end at-scale neural recommendation inference. In ISCA.
[12]
Ranggi Hwang, Taehun Kim, Youngeun Kwon, and Minsoo Rhu. 2020. Centaur: A chiplet-based, hybrid sparse-dense accelerator for personalized recommendations. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA’20). IEEE, 968–981.
[13]
Myoungsoo Jung et al. 2017. SimpleSSD: Modeling solid state drives for holistic system simulation. IEEE Computer Architecture Letters (2017).
[15]
Liu Ke et al. 2020. Recnmp: Accelerating personalized recommendation with near-memory processing. In ISCA.
[16]
Liu Ke, Xuan Zhang, Jinin So, Jong-Geon Lee, Shin-Haeng Kang, Sukhan Lee, Songyi Han, YeonGon Cho, Jin Hyun Kim, Yongsuk Kwon, et al. 2021. Near-memory processing in action: Accelerating personalized recommendation with axdimm. IEEE Micro 42, 1 (2021), 116–127.
[17]
Ji-Hoon Kim, Yeo-Reum Park, Jaeyoung Do, Soo-Young Ji, and Joo-Young Kim. 2022. Accelerating large-scale graph-based nearest neighbor search on a computational storage platform. IEEE Trans. Comput. (2022), 1–1.
[18]
Yoongu Kim et al. 2015. Ramulator: A fast and extensible DRAM simulator. IEEE Computer Architecture Letters (2015).
[19]
Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2019. Tensordimm: A practical near-memory processing architecture for embeddings and tensor operations in deep learning. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 740–753.
[20]
Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2021. Tensor casting: Co-designing algorithm-architecture for personalized recommendation training. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA’21). IEEE, 235–248.
[21]
Huaicheng Li et al. 2022. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms.
[22]
Jason Lowe-Power et al. 2020. The gem5 simulator: Version 20.0+. arXiv preprint arXiv:2007.03152 (2020).
[23]
Yu A. Malkov and Dmitry A. Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 4 (2018), 824–836.
[25]
Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, et al. 2021. High-performance, distributed training of large-scale deep learning recommendation models. arXiv preprint arXiv:2104.05158 (2021).
[26]
Maxim Naumov et al. 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv (2019).
[27]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532–1543.
[30]
Mohammadreza Soltaniyeh et al. 2022. Near-storage processing for solid state drive based recommendation inference with SmartSSDs®. In ICPE.
[33]
Xuan Sun, Hu Wan, Qiao Li, Chia-Lin Yang, Tei-Wei Kuo, and Chun Jason Xue. 2022. Rm-ssd: In-storage computing for large-scale recommendation inference. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA’22). IEEE, 1056–1070.
[35]
Frank Edward Walter et al. 2008. A model of a trust-based recommendation system on a social network. AAMAS (2008).
[36]
Yitu Wang, Zhenhua Zhu, Fan Chen, Mingyuan Ma, Guohao Dai, Yu Wang, Hai Li, and Yiran Chen. 2021. REREC: In-ReRAM acceleration with access-aware mapping for personalized recommendation. In 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD’21). IEEE, 1–9.
[37]
Mark Wilkening, Gupta, et al. 2021. RecSSD: Near data processing for solid state drive based recommendation inference. In ASPLOS.
[38]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
[40]
Xiangmin Zhou et al. 2015. Online video recommendation in sharing community. In ICMD.

Cited By

View all
  • (2024)PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00052(612-626)Online publication date: 2-Nov-2024
  • (2024)Accelerating Large-Scale DLRM Inference through Dynamic Hot Data Rearrangement2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558132(1-5)Online publication date: 19-May-2024
  • (2024)NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00035(368-381)Online publication date: 29-Jun-2024
  • Show More Cited By

Index Terms

  1. EMS-i: An Efficient Memory System Design with Specialized Caching Mechanism for Recommendation Inference

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 22, Issue 5s
      Special Issue ESWEEK 2023
      October 2023
      1394 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/3614235
      • Editor:
      • Tulika Mitra
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 09 September 2023
      Accepted: 13 July 2023
      Revised: 02 June 2023
      Received: 23 March 2023
      Published in TECS Volume 22, Issue 5s

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Recommendation system
      2. compute express link

      Qualifiers

      • Research-article

      Funding Sources

      • National Science Foundation
      • National Science Foundation IUCRC memberships from Samsung and other companies

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)520
      • Downloads (Last 6 weeks)17
      Reflects downloads up to 11 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00052(612-626)Online publication date: 2-Nov-2024
      • (2024)Accelerating Large-Scale DLRM Inference through Dynamic Hot Data Rearrangement2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558132(1-5)Online publication date: 19-May-2024
      • (2024)NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00035(368-381)Online publication date: 29-Jun-2024
      • (2023)Evaluating the Efficiency of Caching Strategies in Reducing Application LatencyJournal of Science & Technology10.55662/JST.2023.46064:6(83-98)Online publication date: 6-Nov-2023

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media