research-article

RecShard: statistical feature-based memory optimization for industry-scale neural recommendation

Authors:

Christos Kozyrakis,

Caroline Trippel,

Carole-Jean WuAuthors Info & Claims

ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 344 - 358

https://doi.org/10.1145/3503222.3507777

Published: 22 February 2022 Publication History

Abstract

We propose RecShard, a fine-grained embedding table (EMB) partitioning and placement technique for deep learning recommendation models (DLRMs). RecShard is designed based on two key observations. First, not all EMBs are equal, nor all rows within an EMB are equal in terms of access patterns. EMBs exhibit distinct memory characteristics, providing performance optimization opportunities for intelligent EMB partitioning and placement across a tiered memory hierarchy. Second, in modern DLRMs, EMBs function as hash tables. As a result, EMBs display interesting phenomena, such as the birthday paradox, leaving EMBs severely under-utilized. RecShard determines an optimal EMB sharding strategy for a set of EMBs based on training data distributions and model characteristics, along with the bandwidth characteristics of the underlying tiered memory hierarchy. In doing so, RecShard achieves over 6 times higher EMB training throughput on average for capacity constrained DLRMs. The throughput increase comes from improved EMB load balance by over 12 times and from the reduced access to the slower memory by over 87 times.

References

[1]

Bilge Acun, Matthew Murphy, Xiaodong Wang, Jade Nie, Carole-Jean Wu, and Kim Hazelwood. 2021. Understanding Training Efficiency of Deep Learning Recommendation Models at Scale. In 2021 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[2]

Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, and Prashant J. Nair. 2021. High-Performance Training by Exploiting Hot-Embeddings in Recommendation Systems. CoRR, arxiv:2103.00686

[3]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In Workshop on Deep Learning for Recommender Systems.

[4]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In ACM Recommender Systems Conference.

[5]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR.2009.5206848

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota. 4171–4186. https://doi.org/10.18653/v1/N19-1423

[7]

Facebook Research. 2021. An implementation of a deep learning recommendation model (DLRM). https://github.com/facebookresearch/dlrm

[8]

Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, and James Zou. 2019. Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems. arXiv preprint arXiv:1909.11810.

[9]

Carlos A. Gomez-Uribe and Neil Hunt. 2016. The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Manage. Inf. Syst., 6, 4 (2016), Article 13, Dec., 19 pages. issn:2158-656X https://doi.org/10.1145/2843948

Digital Library

[10]

Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks, and Carole-Jean Wu. 2020. DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference. In Proceedings of the ACM/IEEE Annual International Symposium on Computer Architecture.

Digital Library

[11]

Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Mark Hempstead, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, Mikhail Smelyanskiy, Liang Xiong, and Xuan Zhang. 2020. The Architectural Implications of Facebook’s DNN-Based Personalized Recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[12]

Gurobi Optimization, LLC. 2021. Gurobi Optimizer Reference Manual. https://www.gurobi.com

[13]

Mark Harris. 2013. Unified Memory in CUDA 6. https://developer.nvidia.com/blog/unified-memory-in-cuda-6/

[14]

Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, James Law, Kevin Lee, Jason Lu, Pieter Noordhuis, Misha Smelyanskiy, Liang Xiong, and Xiaodong Wang. 2018. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90

[16]

S. Hsia, U. Gupta, M. Wilkening, C. Wu, G. Wei, and D. Brooks. 2020. Cross-Stack Workload Characterization of Deep Recommendation Systems. In IEEE International Symposium on Workload Characterization (IISWC). IEEE Computer Society.

[17]

Biye Jiang, Chao Deng, Huimin Yi, Zelin Hu, Guorui Zhou, Yang Zheng, Sui Huang, Xinyang Guo, Dongyue Wang, Yue Song, Liqin Zhao, Zhi Wang, Peng Sun, Yu Zhang, Di Zhang, Jinhui Li, Jian Xu, Xiaoqiang Zhu, and Kun Gai. 2019. XDL: An Industrial Deep Learning Framework for High-Dimensional Sparse Data. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data (DLP-KDD ’19). Association for Computing Machinery, New York, NY, USA. Article 6, 9 pages. isbn:9781450367837 https://doi.org/10.1145/3326937.3341255

Digital Library

[18]

Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, and Chuanxiong Guo. 2020. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 463–479. isbn:978-1-939133-19-9 https://www.usenix.org/conference/osdi20/presentation/jiang

Digital Library

[19]

Manas R. Joglekar, Cong Li, Mei Chen, Taibai Xu, Xiaoming Wang, Jay K. Adams, Pranav Khaitan, Jiahui Liu, and Quoc V. Le. 2020. Neural Input Search for Large Scale Recommendation Models. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’20). Association for Computing Machinery, New York, NY, USA. 2387–2397. isbn:9781450379984 https://doi.org/10.1145/3394486.3403288

Digital Library

[20]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the ACM/IEEE 44th Annual International Symposium on Computer Architecture.

Digital Library

[21]

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, and Demis Hassabis. 2021. Highly accurate protein structure prediction with AlphaFold. Nature.

[22]

Wang-Cheng Kang, Derek Zhiyuan Cheng, Tiansheng Yao, Xinyang Yi, Ting Chen, Lichan Hong, and Ed H. Chi. 2021. Learning to Embed Categorical Features without Embedding Tables for Recommendation. CoRR, arxiv:2010.10784

[23]

Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim M. Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang, Brandon Reagen, Carole-Jean Wu, Mark Hempstead, and Xuan Zhang. 2020. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing. In 47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020, Valencia, Spain, May 30 - June 3, 2020. IEEE, 790–803. https://doi.org/10.1109/ISCA45697.2020.00070

Digital Library

[24]

Sameer Kumar, James Bradbury, Cliff Young, Yu Emma Wang, Anselm Levskaya, Blake Hechtman, Dehao Chen, HyoukJoong Lee, Mehmet Deveci, Naveen Kumar, Pankaj Kanwar, Shibo Wang, Skye Wanderman-Milne, Steve Lacy, Tao Wang, Tayo Oguntebi, Yazhou Zu, Yuanzhong Xu, and Andy Swing. 2021. Exploring the limits of Concurrency in ML Training on Google TPUs. arxiv:2011.03641.

[25]

Haochen Liu, Xiangyu Zhao, Chong Wang, Xiaobing Liu, and Jiliang Tang. 2020. Automated Embedding Size Search in Deep Recommender Systems. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). Association for Computing Machinery, New York, NY, USA. 2307–2316. isbn:9781450380164 https://doi.org/10.1145/3397271.3401436

Digital Library

[26]

Michael Lui, Yavuz Yetim, Özgür Özkan, Zhuoran Zhao, Shin-Yeh Tsai, Carole-Jean Wu, and Mark Hempstead. 2021. Understanding Capacity-Driven Scale-Out Neural Recommendation Inference. In 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[27]

Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, and Volker Markl. 2020. Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA. 1633–1649. isbn:9781450367356 https://doi.org/10.1145/3318464.3389705

Digital Library

[28]

Kiwan Maeng, Shivam Bharuka, Isabel Gao, Mark Jeffrey, Vikram Saraph, Bor-Yiing Su, Caroline Trippel, Jiyan Yang, Mike Rabbat, Brandon Lucia, and Carole-Jean Wu. 2021. Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery. In Proceedings of Machine Learning and Systems.

[29]

Peter Mattson, Christine Cheng, Gregory Diamos, Cody Coleman, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debo Dutta, Udit Gupta, Kim Hazelwood, Andy Hock, Xinyuan Huang, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St John, Carole-Jean Wu, Lingjie Xu, Cliff Young, and Matei Zaharia. 2020. MLPerf Training Benchmark. In Proceedings of Machine Learning and Systems.

[30]

Seung Won Min, Vikram Sharma Mailthody, Zaid Qureshi, Jinjun Xiong, Eiman Ebrahimi, and Wen-mei Hwu. 2020. EMOGI: Efficient Memory-Access for out-of-Memory Graph-Traversal in GPUs. Proc. VLDB Endow., 14, 2 (2020), Oct., 114–127. issn:2150-8097 https://doi.org/10.14778/3425879.3425883

Digital Library

[31]

Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng, Yinbin Ma, Junjie Yang, Ellie Wen, Hong Li, Lin Yang, Chonglin Sun, Whitney Zhao, Dimitry Melts, Krishna Dhulipala, KR Kishore, Tyler Graf, Assaf Eisenman, Kiran Kumar Matam, Adi Gangidi, Guoqiang Jerry Chen, Manoj Krishnan, Avinash Nayak, Krishnakumar Nair, Bharath Muthiah, Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, and Vijay Rao. 2021. High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models. CoRR, arxiv:2104.05158.

[32]

Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, Krishnakumar Nair, Isabel Gao, Bor-Yiing Su, Jiyan Yang, and Mikhail Smelyanskiy. 2020. Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems. CoRR, arxiv:2003.09518.

[33]

Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. arxiv:1906.00091.

[34]

Yves Raimond. 2018. Deep Learning for Recommender Systems. https://www.slideshare.net/moustaki/deep-learning-for- recommender-systems-86752234

[35]

Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. 2020. MLPerf Inference Benchmark. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[36]

Danny Sullivan. 2016. Google uses RankBrain for every search, impacts rankings of “lots” of them. https://searchengineland.com/google-loves-rankbrain-uses-for-every-search-252526

[37]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, undefinedukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA. 6000–6010. isbn:9781510860964

Digital Library

[38]

Yu Emma Wang, Carole-Jean Wu, Xiaodong Wang, Kim Hazelwood, and David Brooks. 2021. Exploiting Parallelism Opportunities with Deep Learning Frameworks. ACM Transactions on Architecture and Code Optimization, 18, 1 (2021).

Digital Library

[39]

Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. Feature Hashing for Large Scale Multitask Learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML ’09). Association for Computing Machinery, New York, NY, USA. 1113–1120. isbn:9781605585161 https://doi.org/10.1145/1553374.1553516

Digital Library

[40]

Jonathan A. Weyn, Dale R. Durran, and Rich Caruana. 2020. Improving Data‐Driven Global Weather Prediction Using Deep Convolutional Neural Networks on a Cubed Sphere. Journal of Advances in Modeling Earth Systems, 12, 9 (2020), Sep, issn:1942-2466 https://doi.org/10.1029/2020ms002109

[41]

Mark Wilkening, Udit Gupta, Samuel Hsia, Caroline Trippel, Carole-Jean Wu, David Brooks, and Gu-Yeon Wei. 2021. RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.

Digital Library

[42]

Carole-Jean Wu, Robin Burke, Ed Chi, Joseph A. Konstan, Julian J. McAuley, Yves Raimond, and Hao Zhang. 2020. Developing a Recommendation Benchmark for MLPerf Training and Inference. CoRR, abs/2003.07336 (2020), arxiv:2003.07336

[43]

Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga Behram, James Huang, Charles Bai, Michael Gschwind, Anurag Gupta, Myle Ott, Anastasia Melnikov, Salvatore Candido, David Brooks, Geeta Chauhan, Benjamin Lee, Hsien-Hsin S. Lee, Bugra Akyildiz, Maximilian Balandat, Joe Spisak, Ravi Jain, Mike Rabbat, and Kim Hazelwood. 2021. Sustainable AI: Environmental Implications, Challenges and Opportunities. CoRR, abs/2111.00364 (2021).

[44]

Chunxing Yin, Bilge Acun, Xing Liu, and Carole-Jean Wu. 2021. TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models. CoRR, abs/2101.11714 (2021), arxiv:2101.11714

[45]

Caojin Zhang, Yicun Liu, Yuanpu Xie, Sofia Ira Ktena, Alykhan Tejani, Akshay Gupta, Pranay Kumar Myana, Deepak Dilipkumar, Suvadip Paul, Ikuhiro Ihara, Prasang Upadhyaya, Ferenc Huszar, and Wenzhe Shi. 2020. Model Size Reduction Using Frequency Based Double Hashing for Recommender Systems. In Fourteenth ACM Conference on Recommender Systems (RecSys ’20). Association for Computing Machinery, New York, NY, USA. 521–526. isbn:9781450375832 https://doi.org/10.1145/3383313.3412227

Digital Library

[46]

Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In Proceedings of Machine Learning and Systems.

[47]

Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. 2019. Recommending What Video to Watch next: A Multitask Ranking System. In Proceedings of the 13th ACM Conference on Recommender Systems (RecSys ’19). Association for Computing Machinery, New York, NY, USA. 43–51. isbn:9781450362436 https://doi.org/10.1145/3298689.3346997

Digital Library

[48]

Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In AAAI conference on artificial intelligence. 33, 5941–5948.

Digital Library

Cited By

Zhang HZhao PMiao XShao YLiu ZYang TCui B(2024)Experimental Analysis of Large-Scale Learnable Vector Storage CompressionProceedings of the VLDB Endowment10.14778/3636218.363623417:4(808-822)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.14778/3636218.3636234
Chen CYen JLai YLin YYang C(2024)RecTS: A Temporal-Aware Memory System Optimization for Training Deep Learning Recommendation ModelsProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689155(104-117)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1145/3688351.3689155
Liu SZheng NKang HSimmons XZhang JLanger MZhu WLee MWang Z(2024)Embedding Optimization for Training Large-scale Deep Learning Recommendation Systems with EMBarkProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688111(622-632)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688111
Show More Cited By

Index Terms

RecShard: statistical feature-based memory optimization for industry-scale neural recommendation
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Min-max placement for large-scale timing optimization
ISPD '02: Proceedings of the 2002 international symposium on Physical design

At the 250nm technology node, interconnect delays account for over 40% of worst delays [12]. Transition to 130nm and below increases this figure, and hence the relative importance of timing-driven placement for VLSI. Our work introduces a novel ...
Constraint-driven large-scale circuit placement algorithms
An integrated-spreading-based macro-refining algorithm for large-scale mixed-size circuit designs
ICCAD '17: Proceedings of the 36th International Conference on Computer-Aided Design

With the increasing use of pre-designed macros in a modern chip and its induced high design complexity, macro placement has become a challenging problem in today's design houses. Most popular macro placement algorithms adopt a three-stage approach: ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

February 2022

1164 pages

ISBN:9781450392051

DOI:10.1145/3503222

General Chairs:
Babak Falsafi
EPFL, Switzerland
,
Michael Ferdman
Stony Brook University, USA
,
Program Chairs:
Shan Lu
University of Chicago, USA
,
Tom Wenisch
University of Michigan, USA / Google, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASPLOS '22

Sponsor:

ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

February 28 - March 4, 2022

Lausanne, Switzerland

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
1,254
Total Downloads

Downloads (Last 12 months)274
Downloads (Last 6 weeks)20

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang HZhao PMiao XShao YLiu ZYang TCui B(2024)Experimental Analysis of Large-Scale Learnable Vector Storage CompressionProceedings of the VLDB Endowment10.14778/3636218.363623417:4(808-822)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.14778/3636218.3636234
Chen CYen JLai YLin YYang C(2024)RecTS: A Temporal-Aware Memory System Optimization for Training Deep Learning Recommendation ModelsProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689155(104-117)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1145/3688351.3689155
Liu SZheng NKang HSimmons XZhang JLanger MZhu WLee MWang Z(2024)Embedding Optimization for Training Large-scale Deep Learning Recommendation Systems with EMBarkProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688111(622-632)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688111
Zhang QTeng ZWu DWang JSerra ESpezzano F(2024)An Enhanced Batch Query Architecture in Real-time RecommendationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680034(5078-5085)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680034
He JChen SZhai JLee IChabbi MSteuwer M(2024)POSTER: Pattern-Aware Sparse Communication for Scalable Recommendation Model TrainingProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638481(466-468)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638481
Wang ZWang YDeng JZheng DLi ADing YTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input PreprocessingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640406(964-979)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640406
Qu YChen TNguyen QYin HAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)Budgeted Embedding Table For Recommender SystemsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635778(557-566)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635778
Wang SFeng TYang HYou XChen BLiu TLuan ZQian D(2024)AtRec: Accelerating Recommendation Model Training on CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.338118635:6(905-918)Online publication date: Jun-2024
https://doi.org/10.1109/TPDS.2024.3381186
Lim JKim YChung SKoushanfar FKong J(2024)Near-Memory Computing With Compressed Embedding Table for Personalized RecommendationIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.334587012:3(938-951)Online publication date: Jul-2024
https://doi.org/10.1109/TETC.2023.3345870
Yang WYang YJi SJiang JJing NWang QMao ZSheng W(2024)RecPIM: Efficient In-Memory Processing for Personalized Recommendation Inference Using Near-Bank ArchitectureIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.338611743:10(2854-2867)Online publication date: Oct-2024
https://doi.org/10.1109/TCAD.2024.3386117
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents