extended-abstract

Toward 100TB Recommendation Models with Embedding Offloading

Authors:

Jian He,

Dennis Van der Staay,

Yu Guo,

Paul ZhangAuthors Info & Claims

RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

Pages 841 - 843

https://doi.org/10.1145/3640457.3688037

Published: 08 October 2024 Publication History

Get Access

Abstract

Training recommendation models become memory-bound with large embedding tables, and fast GPU memory is scarce. In this paper, we explore embedding caches and prefetch pipelines to effectively leverage large but slow host memory for embedding tables. We introduce Locality-Aware Sharding and iterative planning that automatically size caches optimally and produce effective sharding plans. Embedding Offloading, a system that combines all of these components and techniques, is implemented on top of Meta’s open-source libraries, FBGEMM GPU and TorchRec, and it is used to improve scalability and efficiency of industry-scale production models. Embedding Offloading achieved 37x model scale to 100TB model size with only 26% training speed regression.

References

[1]

Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, and Prashant J. Nair. 2021. High-Performance Training by Exploiting Hot-Embeddings in Recommendation Systems. CoRR abs/2103.00686 (2021). arXiv:2103.00686https://arxiv.org/abs/2103.00686

Google Scholar

[2]

Saurabh Agarwal, Chengpo Yan, Ziyi Zhang, and Shivaram Venkataraman. 2023. BagPipe: Accelerating Deep Recommendation Model Training. arxiv:2202.12429 [cs.DC]

Google Scholar

[3]

Ehsan K. Ardestani, Changkyu Kim, Seung Jae Lee, Luoshang Pan, Valmiki Rampersad, Jens Axboe, Banit Agrawal, Fuxun Yu, Ansha Yu, Trung Le, Hector Yuen, Shishir Juluri, Akshat Nanda, Manoj Wodekar, Dheevatsa Mudigere, Krishnakumar Nair, Maxim Naumov, Chris Peterson, Mikhail Smelyanskiy, and Vijay Rao. 2021. Supporting Massive DLRM Inference Through Software Defined Memory. CoRR abs/2110.11489 (2021). arXiv:2110.11489https://arxiv.org/abs/2110.11489

Google Scholar

[4]

Keshav Balasubramanian, Abdulla Alshabanah, Joshua D Choe, and Murali Annavaram. 2021. cDLRM: Look Ahead Caching for Scalable Training of Recommendation Models. In Proceedings of the 15th ACM Conference on Recommender Systems (Amsterdam, Netherlands) (RecSys ’21). Association for Computing Machinery, New York, NY, USA, 263–272. https://doi.org/10.1145/3460231.3474246

Digital Library

Google Scholar

[5]

Mark Harris. 2013. Unified Memory in CUDA 6. https://developer.nvidia.com/blog/unified-memory-in-cuda-6/

Google Scholar

[6]

Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, James Law, Kevin Lee, Jason Lu, Pieter Noordhuis, Misha Smelyanskiy, Liang Xiong, and Xiaodong Wang. 2018. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 620–629. https://doi.org/10.1109/HPCA.2018.00059

Crossref

Google Scholar

[7]

Dmytro Ivchenko, Dennis Van Der Staay, Colin Taylor, Xing Liu, Will Feng, Rahul Kindi, Anirudh Sudarshan, and Shahin Sefati. 2022. TorchRec: a PyTorch Domain Library for Recommendation Systems. In Proceedings of the 16th ACM Conference on Recommender Systems (, Seattle, WA, USA, ) (RecSys ’22). Association for Computing Machinery, New York, NY, USA, 482–483. https://doi.org/10.1145/3523227.3547387

Digital Library

Google Scholar

[8]

Hiwot Tadese Kassa, Paul Johnson, Jason Akers, Mrinmoy Ghosh, Andrew Tulloch, Dheevatsa Mudigere, Jongsoo Park, Xing Liu, Ronald Dreslinski, and Ehsan K. Ardestani. 2023. MTrainS: Improving DLRM training efficiency using heterogeneous memories. arxiv:2305.01515 [cs.IR]

Google Scholar

[9]

Daya Shanker Khudia, Jianyu Huang, Protonu Basu, Summer Deng, Haixin Liu, Jongsoo Park, and Mikhail Smelyanskiy. 2021. FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference. CoRR abs/2101.05615 (2021). arXiv:2101.05615https://arxiv.org/abs/2101.05615

Google Scholar

[10]

R.L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. 1970. Evaluation techniques for storage hierarchies. IBM Systems Journal 9, 2 (1970), 78–117. https://doi.org/10.1147/sj.92.0078

Digital Library

Google Scholar

[11]

Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie (Amy) Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng, Yinbin Ma, Junjie Yang, Ellie Wen, Hong Li, Lin Yang, Chonglin Sun, Whitney Zhao, Dimitry Melts, Krishna Dhulipala, KR Kishore, Tyler Graf, Assaf Eisenman, Kiran Kumar Matam, Adi Gangidi, Guoqiang Jerry Chen, Manoj Krishnan, Avinash Nayak, Krishnakumar Nair, Bharath Muthiah, Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Ajit Mathews, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, and Vijay Rao. 2022. Software-hardware co-design for fast and scalable training of deep learning recommendation models. In Proceedings of the 49th Annual International Symposium on Computer Architecture (New York, New York) (ISCA ’22). Association for Computing Machinery, New York, NY, USA, 993–1011. https://doi.org/10.1145/3470496.3533727

Digital Library

Google Scholar

[12]

Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR abs/1906.00091 (2019). arXiv:1906.00091http://arxiv.org/abs/1906.00091

Google Scholar

[13]

PyTorch. 2024. FBGEMM GPU Python API. https://pytorch.org/FBGEMM/fbgemm_gpu-python-api/table_batched_embedding_ops.html

Google Scholar

[14]

Jie Amy Yang, Jianyu Huang, Jongsoo Park, Ping Tak Peter Tang, and Andrew Tulloch. 2020. Mixed-Precision Embedding Using a Cache. CoRR abs/2010.11305 (2020). arXiv:2010.11305https://arxiv.org/abs/2010.11305

Google Scholar

Index Terms

Toward 100TB Recommendation Models with Embedding Offloading
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Learning linear models
2. Information systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Lossless data embedding: new paradigm in digital watermarking

One common drawback of virtually all current data embedding methods is the fact that the original image is inevitably distorted due to data embedding itself. This distortion typically cannot be removed completely due to quantization, bit-replacement, or ...
Reversible Data Embedding for Tamper-Proof Watermarks
ICICIC '06: Proceedings of the First International Conference on Innovative Computing, Information and Control - Volume 3

In this paper, a novel reversible data embedding for tamper-proof watermarks is proposed. A reversible watermark is embedded into robust watermark in the discrete wavelet transform (DWT) domain using a feature map and a location map. Generally, the ...
Self-embedding fragile watermarking based on reference-data interleaving and adaptive selection of embedding mode

A novel self-embedding watermarking scheme for tampering recovery is proposed. Two types of modes, i.e., overlapping and overlapping-free embedding, are used. Flexible, MSB-based, interleaved reference bits are embedded for content recovery. Embedding ...

Comments

Information & Contributors

Information

Published In

RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

October 2024

1438 pages

ISBN:9798400705052

DOI:10.1145/3640457

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024

Check for updates

Qualifiers

Extended-abstract
Research
Refereed limited

Conference

RecSys '24

Sponsor:

RecSys '24: 18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari, Italy

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
348
Total Downloads

Downloads (Last 12 months)348
Downloads (Last 6 weeks)51

Reflects downloads up to 23 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Index Terms

Recommendations

Lossless data embedding: new paradigm in digital watermarking

Reversible Data Embedding for Tamper-Proof Watermarks

Self-embedding fragile watermarking based on reference-data interleaving and adaptive selection of embedding mode

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations