research-article

FEC: Efficient Deep Recommendation Model Training with Flexible Embedding Communication

Authors:

James ChengAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 2

Article No.: 165, Pages 1 - 21

https://doi.org/10.1145/3589310

Published: 20 June 2023 Publication History

Abstract

Embedding-based deep recommendation models (EDRMs), which contain small dense models and large embedding tables, are widely used in industry. Embedding communication constitutes the main cost for the distributed training of EDRMs, and thus we propose two strategies to improve its efficiency, i.e.,embedding tiering andpre-fetching. In particular, embedding tiering uses AllReduce to communicate popular embeddings that are accessed frequently. This is counter-intuitive as embeddings belong to the sparse embedding tables, but reasonable because the access pattern of popular embeddings resembles dense models. Pre-fetching starts communication early for embeddings that receive no updates such that they are removed from the critical path of training. We implement embedding tiering and pre-fetching in a system called FEC and compare it with the state-of-the-art systems on real datasets. The results show that FEC consistently outperforms the existing methods on all datasets, and its speed can be up to 6.65x and 2.42x in terms of embedding communication time and training throughput compared with the best performing baseline.

Supplemental Material

MP4 File

Presentation video for SIGMOD 2023

Download
27.62 MB

References

[1]

Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, and Prashant J. Nair. 2021. Accelerating Recommendation System Training by Leveraging Popular Choices. Proc. VLDB Endow., Vol. 15, 1 (2021), 127--140.

Digital Library

[2]

Dan Alistarh, Demjan Grubic, Jerry Z Li, Ryota Tomioka, and Milan Vojnovic. 2017. QSGD: communication-efficient SGD via gradient quantization and encoding. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1707--1718.

[3]

Zhenkun Cai, Xiao Yan, Kaihao Ma, Yidi Wu, Yuzhen Huang, James Cheng, Teng Su, and Fan Yu. 2021. Tensoropt: Exploring the tradeoffs in distributed dnn training with auto-parallelism. IEEE Transactions on Parallel and Distributed Systems, Vol. 33, 8 (2021), 1967--1981.

Digital Library

[4]

Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Jó zefowicz. 2016. Revisiting Distributed Synchronous SGD. CoRR, Vol. abs/1604.00981 (2016).

[5]

Wenqiang Chen, Lizhang Zhan, Yuanlong Ci, and Chen Lin. 2019. FLEN: Leveraging Field for Scalable CTR Prediction. CoRR, Vol. abs/1911.04690 (2019).

[6]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In DLRS@RecSys. ACM, 7--10.

[7]

Weiyu Cheng, Yanyan Shen, and Linpeng Huang. 2020. Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7--12, 2020. AAAI Press, 3609--3616. https://ojs.aaai.org/index.php/AAAI/article/view/5768

[8]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In RecSys. ACM, 191--198.

[9]

CriteoLabs. 2014. Criteo display ad challenge. https://www.kaggle.com/c/criteo-display-ad-challenge

[10]

CriteoLabs. 2022. Criteo 1TB Click Logs dataset. https://ailab.criteo.com/criteo-1tb-click-logs-dataset-for-mlperf/

[11]

Jeffrey Dean, Greg S Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V Le, Mark Z Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, et al. 2012. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems-Volume 1. 1223--1231.

[12]

Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed Deep Graph Learning at Scale. In OSDI. USENIX Association, 551--568.

[13]

Saeed Ghadimi, Guanghui Lan, and Hongchao Zhang. 2016. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program., Vol. 155, 1--2 (2016), 267--305.

Digital Library

[14]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. In IJCAI. ijcai.org, 1725--1731.

[15]

Suyog Gupta, Wei Zhang, and Fei Wang. 2017. Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study. In IJCAI. ijcai.org, 4854--4858.

[16]

Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim M. Hazelwood, Mark Hempstead, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, Mikhail Smelyanskiy, Liang Xiong, and Xuan Zhang. 2020. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation. In HPCA. IEEE, 488--501.

[17]

Vipul Gupta, Dhruv Choudhary, Ping Tak Peter Tang, Xiaohan Wei, Xing Wang, Yuzhen Huang, Arun Kejariwal, Kannan Ramchandran, and Michael W. Mahoney. 2021. Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism. In KDD. ACM, 2928--2936.

[18]

Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for Sparse Predictive Analytics. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7--11, 2017, Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de Vries, and Ryen W. White (Eds.). ACM, 355--364. https://doi.org/10.1145/3077136.3080777

Digital Library

[19]

Yongjun He, Jiacheng Lu, and Tianzheng Wang. 2020. CoroBase: coroutine-oriented main-memory database engine. Proceedings of the VLDB Endowment, Vol. 14, 3 (2020), 431--444.

Digital Library

[20]

Herodotos Herodotou and Elena Kakoulli. 2019. Automating Distributed Tiered Storage Management in Cluster Computing. Proc. VLDB Endow., Vol. 13, 1 (2019), 43--56. https://doi.org/10.14778/3357377.3357381

Digital Library

[21]

Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Xu Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché -Buc, Emily B. Fox, and Roman Garnett (Eds.). 103--112. https://proceedings.neurips.cc/paper/2019/hash/093f65e080a295f8076b1c5722a46aa2-Abstract.html

[22]

Yuzhen Huang, Xiaohan Wei, Xing Wang, Jiyan Yang, Bor-Yiing Su, Shivam Bharuka, Dhruv Choudhary, Zewei Jiang, Hai Zheng, and Jack Langman. 2021. Hierarchical Training: Scaling Deep Recommendation Models on Large CPU Clusters. In KDD. ACM, 3050--3058.

[23]

Dmytro Ivchenko, Dennis Van Der Staay, Colin Taylor, Xing Liu, Will Feng, Rahul Kindi, Anirudh Sudarshan, and Shahin Sefati. 2022. TorchRec: a PyTorch Domain Library for Recommendation Systems. In RecSys. ACM, 482--483.

[24]

Zhihao Jia, Sina Lin, Charles R. Qi, and Alex Aiken. 2018. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm"a ssan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research, Vol. 80), Jennifer G. Dy and Andreas Krause (Eds.). PMLR, 2279--2288. http://proceedings.mlr.press/v80/jia18a.html

[25]

Zhihao Jia, James Thomas, Todd Warszawski, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2019a. Optimizing DNN Computation with Relaxed Graph Substitutions. In MLSys. mlsys.org.

[26]

Zhihao Jia, Matei Zaharia, and Alex Aiken. 2019b. Beyond Data and Model Parallelism for Deep Neural Networks. In Proceedings of Machine Learning and Systems 2019, MLSys 2019, Stanford, CA, USA, March 31 - April 2, 2019, Ameet Talwalkar, Virginia Smith, and Matei Zaharia (Eds.). mlsys.org. https://proceedings.mlsys.org/book/265.pdf

[27]

Biye Jiang, Chao Deng, Huimin Yi, Zelin Hu, Guorui Zhou, Yang Zheng, Sui Huang, Xinyang Guo, Dongyue Wang, Yue Song, et al. 2019. XDL: an industrial deep learning framework for high-dimensional sparse data. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data. 1--9.

Digital Library

[28]

Rafal Jó zefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. 2016. Exploring the Limits of Language Modeling. CoRR, Vol. abs/1602.02410 (2016).

[29]

Elena Kakoulli and Herodotos Herodotou. 2017. OctopusFS: A Distributed File System with Tiered Storage Management. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu (Eds.). ACM, 65--78. https://doi.org/10.1145/3035918.3064023

Digital Library

[30]

Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho, Eunji Jeong, Hyeonmin Ha, Sanha Lee, Joo Seong Jeong, and Byung-Gon Chun. 2019. Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks. In EuroSys. ACM, 43:1--43:15.

Digital Library

[31]

Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In OSDI. USENIX Association, 583--598.

Digital Library

[32]

Mingju Li, Elizabeth Varki, Swapnil Bhatia, and Arif Merchant. 2008. TaP: Table-based Prefetching for Storage Caches. In 6th USENIX Conference on File and Storage Technologies, FAST 2008, February 26--29, 2008, San Jose, CA, USA, Mary Baker and Erik Riedel (Eds.). USENIX, 81--96. http://www.usenix.org/events/fast08/tech/li.html

[33]

Xiang Li, Chao Wang, Jiwei Tan, Xiaoyi Zeng, Dan Ou, and Bo Zheng. 2020. Adversarial Multimodal Representation Learning for Click-Through Rate Prediction. In WWW. ACM / IW3C2, 827--836.

[34]

Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018b. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In KDD. ACM, 1754--1763.

[35]

Xiangru Lian, Yijun Huang, Yuncheng Li, and Ji Liu. 2015. Asynchronous parallel stochastic gradient for nonconvex optimization. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2. 2737--2745.

[36]

Xiangru Lian, Binhang Yuan, Xuefeng Zhu, Yulong Wang, Yongjun He, Honghuan Wu, Lei Sun, Haodong Lyu, Chengjun Liu, Xing Dong, et al. 2022. Persia: An open, hybrid system scaling deep learning-based recommenders up to 100 trillion parameters. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3288--3298.

Digital Library

[37]

Xiangru Lian, Wei Zhang, Ce Zhang, and Ji Liu. 2018a. Asynchronous Decentralized Parallel Stochastic Gradient Descent. In ICML (Proceedings of Machine Learning Research, Vol. 80). PMLR, 3049--3058.

[38]

Qiong Luo, Sailesh Krishnamurthy, C. Mohan, Hamid Pirahesh, Honguk Woo, Bruce G. Lindsay, and Jeffrey F. Naughton. 2002. Middle-tier database caching for e-business. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA, June 3--6, 2002, Michael J. Franklin, Bongki Moon, and Anastassia Ailamaki (Eds.). ACM, 600--611. https://doi.org/10.1145/564691.564763

Digital Library

[39]

Amith R. Mamidala, Jiuxing Liu, and Dhabaleswar K. Panda. 2004. Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms. In 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), September 20--23 2004, San Diego, California, USA. IEEE Computer Society, 135--144. https://doi.org/10.1109/CLUSTR.2004.1392611

[40]

Xupeng Miao, Hailin Zhang, Yining Shi, Xiaonan Nie, Zhi Yang, Yangyu Tao, and Bin Cui. 2021. HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework. Proc. VLDB Endow., Vol. 15, 2 (2021), 312--320.

Digital Library

[41]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In ICLR Workshop.

[42]

Seung Won Min, Kun Wu, Mert Hidayetoglu, Jinjun Xiong, Xiang Song, and Wen-mei Hwu. 2022. Graph Neural Network Training and Data Tiering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3555--3565.

Digital Library

[43]

Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng, Yinbin Ma, Junjie Yang, Ellie Wen, Hong Li, Lin Yang, Chonglin Sun, Whitney Zhao, Dimitry Melts, Krishna Dhulipala, K. R. Kishore, Tyler Graf, Assaf Eisenman, Kiran Kumar Matam, Adi Gangidi, Guoqiang Jerry Chen, Manoj Krishnan, Avinash Nayak, Krishnakumar Nair, Bharath Muthiah, Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Ajit Mathews, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, and Vijay Rao. 2022. Software-hardware co-design for fast and scalable training of deep learning recommendation models. In ISCA. ACM, 993--1011.

[44]

Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Phillip B. Gibbons, and Matei Zaharia. 2019. PipeDream: generalized pipeline parallelism for DNN training. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27--30, 2019, Tim Brecht and Carey Williamson (Eds.). ACM, 1--15. https://doi.org/10.1145/3341301.3359646

Digital Library

[45]

Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR, Vol. abs/1906.00091 (2019). https://arxiv.org/abs/1906.00091

[46]

Yabo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, Anxiang Zeng, and Luo Si. 2018. Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks. In KDD. ACM, 596--605.

[47]

Pitch Patarasuk and Xin Yuan. 2007. Bandwidth Efficient All-reduce Operation on Tree Topologies. In 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Proceedings, 26--30 March 2007, Long Beach, California, USA. IEEE, 1--8. https://doi.org/10.1109/IPDPS.2007.370405

[48]

Pitch Patarasuk and Xin Yuan. 2009. Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distributed Comput., Vol. 69, 2 (2009), 117--124.

Digital Library

[49]

Tim C Schroeder. 2011. Peer-to-peer & unified virtual addressing. In GPU Technology Conference, NVIDIA.

[50]

Geet Sethi, Bilge Acun, Niket Agarwal, Christos Kozyrakis, Caroline Trippel, and Carole-Jean Wu. 2022. RecShard: statistical feature-based memory optimization for industry-scale neural recommendation. In ASPLOS. ACM, 344--358.

[51]

Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.

[52]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In CVPR. IEEE Computer Society, 2818--2826.

[53]

Muhammad Tahir, Rabia Enam, and Syed Muhammad Nabeel Mustafa. 2021. E-commerce platform based on Machine Learning Recommendation System. https://doi.org/10.1109/IMTIC53841.2021.9719822

[54]

Zhenheng Tang, Shaohuai Shi, Xiaowen Chu, Wei Wang, and Bo Li. 2020. Communication-Efficient Distributed Deep Learning: A Comprehensive Survey. CoRR, Vol. abs/2003.06307 (2020). showeprint[arXiv]2003.06307 https://arxiv.org/abs/2003.06307

[55]

Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. In ADKDD@KDD. ACM, 12:1--12:7.

[56]

Yidi Wu, Kaihao Ma, Xiao Yan, Zhi Liu, Zhenkun Cai, Yuzhen Huang, James Cheng, Han Yuan, and Fan Yu. 2022. Elastic Deep Learning in Multi-Tenant GPU Clusters. IEEE Trans. Parallel Distributed Syst., Vol. 33, 1 (2022), 144--158.

[57]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. CoRR, Vol. abs/1609.08144 (2016).

[58]

Minhui Xie, Youyou Lu, Jiazhen Lin, Qing Wang, Jian Gao, Kai Ren, and Jiwu Shu. 2022. Fleche: an efficient GPU embedding cache for personalized recommendations. In EuroSys. ACM, 402--416.

[59]

Sixin Zhang, Anna Choromanska, and Yann LeCun. 2015. Deep learning with Elastic Averaging SGD. In NIPS. 685--693.

[60]

Wei Zhang, Suyog Gupta, Xiangru Lian, and Ji Liu. 2016. Staleness-Aware Async-SGD for Distributed Deep Learning. In IJCAI. IJCAI/AAAI Press, 2350--2356.

[61]

Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In MLSys. mlsys.org.

[62]

Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed H. Chi. 2019. Recommending what video to watch next: a multitask ranking system. In RecSys. ACM, 43--51.

Digital Library

[63]

Kaifu Zheng, Lu Wang, Yu Li, Xusong Chen, Hu Liu, Jing Lu, Xiwei Zhao, Changping Peng, Zhangang Lin, and Jingping Shao. 2022. Implicit User Awareness Modeling via Candidate Items for CTR Prediction in Search Ads. In WWW. ACM, 246--255.

[64]

Qinqing Zheng, Bor-Yiing Su, Jiyan Yang, Alisson G. Azzolini, Qiang Wu, Ou Jin, Shri Karandikar, Hagay Lupesko, Liang Xiong, and Eric Zhou. 2020. ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training. CoRR, Vol. abs/2003.03477 (2020).

[65]

Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep Interest Evolution Network for Click-Through Rate Prediction. In AAAI. AAAI Press, 5941--5948.

Cited By

Markakis MZhang ZShahout RGao TLiu CSabek ICafarella M(2024)Press ECCS to Doubt (Your Causal Graph)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669842(6-15)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3665601.3669842
Liu SZheng NKang HSimmons XZhang JLanger MZhu WLee MWang Z(2024)Embedding Optimization for Training Large-scale Deep Learning Recommendation Systems with EMBarkProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688111(622-632)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688111
Sirin UIdreos S(2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639307
Show More Cited By

Index Terms

FEC: Efficient Deep Recommendation Model Training with Flexible Embedding Communication
1. Computing methodologies
  1. Machine learning
  2. Parallel computing methodologies

Recommendations

Bagpipe: Accelerating Deep Recommendation Model Training
SOSP '23: Proceedings of the 29th Symposium on Operating Systems Principles

Deep learning based recommendation models (DLRM) are widely used in several business critical applications. Training such recommendation models efficiently is challenging because they contain billions of embedding-based parameters, leading to ...
EmbRace: Accelerating Sparse Communication for Distributed Training of Deep Neural Networks
ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

Distributed data-parallel training has been widely adopted for deep neural network (DNN) models. Although current deep learning (DL) frameworks scale well for dense models like image classification models, we find that these DL frameworks have relatively ...
An efficient high payload ±1 data embedding scheme

Embedding of confidential data in the least significant bit of an image is still an attractive method of steganography. Utilizing the full capacity of cover images by embedding one bit of data per pixel, using methods such as LSB flipping or LSB ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 1, Issue 2

PACMMOD

June 2023

2310 pages

EISSN:2836-6573

DOI:10.1145/3605748

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2023

Published in PACMMOD Volume 1, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
378
Total Downloads

Downloads (Last 12 months)282
Downloads (Last 6 weeks)16

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Markakis MZhang ZShahout RGao TLiu CSabek ICafarella M(2024)Press ECCS to Doubt (Your Causal Graph)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669842(6-15)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3665601.3669842
Liu SZheng NKang HSimmons XZhang JLanger MZhu WLee MWang Z(2024)Embedding Optimization for Training Large-scale Deep Learning Recommendation Systems with EMBarkProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688111(622-632)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688111
Sirin UIdreos S(2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639307
Ma JBhowmick STay LChoi BBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)SIERRA: A Counterfactual Thinking-based Visual Interface for Property Graph Query ConstructionCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654729(440-443)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654729
Ma PJi ZYao PWang SRen KRoychoudhury APaiva AAbreu RStorey M(2024)Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence ReasoningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623348(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3623348
Sun AMa PYuan YWang SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Explain any conceptProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667076(21826-21840)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667076

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents