Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3397271.3401113acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Distributed Equivalent Substitution Training for Large-Scale Recommender Systems

Published: 25 July 2020 Publication History
  • Get Citation Alerts
  • Abstract

    We present Distributed Equivalent Substitution (DES) training, a novel distributed training framework for large-scale recommender systems with dynamic sparse features. DES introduces fully synchronous training to large-scale recommendation system for the first time by reducing communication, thus making the training of commercial recommender systems converge faster and reach better CTR. DES requires much less communication by substituting the weights-rich operators with the computationally equivalent sub-operators and aggregating partial results instead of transmitting the huge sparse weights directly through the network. Due to the use of synchronous training on large-scale Deep Learning Recommendation Models (DLRMs), DES achieves higher AUC(Area Under ROC). We successfully apply DES training on multiple popular DLRMs of industrial scenarios. Experiments show that our implementation outperforms the state-of-the-art PS-based training framework, achieving up to 68.7% communication savings and higher throughput compared to other PS-based recommender systems.

    Supplementary Material

    MP4 File (3397271.3401113.mp4)
    We present Distributed Equivalent Substitution (DES) training, a novel distributed training framework for large-scale recommender systems with dynamic sparse features. DES introduces fully synchronous training to large-scale recommendation system for the first time by reducing communication, thus making the training of commercial recommender systems converge faster and reach better CTR . DES requires much less communication by substituting the weights-rich operators with the computationally equivalent sub- operators and aggregating partial results instead of transmitting the huge sparse weights directly through the network. Due to the use of synchronous training on large-scale DLRMs, DES achieves higher AUC. We successfully apply DES training on multiple popular DLRMs of industrial scenarios. Experiments show that our implementation outperforms the state-of-the-art PS-based training framework, achieving up to 68.7% communication savings and higher throughput compared to other PS-based recommender systems.

    References

    [1]
    Shiyu Chang, Yang Zhang, Jiliang Tang, Dawei Yin, Yi Chang, Mark A. Hasegawa- Johnson, and Thomas S. Huang. 2017. Streaming Recommender Systems. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 381--389. https://doi.org/10.1145/3038912.3052627
    [2]
    Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Józefowicz. 2016. Revisiting Distributed Synchronous SGD. CoRR abs/1604.00981 (2016). arXiv:1604.00981 http://arxiv.org/abs/1604.00981
    [3]
    Wenqiang Chen, Lizhang Zhan, Yuanlong Ci, and Chen Lin. 2019. FLEN: Leveraging Field for Scalable CTR Prediction. arXiv:1911.04690 [cs.IR]
    [4]
    Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (Boston, MA, USA) (DLRS 2016). ACM, New York, NY, USA, 7--10. https://doi.org/10.1145/2988450.2988454
    [5]
    Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (Boston, Massachusetts, USA) (RecSys '16). ACM, New York, NY, USA, 191--198. https://doi.org/10.1145/2959100.2959190
    [6]
    Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large Scale Distributed Deep Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (Lake Tahoe, Nevada) (NIPS'12). Curran Associates Inc., USA, 1223--1231. http://dl.acm.org/citation.cfm?id=2999134.2999271
    [7]
    John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 12 (July 2011), 2121--2159. http://dl.acm.org/citation.cfm?id=1953048.2021068
    [8]
    Sanghamitra Dutta, Gauri Joshi, Soumyadip Ghosh, Parijat Dube, and Priya Nagpurkar. 2018. Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD. arXiv:1803.01113 [stat.ML]
    [9]
    Amir Gholami, Ariful Azad, Peter Jin, Kurt Keutzer, and Aydin Buluç. 2018. Integrated Model, Batch, and Domain Parallelism in Training Neural Networks. In SPAA'18: 30th ACM Symposium on Parallelism in Algorithms and Architectures. http://eecs.berkeley.edu/~aydin/integrateddnn_spaa2018.pdf
    [10]
    Andrew Gibiansky. 2017. Bringing HPC techniques to deep learning. http:// research.baidu.com/bringing-hpc-techniques-deep-learning
    [11]
    Priya Goyal, Piotr Dollár, Ross B. Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, Large Minibatch SGD: Training Image Net in 1 Hour. CoRR abs/1706.02677 (2017). arXiv:1706.02677 http://arxiv.org/abs/1706.02677
    [12]
    Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. Deep FM: A Factorization-machine Based Neural Network for CTR Prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (Melbourne, Australia) (IJCAI'17). AAAI Press, 1725--1731. http://dl.acm.org/citation.cfm?id=3172077.3172127
    [13]
    Yanping Huang, Yonglong Cheng, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, and Zhifeng Chen. 2018. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. CoRR abs/1811.06965 (2018). arXiv:1811.06965 http://arxiv.org/abs/1811.06965
    [14]
    Xianyan Jia, Shutao Song, Wei He, Yangzihao Wang, Haidong Rong, Feihu Zhou, Liqiang Xie, Zhenyu Guo, Yuanzhou Yang, Liwei Yu, Tiegang Chen, Guangxiao Hu, Shaohuai Shi, and Xiaowen Chu. 2018. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes. CoRR abs/1807.11205, 1807.11205v1 (July 2018). arXiv:1807.11205v1 [cs.DC]
    [15]
    Zhihao Jia, Sina Lin, Charles R. Qi, and Alex Aiken. 2018. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. CoRR abs/1802.04924 (2018). arXiv:1802.04924 http://arxiv.org/abs/1802.04924
    [16]
    Diederik P. Kingma and Jimmy Ba. 2014. Adam:A Method for Stochastic Optimization. arXiv e-prints, Article arXiv:1412.6980 (Dec 2014), arXiv:1412.6980 pages. arXiv:1412.6980 [cs.LG]
    [17]
    Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. CoRR abs/1404.5997 (2014). arXiv:1404.5997 http://arxiv.org/abs/1404. 5997
    [18]
    Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (Broomfield, CO) (OSDI'14). USENIX Association, Berkeley, CA, USA, 583--598. http://dl.acm.org/citation.cfm?id=2685048.2685095
    [19]
    Mu Li, Ziqi Liu, Alexander J. Smola, and Yu-Xiang Wang. 2016. DiFacto: Distributed Factorization Machines. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (San Francisco, California, USA) (WSDM '16). ACM, New York, NY, USA, 377--386. https://doi.org/10.1145/2835776.2835781
    [20]
    Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xDeepFM. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Jul 2018). https://doi.org/10.1145/3219819.3220023
    [21]
    H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. 2013. Ad Click Prediction: A View from the Trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Chicago, Illinois, USA) (KDD '13). ACM, New York, NY, USA, 1222--1230. https://doi.org/10.1145/2487575.2488200
    [22]
    Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. 2017. Device Placement Optimization with Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML'17). JMLR.org, 2430--2439. http://dl.acm.org/citation.cfm?id=3305890.3305932
    [23]
    Aäron van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep Content-based Music Recommendation. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (Lake Tahoe, Nevada) (NIPS'13). Curran Associates Inc., USA, 2643--2651. http://dl.acm.org/ citation.cfm?id=2999792.2999907
    [24]
    Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang. 2016. Product-based Neural Networks for User Response Prediction. arXiv:1611.00144 [cs.LG]
    [25]
    Steffen Rendle. 2010. Factorization Machines. In Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM '10). IEEE Computer Society, Washington, DC, USA, 995--1000. https://doi.org/10.1109/ICDM.2010.127
    [26]
    Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting Clicks: Estimating the Click-through Rate for New Ads. In Proceedings of the 16th International Conference on World Wide Web (Banff, Alberta, Canada) (WWW'07). ACM, New York, NY, USA, 521--530. https://doi.org/10.1145/1242572.1242643
    [27]
    Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, and Blake Hechtman. 2018. Mesh-TensorFlow: Deep Learning for Supercomputers. In Proceedings of the 32Nd International Conference on Neural Information Processing Systems (Montral, Canada) (NIPS'18). Curran Associates Inc., USA, 10435--10444. http://dl.acm.org/citation.cfm?id=3327546.3327703
    [28]
    Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. AutoInt. Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Nov 2019). https://doi.org/10.1145/3357384.3357925
    [29]
    RuoxiWang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. CoRR abs/1708.05123 (2017). arXiv:1708.05123 http://arxiv.org/abs/1708.05123
    [30]
    Yang You, Jing Li, Jonathan Hseu, Xiaodan Song, James Demmel, and Cho-Jui Hsieh. 2019. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes. CoRR abs/1904.00962 (2019). arXiv:1904.00962 http://arxiv.org/abs/1904.00962
    [31]
    Weinan Zhang, Tianming Du, and Jun Wang. 2016. Deep Learning over Multifield Categorical Data: A Case Study on User Response Prediction. ArXiv abs/1601.02376 (2016).
    [32]
    Guorui Zhou, Na Mou, Ying Fan, Qi Pi,Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2018. Deep Interest Evolution Network for Click-Through Rate Prediction. arXiv:1809.03672 [stat.ML]
    [33]
    Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click- Through Rate Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD '18). ACM, NewYork, NY, USA, 1059--1068. https://doi.org/10.1145/3219819.3219823

    Cited By

    View all
    • (2024)Horizontal Federated Recommender System: A SurveyACM Computing Surveys10.1145/365616556:9(1-42)Online publication date: 3-Apr-2024
    • (2023)OpenEmbedding: A Distributed Parameter Server for Deep Learning Recommendation Models using Persistent Memory2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00228(2976-2987)Online publication date: Apr-2023
    • (2023)Improving Accuracy of Recommendation Systems with Deep Learning ModelsAdvances in Data-Driven Computing and Intelligent Systems10.1007/978-981-99-3250-4_60(795-806)Online publication date: 4-Aug-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2020
    2548 pages
    ISBN:9781450380164
    DOI:10.1145/3397271
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dynamic sparse features
    2. ranking systems
    3. recommender systems
    4. synchronous training

    Qualifiers

    • Research-article

    Conference

    SIGIR '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 09 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Horizontal Federated Recommender System: A SurveyACM Computing Surveys10.1145/365616556:9(1-42)Online publication date: 3-Apr-2024
    • (2023)OpenEmbedding: A Distributed Parameter Server for Deep Learning Recommendation Models using Persistent Memory2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00228(2976-2987)Online publication date: Apr-2023
    • (2023)Improving Accuracy of Recommendation Systems with Deep Learning ModelsAdvances in Data-Driven Computing and Intelligent Systems10.1007/978-981-99-3250-4_60(795-806)Online publication date: 4-Aug-2023
    • (2022)Field-aware Variational Autoencoders for Billion-scale User Representation Learning2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00321(3413-3425)Online publication date: May-2022
    • (2021)Training Recommender Systems at Scale: Communication-Efficient Model and Data ParallelismProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467080(2928-2936)Online publication date: 14-Aug-2021
    • (2021)ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding TableProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462976(1269-1278)Online publication date: 11-Jul-2021
    • (2021)BiPS: Hotness-aware Bi-tier Parameter Synchronization for Recommendation Models2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00069(609-618)Online publication date: May-2021
    • (2021)Deep learning‐driven distributed communication systems for cluster online educational platform considering human–computer interactionInternational Journal of Communication Systems10.1002/dac.500935:1Online publication date: 26-Oct-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media