research-article

Gradient Matching for Categorical Data Distillation in CTR Prediction

Authors:

Rui ZhangAuthors Info & Claims

RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

Pages 161 - 170

https://doi.org/10.1145/3604915.3608769

Published: 14 September 2023 Publication History

Abstract

The cost of hardware and energy consumption on training a click-through rate (CTR) model is highly prohibitive. A recent promising direction for reducing such costs is data distillation with gradient matching, which aims to synthesize a small distilled dataset to guide the model to a similar parameter space as those trained on real data. However, there are two main challenges to implementing such a method in the recommendation field: (1) The categorical recommended data are high dimensional and sparse one- or multi-hot data which will block the gradient flow, causing backpropagation-based data distillation invalid. (2) The data distillation process with gradient matching is computationally expensive due to the bi-level optimization. To this end, we investigate efficient data distillation tailored for recommendation data with plenty of side information where we formulate the discrete data to the dense and continuous data format. Then, we further introduce a one-step gradient matching scheme, which performs gradient matching for only a single step to overcome the inefficient training process. The overall proposed method is called Categorical data distillation with Gradient Matching (CGM), which is capable of distilling a large dataset into a small of informative synthetic data for training CTR models from scratch. Experimental results show that our proposed method not only outperforms the state-of-the-art coreset selection and data distillation methods but also has remarkable cross-architecture performance. Moreover, we explore the application of CGM on model retraining and mitigate the effect of different random seeds on the training results.

References

[1]

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. 2022. Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4750–4759.

[2]

Suming J Chen, Zhen Qin, Zac Wilson, Brian Calaci, Michael Rose, Ryan Evans, Sean Abraham, Donald Metzler, Sandeep Tata, and Michael Colagrosso. 2020. Improving recommendation quality in google drive. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2900–2908.

Digital Library

[3]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.

Digital Library

[4]

Jang Hyun Cho and Bharath Hariharan. 2019. On the efficacy of knowledge distillation. In Proceedings of the IEEE/CVF international conference on computer vision. 4794–4802.

[5]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.

Digital Library

[6]

James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, 2010. The YouTube video recommendation system. In Proceedings of the fourth ACM conference on Recommender systems. 293–296.

Digital Library

[7]

Tian Dong, Bo Zhao, and Lingjuan Lyu. 2022. Privacy for free: How does dataset condensation help privacy?. In International Conference on Machine Learning. PMLR, 5378–5396.

[8]

Dan Feldman, Melanie Schmidt, and Christian Sohler. 2020. Turning big data into tiny data: Constant-size coresets for k-means, PCA, and projective clustering. SIAM J. Comput. 49, 3 (2020), 601–657.

Digital Library

[9]

Chengcheng Guo, Bo Zhao, and Yanbing Bai. 2022. Deepcore: A comprehensive library for coreset selection in deep learning. In International Conference on Database and Expert Systems Applications. Springer, 181–195.

Digital Library

[10]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 1725–1731.

[11]

Zhexue Huang. 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery 2, 3 (1998), 283–304.

[12]

Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical Reparameterization with Gumbel-Softmax. In International Conference on Learning Representations.

[13]

Wei Jin, Xianfeng Tang, Haoming Jiang, Zheng Li, Danqing Zhang, Jiliang Tang, and Bing Yin. 2022. Condensing graphs via one-step gradient matching. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 720–730.

Digital Library

[14]

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.

Digital Library

[15]

Saehyung Lee, Sanghyuk Chun, Sangwon Jung, Sangdoo Yun, and Sungroh Yoon. 2022. Dataset condensation with contrastive signals. In International Conference on Machine Learning. PMLR, 12352–12364.

[16]

Zeyu Li, Wei Cheng, Yang Chen, Haifeng Chen, and Wei Wang. 2020. Interpretable click-through rate prediction through hierarchical attention. In Proceedings of the 13th International Conference on Web Search and Data Mining. 313–321.

Digital Library

[17]

Yang Liu, Cheng Lyu, Zhiyuan Liu, and Dacheng Tao. 2019. Building effective short video recommendation. In 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 651–656.

[18]

Andrew L Maas, Awni Y Hannun, Andrew Y Ng, 2013. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, Vol. 30. Citeseer, 3.

[19]

Rui Maia and Joao C Ferreira. 2018. Context-aware food recommendation system. Context-aware food recommendation system (2018), 349–356.

[20]

Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. 2020. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning. PMLR, 6950–6960.

[21]

Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite. 2021. Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems 34 (2021), 20596–20607.

[22]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. 2017. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2001–2010.

[23]

Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995–1000.

Digital Library

[24]

Noveen Sachdeva, Mehak Dhaliwal, Carole-Jean Wu, and Julian McAuley. 2022. Infinite recommendation networks: A data-centric approach. Advances in Neural Information Processing Systems 35 (2022), 31292–31305.

[25]

Noveen Sachdeva, Carole-Jean Wu, and Julian McAuley. 2022. On Sampling Collaborative Filtering Datasets. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 842–850.

Digital Library

[26]

Ozan Sener and Silvio Savarese. 2018. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In International Conference on Learning Representations.

[27]

Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, and Ari Morcos. 2022. Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural Information Processing Systems 35 (2022), 19523–19536.

[28]

Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J Gordon. 2018. An Empirical Study of Example Forgetting during Deep Neural Network Learning. In International Conference on Learning Representations.

[29]

Qinyong Wang, Hongzhi Yin, Hao Wang, Quoc Viet Hung Nguyen, Zi Huang, and Lizhen Cui. 2019. Enhancing collaborative filtering with generative augmentation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 548–556.

Digital Library

[30]

Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17. 1–7.

Digital Library

[31]

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. 2018. Dataset distillation. arXiv preprint arXiv:1811.10959 (2018).

[32]

Yaozheng Wang, Dawei Feng, Dongsheng Li, Xinyuan Chen, Yunxiang Zhao, and Xin Niu. 2016. A mobile recommendation system based on logistic regression and gradient boosting decision trees. In 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 1896–1902.

[33]

Max Welling. 2009. Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning. 1121–1128.

Digital Library

[34]

Ji Yang, Xinyang Yi, Derek Zhiyuan Cheng, Lichan Hong, Yang Li, Simon Xiaoming Wang, Taibai Xu, and Ed H Chi. 2020. Mixed negative sampling for learning two-tower neural networks in recommendations. In Companion Proceedings of the Web Conference 2020. 441–447.

Digital Library

[35]

Jianyu Zhang and Françoise Fogelman-Soulié. 2018. KKbox’s music recommendation challenge solution with feature engineering. In 11th ACM International Conference on Web Search and Data Mining WSDM.

[36]

Yang Zhang, Fuli Feng, Chenxu Wang, Xiangnan He, Meng Wang, Yan Li, and Yongdong Zhang. 2020. How to retrain recommender system? A sequential meta-learning method. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1479–1488.

[37]

Bo Zhao and Hakan Bilen. 2021. Dataset condensation with differentiable siamese augmentation. In International Conference on Machine Learning. PMLR, 12674–12685.

[38]

Bo Zhao and Hakan Bilen. 2023. Dataset condensation with distribution matching. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 6514–6523.

[39]

Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2021. Dataset Condensation with Gradient Matching.ICLR 1, 2 (2021), 3.

[40]

Weijie Zhao, Jingyuan Zhang, Deping Xie, Yulei Qian, Ronglai Jia, and Ping Li. 2019. Aibox: Ctr prediction model training on a single node. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 319–328.

Digital Library

[41]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1059–1068.

Digital Library

[42]

Jieming Zhu, Jinyang Liu, Shuai Yang, Qi Zhang, and Xiuqiang He. 2021. Open Benchmarking for Click-Through Rate Prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2759–2769.

Digital Library

[43]

Ligeng Zhu, Zhijian Liu, and Song Han. 2019. Deep leakage from gradients. Advances in neural information processing systems 32 (2019).

Cited By

Wu JFan WChen JLiu SLiu QHe RLi QTang K(2025)Condensing Pre-Augmented Recommendation Data via Lightweight Policy Gradient EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348424937:1(162-173)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3484249
Zhang HLi SLin FWang WQian ZGe SLarson K(2024)DANCEProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/186(1679-1687)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/186
Yin MWang HGuo WLiu YZhang SZhao SLian DChen EBaeza-Yates RBonchi F(2024)Dataset Regeneration for Sequential RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671841(3954-3965)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671841
Show More Cited By

Index Terms

Gradient Matching for Categorical Data Distillation in CTR Prediction
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
  2. Information systems applications
    1. Enterprise information systems
      1. Data centers

Recommendations

Importance-aware adaptive dataset distillation
Abstract
Herein, we propose a novel dataset distillation method for constructing small informative datasets that preserve the information of the large original datasets. The development of deep learning models is enabled by the availability of large-scale ...
Distill Gold from Massive Ores: Bi-level Data Pruning Towards Efficient Dataset Distillation
Computer Vision – ECCV 2024
Abstract
Data-efficient learning has garnered significant attention, especially given the current trend of large multi-modal models. Recently, dataset distillation has become an effective approach by synthesizing data samples that are essential for network ...
Diversified Semantic Distribution Matching for Dataset Distillation
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Dataset distillation, also known as dataset condensation, offers a possibility for compressing a large-scale dataset into a small-scale one (i.e., distilled dataset) while achieving similar performance during model training. This method effectively ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

September 2023

1406 pages

ISBN:9798400702419

DOI:10.1145/3604915

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China
National Natural Science Foundation of China
Science and Technology Support Program of Hubei Province
National Natural Science Foundation of China

Conference

RecSys '23

Sponsor:

RecSys '23: Seventeenth ACM Conference on Recommender Systems

September 18 - 22, 2023

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
594
Total Downloads

Downloads (Last 12 months)287
Downloads (Last 6 weeks)21

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu JFan WChen JLiu SLiu QHe RLi QTang K(2025)Condensing Pre-Augmented Recommendation Data via Lightweight Policy Gradient EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348424937:1(162-173)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3484249
Zhang HLi SLin FWang WQian ZGe SLarson K(2024)DANCEProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/186(1679-1687)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/186
Yin MWang HGuo WLiu YZhang SZhao SLian DChen EBaeza-Yates RBonchi F(2024)Dataset Regeneration for Sequential RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671841(3954-3965)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671841
Lu YGu JChen XVahidian SXuan Q(2024)Exploring the Impact of Dataset Bias on Dataset Distillation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00761(7656-7663)Online publication date: 17-Jun-2024
https://doi.org/10.1109/CVPRW63382.2024.00761

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten