Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3580305.3599834acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Graph-Based Model-Agnostic Data Subsampling for Recommendation Systems

Published: 04 August 2023 Publication History

Abstract

Data subsampling is widely used to speed up the training of large-scale recommendation systems. Most subsampling methods are model-based and often require a pre-trained pilot model to measure data importance via e.g. sample hardness. However, when the pilot model is misspecified, model-based subsampling methods deteriorate. Since model misspecification is persistent in real recommendation systems, we instead propose model-agnostic data subsampling methods by only exploring input data structure represented by graphs. Specifically, we study the topology of the user-item graph to estimate the importance of each user-item interaction (an edge in the user-item graph) via graph conductance, followed by a propagation step on the network to smooth out the estimated importance value. Since our proposed method is model-agnostic, we can marry the merits of both model-agnostic and model-based subsampling methods. Empirically, we show that combing the two consistently improves over any single method on the used datasets. Experimental results on KuaiRec and MIND datasets demonstrate that our proposed methods achieve superior results compared to baseline approaches.

Supplementary Material

MP4 File (kdd_promo.mp4)
Model-agnostic data subsampling method for recommendation system.

References

[1]
Charles K Alexander, Matthew NO Sadiku, and Matthew Sadiku. 2007. Fundamentals of electric circuits. McGraw-Hill Higher Education Boston.
[2]
Chen Cai, Dingkang Wang, and Yusu Wang. 2021. Graph coarsening with neural networks. arXiv preprint arXiv:2102.01350 (2021).
[3]
Ashok K Chandra, Prabhakar Raghavan, Walter L Ruzzo, Roman Smolensky, and Prasoon Tiwari. 1996. The electrical resistance of a graph captures its commute and cover times. computational complexity, Vol. 6, 4 (1996), 312--340.
[4]
Nitesh V Chawla. 2009. Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook (2009), 875--886.
[5]
Ting Chen, Yizhou Sun, Yue Shi, and Liangjie Hong. 2017. On sampling strategies for neural network-based collaborative filtering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 767--776.
[6]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7--10.
[7]
Fan RK Chung. 1997. Spectral graph theory. Vol. 92. American Mathematical Soc.
[8]
Jingtao Ding, Yuhan Quan, Quanming Yao, Yong Li, and Depeng Jin. 2020. Simplify and robustify negative sampling for implicit collaborative filtering. Advances in Neural Information Processing Systems, Vol. 33 (2020), 1094--1105.
[9]
William Fithian and Trevor Hastie. 2014. Local case-control sampling: Efficient subsampling in imbalanced data sets. Annals of statistics, Vol. 42, 5 (2014), 1693.
[10]
Chongming Gao, Shijun Li, Wenqiang Lei, Jiawei Chen, Biao Li, Peng Jiang, Xiangnan He, Jiaxin Mao, and Tat-Seng Chua. 2022. KuaiRec: A Fully-observed Dataset and Insights for Evaluating Recommender Systems. arXiv preprint arXiv:2202.10842 (2022).
[11]
Arpita Ghosh, Stephen Boyd, and Amin Saberi. 2008. Minimizing effective resistance of a graph. SIAM review, Vol. 50, 1 (2008), 37--66.
[12]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).
[13]
Michael Hamann, Gerd Lindner, Henning Meyerhenke, Christian L Staudt, and Dorothea Wagner. 2016. Structure-preserving sparsification methods for social networks. Social Network Analysis and Mining, Vol. 6, 1 (2016), 1--22.
[14]
Lei Han, Kean Ming Tan, Ting Yang, and Tong Zhang. 2020. Local uncertainty sampling for large-scale multiclass logistic regression. The Annals of Statistics, Vol. 48, 3 (2020), 1770--1788.
[15]
Frank Harary and Robert Z Norman. 1960. Some properties of line digraphs. Rendiconti del circolo matematico di palermo, Vol. 9, 2 (1960), 161--168.
[16]
Xiangnan He and Tat-Seng Chua. 2017. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. 355--364.
[17]
Daniel G Horvitz and Donovan J Thompson. 1952. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, Vol. 47, 260 (1952), 663--685.
[18]
Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. 2020b. Embedding-based retrieval in facebook search. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2553--2561.
[19]
Qian Huang, Horace He, Abhay Singh, Ser-Nam Lim, and Austin R Benson. 2020a. Combining label propagation and simple models out-performs graph neural networks. arXiv preprint arXiv:2010.13993 (2020).
[20]
Junteng Jia and Austion R Benson. 2020. Residual correlation in graph neural network regression. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 588--598.
[21]
Can M Le. 2021. Edge Sampling Using Local Network Information. J. Mach. Learn. Res., Vol. 22 (2021), 88--1.
[22]
Bin Liu, Chenxu Zhu, Guilin Li, Weinan Zhang, Jincai Lai, Ruiming Tang, Xiuqiang He, Zhenguo Li, and Yong Yu. 2020. Autofis: Automatic feature interaction selection in factorization models for click-through rate prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2636--2645.
[23]
Jeremy Nixon, Michael W Dusenberry, Linchuan Zhang, Ghassen Jerfel, and Dustin Tran. 2019. Measuring Calibration in Deep Learning. In CVPR workshops, Vol. 2.
[24]
Arlind Nocaj, Mark Ortmann, and Ulrik Brandes. 2014. Untangling hairballs. In International Symposium on Graph Drawing. Springer, 101--112.
[25]
Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, and Stefanie Jegelka. 2020. Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592 (2020).
[26]
Venu Satuluri, Srinivasan Parthasarathy, and Yiye Ruan. 2011. Local graph sparsification for scalable clustering. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. 721--732.
[27]
David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. Advances in neural information processing systems, Vol. 28 (2015).
[28]
Mojtaba Shahin, Muhammad Ali Babar, and Liming Zhu. 2017. Continuous integration, delivery and deployment: a systematic review on approaches, tools, challenges and practices. IEEE Access, Vol. 5 (2017), 3909--3943.
[29]
Xinwei Shen, Kani Chen, and Wen Yu. 2021. Surprise sampling: Improving and extending the local case-control sampling. (2021).
[30]
Daniel A Spielman and Nikhil Srivastava. 2008. Graph sparsification by effective resistances. In Proceedings of the fortieth annual ACM symposium on Theory of computing. 563--568.
[31]
Christian L Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. 2016. NetworKit: A tool suite for large-scale complex network analysis. Network Science, Vol. 4, 4 (2016), 508--530.
[32]
Daniel Ting and Eric Brochu. 2018. Optimal subsampling with influence functions. Advances in neural information processing systems, Vol. 31 (2018).
[33]
Guihong Wan and Harsha Kokel. 2021. Graph sparsification via meta-learning. DLG@ AAAI (2021).
[34]
HaiYing Wang. 2020. Logistic regression for massive data with rare events. In International Conference on Machine Learning. PMLR, 9829--9836.
[35]
HaiYing Wang, Aonan Zhang, and Chong Wang. 2021. Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data. Advances in Neural Information Processing Systems, Vol. 34 (2021), 19847--19859.
[36]
Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, and Dell Zhang. 2017. Irgan: A minimax game for unifying generative and discriminative information retrieval models. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. 515--524.
[37]
Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, et al. 2020. Mind: A large-scale dataset for news recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3597--3606.
[38]
Ga Wu, Maksims Volkovs, Chee Loong Soon, Scott Sanner, and Himanshu Rai. 2019. Noise contrastive estimation for one-class collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 135--144.
[39]
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 974--983.
[40]
Weinan Zhang, Tianqi Chen, Jun Wang, and Yong Yu. 2013. Optimizing top-n collaborative filtering via dynamic negative item sampling. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 785--788.
[41]
Cheng Zheng, Bo Zong, Wei Cheng, Dongjin Song, Jingchao Ni, Wenchao Yu, Haifeng Chen, and Wei Wang. 2020. Robust graph representation learning via neural sparsification. In International Conference on Machine Learning. PMLR, 11458--11468.
[42]
Dengyong Zhou, Olivier Bousquet, Thomas Lal, Jason Weston, and Bernhard Schölkopf. 2003. Learning with local and global consistency. Advances in neural information processing systems, Vol. 16 (2003).

Cited By

View all
  • (2023)Matrix Completion of Adaptive Jumping Graph Neural Networks for Recommendation SystemsIEEE Access10.1109/ACCESS.2023.330594511(88433-88450)Online publication date: 2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2023
5996 pages
ISBN:9798400701030
DOI:10.1145/3580305
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data subsampling
  2. network analysis
  3. recommender systems

Qualifiers

  • Research-article

Conference

KDD '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)495
  • Downloads (Last 6 weeks)36
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Matrix Completion of Adaptive Jumping Graph Neural Networks for Recommendation SystemsIEEE Access10.1109/ACCESS.2023.330594511(88433-88450)Online publication date: 2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media

Access Granted

The conference sponsors are committed to making content openly accessible in a timely manner.
This article is provided by ACM and the conference, through the ACM OpenTOC service.