research-article

Open access

Graph-Based Model-Agnostic Data Subsampling for Recommendation Systems

Authors:

Aonan ZhangAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3865 - 3876

https://doi.org/10.1145/3580305.3599834

Published: 04 August 2023 Publication History

Abstract

Data subsampling is widely used to speed up the training of large-scale recommendation systems. Most subsampling methods are model-based and often require a pre-trained pilot model to measure data importance via e.g. sample hardness. However, when the pilot model is misspecified, model-based subsampling methods deteriorate. Since model misspecification is persistent in real recommendation systems, we instead propose model-agnostic data subsampling methods by only exploring input data structure represented by graphs. Specifically, we study the topology of the user-item graph to estimate the importance of each user-item interaction (an edge in the user-item graph) via graph conductance, followed by a propagation step on the network to smooth out the estimated importance value. Since our proposed method is model-agnostic, we can marry the merits of both model-agnostic and model-based subsampling methods. Empirically, we show that combing the two consistently improves over any single method on the used datasets. Experimental results on KuaiRec and MIND datasets demonstrate that our proposed methods achieve superior results compared to baseline approaches.

Supplementary Material

MP4 File (kdd_promo.mp4)

Model-agnostic data subsampling method for recommendation system.

Download
3.79 MB

References

[1]

Charles K Alexander, Matthew NO Sadiku, and Matthew Sadiku. 2007. Fundamentals of electric circuits. McGraw-Hill Higher Education Boston.

[2]

Chen Cai, Dingkang Wang, and Yusu Wang. 2021. Graph coarsening with neural networks. arXiv preprint arXiv:2102.01350 (2021).

[3]

Ashok K Chandra, Prabhakar Raghavan, Walter L Ruzzo, Roman Smolensky, and Prasoon Tiwari. 1996. The electrical resistance of a graph captures its commute and cover times. computational complexity, Vol. 6, 4 (1996), 312--340.

[4]

Nitesh V Chawla. 2009. Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook (2009), 875--886.

[5]

Ting Chen, Yizhou Sun, Yue Shi, and Liangjie Hong. 2017. On sampling strategies for neural network-based collaborative filtering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 767--776.

Digital Library

[6]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7--10.

Digital Library

[7]

Fan RK Chung. 1997. Spectral graph theory. Vol. 92. American Mathematical Soc.

Digital Library

[8]

Jingtao Ding, Yuhan Quan, Quanming Yao, Yong Li, and Depeng Jin. 2020. Simplify and robustify negative sampling for implicit collaborative filtering. Advances in Neural Information Processing Systems, Vol. 33 (2020), 1094--1105.

[9]

William Fithian and Trevor Hastie. 2014. Local case-control sampling: Efficient subsampling in imbalanced data sets. Annals of statistics, Vol. 42, 5 (2014), 1693.

[10]

Chongming Gao, Shijun Li, Wenqiang Lei, Jiawei Chen, Biao Li, Peng Jiang, Xiangnan He, Jiaxin Mao, and Tat-Seng Chua. 2022. KuaiRec: A Fully-observed Dataset and Insights for Evaluating Recommender Systems. arXiv preprint arXiv:2202.10842 (2022).

[11]

Arpita Ghosh, Stephen Boyd, and Amin Saberi. 2008. Minimizing effective resistance of a graph. SIAM review, Vol. 50, 1 (2008), 37--66.

[12]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).

[13]

Michael Hamann, Gerd Lindner, Henning Meyerhenke, Christian L Staudt, and Dorothea Wagner. 2016. Structure-preserving sparsification methods for social networks. Social Network Analysis and Mining, Vol. 6, 1 (2016), 1--22.

[14]

Lei Han, Kean Ming Tan, Ting Yang, and Tong Zhang. 2020. Local uncertainty sampling for large-scale multiclass logistic regression. The Annals of Statistics, Vol. 48, 3 (2020), 1770--1788.

[15]

Frank Harary and Robert Z Norman. 1960. Some properties of line digraphs. Rendiconti del circolo matematico di palermo, Vol. 9, 2 (1960), 161--168.

[16]

Xiangnan He and Tat-Seng Chua. 2017. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. 355--364.

Digital Library

[17]

Daniel G Horvitz and Donovan J Thompson. 1952. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, Vol. 47, 260 (1952), 663--685.

[18]

Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. 2020b. Embedding-based retrieval in facebook search. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2553--2561.

Digital Library

[19]

Qian Huang, Horace He, Abhay Singh, Ser-Nam Lim, and Austin R Benson. 2020a. Combining label propagation and simple models out-performs graph neural networks. arXiv preprint arXiv:2010.13993 (2020).

[20]

Junteng Jia and Austion R Benson. 2020. Residual correlation in graph neural network regression. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 588--598.

Digital Library

[21]

Can M Le. 2021. Edge Sampling Using Local Network Information. J. Mach. Learn. Res., Vol. 22 (2021), 88--1.

[22]

Bin Liu, Chenxu Zhu, Guilin Li, Weinan Zhang, Jincai Lai, Ruiming Tang, Xiuqiang He, Zhenguo Li, and Yong Yu. 2020. Autofis: Automatic feature interaction selection in factorization models for click-through rate prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2636--2645.

Digital Library

[23]

Jeremy Nixon, Michael W Dusenberry, Linchuan Zhang, Ghassen Jerfel, and Dustin Tran. 2019. Measuring Calibration in Deep Learning. In CVPR workshops, Vol. 2.

[24]

Arlind Nocaj, Mark Ortmann, and Ulrik Brandes. 2014. Untangling hairballs. In International Symposium on Graph Drawing. Springer, 101--112.

Digital Library

[25]

Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, and Stefanie Jegelka. 2020. Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592 (2020).

[26]

Venu Satuluri, Srinivasan Parthasarathy, and Yiye Ruan. 2011. Local graph sparsification for scalable clustering. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. 721--732.

Digital Library

[27]

David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. Advances in neural information processing systems, Vol. 28 (2015).

[28]

Mojtaba Shahin, Muhammad Ali Babar, and Liming Zhu. 2017. Continuous integration, delivery and deployment: a systematic review on approaches, tools, challenges and practices. IEEE Access, Vol. 5 (2017), 3909--3943.

[29]

Xinwei Shen, Kani Chen, and Wen Yu. 2021. Surprise sampling: Improving and extending the local case-control sampling. (2021).

[30]

Daniel A Spielman and Nikhil Srivastava. 2008. Graph sparsification by effective resistances. In Proceedings of the fortieth annual ACM symposium on Theory of computing. 563--568.

Digital Library

[31]

Christian L Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. 2016. NetworKit: A tool suite for large-scale complex network analysis. Network Science, Vol. 4, 4 (2016), 508--530.

[32]

Daniel Ting and Eric Brochu. 2018. Optimal subsampling with influence functions. Advances in neural information processing systems, Vol. 31 (2018).

[33]

Guihong Wan and Harsha Kokel. 2021. Graph sparsification via meta-learning. DLG@ AAAI (2021).

[34]

HaiYing Wang. 2020. Logistic regression for massive data with rare events. In International Conference on Machine Learning. PMLR, 9829--9836.

[35]

HaiYing Wang, Aonan Zhang, and Chong Wang. 2021. Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data. Advances in Neural Information Processing Systems, Vol. 34 (2021), 19847--19859.

[36]

Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, and Dell Zhang. 2017. Irgan: A minimax game for unifying generative and discriminative information retrieval models. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. 515--524.

Digital Library

[37]

Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, et al. 2020. Mind: A large-scale dataset for news recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3597--3606.

[38]

Ga Wu, Maksims Volkovs, Chee Loong Soon, Scott Sanner, and Himanshu Rai. 2019. Noise contrastive estimation for one-class collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 135--144.

Digital Library

[39]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 974--983.

Digital Library

[40]

Weinan Zhang, Tianqi Chen, Jun Wang, and Yong Yu. 2013. Optimizing top-n collaborative filtering via dynamic negative item sampling. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 785--788.

Digital Library

[41]

Cheng Zheng, Bo Zong, Wei Cheng, Dongjin Song, Jingchao Ni, Wenchao Yu, Haifeng Chen, and Wei Wang. 2020. Robust graph representation learning via neural sparsification. In International Conference on Machine Learning. PMLR, 11458--11468.

[42]

Dengyong Zhou, Olivier Bousquet, Thomas Lal, Jason Weston, and Bernhard Schölkopf. 2003. Learning with local and global consistency. Advances in neural information processing systems, Vol. 16 (2003).

Cited By

Zhu XFu JChen C(2023)Matrix Completion of Adaptive Jumping Graph Neural Networks for Recommendation SystemsIEEE Access10.1109/ACCESS.2023.330594511(88433-88450)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3305945

Index Terms

Graph-Based Model-Agnostic Data Subsampling for Recommendation Systems
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Learning to rank
2. Information systems
  1. Data management systems
    1. Information integration
      1. Extraction, transformation and loading
  2. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

New Recommendation Techniques for Multicriteria Rating Systems

Traditional single-rating recommender systems have been successful in a number of personalization applications, but the research area of multicriteria recommender systems has been largely untouched. Taking full advantage of multicriteria ratings in ...
Item recommendation in collaborative tagging systems via heuristic data fusion

Collaborative tagging systems have been popular on the Web. However, information overload results in the increasing need for recommender services from users, and thus item recommendation has been one of the key issues in such systems. In this paper, we ...
An efficient hybrid recommendation model based on collaborative filtering recommender systems
Abstract
In recent years, collaborative filtering (CF) techniques have become one of the most popularly used techniques for providing personalized services to users. CF techniques collect users’ previous information about items such as books, music, movies,...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2023

5996 pages

ISBN:9798400701030

DOI:10.1145/3580305

General Chairs:
Ambuj Singh
UC Santa Barbara, USA
,
Yizhou Sun
UC Los Angeles, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Dimitrios Gunopulos
University of Athens, Greece
,
Xifeng Yan
UC Santa Barbara, USA
,
Ravi Kumar
Google, USA
,
Fatma Ozcan
Google, USA
,
Jieping Ye
Alibaba DAMO Academy

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '23

Sponsor:

KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 6 - 10, 2023

CA, Long Beach, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '24

Sponsor:
sigkdd
sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
495
Total Downloads

Downloads (Last 12 months)495
Downloads (Last 6 weeks)36

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhu XFu JChen C(2023)Matrix Completion of Adaptive Jumping Graph Neural Networks for Recommendation SystemsIEEE Access10.1109/ACCESS.2023.330594511(88433-88450)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3305945

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents