Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3534678.3539429acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Condensing Graphs via One-Step Gradient Matching

Published: 14 August 2022 Publication History

Abstract

As training deep learning models on large dataset takes a lot of time and resources, it is desired to construct a small synthetic dataset with which we can train deep learning models sufficiently. There are recent works that have explored solutions on condensing image datasets through complex bi-level optimization. For instance, dataset condensation (DC) matches network gradients w.r.t. large-real data and small-synthetic data, where the network weights are optimized for multiple steps at each outer iteration. However, existing approaches have their inherent limitations: (1) they are not directly applicable to graphs where the data is discrete; and (2) the condensation process is computationally expensive due to the involved nested optimization. To bridge the gap, we investigate efficient dataset condensation tailored for graph datasets where we model the discrete graph structure as a probabilistic model. We further propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. Extensive experiments on various graph datasets demonstrate the effectiveness and efficiency of the proposed method. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance and our method is significantly faster than multi-step gradient matching (e.g. $15$× in CIFAR10 for synthesizing 500 graphs).

References

[1]
Abubakar Abid, Muhammad Fatih Balin, and James Zou. 2019. Concrete autoencoders for differentiable feature selection and reconstruction. arXiv preprint arXiv:1901.09346 (2019).
[2]
Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. 2018. Relational inductive biases, deep learning, and graph networks. ArXiv preprint (2018).
[3]
Francisco M Castro, Manuel J Marín-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. 2018. End-to-end incremental learning. In ECCV.
[4]
Kaize Ding, Zhe Xu, Hanghang Tong, and Huan Liu. 2022. Data augmentation for deep graph learning: A survey. arXiv preprint arXiv:2202.08235 (2022).
[5]
Tian Dong, Bo Zhao, and Lingjuan Lyu. 2022. Privacy for Free: How does Dataset Condensation Help Privacy?. In ICML.
[6]
David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gó mez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P. Adams. 2015. Convolutional Networks on Graphs for Learning Molecular Fingerprints. In NeurIPS.
[7]
Vijay Prakash Dwivedi, Chaitanya K Joshi, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. 2020. Benchmarking Graph Neural Networks. arXiv preprint arXiv:2003.00982 (2020).
[8]
Wenqi Fan, Yao Ma, Qing Li, Yuan He, Yihong Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph Neural Networks for Social Recommendation. In WWW.
[9]
Reza Zanjirani Farahani and Masoud Hekmatfar. 2009. Facility location: concepts, models, algorithms and case studies.
[10]
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural Message Passing for Quantum Chemistry. In ICML.
[11]
Zhichun Guo, Chuxu Zhang, Wenhao Yu, John Herr, Olaf Wiest, Meng Jiang, and Nitesh V Chawla. 2021. Few-shot graph learning for molecular property prediction. In Proceedings of the Web Conference 2021. 2559--2567.
[12]
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open Graph Benchmark: Datasets for Machine Learning on Graphs. In NeurIPS.
[13]
Wei Jin, Xiaorui Liu, Yao Ma, Charu Aggarwal, and Jiliang Tang. 2022 a. Feature Overcorrelation in Deep Graph Neural Networks: A New Perspective. In KDD.
[14]
Wei Jin, Yao Ma, Xiaorui Liu, Xianfeng Tang, Suhang Wang, and Jiliang Tang. 2020. Graph Structure Learning for Robust Graph Neural Networks. In KDD.
[15]
Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah. 2022 b. Graph Condensation for Graph Neural Networks. In ICLR 2022.
[16]
Krishnateja Killamsetty, Durga S, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. 2021. GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training. In ICML. PMLR.
[17]
Jang-Hyun Kim, Jinuk Kim, Seong Joon Oh, Sangdoo Yun, Hwanjun Song, Joonhyun Jeong, Jung-Woo Ha, and Hyun Oh Song. 2022. Dataset Condensation via Efficient Synthetic-Data Parameterization. arXiv:2205.14959 (2022).
[18]
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.
[19]
Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. 2019. Predict then Propagate: Graph Neural Networks meet Personalized PageRank. In ICLR 2019.
[20]
Saehyung Lee, Sanghyuk Chun, Sangwon Jung, Sangdoo Yun, and Sungroh Yoon. 2022. Dataset Condensation with Contrastive Signals. In ICML.
[21]
Guohao Li, Matthias Mü ller, Ali K. Thabet, and Bernard Ghanem. 2019. DeepGCNs: Can GCNs Go As Deep As CNNs?. In ICCV.
[22]
Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 12 (2017), 2935--2947.
[23]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. In ICLR.
[24]
Meng Liu, Hongyang Gao, and Shuiwang Ji. 2020. Towards deeper graph neural networks. In KDD.
[25]
Meng Liu, Youzhi Luo, Kanji Uchino, Koji Maruhashi, and Shuiwang Ji. 2022. Generating 3D Molecules for Target Protein Binding. In ICML.
[26]
Chris J Maddison, Andriy Mnih, and Yee Whye Teh. 2016. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712 (2016).
[27]
Christopher Morris, Nils M Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. 2020. Tudataset: A collection of benchmark datasets for learning with graphs. arXiv preprint arXiv:2007.08663 (2020).
[28]
Timothy Nguyen, Zhourong Chen, and Jaehoon Lee. 2021 a. Dataset Meta-Learning from Kernel Ridge-Regression. In ICLR.
[29]
Timothy Nguyen, Roman Novak, Lechao Xiao, and Jaehoon Lee. 2021 b. Dataset distillation with infinitely wide convolutional networks. NeurIPS, Vol. 34 (2021).
[30]
Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. 2017. iCaRL: Incremental Classifier and Representation Learning. In CVPR.
[31]
Ozan Sener and Silvio Savarese. 2018. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In ICLR.
[32]
Xianfeng Tang, Yandong Li, Yiwei Sun, Huaxiu Yao, Prasenjit Mitra, and Suhang Wang. 2020. Transferring robustness for graph neural network against poisoning attacks. In WSDM.
[33]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 11 (2008).
[34]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR.
[35]
Daixin Wang, Jianbin Lin, Peng Cui, Quanhui Jia, Zhen Wang, Yanming Fang, Quan Yu, Jun Zhou, Shuang Yang, and Yuan Qi. 2019. A Semi-Supervised Graph Attentive Network for Financial Fraud Detection. In ICDM.
[36]
Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. 2018. Dataset distillation. ArXiv preprint (2018).
[37]
Yu Wang, Wei Jin, and Tyler Derr. 2022 a. Graph Neural Networks: Self-supervised Learning. In Graph Neural Networks: Foundations, Frontiers, and Applications. Springer, 391--420.
[38]
Yu Wang, Yuying Zhao, Yushun Dong, Huiyuan Chen, Jundong Li, and Tyler Derr. 2022 b. Improving Fairness in Graph Neural Networks via Mitigating Sensitive Attribute Leakage. In KDD.
[39]
Jeremy Watt, Reza Borhani, and Aggelos K Katsaggelos. 2020. Machine learning refined: Foundations, algorithms, and applications. Cambridge University Press.
[40]
Max Welling. 2009. Herding dynamical weights to learn. In ICML.
[41]
Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Q. Weinberger. 2019 a. Simplifying Graph Convolutional Networks. In ICML.
[42]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. 2019 b. A comprehensive survey on graph neural networks. ArXiv preprint (2019).
[43]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In ICLR.
[44]
Shuo Yang, Zeke Xie, Hanyu Peng, Min Xu, Mingming Sun, and Ping Li. 2022. Dataset Pruning: Reducing Training Data by Examining Generalization Influence. arXiv preprint arXiv:2205.09329 (2022).
[45]
Rahul Yedida, Snehanshu Saha, and Tejas Prashanth. 2021. LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence. Applied Intelligence, Vol. 51, 3 (2021), 1460--1478.
[46]
Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018. Hierarchical Graph Representation Learning with Differentiable Pooling. In NeurIPS.
[47]
Yuning You, Tianlong Chen, Yang Shen, and Zhangyang Wang. 2021. Graph Contrastive Learning Automated. In ICML.
[48]
Yuning You, Tianlong Chen, Zhangyang Wang, and Yang Shen. 2020. When Does Self-Supervision Help Graph Convolutional Networks?. In ICML.
[49]
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor K. Prasanna. 2020. GraphSAINT: Graph Sampling Based Inductive Learning Method. In ICLR.
[50]
Bo Zhao and Hakan Bilen. 2021 a. Dataset Condensation with Differentiable Siamese Augmentation. In ICML (Proceedings of Machine Learning Research).
[51]
Bo Zhao and Hakan Bilen. 2021 b. Dataset Condensation with Distribution Matching. arXiv preprint arXiv:2110.04181 (2021).
[52]
Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2021 b. Dataset Condensation with Gradient Matching. In ICLR.
[53]
Tong Zhao, Gang Liu, Stephan Günnemann, and Meng Jiang. 2022. Graph Data Augmentation for Graph Machine Learning: A Survey. arXiv:2202.08871 (2022).
[54]
Tong Zhao, Yozen Liu, Leonardo Neves, Oliver Woodford, Meng Jiang, and Neil Shah. 2021 a. Data augmentation for graph neural networks. In AAAI.

Cited By

View all
  • (2025)PUMA: Efficient Continual Graph Learning for Node Classification With Graph CondensationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348569137:1(449-461)Online publication date: Jan-2025
  • (2025)Condensing Pre-Augmented Recommendation Data via Lightweight Policy Gradient EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348424937:1(162-173)Online publication date: Jan-2025
  • (2024)Promoting fairness in link prediction with graph enhancementFrontiers in Big Data10.3389/fdata.2024.14893067Online publication date: 24-Oct-2024
  • Show More Cited By

Index Terms

  1. Condensing Graphs via One-Step Gradient Matching

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2022
    5033 pages
    ISBN:9781450393850
    DOI:10.1145/3534678
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 August 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data-efficient learning
    2. graph generation
    3. graph neural networks

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)666
    • Downloads (Last 6 weeks)78
    Reflects downloads up to 22 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)PUMA: Efficient Continual Graph Learning for Node Classification With Graph CondensationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348569137:1(449-461)Online publication date: Jan-2025
    • (2025)Condensing Pre-Augmented Recommendation Data via Lightweight Policy Gradient EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348424937:1(162-173)Online publication date: Jan-2025
    • (2024)Promoting fairness in link prediction with graph enhancementFrontiers in Big Data10.3389/fdata.2024.14893067Online publication date: 24-Oct-2024
    • (2024)Scalable Multi-Source Pre-training for Graph Neural NetworksProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680924(1292-1301)Online publication date: 28-Oct-2024
    • (2024)Distillation vs. Sampling for Efficient Training of Learning to Rank ModelsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672527(51-60)Online publication date: 2-Aug-2024
    • (2024)Graph Condensation for Open-World Graph LearningProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671917(851-862)Online publication date: 25-Aug-2024
    • (2024)Graph Data Condensation via Self-expressive Graph Structure ReconstructionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671710(1992-2002)Online publication date: 25-Aug-2024
    • (2024)Self-Supervised Learning for Graph Dataset CondensationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671682(3289-3298)Online publication date: 25-Aug-2024
    • (2024)Dataset Condensation for Time Series Classification via Dual Domain MatchingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671675(1980-1991)Online publication date: 25-Aug-2024
    • (2024)Graph Coarsening via Convolution Matching for Scalable Graph Neural Network TrainingCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651920(1502-1510)Online publication date: 13-May-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media