research-article

Public Access

Condensing Graphs via One-Step Gradient Matching

Authors:

Bing YinAuthors Info & Claims

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 720 - 730

https://doi.org/10.1145/3534678.3539429

Published: 14 August 2022 Publication History

Abstract

As training deep learning models on large dataset takes a lot of time and resources, it is desired to construct a small synthetic dataset with which we can train deep learning models sufficiently. There are recent works that have explored solutions on condensing image datasets through complex bi-level optimization. For instance, dataset condensation (DC) matches network gradients w.r.t. large-real data and small-synthetic data, where the network weights are optimized for multiple steps at each outer iteration. However, existing approaches have their inherent limitations: (1) they are not directly applicable to graphs where the data is discrete; and (2) the condensation process is computationally expensive due to the involved nested optimization. To bridge the gap, we investigate efficient dataset condensation tailored for graph datasets where we model the discrete graph structure as a probabilistic model. We further propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. Extensive experiments on various graph datasets demonstrate the effectiveness and efficiency of the proposed method. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance and our method is significantly faster than multi-step gradient matching (e.g. $15$× in CIFAR10 for synthesizing 500 graphs).

References

[1]

Abubakar Abid, Muhammad Fatih Balin, and James Zou. 2019. Concrete autoencoders for differentiable feature selection and reconstruction. arXiv preprint arXiv:1901.09346 (2019).

[2]

Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. 2018. Relational inductive biases, deep learning, and graph networks. ArXiv preprint (2018).

[3]

Francisco M Castro, Manuel J Marín-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. 2018. End-to-end incremental learning. In ECCV.

[4]

Kaize Ding, Zhe Xu, Hanghang Tong, and Huan Liu. 2022. Data augmentation for deep graph learning: A survey. arXiv preprint arXiv:2202.08235 (2022).

[5]

Tian Dong, Bo Zhao, and Lingjuan Lyu. 2022. Privacy for Free: How does Dataset Condensation Help Privacy?. In ICML.

[6]

David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gó mez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P. Adams. 2015. Convolutional Networks on Graphs for Learning Molecular Fingerprints. In NeurIPS.

[7]

Vijay Prakash Dwivedi, Chaitanya K Joshi, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. 2020. Benchmarking Graph Neural Networks. arXiv preprint arXiv:2003.00982 (2020).

[8]

Wenqi Fan, Yao Ma, Qing Li, Yuan He, Yihong Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph Neural Networks for Social Recommendation. In WWW.

[9]

Reza Zanjirani Farahani and Masoud Hekmatfar. 2009. Facility location: concepts, models, algorithms and case studies.

[10]

Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural Message Passing for Quantum Chemistry. In ICML.

Digital Library

[11]

Zhichun Guo, Chuxu Zhang, Wenhao Yu, John Herr, Olaf Wiest, Meng Jiang, and Nitesh V Chawla. 2021. Few-shot graph learning for molecular property prediction. In Proceedings of the Web Conference 2021. 2559--2567.

Digital Library

[12]

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open Graph Benchmark: Datasets for Machine Learning on Graphs. In NeurIPS.

[13]

Wei Jin, Xiaorui Liu, Yao Ma, Charu Aggarwal, and Jiliang Tang. 2022 a. Feature Overcorrelation in Deep Graph Neural Networks: A New Perspective. In KDD.

[14]

Wei Jin, Yao Ma, Xiaorui Liu, Xianfeng Tang, Suhang Wang, and Jiliang Tang. 2020. Graph Structure Learning for Robust Graph Neural Networks. In KDD.

[15]

Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah. 2022 b. Graph Condensation for Graph Neural Networks. In ICLR 2022.

[16]

Krishnateja Killamsetty, Durga S, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. 2021. GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training. In ICML. PMLR.

[17]

Jang-Hyun Kim, Jinuk Kim, Seong Joon Oh, Sangdoo Yun, Hwanjun Song, Joonhyun Jeong, Jung-Woo Ha, and Hyun Oh Song. 2022. Dataset Condensation via Efficient Synthetic-Data Parameterization. arXiv:2205.14959 (2022).

[18]

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.

[19]

Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. 2019. Predict then Propagate: Graph Neural Networks meet Personalized PageRank. In ICLR 2019.

[20]

Saehyung Lee, Sanghyuk Chun, Sangwon Jung, Sangdoo Yun, and Sungroh Yoon. 2022. Dataset Condensation with Contrastive Signals. In ICML.

[21]

Guohao Li, Matthias Mü ller, Ali K. Thabet, and Bernard Ghanem. 2019. DeepGCNs: Can GCNs Go As Deep As CNNs?. In ICCV.

[22]

Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 12 (2017), 2935--2947.

[23]

Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. In ICLR.

[24]

Meng Liu, Hongyang Gao, and Shuiwang Ji. 2020. Towards deeper graph neural networks. In KDD.

[25]

Meng Liu, Youzhi Luo, Kanji Uchino, Koji Maruhashi, and Shuiwang Ji. 2022. Generating 3D Molecules for Target Protein Binding. In ICML.

[26]

Chris J Maddison, Andriy Mnih, and Yee Whye Teh. 2016. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712 (2016).

[27]

Christopher Morris, Nils M Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. 2020. Tudataset: A collection of benchmark datasets for learning with graphs. arXiv preprint arXiv:2007.08663 (2020).

[28]

Timothy Nguyen, Zhourong Chen, and Jaehoon Lee. 2021 a. Dataset Meta-Learning from Kernel Ridge-Regression. In ICLR.

[29]

Timothy Nguyen, Roman Novak, Lechao Xiao, and Jaehoon Lee. 2021 b. Dataset distillation with infinitely wide convolutional networks. NeurIPS, Vol. 34 (2021).

[30]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. 2017. iCaRL: Incremental Classifier and Representation Learning. In CVPR.

[31]

Ozan Sener and Silvio Savarese. 2018. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In ICLR.

[32]

Xianfeng Tang, Yandong Li, Yiwei Sun, Huaxiu Yao, Prasenjit Mitra, and Suhang Wang. 2020. Transferring robustness for graph neural network against poisoning attacks. In WSDM.

[33]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 11 (2008).

[34]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR.

[35]

Daixin Wang, Jianbin Lin, Peng Cui, Quanhui Jia, Zhen Wang, Yanming Fang, Quan Yu, Jun Zhou, Shuang Yang, and Yuan Qi. 2019. A Semi-Supervised Graph Attentive Network for Financial Fraud Detection. In ICDM.

[36]

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. 2018. Dataset distillation. ArXiv preprint (2018).

[37]

Yu Wang, Wei Jin, and Tyler Derr. 2022 a. Graph Neural Networks: Self-supervised Learning. In Graph Neural Networks: Foundations, Frontiers, and Applications. Springer, 391--420.

[38]

Yu Wang, Yuying Zhao, Yushun Dong, Huiyuan Chen, Jundong Li, and Tyler Derr. 2022 b. Improving Fairness in Graph Neural Networks via Mitigating Sensitive Attribute Leakage. In KDD.

[39]

Jeremy Watt, Reza Borhani, and Aggelos K Katsaggelos. 2020. Machine learning refined: Foundations, algorithms, and applications. Cambridge University Press.

[40]

Max Welling. 2009. Herding dynamical weights to learn. In ICML.

[41]

Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Q. Weinberger. 2019 a. Simplifying Graph Convolutional Networks. In ICML.

[42]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. 2019 b. A comprehensive survey on graph neural networks. ArXiv preprint (2019).

[43]

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In ICLR.

[44]

Shuo Yang, Zeke Xie, Hanyu Peng, Min Xu, Mingming Sun, and Ping Li. 2022. Dataset Pruning: Reducing Training Data by Examining Generalization Influence. arXiv preprint arXiv:2205.09329 (2022).

[45]

Rahul Yedida, Snehanshu Saha, and Tejas Prashanth. 2021. LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence. Applied Intelligence, Vol. 51, 3 (2021), 1460--1478.

Digital Library

[46]

Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018. Hierarchical Graph Representation Learning with Differentiable Pooling. In NeurIPS.

[47]

Yuning You, Tianlong Chen, Yang Shen, and Zhangyang Wang. 2021. Graph Contrastive Learning Automated. In ICML.

[48]

Yuning You, Tianlong Chen, Zhangyang Wang, and Yang Shen. 2020. When Does Self-Supervision Help Graph Convolutional Networks?. In ICML.

[49]

Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor K. Prasanna. 2020. GraphSAINT: Graph Sampling Based Inductive Learning Method. In ICLR.

[50]

Bo Zhao and Hakan Bilen. 2021 a. Dataset Condensation with Differentiable Siamese Augmentation. In ICML (Proceedings of Machine Learning Research).

[51]

Bo Zhao and Hakan Bilen. 2021 b. Dataset Condensation with Distribution Matching. arXiv preprint arXiv:2110.04181 (2021).

[52]

Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2021 b. Dataset Condensation with Gradient Matching. In ICLR.

[53]

Tong Zhao, Gang Liu, Stephan Günnemann, and Meng Jiang. 2022. Graph Data Augmentation for Graph Machine Learning: A Survey. arXiv:2202.08871 (2022).

[54]

Tong Zhao, Yozen Liu, Leonardo Neves, Oliver Woodford, Meng Jiang, and Neil Shah. 2021 a. Data augmentation for graph neural networks. In AAAI.

Cited By

Liu YQiu RTang YYin HHuang Z(2025)PUMA: Efficient Continual Graph Learning for Node Classification With Graph CondensationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348569137:1(449-461)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3485691
Wu JFan WChen JLiu SLiu QHe RLi QTang K(2025)Condensing Pre-Augmented Recommendation Data via Lightweight Policy Gradient EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348424937:1(162-173)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3484249
Liu YChen HImani M(2024)Promoting fairness in link prediction with graph enhancementFrontiers in Big Data10.3389/fdata.2024.14893067Online publication date: 24-Oct-2024
https://doi.org/10.3389/fdata.2024.1489306
Show More Cited By

Index Terms

Condensing Graphs via One-Step Gradient Matching
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Exhaustive Generation of k-Critical $${\mathcal H}$$-Free Graphs
WG 2016: Revised Selected Papers of the 42nd International Workshop on Graph-Theoretic Concepts in Computer Science - Volume 9941

We describe an algorithm for generating all k-critical $${\mathcal H}$$-free graphs, based on a method of Hoíng et al. Using this algorithm, we prove that there are only finitely many 4-critical $$P_7,C_k$$-free graphs, for both $$k=4$$ and $$k=5$$. We ...
Exhaustive generation of k‐critical H‐free graphs

AbstractWe describe an algorithm for generating all k‐critical H‐free graphs, based on a method of Hoàng et al. A graph G is k‐critical H‐free if G is H‐free, k‐chromatic, and every H‐free proper subgraph of G is k−1‐colorable. Using this algorithm, we ...
Chromatic-index-critical graphs of orders 13 and 14

A graph is chromatic-index-critical if it cannot be edge-coloured with @D colours (with @D the maximal degree of the graph), and if the removal of any edge decreases its chromatic index. The Critical Graph Conjecture stated that any such graph has odd ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2022

5033 pages

ISBN:9781450393850

DOI:10.1145/3534678

General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

KDD '22

Sponsor:

KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2022

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
1,407
Total Downloads

Downloads (Last 12 months)666
Downloads (Last 6 weeks)78

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu YQiu RTang YYin HHuang Z(2025)PUMA: Efficient Continual Graph Learning for Node Classification With Graph CondensationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348569137:1(449-461)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3485691
Wu JFan WChen JLiu SLiu QHe RLi QTang K(2025)Condensing Pre-Augmented Recommendation Data via Lightweight Policy Gradient EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348424937:1(162-173)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3484249
Liu YChen HImani M(2024)Promoting fairness in link prediction with graph enhancementFrontiers in Big Data10.3389/fdata.2024.14893067Online publication date: 24-Oct-2024
https://doi.org/10.3389/fdata.2024.1489306
Lin MLi WHong XLu SCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Scalable Multi-Source Pre-training for Graph Neural NetworksProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680924(1292-1301)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680924
Khandel PYates AVarbanescu Ade Rijke MPimentel AOosterhuis HBast HXiong C(2024)Distillation vs. Sampling for Efficient Training of Learning to Rank ModelsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672527(51-60)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3664190.3672527
Gao XChen TZhang WLi YSun XYin HBaeza-Yates RBonchi F(2024)Graph Condensation for Open-World Graph LearningProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671917(851-862)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671917
Liu ZZeng CZheng GBaeza-Yates RBonchi F(2024)Graph Data Condensation via Self-expressive Graph Structure ReconstructionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671710(1992-2002)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671710
Wang YYan XJin SHuang HXu QZhang QDu BJiang JBaeza-Yates RBonchi F(2024)Self-Supervised Learning for Graph Dataset CondensationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671682(3289-3298)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671682
Liu ZHao KZheng GYu YBaeza-Yates RBonchi F(2024)Dataset Condensation for Time Series Classification via Dual Domain MatchingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671675(1980-1991)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671675
Dickens CHuang EReganti AZhu JSubbian KKoutra DChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Graph Coarsening via Convolution Matching for Scalable Graph Neural Network TrainingCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651920(1502-1510)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3651920
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents