Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448016.3457325acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

ALG: Fast and Accurate Active Learning Framework for Graph Convolutional Networks

Published: 18 June 2021 Publication History

Abstract

Graph Convolutional Networks (GCNs) have become state-of-the-art methods in many supervised and semi-supervised graph representation learning scenarios. In order to achieve satisfactory performance, GCNs require a sufficient amount of labeled data. However, in real-world scenarios, labeled data is often expensive to obtain. Therefore, we propose ALG, a novel Active Learning framework for GCNs, which employs domain-specific intelligence to achieve much higher performance and efficiency compared to the generic AL frameworks. First, by decoupling GCN models, ALG serves as an effective and efficient AL framework for measuring and combining node representativeness and informativeness. Second, by exploiting the characteristic of the reception field in GCNs, ALG considers both the importance and correlation of nodes by proposing a new node selection metric that maximizes the effective reception field (ERF). We prove that this ERF maximization problem is NP-hard and provide an efficient algorithm accompanied with a provable approximation guarantee. The empirical studies on four public datasets demonstrate that ALG can significantly improve both the performance and efficiency of active learning for GCNs.

Supplementary Material

MP4 File (3448016.3457325.mp4)
Graph Convolutional Networks (GCNs) and their variants, which have become the de facto methods in many supervised and semi-supervised graph representation learning scenarios suchas node classification and link prediction.In order to achieve satisfactory performance, GCNs require a sufficient amount oflabeled data. However, in the real world scenarios, labeled data is often expensive to obtain.Therefore, we propose ALG, a novel \textbf{A}ctive \textbf{L}earning framework for \textbf{G}CNs, which employs domain-specific intelligence to achieve much higher performance and efficiency compared to the generic AL frameworks.First, by decoupling GCN models, ALG serves as an effective and efficient AL framework for measuring and combining node representativeness and informativeness.Second, by exploiting the characteristic of receptive field in GCNs, ALG considers both the importance and correlation of nodes through proposing a new node selection metric that maximizes the effective receptive field (ERF). We prove that this ERF maximization problem is an NP-hard and provide an efficient algorithm accompanied with provable approximationguarantee.The empirical studies on four public datasets demonstrate that ERF can significantly improve both the performance and efficiency of active learning for GCNs.Especially on Reddit dataset, the proposed ALG can outperform the competitive baseline - AGE by at least two orders of magnitude in our evaluation.

References

[1]
Naoki Abe, Bianca Zadrozny, and John Langford. 2006. Outlier detection by active learning. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining . 504--509.
[2]
Sami Abu-El-Haija, Amol Kapoor, Bryan Perozzi, and Joonseok Lee. 2020. N-gcn: Multi-scale graph convolution for semi-supervised node classification. In UAI . PMLR, 841--851.
[3]
Sugato Basu, Arindam Banerjee, and Raymond J. Mooney. 2002. Semi-supervised Clustering by Seeding. In Machine Learning, Proceedings of the Nineteenth International Conference (ICML 2002), University of New South Wales, Sydney, Australia, July 8--12, 2002. 27--34.
[4]
Robert Burbidge, Jem J Rowland, and Ross D King. 2007. Active learning for regression based on query by committee. In International conference on intelligent data engineering and automated learning. Springer, 209--218.
[5]
Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. 2017. Active learning for graph embedding. arXiv preprint arXiv:1705.05085 (2017).
[6]
Chandra Chekuri and Amit Kumar. 2004. Maximum coverage problem with group budget constraints and applications. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques . Springer, 72--83.
[7]
David A Cohn, Zoubin Ghahramani, and Michael I Jordan. 1996. Active learning with statistical models. Journal of artificial intelligence research, Vol. 4 (1996), 129--145.
[8]
Bo Du, Zengmao Wang, Lefei Zhang, Liangpei Zhang, Wei Liu, Jialie Shen, and Dacheng Tao. 2015. Exploring representativeness and informativeness for active learning. IEEE transactions on cybernetics, Vol. 47, 1 (2015), 14--26.
[9]
Yifan Fu, Xingquan Zhu, and Bin Li. 2013. A survey on instance selection for active learning. Knowledge and information systems, Vol. 35, 2 (2013), 249--283.
[10]
Yarin Gal, Riashat Islam, and Zoubin Ghahramani. 2017. Deep bayesian active learning with image data. arXiv preprint arXiv:1703.02910 (2017).
[11]
Li Gao, Hong Yang, Chuan Zhou, Jia Wu, Shirui Pan, and Yue Hu. 2018. Active Discriminative Network Representation Learning. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence . 2142--2148.
[12]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in neural information processing systems. 1024--1034.
[13]
Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. arXiv preprint arXiv:2002.02126 (2020).
[14]
Seho Kee, Enrique del Castillo, and George Runger. 2018. Query-by-committee improvement with diversity and density in batch active learning. Information Sciences, Vol. 454 (2018), 401--418.
[15]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[16]
Thomas N Kipf and Max Welling. 2016a. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[17]
Thomas N Kipf and Max Welling. 2016b. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016).
[18]
Brian Kulis, Sugato Basu, Inderjit Dhillon, and Raymond Mooney. 2009. Semi-supervised graph clustering: a kernel approach. Machine learning, Vol. 74, 1 (2009), 1--22.
[19]
Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018. Deeper Insights Into Graph Convolutional Networks for Semi-Supervised Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence . 3538--3545.
[20]
Xiang Li, Yan Zhao, Xiaofang Zhou, and Kai Zheng. 2020. Consensus-Based Group Task Assignment with Social Impact in Spatial Crowdsourcing. Data Science and Engineering, Vol. 5, 4 (2020), 375--390.
[21]
Siwu Liu, Ji Hwan Park, and Shinjae Yoo. 2020. Efficient and effective graph convolution networks. In Proceedings of the 2020 SIAM International Conference on Data Mining. SIAM, 388--396.
[22]
Yezheng Liu, Zhe Li, Chong Zhou, Yuanchun Jiang, Jianshan Sun, Meng Wang, and Xiangnan He. 2019. Generative adversarial active learning for unsupervised outlier detection. IEEE Transactions on Knowledge and Data Engineering, Vol. 32, 8 (2019), 1517--1528.
[23]
Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016).
[24]
Kaushalya Madhawa and Tsuyoshi Murata. 2020. Active Learning for Node Classification: An Evaluation. Entropy, Vol. 22, 10 (2020), 1164.
[25]
Kayo Matsushita. 2018. An invitation to deep active learning. In Deep active learning . Springer, 15--33.
[26]
Andrew Kachites McCallumzy and Kamal Nigamy. 1998. Employing EM and pool-based active learning for text classification. In Proc. International Conference on Machine Learning (ICML). Citeseer, 359--367.
[27]
Prem Melville and Raymond J Mooney. 2004. Diverse ensembles for active learning. In Proceedings of the twenty-first international conference on Machine learning. 74.
[28]
Xupeng Miao, Nezihe Merve Gürel, Wentao Zhang, Zhichao Han, Bo Li, Wei Min, Xi Rao, Hansheng Ren, Yinan Shan, Yingxia Shao, et almbox. 2019. DeGNN: Characterizing and Improving Graph Neural Networks with Graph Decomposition. arXiv preprint arXiv:1910.04499 (2019).
[29]
Hieu T Nguyen and Arnold Smeulders. 2004. Active learning using pre-clustering. In Proceedings of the twenty-first international conference on Machine learning. 79.
[30]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et almbox. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024--8035.
[31]
Breno Piva. 2019. Approximations for Restrictions of The Budgeted and Generalized Maximum Coverage Problems. In Proceedings of the tenth Latin and American Algorithms, Graphs and Optimization Symposium (Electronic Notes in Theoretical Computer Science, Vol. 346). Elsevier, 667--676.
[32]
Pei Quan, Yong Shi, Minglong Lei, Jiaxu Leng, Tianlin Zhang, and Lingfeng Niu. 2019. A Brief Review of Receptive Fields in Graph Convolutional Networks. In IEEE/WIC/ACM International Conference on Web Intelligence-Companion Volume . 106--110.
[33]
Emanuele Rossi, Fabrizio Frasca, Ben Chamberlain, Davide Eynard, Michael Bronstein, and Federico Monti. 2020. SIGN: Scalable Inception Graph Neural Networks. arXiv preprint arXiv:2004.11198 (2020).
[34]
Burr Settles. 2009. Active learning literature survey. (2009).
[35]
Alexander Sorokin and David Forsyth. 2008. Utility data annotation with amazon mechanical turk. In 2008 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, 1--8.
[36]
Indro Spinelli, Simone Scardapane, and Aurelio Uncini. 2020. Adaptive propagation graph convolutional network. IEEE Transactions on Neural Networks and Learning Systems (2020).
[37]
Min Tang, Xiaoqiang Luo, and Salim Roukos. 2002. Active learning for statistical natural language parsing. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics . 120--127.
[38]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).
[39]
Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
[40]
Xinyue Wang, Bo Liu, Siyu Cao, Liping Jing, and Jian Yu. 2020. Important sampling based active learning for imbalance classification. Science China Information Sciences, Vol. 63, 8 (2020), 1--14.
[41]
Felix Wu, Tianyi Zhang, Amauri Holanda de Souza Jr, Christopher Fifty, Tao Yu, and Kilian Q Weinberger. 2019. Simplifying graph convolutional networks. arXiv preprint arXiv:1902.07153 (2019).
[42]
Shiwen Wu, Wentao Zhang, Fei Sun, and Bin Cui. 2020 b. Graph Neural Networks in Recommender Systems: A Survey. arXiv preprint arXiv:2011.02260 (2020).
[43]
Shiwen Wu, Yuanxing Zhang, Chengliang Gao, Kaigui Bian, and Bin Cui. 2020 a. GARG: Anonymous Recommendation of Point-of-Interest in Mobile Networks by Graph Convolution Network. Data Science and Engineering, Vol. 5, 4 (2020), 433--447.
[44]
Yi Yang, Zhigang Ma, Feiping Nie, Xiaojun Chang, and Alexander G Hauptmann. 2015. Multi-class active learning by uncertainty sampling with diversity maximization. International Journal of Computer Vision, Vol. 113, 2 (2015), 113--127.
[45]
Kai Yu, Jinbo Bi, and Volker Tresp. 2006. Active learning via transductive experimental design. In Proceedings of the 23rd international conference on Machine learning. 1081--1088.
[46]
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor K. Prasanna. 2020. GraphSAINT: Graph Sampling Based Inductive Learning Method. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020 .
[47]
Wentao Zhang, Jiawei Jiang, Yingxia Shao, and Bin Cui. 2020 a. Snapshot boosting: a fast ensemble framework for deep neural networks. Sci. China Inf. Sci., Vol. 63, 1 (2020), 112102.
[48]
Wentao Zhang, Xupeng Miao, Yingxia Shao, Jiawei Jiang, Lei Chen, Olivier Ruas, and Bin Cui. 2020 b. Reliable Data Distillation on Graph Convolutional Network. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data . 1399--1414.
[49]
Da Zheng, Chao Ma, Minjie Wang, Jinjing Zhou, Qidong Su, Xiang Song, Quan Gan, Zheng Zhang, and George Karypis. 2020. DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs. arXiv preprint arXiv:2010.05337 (2020).
[50]
Jingbo Zhu, Huizhen Wang, Tianshun Yao, and Benjamin K Tsou. 2008. Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In Proceedings of the 22nd International Conference on Computational Linguistics. 1137--1144.

Cited By

View all
  • (2025)Data-Centric Graph Learning: A SurveyIEEE Transactions on Big Data10.1109/TBDATA.2024.348941211:1(1-20)Online publication date: Feb-2025
  • (2025)Advancing anomaly detection in computational workflows with active learningFuture Generation Computer Systems10.1016/j.future.2024.107608166(107608)Online publication date: May-2025
  • (2024)Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage AccessesProceedings of the VLDB Endowment10.14778/3648160.364816617:6(1227-1240)Online publication date: 3-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. active learning
  2. graph convolutional networks
  3. reception field

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)145
  • Downloads (Last 6 weeks)12
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Data-Centric Graph Learning: A SurveyIEEE Transactions on Big Data10.1109/TBDATA.2024.348941211:1(1-20)Online publication date: Feb-2025
  • (2025)Advancing anomaly detection in computational workflows with active learningFuture Generation Computer Systems10.1016/j.future.2024.107608166(107608)Online publication date: May-2025
  • (2024)Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage AccessesProceedings of the VLDB Endowment10.14778/3648160.364816617:6(1227-1240)Online publication date: 3-May-2024
  • (2024)Cost-effective Data Labelling for Graph Neural NetworksProceedings of the ACM Web Conference 202410.1145/3589334.3645339(353-364)Online publication date: 13-May-2024
  • (2024)BIM: Improving Graph Neural Networks with Balanced Influence Maximization2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00228(2931-2944)Online publication date: 13-May-2024
  • (2024)NC-ALG: Graph-Based Active Learning Under Noisy Crowd2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00210(2681-2694)Online publication date: 13-May-2024
  • (2024) E 2 GCL: Efficient and Expressive Contrastive Learning on Graph Neural Networks 2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00071(859-873)Online publication date: 13-May-2024
  • (2024)Finding core labels for maximizing generalization of graph neural networksNeural Networks10.1016/j.neunet.2024.106635(106635)Online publication date: Aug-2024
  • (2024)Adaptive graph active learning with mutual information via policy learningExpert Systems with Applications10.1016/j.eswa.2024.124773255(124773)Online publication date: Dec-2024
  • (2024)FICOM: an effective and scalable active learning framework for GNNs on semi-supervised node classificationThe VLDB Journal10.1007/s00778-024-00870-z33:5(1723-1742)Online publication date: 22-Jul-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media