research-article

Public Access

ALLIE: Active Learning on Large-scale Imbalanced Graphs

Authors:

Sumeet Katariya,

Pallav Agrawal,

Karthik Subbian,

Dongwon LeeAuthors Info & Claims

WWW '22: Proceedings of the ACM Web Conference 2022

Pages 690 - 698

https://doi.org/10.1145/3485447.3512229

Published: 25 April 2022 Publication History

All formats PDF

Abstract

Human labeling is time-consuming and costly. This problem is further exacerbated in extremely imbalanced class label scenarios, such as detecting fraudsters in online websites. Active learning selects the most relevant example for human labelers to improve the model performance at a lower cost. However, existing methods for active learning for graph data often assumes that both data and label distributions are balanced. These assumptions fail in extreme rare-class classification scenarios, such as classifying abusive reviews in an e-commerce website.

We propose a novel framework ALLIE to address this challenge of active learning in large-scale imbalanced graph data. In our approach, we efficiently sample from both majority and minority classes using a reinforcement learning agent with imbalance-aware reward function. We employ focal loss in the node classification model in order to focus more on rare class and improve the accuracy of the downstream model. Finally, we use a graph coarsening strategy to reduce the search space of the reinforcement learning agent. We conduct extensive experiments on benchmark graph datasets and real-world e-commerce datasets. ALLIE out-performs state-of-the-art graph-based active learning methods significantly, with up to 10% improvement of F1 score for the positive class. We also validate ALLIE on a proprietary e-commerce graph data by tasking it to detect abuse. Our coarsening strategy reduces the computational time by up to 38% in both proprietary and public datasets.

References

[1]

Bijaya Adhikari, Liangyue Li, Nikhil Rao, and Karthik Subbian. 2021. Finding Needles in Heterogeneous Haystacks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 15232–15239.

[2]

Charu C Aggarwal, Xiangnan Kong, Quanquan Gu, Jiawei Han, and S Yu Philip. 2014. Active learning: A survey. In Data Classification: Algorithms and Applications. CRC Press, 571–605.

[3]

Umang Aggarwal, Adrian Popescu, and Céline Hudelot. 2020. Active learning for imbalanced datasets. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1428–1437.

[4]

Philip Bachman, Alessandro Sordoni, and Adam Trischler. 2017. Learning algorithms for active learning. In international conference on machine learning. PMLR, 301–310.

[5]

Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. 2017. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34, 4 (2017), 18–42.

[6]

Mateusz Buda, Atsuto Maki, and Maciej A Mazurowski. 2018. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks 106(2018), 249–259.

Digital Library

[7]

Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. 2017. Active learning for graph embedding. arXiv preprint arXiv:1705.05085(2017).

[8]

Sina Yamac Caliskan and Paulo Tabuada. 2014. Towards Kron reduction of generalized electrical networks. Automatica 50, 10 (2014), 2586–2590.

Digital Library

[9]

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.

[10]

Xia Chen, Guoxian Yu, Jun Wang, Carlotta Domeniconi, Zhao Li, and Xiangliang Zhang. 2019. Activehne: Active heterogeneous network embedding. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2123–2129.

[11]

Limeng Cui, Siddharth Biswal, Lucas M Glass, Greg Lever, Jimeng Sun, and Cao Xiao. 2020. Conan: Complementary pattern augmentation for rare disease detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 614–621.

[12]

G Dasarathy, N Rao, and R Baraniuk. 2017. On computational and statistical tradeoffs in matrix completion with graph information. In Signal Processing with Adaptive Sparse Structured Representations Workshop SPARS.

[13]

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems 29 (2016), 3844–3852.

[14]

Liat Ein Dor, Alon Halfon, Ariel Gera, Eyal Shnarch, Lena Dankin, Leshem Choshen, Marina Danilevsky, Ranit Aharonov, Yoav Katz, and Noam Slonim. 2020. Active learning for BERT: An empirical study. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 7949–7962.

[15]

Charles Elkan. 2001. The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, Vol. 17. Lawrence Erlbaum Associates Ltd, 973–978.

[16]

Li Gao, Hong Yang, Chuan Zhou, Jia Wu, Shirui Pan, and Yue Hu. 2018. Active discriminative network representation learning. In IJCAI International Joint Conference on Artificial Intelligence.

[17]

Zhongkai Hao, Chengqiang Lu, Zhenya Huang, Hao Wang, Zheyuan Hu, Qi Liu, Enhong Chen, and Cheekong Lee. 2020. ASGN: An active semi-supervised graph neural network for molecular property prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 731–752.

Digital Library

[18]

Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, 1322–1328.

[19]

Haibo He and Edwardo A Garcia. 2009. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering 21, 9(2009), 1263–1284.

Digital Library

[20]

B Hendrickson and R Leland. 1995. A Multi-Level Algorithm For Partitioning Graphs. In Supercomputing’95: Proceedings of the 1995 ACM/IEEE Conference on Supercomputing. IEEE, 28–28.

Digital Library

[21]

Shengding Hu, Zheng Xiong, Meng Qu, Xingdi Yuan, Marc-Alexandre Côté, Zhiyuan Liu, and Jian Tang. 2020. Graph Policy Network for Transferable Active Learning on Graphs. In NeurIPS.

[22]

Vassilis Kalofolias and Nathanaël Perraudin. 2018. Large Scale Graph Learning From Smooth Signals. In International Conference on Learning Representations.

[23]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980

[24]

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJU4ayYgl

[25]

Ksenia Konyushkova, Raphael Sznitman, and Pascal Fua. 2015. Introducing geometry in active learning for image segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 2974–2982.

Digital Library

[26]

Ioannis Koutis, Gary L Miller, and Richard Peng. 2011. A nearly-m log n time solver for sdd linear systems. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science. IEEE, 590–598.

Digital Library

[27]

Stephane Lafon and Ann B Lee. 2006. Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE transactions on pattern analysis and machine intelligence 28, 9(2006), 1393–1403.

[28]

Junhyun Lee, Inyeop Lee, and Jaewoo Kang. 2019. Self-attention graph pooling. In International Conference on Machine Learning. PMLR, 3734–3743.

[29]

Ruirui Li, Xian Wu, Xian Wu, and Wei Wang. 2020. Few-shot learning for new user recommendation in location-based social networks. In Proceedings of The Web Conference 2020. 2472–2478.

Digital Library

[30]

Xin Li and Yuhong Guo. 2013. Adaptive active learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 859–866.

Digital Library

[31]

Yayong Li, Jie Yin, and Ling Chen. 2020. SEAL: Semisupervised Adversarial Active Learning on Attributed Graphs. IEEE Transactions on Neural Networks and Learning Systems (2020).

[32]

Enlu Lin, Qiong Chen, and Xiaoming Qi. 2020. Deep reinforcement learning for imbalanced classification. Applied Intelligence 50, 8 (2020), 2488–2502.

Digital Library

[33]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980–2988.

[34]

Yang Liu, Xiang Ao, Zidi Qin, Jianfeng Chi, Jinghua Feng, Hao Yang, and Qing He. 2021. Pick and Choose: A GNN-based Imbalanced Learning Approach for Fraud Detection. In Proceedings of the Web Conference 2021. 3168–3177.

Digital Library

[35]

Kaushalya Madhawa and Tsuyoshi Murata. 2020. MetAL: Active Semi-Supervised Learning on Graphs via Meta-Learning. In Asian Conference on Machine Learning. PMLR, 561–576.

[36]

Federico Monti, Michael M Bronstein, and Xavier Bresson. 2017. Geometric matrix completion with recurrent multi-graph neural networks. arXiv preprint arXiv:1704.06803(2017).

[37]

Kemal Oksuz, Baris Can Cam, Sinan Kalkan, and Emre Akbas. 2020. Imbalance problems in object detection: A review. IEEE transactions on pattern analysis and machine intelligence (2020).

[38]

Nikhil Rao, Hsiang-Fu Yu, Pradeep Ravikumar, and Inderjit S Dhillon. 2015. Collaborative Filtering with Graph Information: Consistency and Scalable Methods. In NIPS, Vol. 2. Citeseer, 7.

[39]

Chris Seiffert, Taghi M Khoshgoftaar, Jason Van Hulse, and Amri Napolitano. 2009. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 40, 1 (2009), 185–197.

Digital Library

[40]

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI magazine 29, 3 (2008), 93–93.

[41]

Claude Elwood Shannon. 2001. A mathematical theory of communication. ACM SIGMOBILE mobile computing and communications review 5, 1(2001), 3–55.

[42]

Min Shi, Yufei Tang, Xingquan Zhu, David Wilson, and Jianxun Liu. 2020. Multi-class imbalanced graph convolutional network learning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20).

[43]

Daniel A Spielman and Shang-Hua Teng. 2011. Spectral sparsification of graphs. SIAM J. Comput. 40, 4 (2011), 981–1025.

Digital Library

[44]

Sriram Srinivasan, Nikhil S Rao, Karthik Subbian, and Lise Getoor. 2019. Identifying facet mismatches in search via micrographs. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1663–1672.

Digital Library

[45]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations.

[46]

Daixin Wang, Jianbin Lin, Peng Cui, Quanhui Jia, Zhen Wang, Yanming Fang, Quan Yu, Jun Zhou, Shuang Yang, and Yuan Qi. 2019. A semi-supervised graph attentive network for financial fraud detection. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 598–607.

[47]

Lu Wang, Yanghua Xiao, Bin Shao, and Haixun Wang. 2014. How to partition a billion-node graph. In 2014 IEEE 30th International Conference on Data Engineering. IEEE, 568–579.

[48]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3 (1992), 229–256.

[49]

Yuexin Wu, Yichong Xu, Aarti Singh, Yiming Yang, and Artur Dubrawski. 2019. Active learning for graph neural networks via node feature propagation. arXiv preprint arXiv:1910.07567(2019).

[50]

Yang Xu, Yu Hong, Huibin Ruan, Jianmin Yao, Min Zhang, and Guodong Zhou. 2018. Using active learning to expand training data for implicit discourse relation recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 725–731.

[51]

Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L Hamilton, and Jure Leskovec. 2018. Hierarchical graph representation learning with differentiable pooling. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 4805–4815.

[52]

Ruilong Zhang, Lei Li, Yuhong Zhang, and Chenyang Bu. 2018. Imbalanced networked multi-label classification with active learning. In 2018 IEEE International Conference on Big Knowledge (ICBK). IEEE, 290–297.

[53]

Yifan Zhang, Peilin Zhao, Jiezhang Cao, Wenye Ma, Junzhou Huang, Qingyao Wu, and Mingkui Tan. 2018. Online adaptive asymmetric active learning for budgeted imbalanced data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2768–2777.

Digital Library

[54]

Yanqiao Zhu, Weizhi Xu, Qiang Liu, and Shu Wu. 2020. When Contrastive Learning Meets Active Learning: A Novel Graph Active Learning Paradigm with Self-Supervision. arXiv preprint arXiv:2010.16091(2020).

Cited By

Zhu DLi ZZhang MYuan JLiu JKuang KWu CBaeza-Yates RBonchi F(2024)Neural Collapse Anchored Prompt Tuning for Generalizable Vision-Language ModelsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671690(4631-4640)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671690
Xu WWang PZhao ZWang BWang XWang YChua TNgo CKa-Wei Lee RKumar RLauw H(2024)When Imbalance Meets Imbalance: Structure-driven Learning for Imbalanced Graph ClassificationProceedings of the ACM Web Conference 202410.1145/3589334.3645629(905-913)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645629
Wang RZhang YPeng L(2024)Anomaly Detection Service for Blockchain Transactions Using Minimal Substitution-Based Label PropagationIEEE Transactions on Services Computing10.1109/TSC.2024.340760117:5(2054-2066)Online publication date: Sep-2024
https://doi.org/10.1109/TSC.2024.3407601
Show More Cited By

Index Terms

ALLIE: Active Learning on Large-scale Imbalanced Graphs

Index terms have been assigned to the content through auto-classification.

Recommendations

Focus on informative graphs! Semi-supervised active learning for graph-level classification
Abstract
Graph-level classification is a critical problem in social analysis and bioinformatics. Since annotated labels are typically costly, we intend to study this challenging task in semi-supervised scenarios with limited budgets. Inspired by the fact ...
Highlights
- We explore a challenging yet practical problem: semi-supervised graph classification.
- We explore graph semantics from both local and global views in the active learning.
- We incorporate contrastive learning into a semi-supervised ...
Improving Graph Neural Networks by combining active learning with self-training
Abstract
In this paper, we propose a novel framework, called STAL, which makes use of unlabeled graph data, through a combination of Active Learning and Self-Training, in order to improve node labeling by Graph Neural Networks (GNNs). GNNs have been shown ...
Active deep Q-learning with demonstration
Abstract
Reinforcement learning (RL) is a machine learning technique aiming to learn how to take actions in an environment to maximize some kind of reward. Recent research has shown that although the learning efficiency of RL can be improved with expert ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Proceedings of the ACM Web Conference 2022

April 2022

3764 pages

ISBN:9781450390965

DOI:10.1145/3485447

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Elena Simperl
King’s College London, UK
,
Deepak Agarwal
Pinterest, USA
,
Aristides Gionis
KTH Royal Institute of Technology, Sweden
,
Ivan Herman
W3C / retired
,
Lionel Médini
Université Lyon 1, France

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

NSF (National Science Foundation)

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
864
Total Downloads

Downloads (Last 12 months)342
Downloads (Last 6 weeks)47

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhu DLi ZZhang MYuan JLiu JKuang KWu CBaeza-Yates RBonchi F(2024)Neural Collapse Anchored Prompt Tuning for Generalizable Vision-Language ModelsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671690(4631-4640)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671690
Xu WWang PZhao ZWang BWang XWang YChua TNgo CKa-Wei Lee RKumar RLauw H(2024)When Imbalance Meets Imbalance: Structure-driven Learning for Imbalanced Graph ClassificationProceedings of the ACM Web Conference 202410.1145/3589334.3645629(905-913)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645629
Wang RZhang YPeng L(2024)Anomaly Detection Service for Blockchain Transactions Using Minimal Substitution-Based Label PropagationIEEE Transactions on Services Computing10.1109/TSC.2024.340760117:5(2054-2066)Online publication date: Sep-2024
https://doi.org/10.1109/TSC.2024.3407601
Huang YPi YShi YGuo WWang S(2024)Adaptive graph active learning with mutual information via policy learningExpert Systems with Applications10.1016/j.eswa.2024.124773255(124773)Online publication date: Dec-2024
https://doi.org/10.1016/j.eswa.2024.124773
Wang RZhang ZWang YZhang Y(2023)Minimal Substitution-based Label Propagation for Anomalous Blockchain Detection2023 19th International Conference on Mobility, Sensing and Networking (MSN)10.1109/MSN60784.2023.00100(685-692)Online publication date: 14-Dec-2023
https://doi.org/10.1109/MSN60784.2023.00100
Barnabò GSiciliano FCastillo CLeonardi SNakov PDa San Martino GSilvestri F(2023)Deep active learning for misinformation detection using geometric deep learningOnline Social Networks and Media10.1016/j.osnem.2023.10024433(100244)Online publication date: Jan-2023
https://doi.org/10.1016/j.osnem.2023.100244
Jain AArora GSaladi A(2023)Boosting the Performance of Deployable Timestamped Directed GNNs via Time-Relaxed SamplingMachine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track10.1007/978-3-031-43427-3_12(190-206)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43427-3_12

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents