research-article

Public Access

SPARC: Self-Paced Network Representation for Few-Shot Rare Category Characterization

Authors:

Wei FanAuthors Info & Claims

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 2807 - 2816

https://doi.org/10.1145/3219819.3219968

Published: 19 July 2018 Publication History

Abstract

In the era of big data, it is often the rare categories that are of great interest in many high-impact applications, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from network intrusion detection in computer networks to fault detection in manufacturing. As a result, rare category characterization becomes a fundamental learning task, which aims to accurately characterize the rare categories given limited label information. The unique challenge of rare category characterization, i.e., the non-separability nature of the rare categories from the majority classes, together with the availability of the multi-modal representation of the examples, poses a new research question: how can we learn a salient rare category oriented embedding representation such that the rare examples are well separated from the majority class examples in the embedding space, which facilitates the follow-up rare category characterization?

To address this question, inspired by the family of curriculum learning that simulates the cognitive mechanism of human beings, we propose a self-paced framework named SPARC that gradually learns the rare category oriented network representation and the characterization model in a mutually beneficial way by shifting from the 'easy' concept to the target 'difficult' one, in order to facilitate more reliable label propagation to the large number of unlabeled examples. The experimental results on various real data demonstrate that our proposed SPARC algorithm: (1) shows a significant improvement over state-of-the-art graph embedding methods on representing the rare categories that are non-separable from the majority classes; (2) outperforms the existing methods on rare category characterization tasks.

References

[1]

M. Belkin and P. Niyogi . 2002. Laplacian eigenmaps and spectral techniques for embedding and clustering NIPS (2002).

Digital Library

[2]

Y. Bengio . 2014. Evolving culture versus local minima. Growing Adaptive Machines (2014).

[3]

Y. Bengio, J. Louradour, R. Collobert, and J. Weston . 2009. Curriculum learning ICML (2009).

Digital Library

[4]

A. Blum and T. Mitchell . 1998. Combining labeled and unlabeled data with co-training COLT (1998).

Digital Library

[5]

L. Bottou . 2010. Large-scale machine learning with stochastic gradient descent. COMPSTAT (2010).

[6]

N. V Chawla, K. W Bowyer, L. O Hall, and W P. Kegelmeyer . 2002. SMOTE: synthetic minority over-sampling technique. JAIR (2002).

Digital Library

[7]

T. Chen and Y. Sun . 2017. Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification ACM WSDM (2017).

Digital Library

[8]

B. Du, S. Zhang, N. Cao, and H. Tong . 2017. First: Fast interactive attributed subgraph matching ACM SIGKDD (2017).

Digital Library

[9]

E. M Fich and A. Shivdasani . 2007. Financial fraud, director reputation, and shareholder wealth. Journal of Financial Economics (2007).

[10]

A. Grover and J. Leskovec . 2016. node2vec: Scalable feature learning for networks. ACM SIGKDD (2016).

Digital Library

[11]

R. Guo, J. Li, and H. Liu . 2018. INITIATOR: Noise-contrastive Estimation for Marked Temporal Point Process IJCAI (2018).

[12]

H. He and E. A Garcia . 2009. Learning from imbalanced data. IEEE TKDE (2009).

Digital Library

[13]

J. He, Y. Liu, and R. Lawrence . 2008. Graph-based rare category detection. In IEEE ICDM (2008).

Digital Library

[14]

J. He, H. Tong, and J. Carbonell . 2010. Rare category characterization. In IEEE ICDM (2010).

Digital Library

[15]

V. Hodge and J. Austin . 2004. A survey of outlier detection methodologies. Artificial intelligence review (2004).

Digital Library

[16]

C. Huang, Y. Li, C. Change, and X. Tang . 2016. Learning deep representation for imbalanced classification IEEE CVPR (2016).

[17]

L. Jiang, D. Meng, Q. Zhao, S. Shan, and A. G Hauptmann . 2015. Self-Paced Curriculum Learning. In AAAI (2015).

Digital Library

[18]

N Jindal and B Liu . 2007. Review spam detection WWW (2007).

Digital Library

[19]

F. Khan, B. Mutlu, and X. Zhu . 2011. How do humans teach: On curriculum learning and teaching dimension NIPS (2011).

Digital Library

[20]

J. B Kruskal and M. Wish . 1978. Multidimensional scaling.

[21]

M P. Kumar, B. Packer, and D. Koller . 2010. Self-paced learning for latent variable models. In NIPS (2010).

Digital Library

[22]

J. Leskovec and A. Krevl . 2015. $$SNAP Datasets$$:$$Stanford$$ Large Network Dataset Collection. (2015).

[23]

J. Li, H. Dani, X. Hu, J. Tang, Y. Chang, and H. Liu . 2017. Attributed network embedding for learning in a dynamic environment ACM CIKM (2017).

Digital Library

[24]

S. Li, M. Shao, and Y. Fu . {n. d.}. Multi-view low-rank analysis for outlier detection SIAM SDM (2015).

[25]

S. Li, M. Shao, and Y. Fu . 2018. Multi-View Low-Rank Analysis with Applications to Outlier Detection. TKDD (2018).

Digital Library

[26]

F. Ma, D. Meng, Q. Xie, Z. Li, and X. Dong . 2017. Self-paced co-training. In ICML (2017).

[27]

L. Maaten and G. Hinton . 2008. Visualizing data using t-SNE. JMLR (2008).

[28]

T. Mikolov, I. Sutskever, K. Chen, G. S Corrado, and J. Dean . 2013. Distributed representations of words and phrases and their compositionality NIPS (2013).

Digital Library

[29]

D. Pelleg and A. W Moore . 2005. Active learning for anomaly and rare-category detection NIPS (2005).

Digital Library

[30]

B. Perozzi, R. Al-Rfou, and S. Skiena . 2014. Deepwalk: Online learning of social representations ACM SIGKDD (2014).

Digital Library

[31]

X. R and L. Bo . 2012. Discriminatively trained sparse code gradients for contour detection NIPS (2012).

Digital Library

[32]

S. T Roweis and L. K Saul . 2000. Nonlinear dimensionality reduction by locally linear embedding. science (2000).

[33]

W. Shen, X. Wang, Y. Wang, X. Bai, and Z. Zhang . 2015. Deepcontour: A deep convolutional feature learned by positive-sharing loss for contour detection IEEE CVPR (2015).

[34]

V. I Spitkovsky, H. Alshawi, and D. Jurafsky . 2009. Baby Steps: How “Less is More” in unsupervised dependency parsing. NIPS (2009).

[35]

Y. Sun, M. S Kamel, and Y. Wang . 2006. Boosting for learning multiple classes with imbalanced class distribution IEEE ICDM (2006).

Digital Library

[36]

J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei . 2015. Line: Large-scale information network embedding. WWW (2015).

Digital Library

[37]

J. B Tenenbaum, V. De Silva, and J. C Langford . 2000. A global geometric framework for nonlinear dimensionality reduction. science (2000).

[38]

X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang . 2017. Community Preserving Network Embedding. In AAAI (2017).

Digital Library

[39]

G. Wu and E. Y Chang . 2003. Adaptive feature-space conformal transformation for imbalanced-data learning ICML (2003).

Digital Library

[40]

J. Wu, J. He, and Y. Liu . 2018. ImVerde: Vertex-Diminished Random Walk for Learning Network Representation from Imbalanced Data. arXiv preprint arXiv:1804.09222 (2018).

[41]

S. Wu, Q. Ji, S. Wang, H. Wong, Z. Yu, and Y. Xu . 2017. Semi-Supervised Image Classification with Self-Paced Cross-Task Networks. TMM (2017).

Digital Library

[42]

Y. Xu and W. Yin . 2013. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. JIS (2013).

[43]

Z. Yang, W. W Cohen, and R. Salakhutdinov . 2016. Revisiting semi-supervised learning with graph embeddings ICML (2016).

Digital Library

[44]

B. Zadrozny, J. Langford, and N. Abe . 2003. Cost-sensitive learning by cost-proportionate example weighting IEEE ICDM (2003).

Digital Library

[45]

S. Zhang, D. Zhou, M. Y. Yildirim, S. Alcorn, J. He, H. Davulcu, and H. Tong . 2017. HiDDen: hierarchical dense subgraph detection with application to financial fraud detection SIAM SDM (2017).

[46]

D. Zhou, J. He, K S. Candan, and H. Davulcu . 2015 a. MUVIR: Multi-View Rare Category Detection. In IJCAI (2015).

Digital Library

[47]

D. Zhou, J. He, Y. Cao, and J. Seo . 2016. Bi-level rare temporal pattern detection. In IEEE ICDM (2016).

[48]

D. Zhou, A. Karthikeyan, K. Wang, N. Cao, and J. He . 2017 a. Discovering rare categories from graph streams. DMKD (2017).

Digital Library

[49]

D. Zhou, K. Wang, N. Cao, and J. He . 2015 b. Rare category detection on time-evolving graphs. IEEE ICDM (2015).

Digital Library

[50]

D. Zhou, S. Zhang, M. Y. Yildirim, S. Alcorn, H. Tong, H. Davulcu, and J. He . 2017 b. A local algorithm for structure-preserving graph cut ACM SIGKDD (2017).

Digital Library

Cited By

Hu YWang JZhu HLi JShi J(2024)Cost-Sensitive Weighted Contrastive Learning Based on Graph Convolutional Networks for Imbalanced Alzheimer’s Disease StagingIEEE Transactions on Medical Imaging10.1109/TMI.2024.338974743:9(3126-3136)Online publication date: Sep-2024
https://doi.org/10.1109/TMI.2024.3389747
Zhang CHuang CTian YWen QOuyang ZLi YYe YZhang CKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)When sparsity meets contrastive modelsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620132(41133-41150)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3620132
Liu FZhang TZhang CLiu LWang LLiu B(2023)A Review of the Evaluation System for Curriculum LearningElectronics10.3390/electronics1207167612:7(1676)Online publication date: 1-Apr-2023
https://doi.org/10.3390/electronics12071676
Show More Cited By

Index Terms

SPARC: Self-Paced Network Representation for Few-Shot Rare Category Characterization
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

Gold Panning from the Mess: Rare Category Exploration, Exposition, Representation, and Interpretation
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

In contrast to the massive volume of data, it is often the rare categories that are of great importance in many high impact domains, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, ...
Rare Category Analysis for Complex Data: A Review
Though the sheer volume of data that is collected is immense, it is the rare categories that are often the most important in many high-impact domains, ranging from financial fraud detection in online transaction networks to emerging trend detection in ...
Hierarchical Taxonomy Aware Network Embedding
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Network embedding learns the low-dimensional representations for vertices, while preserving the inter-vertex similarity reflected by the network structure. The neighborhood structure of a vertex is usually closely related with an underlying hierarchical ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

July 2018

2925 pages

ISBN:9781450355520

DOI:10.1145/3219819

General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

U.S. Department of Homeland Security
Defense Sciences Office, DARPA
National Science Foundation

Conference

KDD '18

Sponsor:

KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 19 - 23, 2018

London, United Kingdom

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
1,718
Total Downloads

Downloads (Last 12 months)163
Downloads (Last 6 weeks)24

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hu YWang JZhu HLi JShi J(2024)Cost-Sensitive Weighted Contrastive Learning Based on Graph Convolutional Networks for Imbalanced Alzheimer’s Disease StagingIEEE Transactions on Medical Imaging10.1109/TMI.2024.338974743:9(3126-3136)Online publication date: Sep-2024
https://doi.org/10.1109/TMI.2024.3389747
Zhang CHuang CTian YWen QOuyang ZLi YYe YZhang CKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)When sparsity meets contrastive modelsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620132(41133-41150)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3620132
Liu FZhang TZhang CLiu LWang LLiu B(2023)A Review of the Evaluation System for Curriculum LearningElectronics10.3390/electronics1207167612:7(1676)Online publication date: 1-Apr-2023
https://doi.org/10.3390/electronics12071676
Zhou DHe J(2023)Rare Category Analysis for Complex Data: A ReviewACM Computing Surveys10.1145/362652056:5(1-35)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3626520
Wu LLei BXu DZhou DSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Towards Reliable Rare Category Analysis on Graphs via Individual CalibrationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599525(2629-2638)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599525
Gong MZhou HQin ALiu WZhao Z(2023)Self-Paced Co-Training of Graph Neural Networks for Semi-Supervised Node ClassificationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.315768834:11(9234-9247)Online publication date: Nov-2023
https://doi.org/10.1109/TNNLS.2022.3157688
Zhou SHuang XLiu NZhou HChung FHuang L(2023)Improving Generalizability of Graph Anomaly Detection Models via Data AugmentationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327177135:12(12721-12735)Online publication date: 1-Dec-2023
https://doi.org/10.1109/TKDE.2023.3271771
Zhang WMeng XLi JWang YZhang Y(2023)Multi-Layer Collaborative Bandit for Multivariate Time Series Anomaly Detection2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS)10.1109/IWQoS57198.2023.10188761(1-10)Online publication date: 19-Jun-2023
https://doi.org/10.1109/IWQoS57198.2023.10188761
Sood PSharma CNijjer SSakhuja S(2023)Review the role of artificial intelligence in detecting and preventing financial fraud using natural language processingInternational Journal of System Assurance Engineering and Management10.1007/s13198-023-02043-714:6(2120-2135)Online publication date: 26-Jul-2023
https://doi.org/10.1007/s13198-023-02043-7
Zhou DZheng LFu DHan JHe JAl Hasan MXiong L(2022)MentorGNNProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557393(2721-2731)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557393
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents