Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3219819.3219968acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

SPARC: Self-Paced Network Representation for Few-Shot Rare Category Characterization

Published: 19 July 2018 Publication History

Abstract

In the era of big data, it is often the rare categories that are of great interest in many high-impact applications, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from network intrusion detection in computer networks to fault detection in manufacturing. As a result, rare category characterization becomes a fundamental learning task, which aims to accurately characterize the rare categories given limited label information. The unique challenge of rare category characterization, i.e., the non-separability nature of the rare categories from the majority classes, together with the availability of the multi-modal representation of the examples, poses a new research question: how can we learn a salient rare category oriented embedding representation such that the rare examples are well separated from the majority class examples in the embedding space, which facilitates the follow-up rare category characterization?
To address this question, inspired by the family of curriculum learning that simulates the cognitive mechanism of human beings, we propose a self-paced framework named SPARC that gradually learns the rare category oriented network representation and the characterization model in a mutually beneficial way by shifting from the 'easy' concept to the target 'difficult' one, in order to facilitate more reliable label propagation to the large number of unlabeled examples. The experimental results on various real data demonstrate that our proposed SPARC algorithm: (1) shows a significant improvement over state-of-the-art graph embedding methods on representing the rare categories that are non-separable from the majority classes; (2) outperforms the existing methods on rare category characterization tasks.

References

[1]
M. Belkin and P. Niyogi . 2002. Laplacian eigenmaps and spectral techniques for embedding and clustering NIPS (2002).
[2]
Y. Bengio . 2014. Evolving culture versus local minima. Growing Adaptive Machines (2014).
[3]
Y. Bengio, J. Louradour, R. Collobert, and J. Weston . 2009. Curriculum learning ICML (2009).
[4]
A. Blum and T. Mitchell . 1998. Combining labeled and unlabeled data with co-training COLT (1998).
[5]
L. Bottou . 2010. Large-scale machine learning with stochastic gradient descent. COMPSTAT (2010).
[6]
N. V Chawla, K. W Bowyer, L. O Hall, and W P. Kegelmeyer . 2002. SMOTE: synthetic minority over-sampling technique. JAIR (2002).
[7]
T. Chen and Y. Sun . 2017. Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification ACM WSDM (2017).
[8]
B. Du, S. Zhang, N. Cao, and H. Tong . 2017. First: Fast interactive attributed subgraph matching ACM SIGKDD (2017).
[9]
E. M Fich and A. Shivdasani . 2007. Financial fraud, director reputation, and shareholder wealth. Journal of Financial Economics (2007).
[10]
A. Grover and J. Leskovec . 2016. node2vec: Scalable feature learning for networks. ACM SIGKDD (2016).
[11]
R. Guo, J. Li, and H. Liu . 2018. INITIATOR: Noise-contrastive Estimation for Marked Temporal Point Process IJCAI (2018).
[12]
H. He and E. A Garcia . 2009. Learning from imbalanced data. IEEE TKDE (2009).
[13]
J. He, Y. Liu, and R. Lawrence . 2008. Graph-based rare category detection. In IEEE ICDM (2008).
[14]
J. He, H. Tong, and J. Carbonell . 2010. Rare category characterization. In IEEE ICDM (2010).
[15]
V. Hodge and J. Austin . 2004. A survey of outlier detection methodologies. Artificial intelligence review (2004).
[16]
C. Huang, Y. Li, C. Change, and X. Tang . 2016. Learning deep representation for imbalanced classification IEEE CVPR (2016).
[17]
L. Jiang, D. Meng, Q. Zhao, S. Shan, and A. G Hauptmann . 2015. Self-Paced Curriculum Learning. In AAAI (2015).
[18]
N Jindal and B Liu . 2007. Review spam detection WWW (2007).
[19]
F. Khan, B. Mutlu, and X. Zhu . 2011. How do humans teach: On curriculum learning and teaching dimension NIPS (2011).
[20]
J. B Kruskal and M. Wish . 1978. Multidimensional scaling.
[21]
M P. Kumar, B. Packer, and D. Koller . 2010. Self-paced learning for latent variable models. In NIPS (2010).
[22]
J. Leskovec and A. Krevl . 2015. $$SNAP Datasets$$:$$Stanford$$ Large Network Dataset Collection. (2015).
[23]
J. Li, H. Dani, X. Hu, J. Tang, Y. Chang, and H. Liu . 2017. Attributed network embedding for learning in a dynamic environment ACM CIKM (2017).
[24]
S. Li, M. Shao, and Y. Fu . {n. d.}. Multi-view low-rank analysis for outlier detection SIAM SDM (2015).
[25]
S. Li, M. Shao, and Y. Fu . 2018. Multi-View Low-Rank Analysis with Applications to Outlier Detection. TKDD (2018).
[26]
F. Ma, D. Meng, Q. Xie, Z. Li, and X. Dong . 2017. Self-paced co-training. In ICML (2017).
[27]
L. Maaten and G. Hinton . 2008. Visualizing data using t-SNE. JMLR (2008).
[28]
T. Mikolov, I. Sutskever, K. Chen, G. S Corrado, and J. Dean . 2013. Distributed representations of words and phrases and their compositionality NIPS (2013).
[29]
D. Pelleg and A. W Moore . 2005. Active learning for anomaly and rare-category detection NIPS (2005).
[30]
B. Perozzi, R. Al-Rfou, and S. Skiena . 2014. Deepwalk: Online learning of social representations ACM SIGKDD (2014).
[31]
X. R and L. Bo . 2012. Discriminatively trained sparse code gradients for contour detection NIPS (2012).
[32]
S. T Roweis and L. K Saul . 2000. Nonlinear dimensionality reduction by locally linear embedding. science (2000).
[33]
W. Shen, X. Wang, Y. Wang, X. Bai, and Z. Zhang . 2015. Deepcontour: A deep convolutional feature learned by positive-sharing loss for contour detection IEEE CVPR (2015).
[34]
V. I Spitkovsky, H. Alshawi, and D. Jurafsky . 2009. Baby Steps: How “Less is More” in unsupervised dependency parsing. NIPS (2009).
[35]
Y. Sun, M. S Kamel, and Y. Wang . 2006. Boosting for learning multiple classes with imbalanced class distribution IEEE ICDM (2006).
[36]
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei . 2015. Line: Large-scale information network embedding. WWW (2015).
[37]
J. B Tenenbaum, V. De Silva, and J. C Langford . 2000. A global geometric framework for nonlinear dimensionality reduction. science (2000).
[38]
X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang . 2017. Community Preserving Network Embedding. In AAAI (2017).
[39]
G. Wu and E. Y Chang . 2003. Adaptive feature-space conformal transformation for imbalanced-data learning ICML (2003).
[40]
J. Wu, J. He, and Y. Liu . 2018. ImVerde: Vertex-Diminished Random Walk for Learning Network Representation from Imbalanced Data. arXiv preprint arXiv:1804.09222 (2018).
[41]
S. Wu, Q. Ji, S. Wang, H. Wong, Z. Yu, and Y. Xu . 2017. Semi-Supervised Image Classification with Self-Paced Cross-Task Networks. TMM (2017).
[42]
Y. Xu and W. Yin . 2013. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. JIS (2013).
[43]
Z. Yang, W. W Cohen, and R. Salakhutdinov . 2016. Revisiting semi-supervised learning with graph embeddings ICML (2016).
[44]
B. Zadrozny, J. Langford, and N. Abe . 2003. Cost-sensitive learning by cost-proportionate example weighting IEEE ICDM (2003).
[45]
S. Zhang, D. Zhou, M. Y. Yildirim, S. Alcorn, J. He, H. Davulcu, and H. Tong . 2017. HiDDen: hierarchical dense subgraph detection with application to financial fraud detection SIAM SDM (2017).
[46]
D. Zhou, J. He, K S. Candan, and H. Davulcu . 2015 a. MUVIR: Multi-View Rare Category Detection. In IJCAI (2015).
[47]
D. Zhou, J. He, Y. Cao, and J. Seo . 2016. Bi-level rare temporal pattern detection. In IEEE ICDM (2016).
[48]
D. Zhou, A. Karthikeyan, K. Wang, N. Cao, and J. He . 2017 a. Discovering rare categories from graph streams. DMKD (2017).
[49]
D. Zhou, K. Wang, N. Cao, and J. He . 2015 b. Rare category detection on time-evolving graphs. IEEE ICDM (2015).
[50]
D. Zhou, S. Zhang, M. Y. Yildirim, S. Alcorn, H. Tong, H. Davulcu, and J. He . 2017 b. A local algorithm for structure-preserving graph cut ACM SIGKDD (2017).

Cited By

View all
  • (2024)Cost-Sensitive Weighted Contrastive Learning Based on Graph Convolutional Networks for Imbalanced Alzheimer’s Disease StagingIEEE Transactions on Medical Imaging10.1109/TMI.2024.338974743:9(3126-3136)Online publication date: Sep-2024
  • (2023)When sparsity meets contrastive modelsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620132(41133-41150)Online publication date: 23-Jul-2023
  • (2023)A Review of the Evaluation System for Curriculum LearningElectronics10.3390/electronics1207167612:7(1676)Online publication date: 1-Apr-2023
  • Show More Cited By

Index Terms

  1. SPARC: Self-Paced Network Representation for Few-Shot Rare Category Characterization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
    July 2018
    2925 pages
    ISBN:9781450355520
    DOI:10.1145/3219819
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. network embedding
    2. rare category analysis
    3. self-paced learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '18
    Sponsor:

    Acceptance Rates

    KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)163
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Cost-Sensitive Weighted Contrastive Learning Based on Graph Convolutional Networks for Imbalanced Alzheimer’s Disease StagingIEEE Transactions on Medical Imaging10.1109/TMI.2024.338974743:9(3126-3136)Online publication date: Sep-2024
    • (2023)When sparsity meets contrastive modelsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620132(41133-41150)Online publication date: 23-Jul-2023
    • (2023)A Review of the Evaluation System for Curriculum LearningElectronics10.3390/electronics1207167612:7(1676)Online publication date: 1-Apr-2023
    • (2023)Rare Category Analysis for Complex Data: A ReviewACM Computing Surveys10.1145/362652056:5(1-35)Online publication date: 27-Nov-2023
    • (2023)Towards Reliable Rare Category Analysis on Graphs via Individual CalibrationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599525(2629-2638)Online publication date: 6-Aug-2023
    • (2023)Self-Paced Co-Training of Graph Neural Networks for Semi-Supervised Node ClassificationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.315768834:11(9234-9247)Online publication date: Nov-2023
    • (2023)Improving Generalizability of Graph Anomaly Detection Models via Data AugmentationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327177135:12(12721-12735)Online publication date: 1-Dec-2023
    • (2023)Multi-Layer Collaborative Bandit for Multivariate Time Series Anomaly Detection2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS)10.1109/IWQoS57198.2023.10188761(1-10)Online publication date: 19-Jun-2023
    • (2023)Review the role of artificial intelligence in detecting and preventing financial fraud using natural language processingInternational Journal of System Assurance Engineering and Management10.1007/s13198-023-02043-714:6(2120-2135)Online publication date: 26-Jul-2023
    • (2022)MentorGNNProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557393(2721-2731)Online publication date: 17-Oct-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media