Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Dual-MGAN: An Efficient Approach for Semi-supervised Outlier Detection with Few Identified Anomalies

Published: 30 July 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Outlier detection is an important task in data mining, and many technologies for it have been explored in various applications. However, owing to the default assumption that outliers are not concentrated, unsupervised outlier detection may not correctly identify group anomalies with higher levels of density. Although high detection rates and optimal parameters can usually be achieved by using supervised outlier detection, obtaining a sufficient number of correct labels is a time-consuming task. To solve these problems, we focus on semi-supervised outlier detection with few identified anomalies and a large amount of unlabeled data. The task of semi-supervised outlier detection is first decomposed into the detection of discrete anomalies and that of partially identified group anomalies, and a distribution construction sub-module and a data augmentation sub-module are then proposed to identify them, respectively. In this way, the dual multiple generative adversarial networks (Dual-MGAN) that combine the two sub-modules can identify discrete as well as partially identified group anomalies. In addition, in view of the difficulty of determining the stop node of training, two evaluation indicators are introduced to evaluate the training status of the sub-GANs. Extensive experiments on synthetic and real-world data show that the proposed Dual-MGAN can significantly improve the accuracy of outlier detection, and the proposed evaluation indicators can reflect the training status of the sub-GANs.

    References

    [1]
    C. C. Aggarwal. 2017. Outlier Analysis. Springer International Publishing.
    [2]
    S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon. 2018. GANomaly: Semi-supervised anomaly detection via adversarial training. In Proceedings of the Asian Conference on Computer Vision. 622–637. DOI:
    [3]
    F. Angiulli. 2019. CFOF: A concentration free measure for anomaly detection. ACM Transactions on Knowledge Discovery from Data 14, 1 (2019), 1–53. DOI:
    [4]
    A. Belhadi, Y. Djenouri, and J. C. Lin. 2019. Comparative study on trajectory outlier detection algorithms. In Proceedings of the International Conference on Data Mining Workshops.415–423. DOI:
    [5]
    J. Bian, X. L. Hui, S. Y. Sun, X. G. Zhao, and M. Tan. 2019. A novel and efficient CVAE-GAN-Based approach with informative manifold for semi-supervised anomaly detection. IEEE Access 7 (2019), 88903–88916. DOI:
    [6]
    G. O. Campos, A. Zimek, J. Sander, R. J. Campello, M. Barbora, E. Schubert, I. Assent, and M. E. Houle. 2016. On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study. Data Mining and Knowledge Discovery 30, 4 (2016), 891–927. DOI:
    [7]
    M. H. Chehreghani. 2016. K-Nearest neighbor search and outlier detection via minimax distances. In Proceedings of the SIAM International Conference on Data Mining. 405–413. DOI:
    [8]
    D. W. Cheng, X. Y. Wang, Y. Zhang, and L. Q. Zhang. 2020. Graph neural network for fraud detection via spatial-temporal attention. IEEE Transactions on Knowledge and Data Engineering (2020), 1–1. DOI:
    [9]
    A. Daneshpazhouh and A. Sami. 2013. Semi-supervised outlier detection with only positive and unlabeled data based on fuzzy clustering. In Proceedings of the 5th Conference on Information and Knowledge Technology. 344–348. DOI:
    [10]
    A. Daneshpazhouh and A. Sami. 2014. Entropy-Based outlier detection using semi-supervised approach with few positive examples. Pattern Recognition Letters 49, nov. 1 (2014), 77–84. DOI:
    [11]
    K. Ghosh Dastidar, J. Jurgovsky, W. Siblini, L. He-Guelton, and M. Granitzer. 2020. NAG: Neural feature aggregation framework for credit card fraud detection. In Proceedings of the 2020 IEEE International Conference on Data Mining.92–101. DOI:
    [12]
    Y. Dou, G. Ma, P. S. Yu, and S. Xie. 2020. Robust spammer detection by nash reinforcement learning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 924–933. DOI:
    [13]
    S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie. 2016. High-Dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition 58 (2016), 121–134. DOI:
    [14]
    T. Ergen and S. S. Kozat. 2020. Unsupervised anomaly detection with LSTM neural networks. IEEE Transactions on Neural Networks and Learning Systems 31, 8 (2020), 3127–3141. DOI:
    [15]
    U. Fiorea, A. D. Santis, F. Perla, P. Zanetti, and F. Palmieri. 2017. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences 479 (2017), 448–455. DOI:
    [16]
    J. Gao, H. B. Cheng, and P. N. Tan. 2006. Semi-supervised outlier detection. In Proceedings of the ACM symposium on Applied Computing. 635–636. DOI:
    [17]
    Y. D. Gao, B. Shi, B. Dong, Y. Y. Wang, L. Y. Mi, and Q. H. Zheng. 2021. Tax evasion detection with FBNE-PU algorithm based on PnCGCN and PU learning. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. DOI:
    [18]
    I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial networks. Advances in Neural Information Processing Systems 63, 11 (2014), 139–144.
    [19]
    S. Kim, Y. C. Tsai, K. Singh, Y. Choi, and M. Cha. 2020. DATE: Dual attentive tree-aware embedding for customs fraud detection. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2880–2890. DOI:
    [20]
    M. Kimura and T. Yanagihara. 2018. Anomaly detection using GANs for visual inspection in noisy training data. In Proceedings of the Computer Vision—ACCV 2018 Workshops. 373–385. DOI:
    [21]
    Y. Li, P. Hu, J. Z. Liu, D. Peng, J. T. Zhou, and X. Peng. 2020. Contrastive clustering. CoRR. abs/2009.09687.
    [22]
    H. J. Liao, C. Lin, Y. C. Lin, and K. Y. Tung. 2013. Intrusion detection system: A comprehensive review. Journal of Network and Computer Applications 36, 1 (2013), 16–24. DOI:
    [23]
    Q. Liao, H. Y. Chai, H. Han, X. Zhang, X. Wang, W. Xia, and Y. Ding. 2021. An integrated multi-task model for fake news detection. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. DOI:
    [24]
    S. K. Lim, Y. Loo, N. T. Tran, N. M. Cheung, G. Roig, and Y. Elovici. 2018. DOPING: Generative data augmentation for unsupervised anomaly detection with GAN. In Proceedings of the IEEE International Conference on Data Mining.1122–1127. DOI:
    [25]
    J. L. P. Lima, D. Macêdo, and C. Zanchettin. 2019. Heartbeat anomaly detection using adversarial oversampling. In Proceedings of the International Joint Conference on Neural Networks.1–7. DOI:
    [26]
    R. F. Lima and A. C. M. Pereira. 2017. Feature selection approaches to fraud detection in e-payment systems. In Proceedings of the International Conference on Electronic Commerce and Web Technologies. 111–126. DOI:
    [27]
    B. Liu, Y. S. Xiao, L. B. Cao, Z. F. Hao, and F. Q. Deng. 2013. SVDD-Based outlier detection on uncertain data. Knowledge and Information Systems 34, 3 (2013), 597–618. DOI:
    [28]
    B. Liu, Y. S. Xiao, P. S. Yu, Z. F. Hao, and L. B. Cao. 2014. An efficient approach for outlier detection with imperfect data labels. IEEE Transactions on Knowledge and Data Engineering 26, 7 (2014), 1602–1616. DOI:
    [29]
    F. T. Liu, K. M. Ting, and Z. H. Zhou. 2008. Isolation forest. In Proceedings of the IEEE International Conference on Data Mining. 413–422. DOI:
    [30]
    S. H. Liu, B. Hooi, and C. Faloutsos. 2019. A contrast metric for fraud detection in rich graphs. IEEE Transactions on Knowledge and Data Engineering 31, 12 (2019), 2235–2248. DOI:
    [31]
    Y. Z. Liu, Z. Li, C. Zhou, Y. C. Jiang, J. S. Sun, M. Wang, and X. N. He. 2020. Generative adversarial active learning for unsupervised outlier detection. IEEE Transactions on Knowledge and Data Engineering 32, 8 (2020), 1517–1528. DOI:
    [32]
    F. Lüer, D. Mautz, and C. Böhm. 2019. Anomaly detection in time series using generative adversarial networks. In Proceedings of the International Conference on Data Mining Workshops.1047–1048. DOI:
    [33]
    E. Manzoor, S. M. Milajerdi, and L. Akoglu. 2016. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1035–1044. DOI:
    [34]
    J. L. Mao, T. Wang, C. Q. Jin, and A. Y. Zhou. 2017. Feature grouping-based outlier detection upon streaming trajectories. IEEE Transactions on Knowledge and Data Engineering 29, 12 (2017), 2696–2709. DOI:
    [35]
    P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli. 2019. A detailed investigation and analysis of using machine learning techniques for intrusion detection. IEEE Communications Surveys Tutorials 21, 1 (2019), 686–728. DOI:
    [36]
    W. A. Mohotti and R. Nayak. 2020. Efficient outlier detection in text corpus using rare frequency and ranking. ACM Transactions on Knowledge Discovery from Data 14, 6 (2020), 1–30. DOI:
    [37]
    M. S. Munia, M. Nourani, and S. Houari. 2020. Biosignal oversampling using wasserstein generative adversarial network. In Proceedings of the IEEE International Conference on Healthcare Informatics.1–7. DOI:
    [38]
    M. Odiathevar, W. K. G. Seah, M. Frean, and A. Valera. 2021. An online offline framework for anomaly scoring and detecting new traffic in network streams. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. DOI:
    [39]
    A. D. Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi. 2018. Credit card fraud detection: A realistic modeling and a novel learning strategy. IEEE Transactions on Neural Networks and Learning Systems 29, 8 (2018), 3784–3797. DOI:
    [40]
    P. Qi, J. Cao, T. Y. Yang, J. B. Guo, and J. T. Li. 2019. Exploiting multi-domain visual information for fake news detection. In Proceedings of the 2019 IEEE International Conference on Data Mining.518–527. DOI:
    [41]
    T. Qiu, X. Z. Liu, X. B. Zhou, W. Y. Qu, Z. L. Ning, and C. L. P. Chen. 2020. An adaptive social spammer detection model with semi-supervised broad learning. IEEE Transactions on Knowledge and Data Engineering (2020), 1–1. DOI:
    [42]
    Y. X. Ren, B. Wang, J. W. Zhang, and Y. Chang. 2020. Adversarial active learning based heterogeneous graph neural network for fake news detection. In Proceedings of the 2020 IEEE International Conference on Data Mining.452–461. DOI:
    [43]
    M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli. 2018. Adversarially learned one-class classifier for novelty detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3379–3388. DOI:
    [44]
    M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Y. Zhang. 2016. Fast memory efficient local outlier detection in data streams. IEEE Transactions on Knowledge and Data Engineering 28, 12 (2016), 3246–3260. DOI:
    [45]
    T. Schlegl, P. Seebck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs. 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In Proceedings of the International Conference on Information Processing in Medical Imaging. 146–157. DOI:
    [46]
    M. J. Siers and M. Z. Islam. 2021. Class imbalance and cost-sensitive decision trees: A unified survey based on a core similarity. ACM Transactions on Knowledge Discovery from Data 15, 1 (2021), 1–31. DOI:
    [47]
    H. Y. Song, P. Z. Li, and H. F. Liu. 2021. Deep clustering-based fair outlier detection. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
    [48]
    I. Steinwart. 2005. A classification framework for anomaly detection. Journal of Machine Learning Research 6, 1 (2005), 211–232.
    [49]
    B. X. Wang and N. Japkowicz. 2010. Boosting support vector machines for imbalanced data sets. Knowledge and Information Systems 25, 1 (2010), 1–20. DOI:
    [50]
    D. X. Wang, J. B. Lin, P. Cui, Q. H. Jia, Z. Wang, Y. M. Fang, Q. Yu, J. Zhou, S. Yang, and Y. Qi. 2019. A semi-supervised graph attentive network for financial fraud detection. In Proceedings of the IEEE International Conference on Data Mining. 598–607. DOI:
    [51]
    Y. X. Xie, M. Qiu, H. B. Zhang, L. Z. Peng, and Z. X. Chen. 2020. Gaussian distribution based oversampling for imbalanced data classification. IEEE Transactions on Knowledge and Data Engineering (2020), 1–1. DOI:
    [52]
    Z. X. Xue, Y. L. Shang, and A. Feng. 2010. Semi-supervised outlier detection based on fuzzy rough C-means clustering. Knowledge and Information Systems 80, 9 (2010), 1911–1921. DOI:
    [53]
    X. Yang, L. J. Latecki, and D. Pokrajac. 2009. Outlier detection with globally optimal exemplar-based GMM. In Proceedings of the SIAM International Conference on Data Mining. 145–154. DOI:
    [54]
    X. W. Yi, X. D. Yang, Y. Y. Huang, S. Y. Ke, J. B. Zhang, T. R. Li, and Y. Zheng. 2021. Gas-Theft suspect detection among boiler room users: A data-driven approach. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. DOI:
    [55]
    W. Yu, C. Wei, C. C. Aggarwal, Z. Kai, and W. Wei. 2018. NetWalk: A flexible deep embedding approach for anomaly detection in dynamic networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2672–2681. DOI:
    [56]
    H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar. 2018. Efficient GAN-Based anomaly detection. In Proceedings of the Workshop on International Conference on Learning Representations.
    [57]
    Y. L. Zhang, L. Li, J. Zhou, X. Li, and Z. H. Zhou. 2018. Anomaly detection with partially observed anomalies. In Proceedings of the WWW: International World Wide Web Conference. 639–646. DOI:
    [58]
    Y. J. Zheng, X. H. Zhou, W. G. Sheng, Y. Xue, and S. Y. Chen. 2018. Generative adversarial network based telecom fraud detection at the receiving bank. Neural Networks 102 (2018), 78–86. DOI:
    [59]
    C. Zhou and R. C. Paffenroth. 2017. Anomaly detection with robust deep autoencoders. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 665–674. DOI:
    [60]
    J. T. Zhou, J. Du, H. Zhu, X. Peng, Y. Liu, and R. S. M. Goh. 2019. AnomalyNet: An anomaly detection network for video surveillance. IEEE Transactions on Information Forensics and Security 14, 10 (2019), 2537–2550. DOI:
    [61]
    B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen. 2018. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In Proceedings of the International Conference on Learning Representations. Retrieved from https://iclr.cc/Conferences/2018/Schedule?showEvent=12.

    Cited By

    View all
    • (2024)Co-occurrence Order-preserving Pattern Mining with Keypoint Alignment for Time SeriesACM Transactions on Management Information Systems10.1145/365845015:2(1-27)Online publication date: 12-Jun-2024
    • (2024)WAKE: A Weakly Supervised Business Process Anomaly Detection Framework via a Pre-Trained AutoencoderIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332241136:6(2745-2758)Online publication date: Jun-2024
    • (2023)Spatial–Temporal Traffic Modeling With a Fusion Graph Reconstructed by Tensor DecompositionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.331413425:2(1749-1760)Online publication date: 22-Sep-2023
    • Show More Cited By

    Index Terms

    1. Dual-MGAN: An Efficient Approach for Semi-supervised Outlier Detection with Few Identified Anomalies

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 6
        December 2022
        631 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3543989
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 30 July 2022
        Online AM: 15 March 2022
        Accepted: 01 February 2022
        Revised: 01 December 2021
        Received: 01 August 2021
        Published in TKDD Volume 16, Issue 6

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Discrete anomalies
        2. partially identified group anomalies
        3. distribution construction
        4. data augmentation

        Qualifiers

        • Research-article
        • Refereed

        Funding Sources

        • National Natural Science Foundation of China
        • BUCEA Young Scholar Research Capability Improvement Plan
        • National Engineering Laboratory for Big Data Distribution and Exchange Technologies

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)257
        • Downloads (Last 6 weeks)13

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Co-occurrence Order-preserving Pattern Mining with Keypoint Alignment for Time SeriesACM Transactions on Management Information Systems10.1145/365845015:2(1-27)Online publication date: 12-Jun-2024
        • (2024)WAKE: A Weakly Supervised Business Process Anomaly Detection Framework via a Pre-Trained AutoencoderIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332241136:6(2745-2758)Online publication date: Jun-2024
        • (2023)Spatial–Temporal Traffic Modeling With a Fusion Graph Reconstructed by Tensor DecompositionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.331413425:2(1749-1760)Online publication date: 22-Sep-2023
        • (2023)Measures and Optimization for Robustness and Vulnerability in Disconnected NetworksIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.327997918(3350-3362)Online publication date: 1-Jan-2023

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        Full Text

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media