research-article

Dual-MGAN: An Efficient Approach for Semi-supervised Outlier Detection with Few Identified Anomalies

Authors:

Yezheng LiuAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 16, Issue 6

Article No.: 107, Pages 1 - 30

https://doi.org/10.1145/3522690

Published: 30 July 2022 Publication History

Abstract

Outlier detection is an important task in data mining, and many technologies for it have been explored in various applications. However, owing to the default assumption that outliers are not concentrated, unsupervised outlier detection may not correctly identify group anomalies with higher levels of density. Although high detection rates and optimal parameters can usually be achieved by using supervised outlier detection, obtaining a sufficient number of correct labels is a time-consuming task. To solve these problems, we focus on semi-supervised outlier detection with few identified anomalies and a large amount of unlabeled data. The task of semi-supervised outlier detection is first decomposed into the detection of discrete anomalies and that of partially identified group anomalies, and a distribution construction sub-module and a data augmentation sub-module are then proposed to identify them, respectively. In this way, the dual multiple generative adversarial networks (Dual-MGAN) that combine the two sub-modules can identify discrete as well as partially identified group anomalies. In addition, in view of the difficulty of determining the stop node of training, two evaluation indicators are introduced to evaluate the training status of the sub-GANs. Extensive experiments on synthetic and real-world data show that the proposed Dual-MGAN can significantly improve the accuracy of outlier detection, and the proposed evaluation indicators can reflect the training status of the sub-GANs.

References

[1]

C. C. Aggarwal. 2017. Outlier Analysis. Springer International Publishing.

[2]

S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon. 2018. GANomaly: Semi-supervised anomaly detection via adversarial training. In Proceedings of the Asian Conference on Computer Vision. 622–637. DOI:

[3]

F. Angiulli. 2019. CFOF: A concentration free measure for anomaly detection. ACM Transactions on Knowledge Discovery from Data 14, 1 (2019), 1–53. DOI:

Digital Library

[4]

A. Belhadi, Y. Djenouri, and J. C. Lin. 2019. Comparative study on trajectory outlier detection algorithms. In Proceedings of the International Conference on Data Mining Workshops.415–423. DOI:

[5]

J. Bian, X. L. Hui, S. Y. Sun, X. G. Zhao, and M. Tan. 2019. A novel and efficient CVAE-GAN-Based approach with informative manifold for semi-supervised anomaly detection. IEEE Access 7 (2019), 88903–88916. DOI:

[6]

G. O. Campos, A. Zimek, J. Sander, R. J. Campello, M. Barbora, E. Schubert, I. Assent, and M. E. Houle. 2016. On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study. Data Mining and Knowledge Discovery 30, 4 (2016), 891–927. DOI:

Digital Library

[7]

M. H. Chehreghani. 2016. K-Nearest neighbor search and outlier detection via minimax distances. In Proceedings of the SIAM International Conference on Data Mining. 405–413. DOI:

[8]

D. W. Cheng, X. Y. Wang, Y. Zhang, and L. Q. Zhang. 2020. Graph neural network for fraud detection via spatial-temporal attention. IEEE Transactions on Knowledge and Data Engineering (2020), 1–1. DOI:

[9]

A. Daneshpazhouh and A. Sami. 2013. Semi-supervised outlier detection with only positive and unlabeled data based on fuzzy clustering. In Proceedings of the 5th Conference on Information and Knowledge Technology. 344–348. DOI:

[10]

A. Daneshpazhouh and A. Sami. 2014. Entropy-Based outlier detection using semi-supervised approach with few positive examples. Pattern Recognition Letters 49, nov. 1 (2014), 77–84. DOI:

Digital Library

[11]

K. Ghosh Dastidar, J. Jurgovsky, W. Siblini, L. He-Guelton, and M. Granitzer. 2020. NAG: Neural feature aggregation framework for credit card fraud detection. In Proceedings of the 2020 IEEE International Conference on Data Mining.92–101. DOI:

[12]

Y. Dou, G. Ma, P. S. Yu, and S. Xie. 2020. Robust spammer detection by nash reinforcement learning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 924–933. DOI:

Digital Library

[13]

S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie. 2016. High-Dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition 58 (2016), 121–134. DOI:

Digital Library

[14]

T. Ergen and S. S. Kozat. 2020. Unsupervised anomaly detection with LSTM neural networks. IEEE Transactions on Neural Networks and Learning Systems 31, 8 (2020), 3127–3141. DOI:

[15]

U. Fiorea, A. D. Santis, F. Perla, P. Zanetti, and F. Palmieri. 2017. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences 479 (2017), 448–455. DOI:

[16]

J. Gao, H. B. Cheng, and P. N. Tan. 2006. Semi-supervised outlier detection. In Proceedings of the ACM symposium on Applied Computing. 635–636. DOI:

Digital Library

[17]

Y. D. Gao, B. Shi, B. Dong, Y. Y. Wang, L. Y. Mi, and Q. H. Zheng. 2021. Tax evasion detection with FBNE-PU algorithm based on PnCGCN and PU learning. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. DOI:

[18]

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial networks. Advances in Neural Information Processing Systems 63, 11 (2014), 139–144.

[19]

S. Kim, Y. C. Tsai, K. Singh, Y. Choi, and M. Cha. 2020. DATE: Dual attentive tree-aware embedding for customs fraud detection. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2880–2890. DOI:

Digital Library

[20]

M. Kimura and T. Yanagihara. 2018. Anomaly detection using GANs for visual inspection in noisy training data. In Proceedings of the Computer Vision—ACCV 2018 Workshops. 373–385. DOI:

[21]

Y. Li, P. Hu, J. Z. Liu, D. Peng, J. T. Zhou, and X. Peng. 2020. Contrastive clustering. CoRR. abs/2009.09687.

[22]

H. J. Liao, C. Lin, Y. C. Lin, and K. Y. Tung. 2013. Intrusion detection system: A comprehensive review. Journal of Network and Computer Applications 36, 1 (2013), 16–24. DOI:

Digital Library

[23]

Q. Liao, H. Y. Chai, H. Han, X. Zhang, X. Wang, W. Xia, and Y. Ding. 2021. An integrated multi-task model for fake news detection. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. DOI:

[24]

S. K. Lim, Y. Loo, N. T. Tran, N. M. Cheung, G. Roig, and Y. Elovici. 2018. DOPING: Generative data augmentation for unsupervised anomaly detection with GAN. In Proceedings of the IEEE International Conference on Data Mining.1122–1127. DOI:

[25]

J. L. P. Lima, D. Macêdo, and C. Zanchettin. 2019. Heartbeat anomaly detection using adversarial oversampling. In Proceedings of the International Joint Conference on Neural Networks.1–7. DOI:

[26]

R. F. Lima and A. C. M. Pereira. 2017. Feature selection approaches to fraud detection in e-payment systems. In Proceedings of the International Conference on Electronic Commerce and Web Technologies. 111–126. DOI:

[27]

B. Liu, Y. S. Xiao, L. B. Cao, Z. F. Hao, and F. Q. Deng. 2013. SVDD-Based outlier detection on uncertain data. Knowledge and Information Systems 34, 3 (2013), 597–618. DOI:

Digital Library

[28]

B. Liu, Y. S. Xiao, P. S. Yu, Z. F. Hao, and L. B. Cao. 2014. An efficient approach for outlier detection with imperfect data labels. IEEE Transactions on Knowledge and Data Engineering 26, 7 (2014), 1602–1616. DOI:

[29]

F. T. Liu, K. M. Ting, and Z. H. Zhou. 2008. Isolation forest. In Proceedings of the IEEE International Conference on Data Mining. 413–422. DOI:

Digital Library

[30]

S. H. Liu, B. Hooi, and C. Faloutsos. 2019. A contrast metric for fraud detection in rich graphs. IEEE Transactions on Knowledge and Data Engineering 31, 12 (2019), 2235–2248. DOI:

[31]

Y. Z. Liu, Z. Li, C. Zhou, Y. C. Jiang, J. S. Sun, M. Wang, and X. N. He. 2020. Generative adversarial active learning for unsupervised outlier detection. IEEE Transactions on Knowledge and Data Engineering 32, 8 (2020), 1517–1528. DOI:

[32]

F. Lüer, D. Mautz, and C. Böhm. 2019. Anomaly detection in time series using generative adversarial networks. In Proceedings of the International Conference on Data Mining Workshops.1047–1048. DOI:

[33]

E. Manzoor, S. M. Milajerdi, and L. Akoglu. 2016. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1035–1044. DOI:

Digital Library

[34]

J. L. Mao, T. Wang, C. Q. Jin, and A. Y. Zhou. 2017. Feature grouping-based outlier detection upon streaming trajectories. IEEE Transactions on Knowledge and Data Engineering 29, 12 (2017), 2696–2709. DOI:

[35]

P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli. 2019. A detailed investigation and analysis of using machine learning techniques for intrusion detection. IEEE Communications Surveys Tutorials 21, 1 (2019), 686–728. DOI:

[36]

W. A. Mohotti and R. Nayak. 2020. Efficient outlier detection in text corpus using rare frequency and ranking. ACM Transactions on Knowledge Discovery from Data 14, 6 (2020), 1–30. DOI:

Digital Library

[37]

M. S. Munia, M. Nourani, and S. Houari. 2020. Biosignal oversampling using wasserstein generative adversarial network. In Proceedings of the IEEE International Conference on Healthcare Informatics.1–7. DOI:

[38]

M. Odiathevar, W. K. G. Seah, M. Frean, and A. Valera. 2021. An online offline framework for anomaly scoring and detecting new traffic in network streams. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. DOI:

[39]

A. D. Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi. 2018. Credit card fraud detection: A realistic modeling and a novel learning strategy. IEEE Transactions on Neural Networks and Learning Systems 29, 8 (2018), 3784–3797. DOI:

[40]

P. Qi, J. Cao, T. Y. Yang, J. B. Guo, and J. T. Li. 2019. Exploiting multi-domain visual information for fake news detection. In Proceedings of the 2019 IEEE International Conference on Data Mining.518–527. DOI:

[41]

T. Qiu, X. Z. Liu, X. B. Zhou, W. Y. Qu, Z. L. Ning, and C. L. P. Chen. 2020. An adaptive social spammer detection model with semi-supervised broad learning. IEEE Transactions on Knowledge and Data Engineering (2020), 1–1. DOI:

[42]

Y. X. Ren, B. Wang, J. W. Zhang, and Y. Chang. 2020. Adversarial active learning based heterogeneous graph neural network for fake news detection. In Proceedings of the 2020 IEEE International Conference on Data Mining.452–461. DOI:

[43]

M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli. 2018. Adversarially learned one-class classifier for novelty detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3379–3388. DOI:

[44]

M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Y. Zhang. 2016. Fast memory efficient local outlier detection in data streams. IEEE Transactions on Knowledge and Data Engineering 28, 12 (2016), 3246–3260. DOI:

[45]

T. Schlegl, P. Seebck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs. 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In Proceedings of the International Conference on Information Processing in Medical Imaging. 146–157. DOI:

[46]

M. J. Siers and M. Z. Islam. 2021. Class imbalance and cost-sensitive decision trees: A unified survey based on a core similarity. ACM Transactions on Knowledge Discovery from Data 15, 1 (2021), 1–31. DOI:

Digital Library

[47]

H. Y. Song, P. Z. Li, and H. F. Liu. 2021. Deep clustering-based fair outlier detection. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Digital Library

[48]

I. Steinwart. 2005. A classification framework for anomaly detection. Journal of Machine Learning Research 6, 1 (2005), 211–232.

Digital Library

[49]

B. X. Wang and N. Japkowicz. 2010. Boosting support vector machines for imbalanced data sets. Knowledge and Information Systems 25, 1 (2010), 1–20. DOI:

Digital Library

[50]

D. X. Wang, J. B. Lin, P. Cui, Q. H. Jia, Z. Wang, Y. M. Fang, Q. Yu, J. Zhou, S. Yang, and Y. Qi. 2019. A semi-supervised graph attentive network for financial fraud detection. In Proceedings of the IEEE International Conference on Data Mining. 598–607. DOI:

[51]

Y. X. Xie, M. Qiu, H. B. Zhang, L. Z. Peng, and Z. X. Chen. 2020. Gaussian distribution based oversampling for imbalanced data classification. IEEE Transactions on Knowledge and Data Engineering (2020), 1–1. DOI:

Digital Library

[52]

Z. X. Xue, Y. L. Shang, and A. Feng. 2010. Semi-supervised outlier detection based on fuzzy rough C-means clustering. Knowledge and Information Systems 80, 9 (2010), 1911–1921. DOI:

Digital Library

[53]

X. Yang, L. J. Latecki, and D. Pokrajac. 2009. Outlier detection with globally optimal exemplar-based GMM. In Proceedings of the SIAM International Conference on Data Mining. 145–154. DOI:

[54]

X. W. Yi, X. D. Yang, Y. Y. Huang, S. Y. Ke, J. B. Zhang, T. R. Li, and Y. Zheng. 2021. Gas-Theft suspect detection among boiler room users: A data-driven approach. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. DOI:

[55]

W. Yu, C. Wei, C. C. Aggarwal, Z. Kai, and W. Wei. 2018. NetWalk: A flexible deep embedding approach for anomaly detection in dynamic networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2672–2681. DOI:

Digital Library

[56]

H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar. 2018. Efficient GAN-Based anomaly detection. In Proceedings of the Workshop on International Conference on Learning Representations.

[57]

Y. L. Zhang, L. Li, J. Zhou, X. Li, and Z. H. Zhou. 2018. Anomaly detection with partially observed anomalies. In Proceedings of the WWW: International World Wide Web Conference. 639–646. DOI:

Digital Library

[58]

Y. J. Zheng, X. H. Zhou, W. G. Sheng, Y. Xue, and S. Y. Chen. 2018. Generative adversarial network based telecom fraud detection at the receiving bank. Neural Networks 102 (2018), 78–86. DOI:

Digital Library

[59]

C. Zhou and R. C. Paffenroth. 2017. Anomaly detection with robust deep autoencoders. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 665–674. DOI:

Digital Library

[60]

J. T. Zhou, J. Du, H. Zhu, X. Peng, Y. Liu, and R. S. M. Goh. 2019. AnomalyNet: An anomaly detection network for video surveillance. IEEE Transactions on Information Forensics and Security 14, 10 (2019), 2537–2550. DOI:

[61]

B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen. 2018. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In Proceedings of the International Conference on Learning Representations. Retrieved from https://iclr.cc/Conferences/2018/Schedule?showEvent=12.

Cited By

Wu YWang ZLi YGuo YJiang HZhu XWu X(2024)Co-occurrence Order-preserving Pattern Mining with Keypoint Alignment for Time SeriesACM Transactions on Management Information Systems10.1145/365845015:2(1-27)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3658450
Guan WCao JZhao HGu YQian S(2024)WAKE: A Weakly Supervised Business Process Anomaly Detection Framework via a Pre-Trained AutoencoderIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332241136:6(2745-2758)Online publication date: Jun-2024
https://doi.org/10.1109/TKDE.2023.3322411
Li QYang XWang YWu YHe D(2023)Spatial–Temporal Traffic Modeling With a Fusion Graph Reconstructed by Tensor DecompositionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.331413425:2(1749-1760)Online publication date: 22-Sep-2023
https://dl.acm.org/doi/10.1109/TITS.2023.3314134
Show More Cited By

Index Terms

Dual-MGAN: An Efficient Approach for Semi-supervised Outlier Detection with Few Identified Anomalies
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Anomaly detection
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Semi-supervised Based Training Set Construction for Outlier Detection
CLOUDCOM-ASIA '13: Proceedings of the 2013 International Conference on Cloud Computing and Big Data

Outliers are sparse and few. It's costly to obtain a training set with enough outliers so that existing approaches to the problem of outlier detection seldom processed with supervised manner. However, given a training set with sufficient outliers, ...
Read More
Semi-supervised outlier detection
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

Outlier detection has been extensively researched in the context of unsupervised learning. But the learning results are not always satisfactory, which can be significantly improved using supervision of some labeled points. In this paper, we are ...
Read More
Entropy-based outlier detection using semi-supervised approach with few positive examples

Outlier detection is an important problem in data mining that aims to discover useful exceptional and unusual patterns hidden in large data sets. Fraud detection, time series monitoring, intrusion detection and medical condition monitoring are some of ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 16, Issue 6

December 2022

631 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3543989

Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2022

Online AM: 15 March 2022

Accepted: 01 February 2022

Revised: 01 December 2021

Received: 01 August 2021

Published in TKDD Volume 16, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China
BUCEA Young Scholar Research Capability Improvement Plan
National Engineering Laboratory for Big Data Distribution and Exchange Technologies

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
893
Total Downloads

Downloads (Last 12 months)257
Downloads (Last 6 weeks)13

Other Metrics

View Author Metrics

Citations

Cited By

Wu YWang ZLi YGuo YJiang HZhu XWu X(2024)Co-occurrence Order-preserving Pattern Mining with Keypoint Alignment for Time SeriesACM Transactions on Management Information Systems10.1145/365845015:2(1-27)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3658450
Guan WCao JZhao HGu YQian S(2024)WAKE: A Weakly Supervised Business Process Anomaly Detection Framework via a Pre-Trained AutoencoderIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332241136:6(2745-2758)Online publication date: Jun-2024
https://doi.org/10.1109/TKDE.2023.3322411
Li QYang XWang YWu YHe D(2023)Spatial–Temporal Traffic Modeling With a Fusion Graph Reconstructed by Tensor DecompositionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.331413425:2(1749-1760)Online publication date: 22-Sep-2023
https://dl.acm.org/doi/10.1109/TITS.2023.3314134
Zhu LBao QZhang Z(2023)Measures and Optimization for Robustness and Vulnerability in Disconnected NetworksIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.327997918(3350-3362)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TIFS.2023.3279979

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents