Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/646111.679466guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Outlier Detection Using Replicator Neural Networks

Published: 04 September 2002 Publication History

Abstract

We consider the problem of finding outliers in large multivariate databases. Outlier detection can be applied during the data cleansing process of data mining to identify problems with the data itself, and to fraud detection where groups of outliers are often of particular interest. We use replicator neural networks (RNNs) to provide a measure of the outlyingness of data records. The performance of the RNNs is assessed using a ranked score measure. The effectiveness of the RNNs for outlier detection is demonstrated on two publicly available databases.

References

[1]
D. H. Ackley, G. E. Hinton, and T. J. Sejinowski. A learning algorithm for boltzmann machines. Cognit. Sci. , 9:147-169, 1985.
[2]
A. C. Atkinson. Fast very robust methods for the detection of multiple outliers. Journal of the American Statistical Association , 89:1329-1339, 1994.
[3]
A. Bartkowiak and A. Szustalewicz. Detecting multivariate outliers by a grand tour. Machine Graphics and Vision , 6(4):487-505, 1997.
[4]
M. Breunig, H. Kriegel, R. Ng, and J. Sander. Lof: Identifying density-based local outliers. In Proc. ACM SIGMOD, Int. Conf. on Management of Data , 2000.
[5]
1999 KDD Cup competition. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
[6]
W. DuMouchel and M. Schonlau. A fast computer intrusion detection algorithm based on hypothesis testing of command transition probabilities. In Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining , pages 189-193, 1998.
[7]
M. Ester, H. P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. KDD , pages 226-231, 1999.
[8]
T. Fawcett and F. Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery Journal , 1(3):291-316, 1997.
[9]
D. M. Hawkins. Identification of outliers . Chapman and Hall, London, 1980.
[10]
R. Hecht-Nielsen. Replicator neural networks for universal optimal source coding. Science , 269(1860-1863), 1995.
[11]
E. Knorr and R. Ng. Aunified approach for mining outliers. In Proc. KDD , pages 219-222, 1997.
[12]
E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. In Proc. 24th Int. Conf. Very Large Data Bases, VLDB , pages 392- 403, 24-27 1998.
[13]
E. Knorr., R. Ng, and V. Tucakov. Distance-based outliers: Algorithms and applications. VLDB Journal: Very Large Data Bases , 8(3-4):237-253, 2000.
[14]
George Kollios, Dimitrios Gunopoulos, Nick Koudas, and Stefan Berchtold. An efficient approximation scheme for data mining tasks. In ICDE , 2001.
[15]
A. S. Kosinksi. A procedure for the detection of multivariate outliers. Computational Statistics and Data Analysis , 29, 1999.
[16]
R. Ng and J. Han. Efficient and effective clustering methods for spatial data mining. In Proc. 20th VLDB , pages 144-155, 1994.
[17]
S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proceedings of International Conference on Management of Data, A CM-SIGMOD , Dallas, 2000.
[18]
D. F. Swayne, D. Cook, and A. Buja. XGobi: interactive dynamic graphics in the X window system with a link to S. In Proceedings of the ASA Section on Statistical Graphics , pages 1-8, Alexandria, VA, 1991. American Statistical Association.
[19]
P. Sykacek. Equivalent error bars for neural network classifiers trained by bayesian inference. In Proc. ESANN , 1997.
[20]
G. Williams, I. Altas, S. Bakin, Peter Christen, Markus Hegland, Alonso Marquez, Peter Milne, Rajehndra Nagappan, and Stephen Roberts. The integrated delivery of large-scale data mining: The ACSys data mining project. In Mohammed J. Zaki and Ching-Tien Ho, editors, Large-Scale Parallel Data Mining , LNAI State-of-the-Art Survey, pages 24-54. Springer-Verlag, 2000.
[21]
G. Williams and Z. Huang. Mining the knowledge mine: The hot spots methodology for mining large real world databases. In Abdul Sattar, editor, Advanced Topics in Artificial Intelligence, volume 1342 of Lecture Notes in Artificial Intel-ligenvce , pages 340-348. Springer, 1997.
[22]
K. Yamanishi, J. Takeuchi, G. Williams, and P. Milne. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithm. In Proceedings of KDD2000 , pages 320-324, 2000.
[23]
T. Zhang, R. Ramakrishnan, and M. Livny. An efficient data clustering method for very large databases. In Proc. ACM SIGMOD , pages 103-114, 1996.

Cited By

View all
  • (2024)Increasing Detection Rate for Imbalanced Malicious Traffic using Generative Adversarial NetworksProceedings of the 2024 European Interdisciplinary Cybersecurity Conference10.1145/3655693.3655703(74-81)Online publication date: 5-Jun-2024
  • (2024)Adversarial sample attacks and defenses based on LSTM-ED in industrial control systemsComputers and Security10.1016/j.cose.2024.103750140:COnline publication date: 1-May-2024
  • (2023)GNN-based Advanced Feature Integration for ICS Anomaly DetectionACM Transactions on Intelligent Systems and Technology10.1145/362067614:6(1-32)Online publication date: 14-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
DaWaK 2000: Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
September 2002
337 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 04 September 2002

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Increasing Detection Rate for Imbalanced Malicious Traffic using Generative Adversarial NetworksProceedings of the 2024 European Interdisciplinary Cybersecurity Conference10.1145/3655693.3655703(74-81)Online publication date: 5-Jun-2024
  • (2024)Adversarial sample attacks and defenses based on LSTM-ED in industrial control systemsComputers and Security10.1016/j.cose.2024.103750140:COnline publication date: 1-May-2024
  • (2023)GNN-based Advanced Feature Integration for ICS Anomaly DetectionACM Transactions on Intelligent Systems and Technology10.1145/362067614:6(1-32)Online publication date: 14-Nov-2023
  • (2023)Multi-representations Space Separation based Graph-level Anomaly-aware DetectionProceedings of the 35th International Conference on Scientific and Statistical Database Management10.1145/3603719.3603739(1-11)Online publication date: 10-Jul-2023
  • (2023)Scalable Pythagorean Mean-based Incident Detection in Smart Transportation SystemsACM Transactions on Cyber-Physical Systems10.1145/36033818:2(1-25)Online publication date: 5-Jun-2023
  • (2023)DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular DataProceedings of the ACM on Management of Data10.1145/35893281:2(1-26)Online publication date: 20-Jun-2023
  • (2022)Unsupervised time series outlier detection with diversity-driven convolutional ensemblesProceedings of the VLDB Endowment10.14778/3494124.349414215:3(611-623)Online publication date: 4-Feb-2022
  • (2022)Detecting Extreme Traffic Events Via a Context Augmented Graph AutoencoderACM Transactions on Intelligent Systems and Technology10.1145/353973513:6(1-23)Online publication date: 22-Sep-2022
  • (2022)Deep Graph-level Anomaly Detection by Glocal Knowledge DistillationProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498473(704-714)Online publication date: 11-Feb-2022
  • (2022)Traffic Anomaly Detection Via Conditional Normalizing Flow2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC)10.1109/ITSC55140.2022.9922061(2563-2570)Online publication date: 8-Oct-2022
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media