article

Finding centric local outliers in categorical/numerical spaces

Authors:

Aoying ZhouAuthors Info & Claims

Knowledge and Information Systems, Volume 9, Issue 3

Pages 309 - 338

Published: 01 March 2006 Publication History

Abstract

Outlier detection techniques are widely used in many applications such as credit-card fraud detection, monitoring criminal activities in electronic commerce, etc. These applications attempt to identify outliers as noises, exceptions, or objects around the border. The existing density-based local outlier detection assigns the degree to which an object is an outlier in a numerical space. In this paper, we propose a novel mutual-reinforcement-based local outlier detection approach. Instead of detecting local outliers as noise, we attempt to identify local outliers in the center, where they are similar to some clusters of objects on one hand, and are unique on the other. Our technique can be used for bank investment to identify a unique body, similar to many good competitors, in which to invest. We attempt to detect local outliers in categorical, ordinal as well as numerical data. In categorical data, the challenge is that there are many similar but different ways to specify relationships among the data items. Our mutual-reinforcement-based approach is stable, with similar but different user-defined relationships. Our technique can reduce the burden for users to determine the relationships among data items, and find the explanations why the outliers are found. We conducted extensive experimental studies using real datasets.

References

[1]

Aggarwal C, Yu P (2001) Outlier detection for high dimensional data. In: Proceedings of ACM SIGMOD international conference on management of data. ACM, New York, pp 37---47

[2]

Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, New York

[3]

Breunig M, Kriegel H-P, Ng R, Sander J (1999) Optics-of: Identifying local outliers. In: Proccedings of the 3rd European conference on principles and practice of knowledge discovery in databases. Springer, Berlin Heidelberg New York, pp 262---270

[4]

Breunig M, Kriegel H-P, Ng R, Sander J (2000) Lof: Identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 93---104

[5]

Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International conference on knowledge discovery and data mining. AAAI, Manlo Park, CA, pp 226---231

[6]

Guha S, Rastogi R, Shim K (1998) Cure: An efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 73---84

[7]

Guha S, Rastogi R, Shim K (1999) Rock: A robust clustering algorithm for categorical attributes. In: Proceedings of the IEEE international conference on data engineering. IEEE Computer Society, Morristown, NJ

[8]

Hawkins D (1980) Identification of outliers. Chapman and Hall, London

[9]

Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th international conference on knowledge discovery and data mining. AAAI, Menlo Park, CA, pp 58---65

[10]

Jin W, Tung AK, Han J (2001) Mining top-n local outliers in large databases. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 293---298

[11]

Karypis G, Han E, Kumar V (1999) Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Computing32(8):68---75

[12]

Kleinberg J (1998) Authoritative sources in a hyperlinked environment In: Proceedings of the 9th ACM-SIAM symposium on discrete algorithms

[13]

Knorr E, Ng R (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 392---403

[14]

Knorr E, Ng R (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 211---222

[15]

Ng R, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 144---155

[16]

Preparata F, Shamos M (1988) Computational geometry: an introduction. Springer, Berlin Heidelberg New York

[17]

Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 427---438

[18]

Ruts I, Rousseeuw P (1996) Computing depth contours of bivariate point clouds. J Comput Stat Data Anal23:153---168

[19]

Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings of 24th international conference on very large data bases. Morgan Kaufmann, San Mateo, Ca, pp 428---439

[20]

Shekhar S, Lu C-T, Zhang P (2001) Detecting graph-based spatial outliers: Algorithms and applications (a summary of results). In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York

[21]

Tang J, Chen Z, Fu A W-C, Cheung D (2001) A robust outlier detection scheme for large data sets. Technical report. http://www.cs.panam.edu/ chen/paper-file/ outlierpaper.ps

[22]

Wang W, Yang J, Muntz R (1997) Sting: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 186---195

[23]

Zhang T, Ramakrishnan R, Linvy M (1996) Birch: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 103---114

Cited By

Taha AHadi A(2019)Anomaly Detection Methods for Categorical DataACM Computing Surveys10.1145/331273952:2(1-35)Online publication date: 30-May-2019
https://dl.acm.org/doi/10.1145/3312739
Ding JLiu YZhang LWang JLiu Y(2016)An anomaly detection approach for multiple monitoring data series based on latent correlation probabilistic modelApplied Intelligence10.1007/s10489-015-0713-744:2(340-361)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1007/s10489-015-0713-7
Schubert EZimek AKriegel H(2014)Local outlier detection reconsideredData Mining and Knowledge Discovery10.1007/s10618-012-0300-z28:1(190-237)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1007/s10618-012-0300-z
Show More Cited By

Index Terms

Finding centric local outliers in categorical/numerical spaces
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
    2. Machine learning algorithms
      1. Feature selection
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Finding key knowledge attribute subspace of outliers in high-dimensional dataset

Outlier detection has important applications in many fields in which the data can contain high dimensions. However, finding the intentional knowledge of outliers will become inefficient and even infeasible in high dimensional space. In this paper, we ...
Discovering cluster-based local outliers

In this paper, we present a new definition for outlier: cluster-based local outlier, which is meaningful and provides importance to the local data behavior. A measure for identifying the physical significance of an outlier is designed, which is called ...
Finding key attribute subset in dataset for outlier detection

Detection of outlier from high dimensional dataset have found important applications in many fields, yet the unexpected time consumption is likely to hinder its practical use. Thus, it makes sense to build an efficient method for finding meaningful ...

Comments

Information & Contributors

Information

Published In

cover image Knowledge and Information Systems

Knowledge and Information Systems Volume 9, Issue 3

March 2006

126 pages

ISSN:0219-1377

Issue’s Table of Contents

Copyright © Copyright © 2006 Springer-Verlag London Limited.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 March 2006

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Taha AHadi A(2019)Anomaly Detection Methods for Categorical DataACM Computing Surveys10.1145/331273952:2(1-35)Online publication date: 30-May-2019
https://dl.acm.org/doi/10.1145/3312739
Ding JLiu YZhang LWang JLiu Y(2016)An anomaly detection approach for multiple monitoring data series based on latent correlation probabilistic modelApplied Intelligence10.1007/s10489-015-0713-744:2(340-361)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1007/s10489-015-0713-7
Schubert EZimek AKriegel H(2014)Local outlier detection reconsideredData Mining and Knowledge Discovery10.1007/s10618-012-0300-z28:1(190-237)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1007/s10618-012-0300-z
Koufakou AGeorgiopoulos M(2010)A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributesData Mining and Knowledge Discovery10.1007/s10618-009-0148-z20:2(259-289)Online publication date: 1-Mar-2010
https://dl.acm.org/doi/10.1007/s10618-009-0148-z
Chandola VBanerjee AKumar V(2009)Anomaly detectionACM Computing Surveys10.1145/1541880.154188241:3(1-58)Online publication date: 30-Jul-2009
https://dl.acm.org/doi/10.1145/1541880.1541882
Chen KLiu L(2009)“Best K”: critical clustering structures in categorical datasetsKnowledge and Information Systems10.1007/s10115-008-0159-x20:1(1-33)Online publication date: 24-Jun-2009
https://dl.acm.org/doi/10.1007/s10115-008-0159-x

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents