Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Finding centric local outliers in categorical/numerical spaces

Published: 01 March 2006 Publication History

Abstract

Outlier detection techniques are widely used in many applications such as credit-card fraud detection, monitoring criminal activities in electronic commerce, etc. These applications attempt to identify outliers as noises, exceptions, or objects around the border. The existing density-based local outlier detection assigns the degree to which an object is an outlier in a numerical space. In this paper, we propose a novel mutual-reinforcement-based local outlier detection approach. Instead of detecting local outliers as noise, we attempt to identify local outliers in the center, where they are similar to some clusters of objects on one hand, and are unique on the other. Our technique can be used for bank investment to identify a unique body, similar to many good competitors, in which to invest. We attempt to detect local outliers in categorical, ordinal as well as numerical data. In categorical data, the challenge is that there are many similar but different ways to specify relationships among the data items. Our mutual-reinforcement-based approach is stable, with similar but different user-defined relationships. Our technique can reduce the burden for users to determine the relationships among data items, and find the explanations why the outliers are found. We conducted extensive experimental studies using real datasets.

References

[1]
Aggarwal C, Yu P (2001) Outlier detection for high dimensional data. In: Proceedings of ACM SIGMOD international conference on management of data. ACM, New York, pp 37---47
[2]
Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, New York
[3]
Breunig M, Kriegel H-P, Ng R, Sander J (1999) Optics-of: Identifying local outliers. In: Proccedings of the 3rd European conference on principles and practice of knowledge discovery in databases. Springer, Berlin Heidelberg New York, pp 262---270
[4]
Breunig M, Kriegel H-P, Ng R, Sander J (2000) Lof: Identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 93---104
[5]
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International conference on knowledge discovery and data mining. AAAI, Manlo Park, CA, pp 226---231
[6]
Guha S, Rastogi R, Shim K (1998) Cure: An efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 73---84
[7]
Guha S, Rastogi R, Shim K (1999) Rock: A robust clustering algorithm for categorical attributes. In: Proceedings of the IEEE international conference on data engineering. IEEE Computer Society, Morristown, NJ
[8]
Hawkins D (1980) Identification of outliers. Chapman and Hall, London
[9]
Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th international conference on knowledge discovery and data mining. AAAI, Menlo Park, CA, pp 58---65
[10]
Jin W, Tung AK, Han J (2001) Mining top-n local outliers in large databases. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 293---298
[11]
Karypis G, Han E, Kumar V (1999) Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Computing32(8):68---75
[12]
Kleinberg J (1998) Authoritative sources in a hyperlinked environment In: Proceedings of the 9th ACM-SIAM symposium on discrete algorithms
[13]
Knorr E, Ng R (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 392---403
[14]
Knorr E, Ng R (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 211---222
[15]
Ng R, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 144---155
[16]
Preparata F, Shamos M (1988) Computational geometry: an introduction. Springer, Berlin Heidelberg New York
[17]
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 427---438
[18]
Ruts I, Rousseeuw P (1996) Computing depth contours of bivariate point clouds. J Comput Stat Data Anal23:153---168
[19]
Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings of 24th international conference on very large data bases. Morgan Kaufmann, San Mateo, Ca, pp 428---439
[20]
Shekhar S, Lu C-T, Zhang P (2001) Detecting graph-based spatial outliers: Algorithms and applications (a summary of results). In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York
[21]
Tang J, Chen Z, Fu A W-C, Cheung D (2001) A robust outlier detection scheme for large data sets. Technical report. http://www.cs.panam.edu/ chen/paper-file/ outlierpaper.ps
[22]
Wang W, Yang J, Muntz R (1997) Sting: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 186---195
[23]
Zhang T, Ramakrishnan R, Linvy M (1996) Birch: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 103---114

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Knowledge and Information Systems
Knowledge and Information Systems  Volume 9, Issue 3
March 2006
126 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 March 2006

Author Tags

  1. Clustering
  2. Data mining
  3. Outlier detection

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Anomaly Detection Methods for Categorical DataACM Computing Surveys10.1145/331273952:2(1-35)Online publication date: 30-May-2019
  • (2016)An anomaly detection approach for multiple monitoring data series based on latent correlation probabilistic modelApplied Intelligence10.1007/s10489-015-0713-744:2(340-361)Online publication date: 1-Mar-2016
  • (2014)Local outlier detection reconsideredData Mining and Knowledge Discovery10.1007/s10618-012-0300-z28:1(190-237)Online publication date: 1-Jan-2014
  • (2010)A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributesData Mining and Knowledge Discovery10.1007/s10618-009-0148-z20:2(259-289)Online publication date: 1-Mar-2010
  • (2009)Anomaly detectionACM Computing Surveys10.1145/1541880.154188241:3(1-58)Online publication date: 30-Jul-2009
  • (2009)“Best K”: critical clustering structures in categorical datasetsKnowledge and Information Systems10.1007/s10115-008-0159-x20:1(1-33)Online publication date: 24-Jun-2009

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media