Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/11731139_67guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A fast greedy algorithm for outlier mining

Published: 09 April 2006 Publication History

Abstract

The task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Recently, the problem of outlier detection in categorical data is defined as an optimization problem and a local-search heuristic based algorithm (LSA) is presented. However, as is the case with most iterative type algorithms, the LSA algorithm is still very time-consuming on very large datasets. In this paper, we present a very fast greedy algorithm for mining outliers under the same optimization model. Experimental results on real datasets and large synthetic datasets show that: (1) Our new algorithm has comparable performance with respect to those state-of-the-art outlier detection algorithms on identifying true outliers and (2) Our algorithm can be an order of magnitude faster than LSA algorithm.

References

[1]
Hawkins, D.: Identification of Outliers. Chapman and Hall, Reading, London, 1980
[2]
Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal (1948) 379-423
[3]
Aggarwal, C., Yu, P.: Outlier Detection for High Dimensional Data. In: Proc. of SIGMOD'01, pp. 37-46, 2001
[4]
He, Z., Xu, X., Huang, J., Deng, S.: A Frequent Pattern Discovery Based Method for Outlier Detection. In: Proc. of WAIM'04, LNCS 3129, pp. 726-732, 2004
[5]
Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons, New York, 1994
[6]
Johnson, T., Kwok, I., Ng, R.; Fast Computation of 2-Dimensional Depth Contours. In: Proc. of KDD'98, pp.224-228, 1998
[7]
Knorr, E., Ng R., Tucakov, T.: Distance-Based Outliers: Algorithms and Applications. VLDB Journal 8(3-4) (2000) 237-253
[8]
Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: Proc. of SIGMOD'00, pp. 93-104,2000
[9]
Bay, S. D., Schwabacher, M.: Mining Distance Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule. In: Proc of KDD'03, pp.29-38, 2003
[10]
Breunig, M. M., Kriegel, H. P., Ng, R. T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: Proc. of SIGMOD'00, pp. 93-104, 2000
[11]
Papadimitriou, S., Kitagawa, H., Gibbons, P. B., Faloutsos, C.: Fast Outlier Detection Using the Local Correlation Integral. In: Proc of ICDE'03, 2003
[12]
Jiang, M. F., Tseng, S. S., Su, C. M.: Two-phase Clustering Process for Outliers Detection. Pattern Recognition Letters 22(6-7) (2001) 691-700
[13]
Yu, D., Sheikholeslami, G., Zhang, A.: FindOut: Finding Out Outliers in Large Datasets. Knowledge and Information Systems 4(4) (2002) 387-412
[14]
He, Z., Xu, X., Huang, J., Deng, S.: Discovering Cluster-based Local Outliers. Pattern Recognition Letters 24(9-10) (2003) 1641-1650
[15]
Tax, D.M.J., Duin, R.P.W.: Support Vector Data Description. Pattern Recognition Letters 20(11-13) (1999) 1191-1199
[16]
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J., Williamson, R.C.: Estimating the Support of a High Dimensional Distribution. Neural Computation 13(7) (2001) 1443-1472
[17]
Harkins, S., He, H., Willams, G. J., Baster, R. A.: Outlier Detection Using Replicator Neural Networks. In: Proc. of DaWaK'02, pp. 170-180, 2002
[18]
Willams, G. J., Baster, R. A., He, H., Harkins, S., Gu, L.: A Comparative Study of RNN for Outlier Detection in Data Mining. In: Proc of ICDM'02, pp. 709-712, 2002
[19]
He, Z., Deng, S., Xu, X.: Outlier Detection Integrating Semantic Knowledge. In: Proc. of WAIM'02, LNCS 2419, pp.126-131, 2002
[20]
Papadimitriou, S., Faloutsos, C.: Cross-Outlier Detection. In: Proc of SSTD'03, pp.199- 213, 2003
[21]
He, Z., Xu, X., Huang, J., Deng, S.: Mining Class Outliers: Concepts, Algorithms and Applications in CRM. Expert Systems with Applications 27(4) (2004) 681-697
[22]
He, Z., Deng, S., Xu, X.: An Optimization Model for Outlier Detection in Categorical Data. In: Proc. of 2005 International Conference on Intelligent Computing, Lecture Notes in Computer Science 3644, pp.400-409, 2005
[23]
Merz, G., Murphy, P.: Uci Repository of Machine Learning Databases. http://www.ics. uci.edu/mlearn/MLRepository.html, 1996
[24]
Lazarevic, A., Kumar, V.: Feature Bagging for Outlier Detection. In: Proc. of KDD'05, pp. 157-166, 2005
[25]
He, Z., Deng, S., Xu, X.: A Unified Subspace Outlier Ensemble Framework for Outlier Detection. In: Proc. of WAIM'05, LNCS 3739, pp. 632-637, 2005

Cited By

View all
  • (2019)Anomaly Detection Methods for Categorical DataACM Computing Surveys10.1145/331273952:2(1-35)Online publication date: 30-May-2019
  • (2018)Automatic Hand Sign Recognition: Identify Unusuality Through Latent CognizanceArtificial Neural Networks in Pattern Recognition10.1007/978-3-319-99978-4_20(255-267)Online publication date: 19-Sep-2018
  • (2017)Detecting Special Lecturers Using Information theory-based Outlier Detection MethodProceedings of the International Conference on Compute and Data Analysis10.1145/3093241.3093274(240-244)Online publication date: 19-May-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
PAKDD'06: Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
April 2006
876 pages
ISBN:3540332065
  • Editors:
  • Wee-Keong Ng,
  • Masaru Kitsuregawa,
  • Jianzhong Li,
  • Kuiyu Chang

Sponsors

  • SAS
  • Air Force Office of Scientific Research/Asian Office of Aerospace R&D
  • Lee Foundation
  • US Army ITC-PAC Asian Research Office
  • Infocomm Development Authority of Singapore: Infocomm Development Authority of Singapore

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 09 April 2006

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Anomaly Detection Methods for Categorical DataACM Computing Surveys10.1145/331273952:2(1-35)Online publication date: 30-May-2019
  • (2018)Automatic Hand Sign Recognition: Identify Unusuality Through Latent CognizanceArtificial Neural Networks in Pattern Recognition10.1007/978-3-319-99978-4_20(255-267)Online publication date: 19-Sep-2018
  • (2017)Detecting Special Lecturers Using Information theory-based Outlier Detection MethodProceedings of the International Conference on Compute and Data Analysis10.1145/3093241.3093274(240-244)Online publication date: 19-May-2017
  • (2015)Associating absent frequent itemsets with infrequent items to identify abnormal transactionsApplied Intelligence10.1007/s10489-014-0622-142:4(694-706)Online publication date: 1-Jun-2015
  • (2014)A ranking-based algorithm for detection of outliers in categorical dataInternational Journal of Hybrid Intelligent Systems10.3233/HIS-13017911:1(1-11)Online publication date: 1-Jan-2014
  • (2010)A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributesData Mining and Knowledge Discovery10.1007/s10618-009-0148-z20:2(259-289)Online publication date: 1-Mar-2010
  • (2009)Outlier detection based on rough sets theoryIntelligent Data Analysis10.5555/1551582.155158413:2(191-206)Online publication date: 1-Apr-2009
  • (2009)Anomaly detectionACM Computing Surveys10.1145/1541880.154188241:3(1-58)Online publication date: 30-Jul-2009

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media