A Fast Greedy Algorithm for Outlier Mining

He, Zengyou; Deng, Shengchun; Xu, Xiaofei; Huang, Joshua Zhexue

doi:10.1007/11731139_67

Zengyou He²²,
Shengchun Deng²²,
Xiaofei Xu²² &
…
Joshua Zhexue Huang²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3389 Accesses
45 Citations

Abstract

The task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Recently, the problem of outlier detection in categorical data is defined as an optimization problem and a local-search heuristic based algorithm (LSA) is presented. However, as is the case with most iterative type algorithms, the LSA algorithm is still very time-consuming on very large datasets. In this paper, we present a very fast greedy algorithm for mining outliers under the same optimization model. Experimental results on real datasets and large synthetic datasets show that: (1) Our new algorithm has comparable performance with respect to those state-of-the-art outlier detection algorithms on identifying true outliers and (2) Our algorithm can be an order of magnitude faster than LSA algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A New Neighborhood-Based Outlier Detection Technique

Outlier Detection Using Subset Formation of Clustering Based Method

Info-Detection: An Information-Theoretic Approach to Detect Outlier

References

Hawkins, D.: Identification of Outliers. Chapman and Hall, Reading (1980)
Book MATH Google Scholar
Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal, 379–423 (1948)
Google Scholar
Aggarwal, C., Yu, P.: Outlier Detection for High Dimensional Data. In: Proc. of SIGMOD 2001, pp. 37–46 (2001)
Google Scholar
He, Z., Xu, X., Huang, J., Deng, S.: A Frequent Pattern Discovery Based Method for Outlier Detection. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 726–732. Springer, Heidelberg (2004)
Chapter Google Scholar
Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons, New York (1994)
MATH Google Scholar
Johnson, T., Kwok, I., Ng, R.: Fast Computation of 2-Dimensional Depth Contours. In: Proc. of KDD 1998, pp. 224–228 (1998)
Google Scholar
Knorr, E., Ng, R., Tucakov, T.: Distance-Based Outliers: Algorithms and Applications. VLDB Journal 8(3-4), 237–253 (2000)
Article Google Scholar
Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: Proc. of SIGMOD 2000, pp. 93–104 (2000)
Google Scholar
Bay, S.D., Schwabacher, M.: Mining Distance Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule. In: Proc of KDD 2003, pp. 29–38 (2003)
Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: Proc. of SIGMOD 2000, pp. 93–104 (2000)
Google Scholar
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Fast Outlier Detection Using the Local Correlation Integral. In: Proc of ICDE 2003 (2003)
Google Scholar
Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase Clustering Process for Outliers Detection. Pattern Recognition Letters 22(6-7), 691–700 (2001)
Article MATH Google Scholar
Yu, D., Sheikholeslami, G., Zhang, A.: FindOut: Finding Out Outliers in Large Datasets. Knowledge and Information Systems 4(4), 387–412 (2002)
Article Google Scholar
He, Z., Xu, X., Huang, J., Deng, S.: Discovering Cluster-based Local Outliers. Pattern Recognition Letters 24(9-10), 1641–1650 (2003)
Article MATH Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support Vector Data Description. Pattern Recognition Letters 20(11-13), 1191–1199 (1999)
Article MATH Google Scholar
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the Support of a High Dimensional Distribution. Neural Computation 13(7), 1443–1472 (2001)
Article MATH Google Scholar
Harkins, S., He, H., Willams, G.J., Baster, R.A.: Outlier Detection Using Replicator Neural Networks. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 170–180. Springer, Heidelberg (2002)
Chapter Google Scholar
Willams, G.J., Baster, R.A., He, H., Harkins, S., Gu, L.: A Comparative Study of RNN for Outlier Detection in Data Mining. In: Proc of ICDM 2002, pp. 709–712 (2002)
Google Scholar
He, Z., Deng, S., Xu, X.: Outlier Detection Integrating Semantic Knowledge. In: Meng, X., Su, J., Wang, Y. (eds.) WAIM 2002. LNCS, vol. 2419, pp. 126–131. Springer, Heidelberg (2002)
Chapter Google Scholar
Papadimitriou, S., Faloutsos, C.: Cross-Outlier Detection. In: Proc of SSTD 2003, pp. 199–213 (2003)
Google Scholar
He, Z., Xu, X., Huang, J., Deng, S.: Mining Class Outliers: Concepts, Algorithms and Applications in CRM. Expert Systems with Applications 27(4), 681–697 (2004)
Article Google Scholar
He, Z., Deng, S., Xu, X.: An Optimization Model for Outlier Detection in Categorical Data. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 400–409. Springer, Heidelberg (2005)
Chapter Google Scholar
Merz, G., Murphy, P.: Uci Repository of Machine Learning Databases (1996), http://www.ics.uci.edu/mlearn/MLRepository.html
Lazarevic, A., Kumar, V.: Feature Bagging for Outlier Detection. In: Proc. of KDD 2005, pp. 157–166 (2005)
Google Scholar
He, Z., Deng, S., Xu, X.: A Unified Subspace Outlier Ensemble Framework for Outlier Detection. In: Fan, W., Wu, Z., Yang, J. (eds.) WAIM 2005. LNCS, vol. 3739, pp. 632–637. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Harbin Institute of Technology, China
Zengyou He, Shengchun Deng & Xiaofei Xu
E-Business Technology Institute, The University of Hong Kong, Hong Kong, China
Joshua Zhexue Huang

Authors

Zengyou He
View author publications
You can also search for this author in PubMed Google Scholar
Shengchun Deng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Zhexue Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore
Wee-Keong Ng
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Computer Engineering, Nanyang Technological University, 639798, Singapore, Singapore
Kuiyu Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, Z., Deng, S., Xu, X., Huang, J.Z. (2006). A Fast Greedy Algorithm for Outlier Mining. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_67

Download citation

DOI: https://doi.org/10.1007/11731139_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Fast Greedy Algorithm for Outlier Mining

Abstract

Access this chapter

Preview

Similar content being viewed by others

A New Neighborhood-Based Outlier Detection Technique

Outlier Detection Using Subset Formation of Clustering Based Method

Info-Detection: An Information-Theoretic Approach to Detect Outlier

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Fast Greedy Algorithm for Outlier Mining

Abstract

Access this chapter

Preview

Similar content being viewed by others

A New Neighborhood-Based Outlier Detection Technique

Outlier Detection Using Subset Formation of Clustering Based Method

Info-Detection: An Information-Theoretic Approach to Detect Outlier

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation