Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Fast Greedy Algorithm for Outlier Mining

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Included in the following conference series:

Abstract

The task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Recently, the problem of outlier detection in categorical data is defined as an optimization problem and a local-search heuristic based algorithm (LSA) is presented. However, as is the case with most iterative type algorithms, the LSA algorithm is still very time-consuming on very large datasets. In this paper, we present a very fast greedy algorithm for mining outliers under the same optimization model. Experimental results on real datasets and large synthetic datasets show that: (1) Our new algorithm has comparable performance with respect to those state-of-the-art outlier detection algorithms on identifying true outliers and (2) Our algorithm can be an order of magnitude faster than LSA algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Hawkins, D.: Identification of Outliers. Chapman and Hall, Reading (1980)

    Book  MATH  Google Scholar 

  2. Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal, 379–423 (1948)

    Google Scholar 

  3. Aggarwal, C., Yu, P.: Outlier Detection for High Dimensional Data. In: Proc. of SIGMOD 2001, pp. 37–46 (2001)

    Google Scholar 

  4. He, Z., Xu, X., Huang, J., Deng, S.: A Frequent Pattern Discovery Based Method for Outlier Detection. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 726–732. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  5. Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons, New York (1994)

    MATH  Google Scholar 

  6. Johnson, T., Kwok, I., Ng, R.: Fast Computation of 2-Dimensional Depth Contours. In: Proc. of KDD 1998, pp. 224–228 (1998)

    Google Scholar 

  7. Knorr, E., Ng, R., Tucakov, T.: Distance-Based Outliers: Algorithms and Applications. VLDB Journal 8(3-4), 237–253 (2000)

    Article  Google Scholar 

  8. Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: Proc. of SIGMOD 2000, pp. 93–104 (2000)

    Google Scholar 

  9. Bay, S.D., Schwabacher, M.: Mining Distance Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule. In: Proc of KDD 2003, pp. 29–38 (2003)

    Google Scholar 

  10. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: Proc. of SIGMOD 2000, pp. 93–104 (2000)

    Google Scholar 

  11. Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Fast Outlier Detection Using the Local Correlation Integral. In: Proc of ICDE 2003 (2003)

    Google Scholar 

  12. Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase Clustering Process for Outliers Detection. Pattern Recognition Letters 22(6-7), 691–700 (2001)

    Article  MATH  Google Scholar 

  13. Yu, D., Sheikholeslami, G., Zhang, A.: FindOut: Finding Out Outliers in Large Datasets. Knowledge and Information Systems 4(4), 387–412 (2002)

    Article  Google Scholar 

  14. He, Z., Xu, X., Huang, J., Deng, S.: Discovering Cluster-based Local Outliers. Pattern Recognition Letters 24(9-10), 1641–1650 (2003)

    Article  MATH  Google Scholar 

  15. Tax, D.M.J., Duin, R.P.W.: Support Vector Data Description. Pattern Recognition Letters 20(11-13), 1191–1199 (1999)

    Article  MATH  Google Scholar 

  16. Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the Support of a High Dimensional Distribution. Neural Computation 13(7), 1443–1472 (2001)

    Article  MATH  Google Scholar 

  17. Harkins, S., He, H., Willams, G.J., Baster, R.A.: Outlier Detection Using Replicator Neural Networks. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 170–180. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  18. Willams, G.J., Baster, R.A., He, H., Harkins, S., Gu, L.: A Comparative Study of RNN for Outlier Detection in Data Mining. In: Proc of ICDM 2002, pp. 709–712 (2002)

    Google Scholar 

  19. He, Z., Deng, S., Xu, X.: Outlier Detection Integrating Semantic Knowledge. In: Meng, X., Su, J., Wang, Y. (eds.) WAIM 2002. LNCS, vol. 2419, pp. 126–131. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  20. Papadimitriou, S., Faloutsos, C.: Cross-Outlier Detection. In: Proc of SSTD 2003, pp. 199–213 (2003)

    Google Scholar 

  21. He, Z., Xu, X., Huang, J., Deng, S.: Mining Class Outliers: Concepts, Algorithms and Applications in CRM. Expert Systems with Applications 27(4), 681–697 (2004)

    Article  Google Scholar 

  22. He, Z., Deng, S., Xu, X.: An Optimization Model for Outlier Detection in Categorical Data. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 400–409. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  23. Merz, G., Murphy, P.: Uci Repository of Machine Learning Databases (1996), http://www.ics.uci.edu/mlearn/MLRepository.html

  24. Lazarevic, A., Kumar, V.: Feature Bagging for Outlier Detection. In: Proc. of KDD 2005, pp. 157–166 (2005)

    Google Scholar 

  25. He, Z., Deng, S., Xu, X.: A Unified Subspace Outlier Ensemble Framework for Outlier Detection. In: Fan, W., Wu, Z., Yang, J. (eds.) WAIM 2005. LNCS, vol. 3739, pp. 632–637. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

He, Z., Deng, S., Xu, X., Huang, J.Z. (2006). A Fast Greedy Algorithm for Outlier Mining. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_67

Download citation

  • DOI: https://doi.org/10.1007/11731139_67

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33206-0

  • Online ISBN: 978-3-540-33207-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics