Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An Effective Approach to Outlier Detection Based on Centrality and Centre-Proximity

Published: 01 January 2020 Publication History
  • Get Citation Alerts
  • Abstract

    In data mining research, outliers usually represent extreme values that deviate from other observations on data. The significant issue of existing outlier detection methods is that they only consider the object itself not taking its neighbouring objects into account to extract location features. In this paper, we propose an innovative approach to this issue. First, we propose the notions of centrality and centre-proximity for determining the degree of outlierness considering the distribution of all objects. We also propose a novel graph-based algorithm for outlier detection based on the notions. The algorithm solves the problems of existing methods, i.e. the problems of local density, micro-cluster, and fringe objects. We performed extensive experiments in order to confirm the effectiveness and efficiency of our proposed method. The obtained experimental results showed that the proposed method uncovers outliers successfully, and outperforms previous outlier detection methods.

    References

    [1]
    Akoglu, L., McGlohon, M., Faloutsos, C. (2010). OddBall: spotting anomalies in weighted graphs. In: Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 410–421.
    [2]
    Bae, D.-H., Jeong, S., Kim, S.-W., Lee, M. (2012). Outlier detection using centrality and center-proximity. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2251–2254.
    [3]
    Barnett, V., Lewis, T. (1994). Outliers in Statistical Data. John Wiley & Sons.
    [4]
    Bay, S.D., Schwabacher, M. (2003). Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29–38.
    [5]
    Böhm, C., Haegler, K., Müller, N.S., Plant, C. (2009). CoCo: coding cost for parameter-free outlier detection. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 149–158.
    [6]
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J. (2000). LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104.
    [7]
    Brin, S., Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of the 7th International World Wide Web Conference, pp. 107–117.
    [8]
    Chan, K.Y., Kwong, C.K., Fogarty, T.C. (2010). Modeling manufacturing processes using a genetic programming-based fuzzy regression with detection of outliers. Information Sciences, 180(4), 506–518.
    [9]
    Chandola, V., Banerjee, A., Kumar, V. (2009). Anomaly detection: a survey. ACM Computing Surveys, 41(3), 15:1–15:58.
    [10]
    Domingues, R., Filippone, M., Michiardi, P., Zouaoui, J. (2018). A comparative evaluation of outlier detection algorithms: experiments and analyses. Pattern Recognition, 74, 406–421.
    [11]
    Fanaee-T, H., Gama, J. (2016). Tensor-based anomaly detection: an interdisciplinary survey. Knowledge-Based Systems, 98, 130–147.
    [12]
    Friedman, M., Last, M., Makover, Y., Kandel, A. (2007). Anomaly detection in web documents using crisp and fuzzy-based cosine clustering methodology. Information Sciences, 177(2), 467–475.
    [13]
    Ha, J., Bae, D.-H., Kim, S.-W., Baek, S.C., Jeong, B.S. (2011). Analyzing a Korean blogosphere: a social network analysis perspective. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 773–777.
    [14]
    Han, J., Kamber, M. (2000). Data Mining: Concepts and Techniques. Morgan Kaufmann.
    [15]
    Hawkins, D.M. (1980). Identification of Outliers. Chapman and Hall.
    [16]
    Huang, J., Zhu, Q., Yang, L., Feng, J. (2016). A non-parameter outlier detection algorithm based on natural neighbor. Knowledge-Based Systems, 92, 71–77.
    [17]
    Karypis, G., Han, E.H., Kumar, V. (1999). Chameleon: hierarchical clustering using dynamic modeling. IEEE Computer, 32(8), 68–75.
    [18]
    Kieu, T., Yang, B., Jensen, C.S. (2018). Outlier detection for multidimensional time series using deep neural networks. In: Proceedings of the 19th IEEE International Conference on Mobile Data Management, pp. 125–134.
    [19]
    Kleinberg, J.M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.
    [20]
    Knorr, E.M., Ng, R.T. (1999). Finding intensional knowledge of distance-based outliers. In: Proceedings of 25th International Conference on Very Large Data Bases, pp. 211–222.
    [21]
    Knorr, E.M., Ng, R.T., Tucakov, V. (2000). Distance-based outliers: algorithms and applications. The VLDB Journal, 8(3–4), 237–253.
    [22]
    Moonesinghe, H.D.K., Tan, P.N. (2008). OutRank: a graph-based outlier detection framework using random walk. International Journal on Artificial Intelligence Tools, 17(01), 19–36.
    [23]
    Na, G.S., Kim, D., Yu, H. (2018). DILOF: effective and memory efficient local outlier detection in data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1993–2002.
    [24]
    Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C. (2003). LOCI: fast outlier detection using the local correlation integral. In: Proceedings of the 19th International Conference on Data Engineering, pp. 315–326.
    [25]
    Ramaswamy, S., Rastogi, R., Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. SIGMOD Record, 29(2), 427–438.
    [26]
    Song, J., Takakura, H., Okabe, Y., Nakao, K. (2013). Toward a more practical unsupervised anomaly detection system. Information Sciences, 231, 4–14.
    [27]
    Tan, P.N., Steinbach, M., Kumar, V. (2005). Introduction to Data Mining. Addison-Wesley.
    [28]
    Widyantoro, D.H., Ioerger, T.R., Yen, J. (2002). An incremental approach to building a cluster hierarchy. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 705–708.
    [29]
    Yerlikaya-Ö, F., Askan, A., Weber, G.W. (2016). A hybrid computational method based on convex optimization for outlier problems: application to earthquake ground motion prediction. Informatica, 27(4), 893–910.
    [30]
    Basketball-Reference.com. Available from: http://www.basketball-reference.com/.
    [31]
    ESPN Fantasy Basketball. Available from: http://www.espn.com/fantasy/basketball/.

    Index Terms

    1. An Effective Approach to Outlier Detection Based on Centrality and Centre-Proximity
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Informatica
          Informatica  Volume 31, Issue 3
          2020
          221 pages
          Open access article under the CC BY license.

          Publisher

          IOS Press

          Netherlands

          Publication History

          Published: 01 January 2020

          Author Tags

          1. graph-based outlier detection
          2. centrality
          3. centre-proximity

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 26 Jul 2024

          Other Metrics

          Citations

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media