Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/502512.502554acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Mining top-n local outliers in large databases

Published: 26 August 2001 Publication History

Abstract

Outlier detection is an important task in data mining with numerous applications, including credit card fraud detection, video surveillance, etc. A recent work on outlier detection has introduced a novel notion of local outlier in which the degree to which an object is outlying is dependent on the density of its local neighborhood, and each object can be assigned a Local Outlier Factor (LOF) which represents the likelihood of that object being an outlier. Although the concept of local outliers is a useful one, the computation of LOF values for every data objects requires a large number of κ-nearest neighbors searches and can be computationally expensive. Since most objects are usually not outliers, it is useful to provide users with the option of finding only n most outstanding local outliers, i.e., the top-n data objects which are most likely to be local outliers according to their LOFs. However, if the pruning is not done carefully, finding top-n outliers could result in the same amount of computation as finding LOF for all objects. In this paper, we propose a novel method to efficiently find the top-n local outliers in large databases. The concept of "micro-cluster" is introduced to compress the data. An efficient micro-cluster-based local outlier mining algorithm is designed based on this concept. As our algorithm can be adversely affected by the overlapping in the micro-clusters, we proposed a meaningful cut-plane solution for overlapping data. The formal analysis and experiments show that this method can achieve good performance in finding the most outstanding local outliers.

References

[1]
V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley & Sons, 1994.
[2]
S. Berchtold, D. Keim, and H.-P. Kriegel. The X-tree: An efficient and robust, access method for points and rectangles. In Proc. 1996 Int. Conf. Very Large Data Bases (VLDB'96), pages 28-39, Bombay, India, Sept. 1996.
[3]
M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. Loft Identifying density-based local outliers. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'O0), Dallas, Texas, 2000.
[4]
M. Ester', H.-P. Kriegel, J. Sander, and X. Xu. A density-bmsed algorithm for' discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), pages 226-231, Portland, Oregon, Aug. 1996.
[5]
S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proc. I998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98), pages 73-84, Seattle, WA, June 1998.
[6]
D. Hawkins. Identification of Outliers. Chapman and Hall, London, 1980.
[7]
E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB'98), pages 392-403, New York, NY, Aug. 1998.
[8]
R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), pages 144-155, Santiago, Chile, Sept. 1994.
[9]
S. RamaSwamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proc. 2000 A CM-S1GMOD Int. Conf. Management of Data (SIGMOD'O0), Dallas, Texas, 2000.
[10]
T. Zhang, R. R.amakrishnan, and M. Livny. BIRCH: an efficient dal, a clustering method for' very large databases. In Proc. 19.96 A CM-SIGMOD Int. Conf. Management of Data (SIGMOD'96), pages 103-114, Montreal, Canada, June 1996.

Cited By

View all
  • (2024)A Multi-Head Approach with Shuffled Segments for Weakly-Supervised Video Anomaly Detection2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)10.1109/WACVW60836.2024.00022(132-142)Online publication date: 1-Jan-2024
  • (2024)On the Design of Scalable Outlier Detection Methods Using Approximate Nearest Neighbor GraphsSimilarity Search and Applications10.1007/978-3-031-75823-2_14(170-184)Online publication date: 25-Oct-2024
  • (2023)Exploring Effective Outlier Detection in IoT: A Systematic Survey of Techniques and Applications2023 Intelligent Methods, Systems, and Applications (IMSA)10.1109/IMSA58542.2023.10255071(375-380)Online publication date: 15-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
August 2001
493 pages
ISBN:158113391X
DOI:10.1145/502512
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 August 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

KDD01
Sponsor:

Acceptance Rates

KDD '01 Paper Acceptance Rate 31 of 237 submissions, 13%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)3
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Multi-Head Approach with Shuffled Segments for Weakly-Supervised Video Anomaly Detection2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)10.1109/WACVW60836.2024.00022(132-142)Online publication date: 1-Jan-2024
  • (2024)On the Design of Scalable Outlier Detection Methods Using Approximate Nearest Neighbor GraphsSimilarity Search and Applications10.1007/978-3-031-75823-2_14(170-184)Online publication date: 25-Oct-2024
  • (2023)Exploring Effective Outlier Detection in IoT: A Systematic Survey of Techniques and Applications2023 Intelligent Methods, Systems, and Applications (IMSA)10.1109/IMSA58542.2023.10255071(375-380)Online publication date: 15-Jul-2023
  • (2023)Local Outlier Reclassifier (LORec): a Method for Relocating Local Outliers Generated by K-means2023 13th International Conference on Software Technology and Engineering (ICSTE)10.1109/ICSTE61649.2023.00030(143-150)Online publication date: 27-Oct-2023
  • (2023)Vessel sailing route extraction and analysis from satellite-based AIS data using density clustering and probability algorithmsOcean Engineering10.1016/j.oceaneng.2023.114627280(114627)Online publication date: Jul-2023
  • (2023)Enhancing anomaly detectors with LatentOutJournal of Intelligent Information Systems10.1007/s10844-023-00829-662:4(905-923)Online publication date: 24-Nov-2023
  • (2023)BPF: a novel cluster boundary points detection method for static and streaming dataKnowledge and Information Systems10.1007/s10115-023-01854-165:7(2991-3022)Online publication date: 21-Mar-2023
  • (2022)Nonparametric tests for detection of high dimensional outliersJournal of Nonparametric Statistics10.1080/10485252.2022.2026945(1-22)Online publication date: 28-Jan-2022
  • (2022)A novel anomaly detection method for multimodal WSN data flow via a dynamic graph neural networkConnection Science10.1080/09540091.2022.207828134:1(1609-1637)Online publication date: 14-Jun-2022
  • (2022)Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuningInformation Sciences: an International Journal10.1016/j.ins.2022.06.035607:C(1245-1264)Online publication date: 1-Aug-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media