Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1150402.1150501acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Outlier detection by sampling with accuracy guarantees

Published: 20 August 2006 Publication History

Abstract

An effective approach to detecting anomalous points in a data set is distance-based outlier detection. This paper describes a simple sampling algorithm to effciently detect distance-based outliers in domains where each and every distance computation is very expensive. Unlike any existing algorithms, the sampling algorithm requires a xed number of distance computations and can return good results with accuracy guarantees. The most computationally expensive aspect of estimating the accuracy of the result is sorting all of the distances computed by the sampling algorithm. The experimental study on two expensive domains as well as ten additional real-life datasets demonstrates both the effciency and effectiveness of the sampling algorithm in comparison with the state-of-the-art algorithm and there liability of the accuracy guarantees.

References

[1]
N. Abe, C. V. Apte, B. Bhattacharjee, K. A. Goldman, J. Langford, and B. Zadrozny. Sampling approach to resource light data mining. In Workshop at SIAM 2004 - Workshop on Data Mining in Resource Constrained Environments, February 2004.]]
[2]
N. Abe and H. Mamitsuka. Query learning strategies using boosting and bagging. In Proceedings of the Fifteenth International Conference on Machine Learning, 1998.]]
[3]
S. Ben-David and M. Lindenbaum. Learning distributions by their density levels: a paradigm for learning without a teacher. Journal of Computer and System Sciences, 55:171--182, 1997.]]
[4]
L. Breiman. Bagging predictors. Machine Learning, 24:123--140, 1996.]]
[5]
M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. Identifying density based local outliers. In Proceedings of the ACM SIGMOD International Conference on Management of Data, May 2000.]]
[6]
C. Elkan. Results of the kdd'99 classification learning contest. Available at http://www.cs.ucsd.edu/users/elkan/clresults.html, 1999.]]
[7]
W. Fan, M. Miller, S. J. Stolfo, W. Lee, and P. K. Chan. Using artificial anomalies to detect unknown and known network intrusions. In Proceedings of the First IEEE International Conference on Data Mining (ICDM'01), pages 123--130, 2001.]]
[8]
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119--139, 1997.]]
[9]
E. Knorr and R. Ng. Algorithms for mining distance based outliers in large data sets. In Proceedings of the Very Large Databases (VLDB) Conference, August 1998.]]
[10]
A. Lazarevic and V. Kumar. Feature bagging for outlier detection. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2005.]]
[11]
H. Mamitsuka and N. Abe. Efficient mining from large databases by query learning. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.]]
[12]
P. Melville and R. Mooney. Diverse ensemble for active learning. In Proceedings of the 21st International Conference on Machine Learning, pages 584--591, 2004.]]
[13]
S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACM SIGMOD International Conference on Management of Data, May 2000.]]
[14]
H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proc. 5th Annu. Workshop on Comput. Learning Theory, pages 287--294. ACM Press, New York, NY, 1992.]]
[15]
I. Steinwart, D. Hush, and C. Scovel. A classification framework for anomaly detection. Journal of Machine Learning Research, 6:211--232, 2005.]]
[16]
T. Theiler and D. M. Cai. Resampling approach for anomaly detection in multispectral images. In Proceedings of the SPIE 5093, pages 230--240, 2003.]]
[17]
D. Y. Yeung and C. Chow. Parzen-window network intrusion detectors. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR'02), pages 385--388, 2003.]]

Cited By

View all
  • (2023)A Probabilistic Transformation of Distance-Based OutliersMachine Learning and Knowledge Extraction10.3390/make50300425:3(782-802)Online publication date: 18-Jul-2023
  • (2022)Determinants of the perceived usefulness (PU) in the context of using gamification for classroom-based ESL teaching: A scale development studyEducation and Information Technologies10.1007/s10639-022-11409-628:4(4741-4768)Online publication date: 24-Oct-2022
  • (2022)An Intrusion Detection Model Based on Deep Learning and Multi-layer Perceptron in the Internet of Things (IoT) NetworkThe 8th International Conference on Advanced Machine Learning and Technologies and Applications (AMLTA2022)10.1007/978-3-031-03918-8_4(34-46)Online publication date: 17-Apr-2022
  • Show More Cited By

Index Terms

  1. Outlier detection by sampling with accuracy guarantees

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2006
    986 pages
    ISBN:1595933395
    DOI:10.1145/1150402
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 August 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. active learning
    2. ensemble method
    3. outlier detection

    Qualifiers

    • Article

    Conference

    KDD06

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A Probabilistic Transformation of Distance-Based OutliersMachine Learning and Knowledge Extraction10.3390/make50300425:3(782-802)Online publication date: 18-Jul-2023
    • (2022)Determinants of the perceived usefulness (PU) in the context of using gamification for classroom-based ESL teaching: A scale development studyEducation and Information Technologies10.1007/s10639-022-11409-628:4(4741-4768)Online publication date: 24-Oct-2022
    • (2022)An Intrusion Detection Model Based on Deep Learning and Multi-layer Perceptron in the Internet of Things (IoT) NetworkThe 8th International Conference on Advanced Machine Learning and Technologies and Applications (AMLTA2022)10.1007/978-3-031-03918-8_4(34-46)Online publication date: 17-Apr-2022
    • (2021)SDCORKnowledge-Based Systems10.1016/j.knosys.2021.107256228:COnline publication date: 23-Aug-2021
    • (2020)Heterogeneous Univariate Outlier Ensembles in Multidimensional DataACM Transactions on Knowledge Discovery from Data10.1145/340393414:6(1-27)Online publication date: 28-Sep-2020
    • (2020)Outlier DetectionACM Computing Surveys10.1145/338102853:3(1-37)Online publication date: 12-Jun-2020
    • (2020)User-driven Error Detection for Time Series with Events2020 IEEE 36th International Conference on Data Engineering (ICDE)10.1109/ICDE48307.2020.00070(745-757)Online publication date: Apr-2020
    • (2020)Value-added tax fraud detection with scalable anomaly detection techniquesApplied Soft Computing10.1016/j.asoc.2019.10589586:COnline publication date: 1-Jan-2020
    • (2020)Comparison of novelty detection methods for multispectral images in rover-based planetary exploration missionsData Mining and Knowledge Discovery10.1007/s10618-020-00697-6Online publication date: 16-Jun-2020
    • (2019)A Parameter-Free Outlier Detection Algorithm Based on Dataset Optimization MethodInformation10.3390/info1101002611:1(26)Online publication date: 31-Dec-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media