Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1150402.1150459acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Outlier detection by active learning

Published: 20 August 2006 Publication History
  • Get Citation Alerts
  • Abstract

    Most existing approaches to outlier detection are based on density estimation methods. There are two notable issues with these methods: one is the lack of explanation for outlier flagging decisions, and the other is the relatively high computational requirement. In this paper, we present a novel approach to outlier detection based on classification, in an attempt to address both of these issues. Our approach isbased on two key ideas. First, we present a simple reduction of outlier detection to classification, via a procedure that involves applying classification to a labeled data set containing artificially generated examples that play the role of potential outliers. Once the task has been reduced to classification, we then invoke a selective sampling mechanism based on active learning to the reduced classification problem. We empirically evaluate the proposed approach using a number of data sets, and find that our method is superior to other methods based on the same reduction to classification, but using standard classification methods. We also show that it is competitive to the state-of-the-art outlier detection methods in the literature based on density estimation, while significantly improving the computational complexity and explanatory power.

    References

    [1]
    N. Abe, C. V. Apte, B. Bhattacharjee, K. A. Goldman, J. Langford, and B. Zadrozny. Sampling approach to resource light data mining. In Workshop at SIAM 2004 - Workshop on Data Mining in Resource Constrained Environments, February 2004.]]
    [2]
    N. Abe and H. Mamitsuka. Query learning strategies using boosting and bagging. In Proceedings of the Fifteenth International Conference on Machine Learning, 1998.]]
    [3]
    S. Ben-David and M. Lindenbaum. Learning distributions by their density levels: a paradigm for learning without a teacher. Journal of Computer and System Sciences, 55:171--182, 1997.]]
    [4]
    L. Breiman. Bagging predictors. Machine Learning, 24:123--140, 1996.]]
    [5]
    M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. Identifying density based local outliers. In Proceedings of the ACM SIGMOD International Conference on Management of Data, May 2000.]]
    [6]
    C. Elkan. Results of the kdd'99 classification learning contest. Available at http://www.cs.ucsd.edu/users/elkan/clresults.html, 1999.]]
    [7]
    W. Fan, M. Miller, S. J. Stolfo, W. Lee, and P. K. Chan. Using artificial anomalies to detect unknown and known network intrusions. In Proceedings of the First IEEE International Conference on Data Mining (ICDM'01), pages 123--130, 2001.]]
    [8]
    Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119--139, 1997.]]
    [9]
    E. Knorr and R. Ng. Algorithms for mining distance based outliers in large data sets. In Proceedings of the Very Large Databases (VLDB) Conference, August 1998.]]
    [10]
    A. Lazarevic and V. Kumar. Feature bagging for outlier detection. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2005.]]
    [11]
    H. Mamitsuka and N. Abe. Efficient mining from large databases by query learning. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.]]
    [12]
    P. Melville and R. Mooney. Diverse ensemble for active learning. In Proceedings of the 21st International Conference on Machine Learning, pages 584--591, 2004.]]
    [13]
    S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proceedings of the ACM SIGMOD International Conference on Management of Data, May 2000.]]
    [14]
    H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proc. 5th Annu. Workshop on Comput. Learning Theory, pages 287--294. ACM Press, New York, NY, 1992.]]
    [15]
    I. Steinwart, D. Hush, and C. Scovel. A classification framework for anomaly detection. Journal of Machine Learning Research, 6:211--232, 2005.]]
    [16]
    T. Theiler and D. M. Cai. Resampling approach for anomaly detection in multispectral images. In Proceedings of the SPIE 5093, pages 230--240, 2003.]]
    [17]
    D. Y. Yeung and C. Chow. Parzen-window network intrusion detectors. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR'02), pages 385--388, 2003.]]

    Cited By

    View all
    • (2024)Anomaly detection research using Isolation Forest in Machine LearningHerald of Dagestan State Technical University. Technical Sciences10.21822/2073-6185-2024-51-1-106-11251:1(106-112)Online publication date: 16-Apr-2024
    • (2024)Active Learning for Data Quality Control: A SurveyJournal of Data and Information Quality10.1145/366336916:2(1-45)Online publication date: 11-May-2024
    • (2024)Outlier Detection Using a GPU-Based Parallel Algorithm: Quantum ClusteringInternational Journal on Artificial Intelligence Tools10.1142/S021821302350077X33:04Online publication date: 30-May-2024
    • Show More Cited By

    Index Terms

    1. Outlier detection by active learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2006
      986 pages
      ISBN:1595933395
      DOI:10.1145/1150402
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 August 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. active learning
      2. ensemble method
      3. outlier detection

      Qualifiers

      • Article

      Conference

      KDD06

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '24

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)165
      • Downloads (Last 6 weeks)23
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Anomaly detection research using Isolation Forest in Machine LearningHerald of Dagestan State Technical University. Technical Sciences10.21822/2073-6185-2024-51-1-106-11251:1(106-112)Online publication date: 16-Apr-2024
      • (2024)Active Learning for Data Quality Control: A SurveyJournal of Data and Information Quality10.1145/366336916:2(1-45)Online publication date: 11-May-2024
      • (2024)Outlier Detection Using a GPU-Based Parallel Algorithm: Quantum ClusteringInternational Journal on Artificial Intelligence Tools10.1142/S021821302350077X33:04Online publication date: 30-May-2024
      • (2024)Random clustering-based outlier detectorInformation Sciences: an International Journal10.1016/j.ins.2024.120498667:COnline publication date: 1-May-2024
      • (2024)Analysis of Smooth and Enhanced Smooth Quadrature-Inspired Generalized Choquet IntegralFuzzy Sets and Systems10.1016/j.fss.2024.108926(108926)Online publication date: Mar-2024
      • (2024)Evidential uncertainty sampling strategies for active learningMachine Learning10.1007/s10994-024-06567-2Online publication date: 27-Jun-2024
      • (2024)Log‐based anomaly detection for distributed systems: State of the art, industry experience, and open issuesJournal of Software: Evolution and Process10.1002/smr.2650Online publication date: 7-Feb-2024
      • (2023)How to Open a Black Box Classifier for Tabular DataAlgorithms10.3390/a1604018116:4(181)Online publication date: 27-Mar-2023
      • (2023)Homomorphic encryption-based ciphertext anomaly detection method for e-health recordsSCIENTIA SINICA Informationis10.1360/SSI-2022-021453:7(1368)Online publication date: 6-Jul-2023
      • (2023)Unsupervised deep learning framework for ultrasonic-based distributed damage detection in concrete: integration of a deep auto-encoder and Isolation Forest for anomaly detectionStructural Health Monitoring10.1177/1475921723118314323:3(1313-1333)Online publication date: 10-Jul-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media