Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/502512.502570acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner

Published: 26 August 2001 Publication History
  • Get Citation Alerts
  • Abstract

    This paper is concerned with the problem of detecting outliers from unlabeled data. In prior work we have developed SmartSifter, which is an on-line outlier detection algorithm based on unsupervised learning from data. On the basis of SmartSifter this paper yields a new framework for outlier filtering using both supervised and unsupervised learning techniques iteratively in order to make the detection process more effective and more understandable. The outline of the framework is as follows: In the first round, for an initial dataset, we run SmartSifter to give each data a score, with a high score indicating a high possibility of being an outlier. Next, giving positive labels to a number of higher scored data and negative labels to a number of lower scored data, we create labeled examples. Then we construct an outlier filtering rule by supervised learning from them. Here the rule is generated based on the principle of minimizing extended stochastic complexity. In the second round, for a new dataset, we filter the data using the constructed rule, then among the filtered data, we run SmartSifter again to evaluate the data in order to update the filtering rule. Applying of our framework to the network intrusion detection, we demonstrate that 1) it can significantly improve the accuracy of SmartSifter, and 2) outlier filtering rules can help the user to discover a general pattern of an outlier group.

    References

    [1]
    V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, 1994.
    [2]
    F. Bonchi, F. Giannotti, G. Mainetto, and D. Pedeschi, A classification-based methodology for planning audit strategies in fraud detection, in Proc. of KDD-99, pp:175-184, 1999.
    [3]
    P. Burge and J. Shawe-Taylor, Detecting cellular fraud using adaptive prototypes, in Proc. of AI Approaches to Fraud Detection and Risk Management, pp:9-13, 1997.
    [4]
    T. Fawcett and F. Provost, Adaptive fraud detection, Data Mining and Knowledge Discovery, vol.1, Kluwer Academic Publishers, Boston CA, pp:291-316 (1997).
    [5]
    http://www.hnc.com
    [6]
    http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
    [7]
    E. M. Knorr and R. T. Ng, Algorithms for mining distance-based outliers in large datasets, in Proc. of the 24th VLDB Conference,pp:392-403, 1998.
    [8]
    E. M. Knorr and R. T. Ng, Finding intensional knowledge of distance-based outliers, in Proc. of the 2Sth VLDB Conference, pp:211-222, 1999.
    [9]
    T. Lane and C.E. Brodley, Temporal sequence learning and data reduction for anomaly detection, ACM Trans. on Information and System Security, 2,pp:295-331 (1999).
    [10]
    W. Lee, S. J. Stolfo, and K. W. Mok, Mining audit data to build intrusion detection models, in Proc. of KDD-98, 1998.
    [11]
    H. Li and K. Yamanishi, Text classification using ESC-based stochastic decision lists, in Proc. of CIKM'99, pp:122-130 (1999).
    [12]
    Y. Moreau and J. Vandewalle, Detection of mobile phone fraud using supervised neural networks: a first prototype, Available via: ftp://ftp.esat.kuleuven.ac.jp/pub/SISTA/ moreau/reports/icann97_TR97-44.ps.
    [13]
    U. Murad and G. Pinkas, Unsupervised profiling for identifying superimposed fraud, in Proc. of PKDD'99, pp:251-261 (1999).
    [14]
    J. Rissanen, Fisher information and stochastic complexity, IEEE Trans. Inf. Theory, IT-42, 1, pp. 40-47 (1996).
    [15]
    R. M. Neal and G. E. Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants, ftp://ftp.cs.toronto.edu/pub/radford/www/publications.html 1993.
    [16]
    R.L. Rivest, Learning decision lists, Machine Learning, 2, pp:229-246, (1987).
    [17]
    S. Rosset, U. Murad, E. Neumann, Y. Idan, and G. Pinkas, Discovery of fraud rules for telecommunications-challenges and solutions, in Proe. of KDD-99, pp:409-413, 1999.
    [18]
    J.Takeuchi and K.Yamanishi, Empirical evaluation of an outlier detection engine SmartSifter, in Proc. of Symposium on Information and Its Applications (in Japanese), 2000.
    [19]
    K.Yamanishi, A learning criterion for stochastic rules, Machine Learning, Vol.9,pp:165-203 (1992).
    [20]
    K. Yamanishi, A decision-theoretic extension of stochastic complexity and its application to learning, IEEE Trans. on Inf. Theory, IT-44, pp.1424-1439 (1998).
    [21]
    K. Yamanishi, J.Takeuchi, G.Williams, and P.Milne, On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms, in Proc. of KDD2000, ACM Press, pp:250-254, (2000).

    Cited By

    View all
    • (2024)Enhancing Fraud Detection in Financial Transactions through Cyber Security MeasuresInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT241028110:2(364-371)Online publication date: 20-Apr-2024
    • (2023)Parameter EstimationLearning with the Minimum Description Length Principle10.1007/978-981-99-1790-7_2(47-90)Online publication date: 15-Sep-2023
    • (2022)Hybridizing graph‐based Gaussian mixture model with machine learning for classification of fraudulent transactionsComputational Intelligence10.1111/coin.1256138:6(2134-2160)Online publication date: 26-Nov-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2001
    493 pages
    ISBN:158113391X
    DOI:10.1145/502512
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 August 2001

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    KDD01
    Sponsor:

    Acceptance Rates

    KDD '01 Paper Acceptance Rate 31 of 237 submissions, 13%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhancing Fraud Detection in Financial Transactions through Cyber Security MeasuresInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT241028110:2(364-371)Online publication date: 20-Apr-2024
    • (2023)Parameter EstimationLearning with the Minimum Description Length Principle10.1007/978-981-99-1790-7_2(47-90)Online publication date: 15-Sep-2023
    • (2022)Hybridizing graph‐based Gaussian mixture model with machine learning for classification of fraudulent transactionsComputational Intelligence10.1111/coin.1256138:6(2134-2160)Online publication date: 26-Nov-2022
    • (2022)An unsupervised approach to discover filtering rules from diagnostic logs2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW55968.2022.00030(1-6)Online publication date: Oct-2022
    • (2022)Unsupervised Abnormal Traffic Detection through Topological Flow Analysis2022 14th International Conference on Communications (COMM)10.1109/COMM54429.2022.9817285(1-6)Online publication date: 16-Jun-2022
    • (2022)Cybersecurity and Fraud Detection in Financial TransactionsBig Data and Artificial Intelligence in Digital Finance10.1007/978-3-030-94590-9_15(269-278)Online publication date: 29-Apr-2022
    • (2021)Toward Capturing Scientific Evidence in Elderly Care: Efficient Extraction of Changing Facial Feature PointsSensors10.3390/s2120672621:20(6726)Online publication date: 10-Oct-2021
    • (2021)Consumer Fraud Detection via P-feature Conversion2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC51774.2021.00052(318-323)Online publication date: Jul-2021
    • (2020)Machine Learning Applications in Misuse and Anomaly DetectionEthics, Laws, and Policies for Privacy, Security, and Liability [Working Title]10.5772/intechopen.92653Online publication date: 19-Jun-2020
    • (2020)Introductory Chapter: Machine Learning in Misuse and Anomaly DetectionComputer and Network Security10.5772/intechopen.92168Online publication date: 10-Jun-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media