Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICDM.2007.53guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection

Published: 28 October 2007 Publication History

Abstract

Identifying atypical objects is one of the traditional topics in machine learning. Recently, novel approaches, e.g., Minority Detection and One-class clustering, have explored further to identify clusters of atypical objects which strongly contrast from the rest of the data in terms of their distribution or density. This paper analyzes such tasks from an information theoretic perspective. Based on Information Bottleneck formalization, these tasks interpret to increasing the averaged atypicalness of the clusters while reducing the complexity of the clustering. This formalization yields a unifying view of the new approaches as well as the classic outlier detection. We also present a scalable minimization algorithm which exploits the localized form of the cost function over individual clusters. The proposed algorithm is evaluated using simulated datasets and a text classification benchmark, in comparison with an existing method.

Cited By

View all
  • (2018)Detection of variable length anomalous subsequences in data streamsInternational Journal of Intelligent Information and Database Systems10.1504/IJIIDS.2012.0470056:3(273-288)Online publication date: 15-Dec-2018
  • (2015)Incremental multiple instance outlier detectionNeural Computing and Applications10.1007/s00521-014-1750-626:4(957-968)Online publication date: 1-May-2015
  • (2014)Multiple queries with conditional attributes (QCATs) for anomaly detection and visualizationProceedings of the Eleventh Workshop on Visualization for Cyber Security10.1145/2671491.2671502(17-24)Online publication date: 10-Nov-2014
  • Show More Cited By
  1. Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICDM '07: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
    October 2007
    767 pages
    ISBN:0769530184

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 28 October 2007

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Detection of variable length anomalous subsequences in data streamsInternational Journal of Intelligent Information and Database Systems10.1504/IJIIDS.2012.0470056:3(273-288)Online publication date: 15-Dec-2018
    • (2015)Incremental multiple instance outlier detectionNeural Computing and Applications10.1007/s00521-014-1750-626:4(957-968)Online publication date: 1-May-2015
    • (2014)Multiple queries with conditional attributes (QCATs) for anomaly detection and visualizationProceedings of the Eleventh Workshop on Visualization for Cyber Security10.1145/2671491.2671502(17-24)Online publication date: 10-Nov-2014
    • (2011)Anomaly detection in categorical datasets using bayesian networksProceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part II10.5555/2045820.2045909(610-619)Online publication date: 24-Sep-2011
    • (2011)Latent feature encoding using dyadic and relational dataProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063926(2201-2204)Online publication date: 24-Oct-2011
    • (2009)Detection of unique temporal segments by information theoretic meta-clusteringProceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1557019.1557033(59-68)Online publication date: 28-Jun-2009
    • (2009)Anomaly detectionACM Computing Surveys (CSUR)10.1145/1541880.154188241:3(1-58)Online publication date: 30-Jul-2009

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media