Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1458082.1458227acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

EDSC: efficient density-based subspace clustering

Published: 26 October 2008 Publication History
  • Get Citation Alerts
  • Abstract

    Subspace clustering mines clusters hidden in subspaces of high-dimensional data sets. Density-based approaches have been shown to successfully mine clusters of arbitrary shape even in the presence of noise in full space clustering. Exhaustive search of all density-based subspace clusters, however, results in infeasible runtimes for large high-dimensional data sets. This is due to the exponential number of possible subspace projections in addition to the high computational cost of density-based clustering.
    In this paper, we propose lossless efficient detection of density-based subspace clusters. In our EDSC (efficient density-based subspace clustering) algorithm we reduce the high computational cost of density-based subspace clustering by a complete multistep filter-and-refine algorithm. Our first hypercube filter step avoids exhaustive search of all regions in all subspaces by enclosing potentially density-based clusters in hypercubes. Our second filter step provides additional pruning based on a density monotonicity property. In the final refinement step, the exact unbiased density-based subspace clustering result is detected. As we prove that pruning is lossless in both filter steps, we guarantee completeness of the result.
    In thorough experiments on synthetic and real world data sets, we demonstrate substantial efficiency gains. Our lossless EDSC approach outperforms existing density-based subspace clustering algorithms by orders of magnitude.

    References

    [1]
    C. Aggarwal, J. Wolf, P. Yu, C. Procopiuc, and J. Park. Fast algorithms for projected clustering. In SIGMOD, pages 61--72, 1999.
    [2]
    C. Aggarwal and P. Yu. Finding generalized projected clusters in high dimensional spaces. In SIGMOD, pages 70--81, 2000.
    [3]
    R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD, pages 94--105, 1998.
    [4]
    I. Assent, R. Krieger, E. Müller, and T. Seidl. DUSC: Dimensionality unbiased subspace clustering. In ICDM, pages 409--414, 2007.
    [5]
    K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is nearest neighbors meaningful. In IDBT, pages 217--235, 1999.
    [6]
    C.-H. Cheng, A. W. Fu, and Y. Zhang. Entropy-based subspace clustering for mining numerical data. In KDD, pages 84--93, 1999.
    [7]
    M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In KDD, pages 226--231, 1996.
    [8]
    A. Hinneburg and D. Keim. An efficient approach to clustering in large multimedia databases with noise. In KDD, pages 58--65, 1998.
    [9]
    I. Joliffe. Principal Component Analysis. Springer, New York, 1986.
    [10]
    K. Kailing, H.-P. Kriegel, and P. Kröger. Density-connected subspace clustering for high-dimensional data. In SDM, pages 246--257, 2004.
    [11]
    K. Kailing, H.-P. Kriegel, P. Kröger, and S. Wanka. Ranking interesting subspaces for clustering high dimensional data. In PKDD, pages 241--252, 2003.
    [12]
    H.-P. Kriegel, P. Kröger, M. Renz, and S. Wurst. A generic framework for efficient subspace clustering of high-dimensional data. In ICDM, pages 250--257, 2005.
    [13]
    S. Lauritzen. The EM algorithm for graphical association models with missing data. Comp. Statistics & Data Analysis, 19:191--201, 1995.
    [14]
    J. MacQueen. Some methods for classification and analysis of multivariate observations. In Berkeley Symp. Math. stat. & prob., pages 281--297, 1967.
    [15]
    G. Moise, J. Sander, and M. Ester. P3C: A robust projected clustering algorithm. In ICDM, pages 414--425, 2006.
    [16]
    H. Nagesh, S. Goil, and A. Choudhary. MAFIA: Efficient and scalable subspace clustering for very large data sets. In TR 9906--010, NWU, 1999.
    [17]
    D. Newman, S. Hettich, C. Blake, and C. Merz. UCI repository of MLDBs, 1998.
    [18]
    K. Sequeira and M. Zaki. SCHISM: A new approach for interesting subspace mining. In ICDM, pages 186--193, 2004.
    [19]
    I. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, USA, 2005.

    Cited By

    View all

    Index Terms

    1. EDSC: efficient density-based subspace clustering

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
      October 2008
      1562 pages
      ISBN:9781595939913
      DOI:10.1145/1458082
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 October 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. data mining
      2. density-based clustering
      3. efficiency
      4. high-dimensional data
      5. subspace clustering

      Qualifiers

      • Research-article

      Conference

      CIKM08
      CIKM08: Conference on Information and Knowledge Management
      October 26 - 30, 2008
      California, Napa Valley, USA

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)2
      Reflects downloads up to

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Clustering High-Dimensional DataMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_11(219-237)Online publication date: 26-Feb-2023
      • (2019)Density‐based clusteringWIREs Data Mining and Knowledge Discovery10.1002/widm.134310:2Online publication date: 29-Oct-2019
      • (2018)Finding density-based subspace clusters in graphs with feature vectorsData Mining and Knowledge Discovery10.1007/s10618-012-0272-z25:2(243-269)Online publication date: 26-Dec-2018
      • (2018)Projective clustering ensemblesData Mining and Knowledge Discovery10.1007/s10618-012-0266-x26:3(452-511)Online publication date: 26-Dec-2018
      • (2018)A survey on enhanced subspace clusteringData Mining and Knowledge Discovery10.1007/s10618-012-0258-x26:2(332-397)Online publication date: 26-Dec-2018
      • (2018)A survey on unsupervised outlier detection in high-dimensional numerical dataStatistical Analysis and Data Mining10.1002/sam.111615:5(363-387)Online publication date: 20-Dec-2018
      • (2014)Frequent Pattern Mining Algorithms for Data ClusteringFrequent Pattern Mining10.1007/978-3-319-07821-2_16(403-423)Online publication date: 30-Aug-2014
      • (2012)Efficient Density-Based Subspace Clustering in High DimensionsRevised Selected Papers of the First International Workshop on Clustering High--Dimensional Data - Volume 762710.1007/978-3-662-48577-4_3(34-49)Online publication date: 15-May-2012
      • (2011)DB-CSCProceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I10.5555/2034063.2034112(565-580)Online publication date: 5-Sep-2011
      • (2011)Projected Gustafson-Kessel clustering algorithm and its convergenceTransactions on rough sets XIV10.5555/2017701.2017710(159-182)Online publication date: 1-Jan-2011
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media