Ranking Interesting Subspaces for Clustering High Dimensional Data

Kailing, Karin; Kriegel, Hans-Peter; Kröger, Peer; Wanka, Stefanie

doi:10.1007/978-3-540-39804-2_23

Karin Kailing¹⁰,
Hans-Peter Kriegel¹⁰,
Peer Kröger¹⁰ &
…
Stefanie Wanka¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2838))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2425 Accesses

Abstract

Application domains such as life sciences, e.g. molecular biology produce a tremendous amount of data which can no longer be managed without the help of efficient and effective data mining methods. One of the primary data mining tasks is clustering. However, traditional clustering algorithms often fail to detect meaningful clusters because of the high dimensional, inherently sparse feature space of most real-world data sets. Nevertheless, the data sets often contain clusters hidden in various subspaces of the original feature space. We present a pre-processing step for traditional clustering algorithms, which detects all interesting subspaces of high-dimensional data containing clusters. For this purpose, we define a quality criterion for the interestingness of a subspace and propose an efficient algorithm called RIS (Ranking Interesting Subspaces) to examine all such subspaces. A broad evaluation based on synthetic and real-world data sets empirically shows that RIS is suitable to find all relevant subspaces in large, high dimensional, sparse data and to rank them accordingly.

The work is supported in part by the German Ministery for Education, Science, Research and Technology (BMBF) under grant no. 031U112F within the BFAM (Bioinformatics for the Functional Analysis of Mammalian Genomes) project which is part of the German Genome Analysis Network (NGFN).

Download to read the full chapter text

Chapter PDF

Analyzing Subspace Clustering Approaches for High Dimensional Data

A meta-heuristic density-based subspace clustering algorithm for high-dimensional data

Article 21 June 2021

DACC: A Data Exploration Method for High-Dimensional Data Sets

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA (1998)
Google Scholar
Aggarwal, C.C., Procopiuc, C.: Fast Algorithms for Projected Clustering. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Philadelphia, PA (1999)
Google Scholar
Aggarwal, C., Yu, P.: Finding Generalized Projected Clusters in High Dimensional Space. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Dallas, TX (2000)
Google Scholar
Hinneburg, A., Keim, D.: Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proc. 25th Int. Conf. on Very Large Databases, Edinburgh, Scotland (1999)
Google Scholar
Cheng, C.H., Fu, A.C., Zhang, Y.: Entropy-Based Subspace Clustering for Mining Numerical Data. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery in Databases, San Diego, FL (1999)
Google Scholar
Goil, S., Nagesh, H., Choudhary, A.: MAFIA: Efficiant and Scalable Subspace Clustering for Very Large Data Sets. Tech. Report No. CPDC-TR-9906-010, Center for Parallel and Distributed Computing, Dept. of Electrical and Computer Engineering, Northwestern University (1999)
Google Scholar
Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M.: A Monte Carlo Algorithm for Fast Projective Clustering. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Madison, WI, pp. 418–427 (2002)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 291–316 (1996)
Google Scholar
Hinneburg, A., Keim, D.A.: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In: Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York City, NY, pp. 224–228 (1998)
Google Scholar
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering Points to Identify the Clustering Structure. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Philadelphia, PA, pp. 49–60 (1999)
Google Scholar
Berchtold, S., Böhm, C., Keim, D.A., Kriegel, H.P.: A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space. In: Proc. ACM PODS Symp. on Principles of Database Systems, Tucson, AZ, pp. 78–86 (1997)
Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Minneapolis, MN, pp. 94–105 (1994)
Google Scholar
Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., Futcher, B.: Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization. Molecular Biolology of the Cell 9, 3273–3297 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Computer Science, University of Munich, Oettingenstr. 67, 80538, Munich, Germany
Karin Kailing, Hans-Peter Kriegel, Peer Kröger & Stefanie Wanka

Authors

Karin Kailing
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Kriegel
View author publications
You can also search for this author in PubMed Google Scholar
Peer Kröger
View author publications
You can also search for this author in PubMed Google Scholar
Stefanie Wanka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač
Rudjer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia
Dragan Gamberger
Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Ljupčo Todorovski
Leiden Institute of Advanced Computer Science, Leiden University,
Hendrik Blockeel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kailing, K., Kriegel, HP., Kröger, P., Wanka, S. (2003). Ranking Interesting Subspaces for Clustering High Dimensional Data. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-540-39804-2_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Ranking Interesting Subspaces for Clustering High Dimensional Data

Abstract

Chapter PDF

Similar content being viewed by others

Analyzing Subspace Clustering Approaches for High Dimensional Data

A meta-heuristic density-based subspace clustering algorithm for high-dimensional data

DACC: A Data Exploration Method for High-Dimensional Data Sets

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Ranking Interesting Subspaces for Clustering High Dimensional Data

Abstract

Chapter PDF

Similar content being viewed by others

Analyzing Subspace Clustering Approaches for High Dimensional Data

A meta-heuristic density-based subspace clustering algorithm for high-dimensional data

DACC: A Data Exploration Method for High-Dimensional Data Sets

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation