Clustering Based on Compressed Data for Categorical and Mixed Attributes

Rendón, Erendira; Sánchez, José Salvador

doi:10.1007/11815921_90

Erendira Rendón²¹ &
José Salvador Sánchez²²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4109))

Included in the following conference series:

Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR)

1806 Accesses
4 Citations

Abstract

Clustering in data mining is a discovery process that groups a set of data so as to maximize the intra-cluster similarity and to minimize the inter-cluster similarity. Clustering becomes more challenging when data are categorical and the amount of available memory is less than the size of the data set. In this paper, we introduce CBC (Clustering Based on Compressed Data), an extension of the Birch algorithm whose main characteristics refer to the fact that it can be especially suitable for very large databases and it can work both with categorical attributes and mixed features. Effectiveness and performance of the CBC procedure were compared with those of the well-known K-modes clustering algorithm, demonstrating that the CBC summary process does not affect the final clustering, while execution times can be drastically lessened.

Download to read the full chapter text

Chapter PDF

Data Compaction Through Simultaneous Selection of Prototypes and Features

Effective Data Clustering Algorithms

Categorical Data Clustering

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik, K.C.: LIMBO: scalable clustering of categorical data. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 123–146. Springer, Heidelberg (2004)
Chapter Google Scholar
Barbará, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: Proc. 11th Intl. Conf. on Information and Knowledge Management, pp. 582–589 (2002)
Google Scholar
Ganti, V., Gehrkeand, J., Ramakrishanan, R.: CACTUS — Clustering categorical data using summaries. In: Proc. 5th ACM Sigmod Intl. Conf. on Knowledge Discovery in Databases, pp. 73–83 (1999)
Google Scholar
Gowda, K., Diday, E.: Symbolic clustering using a new dissimilarity measure. Pattern Recognition 24, 567–578 (1991)
Article Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. In: Proc. of the IEEE Intl. Conf. on Data Engineering, pp. 512–521 (1999)
Google Scholar
Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proc. SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tech. Report 97–07, UBC, Dept. of Computer Science (1997)
Google Scholar
Ichino, M., Yaguchi, H.: Generalized Minkowski metrics for mixed feature-type data analysis. IEEE Trans. on Systems, Man and Cybernetics 24, 698–708 (1994)
Article MathSciNet Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, New York (1990)
Google Scholar
Milenova, B.L., Campos, M.M.: Clustering large databases with numeric and nominal values using orthogonal projection. In: Proc. 29th Intl. Conf. on Very Large Databases (2003)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proc. ACM-SIGMOD Intl. Conf. on Management of Data, pp. 103–114 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Av. Tecnológico s/n, 52140, Metepec, Mexico
Erendira Rendón
Dept. Llenguatges i Sistemes Informàtics, Universitat Jaume I, Av. Sos Baynat s/n, E-12071, Castelló de la Plana, Spain
José Salvador Sánchez

Authors

Erendira Rendón
View author publications
You can also search for this author in PubMed Google Scholar
José Salvador Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Hong Kong University of Science and Technology,
Dit-Yan Yeung
Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
James T. Kwok
Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal
Ana Fred
Department of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123, Cagliari, Italy
Fabio Roli
Faculty of Electrical Engineering, Mathematics and Computer Science, Information and Communication Theory Group, Delft University of Technology, Delft, The Netherlands
Dick de Ridder

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rendón, E., Sánchez, J.S. (2006). Clustering Based on Compressed Data for Categorical and Mixed Attributes. In: Yeung, DY., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2006. Lecture Notes in Computer Science, vol 4109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11815921_90

Download citation

DOI: https://doi.org/10.1007/11815921_90
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37236-3
Online ISBN: 978-3-540-37241-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Clustering Based on Compressed Data for Categorical and Mixed Attributes

Abstract

Chapter PDF

Similar content being viewed by others

Data Compaction Through Simultaneous Selection of Prototypes and Features

Effective Data Clustering Algorithms

Categorical Data Clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Clustering Based on Compressed Data for Categorical and Mixed Attributes

Abstract

Chapter PDF

Similar content being viewed by others

Data Compaction Through Simultaneous Selection of Prototypes and Features

Effective Data Clustering Algorithms

Categorical Data Clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation