Abstract
In this paper we introduce a general framework for hierarchical clustering that deals with both static and dynamic data sets. From this framework, different hierarchical agglomerative algorithms can be obtained, by specifying an inter-cluster similarity measure, a subgraph of the β-similarity graph, and a cover algorithm. A new clustering algorithm called Hierarchical Compact Algorithm and its dynamic version are presented, which are specific versions of the proposed framework. Our evaluation experiments on several standard document collections show that this algorithm requires less computational time than standard methods in dynamic data sets while achieving a comparable or even better clustering quality. Therefore, we advocate its use for tasks that require dynamic clustering, such as information organization, creation of document taxonomies and hierarchical topic detection.
Chapter PDF
Similar content being viewed by others
Keywords
- Cluster Algorithm
- Document Cluster
- Hierarchical Cluster Algorithm
- Hierarchical Cluster Method
- Cover Algorithm
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Text REtrieval Conference (TREC), http://trec.nist.gov
TDT2 collection, version 4.0 (1998), http://www.nist.gov/speech/tests/tdt.html
Carpineto, C., Romano, G.: A lattice conceptual clustering system and its application to browsing retrieval. Machine Learning 24(2), 95–122 (1996)
Carrasco-Ochoa, J.A., Ruiz-Shulcloper, J., De la Vega-Doria, L.A.: Sensitivity analysis for beta0-compact sets. In: VI Iberoamerican Symposium on Pattern Recognition, pp. 14–19 (2001)
Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. In: 29th Annual Symposium on Theory of Computing, pp. 626–635 (1997)
Hammouda, K.M., Kamel, M.S.: Efficient phrase-based document indexing for web document clustering. IEEE Transactions on Knowledge and Data Engineering 16(10), 1279–1296 (2004)
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: KDD 1999, pp. 16–22 (1999)
Lewis, D.: Reuters-21578 text collection, version 1.2., http://kdd.ics.uci.edu
Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulcloper, J.: On-line event and topic detection by using the compact sets clustering algorithm. Journal of Intelligent and Fuzzy Systems 3(4), 185–194 (2002)
Wai-chiu, W., Wai-chee Fu, A.: Incremental document clustering for web page classification. In: IEEE 2000 International Conference on Information Society in the 21st Century: Emerging technologies and new challenges (2000)
Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: International Conference on Information and Knowledge Management, pp. 515–524 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gil-García, R., Badía-Contelles, J.M., Pons-Porrata, A. (2005). Dynamic Hierarchical Compact Clustering Algorithm. In: Sanfeliu, A., Cortés, M.L. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2005. Lecture Notes in Computer Science, vol 3773. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11578079_32
Download citation
DOI: https://doi.org/10.1007/11578079_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29850-2
Online ISBN: 978-3-540-32242-9
eBook Packages: Computer ScienceComputer Science (R0)