Abstract
Clustering problems arise in various domains of science and engineering. A large number of methods have been developed to date. Kohonen self-organizing map (SOM) is a popular tool that maps a high-dimensional space onto a small number of dimensions by placing similar elements close together, forming clusters. Cluster analysis is often left to the user. In this paper we present a method and a set of tools to perform unsupervised SOM cluster analysis, determine cluster confidence and visualize the result as a tree facilitating comparison with existing hierarchical classifiers. We also introduce a distance measure for cluster trees that allows to select a SOM with the most confident clusters.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kohonen, T.: Self-Organizing Maps, 2nd edn. Springer Series in Information Sciences, vol. 30. Springer, Heidelberg (1997)
Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J.: SOM_PAK: the self-organizing map program package, 2nd edn. (1995)
Himberg, J.: A SOM based cluster visualization and its application for false coloring. In: IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN 2000), vol. 3, pp. 3587–3592 (2000)
Margush, T., McMorris, F.R.: Consensus n-trees. Bulletin of Mathematical Biology 43, 239–244 (1981)
Adams, E.N.: N-trees as nestings: complexity, similarity, and consensus. Journal of Classification 3, 299–317 (1986)
Fritzke, B.: A growing neural gas network learns topologies. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 625–632. MIT Press, Cambridge (1995)
Dittenbach, M., Rauber, A., Merkl, D.: Uncovering hierarchical structure in data using the growing hierarchical self-organizing map. Neurocomputing 48, 199–216 (2002)
Dopazo, J., Carazo, J.M.: Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. Journal of Molecular Evolution 44, 226–233 (1997)
Myers, E.W., Miller, W.: Optimal alignments in linear space. In: Computer Applications in Biosciences (CABIOS), vol. 4, pp. 11–17 (1988)
Waterman, M.S.: Parametric and ensemble sequence alignment algorithms. Bulletin of Mathematical Biology 56(4), 743–767 (1994)
Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. In: Proceedings of the National Academy of Sciences, USA, vol. 89, pp. 10915–10919 (1992)
Sjölander, K.: Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics 20(2), 170–179 (2004)
Horn, F., Bettler, E., Oliveira, L., Campagne, F., Cohen, F.E., Vriend, G.: GPCRDB information system for G protein-coupled receptors. Nucleic Acids Research 31(1), 294–297 (2003)
Thompson, J., Higgins, D., Gibson, T.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)
Jones, D.T., Taylor, W.R., Thornton, J.M.: The rapid generation of mutation data matrices from protein sequences. Computer Applications in Biosciences (CABIOS) 8, 275–282 (1992)
Felsenstein, J.: PHYLIP – phylogeny inference package (version 3.2). Cladistics 5, 164–166 (1989)
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)
Felsenstein, J.: Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985)
Samsonova, E.V., Bäck, T., Beukers, M.W., IJzerman, A.P., Kok, J.N.: Combining and comparing cluster methods in a receptor database. In: R. Berthold, M., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 341–351. Springer, Heidelberg (2003)
Samsonova, E.V., Kok, J.N., IJzerman, A.P.: TreeSOM: cluster analysis in the self-organizing map. In: Proceedings of the 5th Workshop On Self-Organizing Maps (2005) (in press)
Hanke, J., Reich, J.G.: Kohonen map as a visualization tool for the analysis of protein sequences: multiple alignments, domains and segments of secondary structures. Computer Applications in Biosciences (CABIOS) 12(6), 447–454 (1996)
Kohonen, T., Somervuo, P.: How to make large self-organizing maps for nonvectorial data. Neural Networks 15, 945–952 (2002)
Fischer, I.: Similarity-based neural networks for applications in computational molecular biology. In: R. Berthold, M., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 208–218. Springer, Heidelberg (2003)
Vesanto, J., Alhoniemi, E.: Clustering of the self-organizing map. IEEE Transactions on Neural Networks 11(3), 586–600 (2000)
Kraaijveld, M.A., Mao, J., Jain, A.K.: A non-linear projection method based on Kohonen’s topology preserving maps. In: Proceedings of the 11th International Conference on Pattern Recognition (11ICPR), pp. 41–45. IEEE Comput. Soc. Press, Los Alamitos (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Samsonova, E.V., Bäck, T., Kok, J.N., IJzerman, A.P. (2005). Reliable Hierarchical Clustering with the Self-organizing Map. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_35
Download citation
DOI: https://doi.org/10.1007/11552253_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28795-7
Online ISBN: 978-3-540-31926-9
eBook Packages: Computer ScienceComputer Science (R0)