Abstract
Probabilistic Distance (PD) Clustering is a non parametric probabilistic method to find homogeneous groups in multivariate datasets with J variables and n units. PD Clustering runs on an iterative algorithm and looks for a set of K group centers, maximising the empirical probabilities of belonging to a cluster of the n statistical units. As J becomes large the solution tends to become unstable. This paper extends the PD-Clustering to the context of Factorial clustering methods and shows that Tucker3 decomposition is a consistent transformation to project original data in a subspace defined according to the same PD-Clustering criterion. The method consists of a two step iterative procedure: a linear transformation of the initial data and PD-clustering on the transformed data. The integration of the PD Clustering and the Tucker3 factorial step makes the clustering more stable and lets us consider datasets with large J and let us use it in case of clusters not having elliptical form.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ben-Israel, A., & Iyigun, C. (2008). Probabilistic d-clustering. Journal of Classification, 25(1), 5–26.
Iyigun, C. (2007). Probabilistic distance clustering. Ph.D. thesis, New Brunswick Rutgers, The State University of New Jersey.
Jain, A. K. (2009). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666.
Kiers, H., & Kinderen, A. (2003). A fast method for choosing the numbers of components in tucker3 analysis. British Journal of Mathematical and Statistical Psychology, 56(1), 119–125.
Kroonenberg, P. (2008). Applied multiway data analysis. Ebooks Corporation, Baarn, Nederland.
Menardi, G. (2011). Density-based Silhouette diagnostics for clustering methods. Statistics and Computing, 21, 295–308.
Montanari, A., & Viroli, C. (2011). Maximum likelihood estimation of mixtures of factor analyzers. Computational Statistics and Data Analysis, 55, 2712–2723.
Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review SIGKDD Explorations Newsletter, 6, 90–105.
Tortora, C. (2011). Non-hierarchical clustering methods on factorial subspaces. Ph.D. thesis at Universitá di Napoli Federico II, Naples.
Tortora, C., Palumbo, F., & Gettler Summa, M. (2011). Factorial PD-clustering. Working paper. arXiv:1106.3830v1.
Vichi, M., & Kiers, H. (2001). Factorial k-means analysis for two way data. Computational Statistics and Data Analysis, 37, 29–64.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Tortora, C., Summa, M.G., Palumbo, F. (2013). Factor PD-Clustering. In: Lausen, B., Van den Poel, D., Ultsch, A. (eds) Algorithms from and for Nature and Life. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-00035-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-00035-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-00034-3
Online ISBN: 978-3-319-00035-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)