Abstract
In the fuzzy k-modes clustering, there is just one membership degree of interest by class for each individual which cannot be sufficient to model ambiguity of data precisely. It is known that the essence of a multivariate thinking allows to expose the inherent structure and meaning revealed within a set of variables classified. In this paper, a multivariate approach for membership degrees is presented to better handle ambiguous data that share properties of different clusters. This method is compared with other fuzzy k-modes methods of the literature based on a multivariate internal index that is also proposed in this paper. Synthetic and real categorical data sets are considered in this study.
Similar content being viewed by others
References
Dodge Y (2008) The concise encyclopedia of statistics. Springer, New York
Kaufman L, Rouseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Xu R, Wunsch D (2005) Survey of clustering algorithms. Trans Neural Netw 16:645–678
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York
Ismail MA, Selim SZ (1986) Fuzzy K-means: optimality of solutions and effective termination of the problem. Pattern Recognit 19:481–485
Simar L, Hardle W (2007) Applied multivariate statistical analysis, 2nd edn. Springer, New York
Ralambondrainy H (1995) A conceptual version of the k-means algorithm. Pattern Recognit Lett 16:1147–1157
Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the first Pacific Asia knowledge discovery and data mining conference, vol 4, pp 21–34
Huang Z, Ng MK (1999) A fuzzy k-modes algorithm for clustering categorical data. Trans Fuzzy Syst 7:446–452
Zadeh L (1995) Fuzzy sets. Inf Control 3:338–353
Ruspini EH (1969) A new approach to clustering. Inf Control 15:22–32
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Tamura S, Higuchi S, Tanaka K (1971) Pattern classification based on fuzzy relations. Trans Syst 1:61–66
Pimentel BA, Souza RMCR (2013) A multivariate fuzzy c-means method. Appl Soft Comput 13(4):1592–1607
Trigo MM (2005) Using fuzzy k-modes to analyse patterns of system calls for intrusion detection. Thesis, California State University
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:283–286
Campello RJGB, Hruschka ER (2006) A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Syst 157:2858–2875
Hullermeier E, Rifqi M, Henzgen S, Senge R (2012) Comparing fuzzy partitions: a generalization of the Rand index and related measures. Trans Fuzzy Syst 20:546–556
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850
Mingoti SA, Matos RA (2012) Clustering algorithms for categorical data: a Monte Carlo study. Int J Stat Appl 2:24–32
Acknowledgments
The authors would like to thank Brazilian agencies CNPq (National Council for Scientific and Technological Development) and CAPES (Coordination for the Improvement of Higher Education Personnel) for financial support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Maciel, D.B.M., Amaral, G.J.A., de Souza, R.M.C.R. et al. Multivariate fuzzy k-modes algorithm. Pattern Anal Applic 20, 59–71 (2017). https://doi.org/10.1007/s10044-015-0465-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-015-0465-3