Abstract
Normal mixture models are being increasingly used as a way of clustering sets of continuous multivariate data. They provide a probabilistic (soft) clustering of the data in terms of their fitted posterior probabilities of membership of the mixture components corresponding to the clusters. An outright (hard) clustering can be subsequently obtained by assigning each observation to the component to which it has the highest fitted posterior probability of belonging. However, outliers in the data can affect the estimates of the parameters in the normal component densities, and hence the implied clustering. A more robust approach is to fit mixtures of multivariate t-distributions, which have longer tails than the normal components. The expectation-maximization (EM) algorithm can be used to fit mixtures of t-distributions by maximum likelihood. The application of this model to provide a robust approach to clustering is illustrated on a real data set. It is demonstrated how the use of t-components provides less extreme estimates of the posterior probabilities of cluster membership.
Chapter PDF
Keywords
- Posterior Probability
- Normal Mixture
- Finite Mixture Model
- Normal Mixture Model
- Minimum Covariance Determinant
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Campbell, N.A. (1994). Mixture models and atypical values. Mathematical Geology 16, 465–477.
Campbell, N.A. and Mahon, R.J. (1974). A multivariate study of variation in two species of rock crab of genus Leptograpsus. Australian Journal of Zoology 22, 417–425.
Davé, R.N. and Krishnapuram, R. (1995). Robust clustering methods: a unified view. IEEE Transactions on Fuzzy Systems 5, 270–293.
Dempster, A.P., Laird, N.M., and and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B 39, 1–38.
De Veaux, R.D. and Kreiger, A.M. (1990). Robust estimation of a normal mixture. Statistics & Probability Letters 10, 1–7.
Frigui, H. and Krishnapuram, R. (1996). A robust algorithm for automatic extraction of an unknown number of clusters from noisy data. Pattern Recognition Letters 17, 1223–1232.
Hawkins, D.M. (1981). A new test for multivariate normality and homoscedasticity. Technometrics 23, 105–110.
Hampel, F.R. (1973). Robust estimation: a condensed partial survey. Z. Wahrscheinlickeitstheorie verw. Gebiete 27, 87–104.
Hawkins, D.M. and McLachlan, G.J. (1997). High-breakdown linear discriminant analysis. Journal of the American Statistical Association 92, 136–143.
Huber, P.J. (1964). Robust estimation of a location parameter. Annals of Mathematical Statistics 35, 73–101.
Kharin, Y. (1996). Robustness in Statistical Pattern Recognition. Dordrecht: Kluwer.
Liu, C. and Rubin, D.B. (1995). ML estimation of the t distribution using EM and its extensions, ECM and ECME. Statistica Sinica, 5, 19–39.
McLachlan, G.J. (1992). Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley.
McLachlan, G.J. (1999). Finite Mixture Models. New York: Wiley.
McLachlan, G.J. and Basford, K.E. (1988). Mixture Models: Inference and Applications to Clustering. New York: Marcel Dekker.
McLachlan, G.J. and Krishnan, T. (1997). The EM Algorithm and Extensions. New York: Wiley.
McLachlan, G.J., Peel, D., Basford, K.E., and Adams, P. (1997). MIXFIT: an algorithm for the automatic fitting and testing of normal mixture models. Unpublished manuscript.
Ripley, B.D. (1996). Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.
Rocke, D.M. and Woodruff, D.L. (1997). Robust estimation of multivariate location and shape. Journal of Statistical Planning and Inference 57, 245–255.
Rousseeuw, P.J., Kaufman, L., and Trauwaert, E. (1996). Fuzzy clustering using scatter matrices. Computational Statistics and Data Analysis 23, 135–151.
Zhuang, X., Huang, Y., Palaniappan, K., and Zhao, Y. (1996). Gaussian density mixture modeling, decomposition and applications. IEEE Transactions on Image Processing 5, 1293–1302.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McLachlan, G.J., Peel, D. (1998). Robust cluster analysis via mixtures of multivariate t-distributions. In: Amin, A., Dori, D., Pudil, P., Freeman, H. (eds) Advances in Pattern Recognition. SSPR /SPR 1998. Lecture Notes in Computer Science, vol 1451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0033290
Download citation
DOI: https://doi.org/10.1007/BFb0033290
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64858-1
Online ISBN: 978-3-540-68526-5
eBook Packages: Springer Book Archive