Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Inference in model-based cluster analysis

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the within-group covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST software available in S-PLUS and StatLib. However, it has several limitations: there is no assessment of the uncertainty about the classification, the partition can be suboptimal, parameter estimates are biased, the shape matrix has to be specified by the user, prior group probabilities are assumed to be equal, the method for choosing the number of groups is based on a crude approximation, and no formal way of choosing between the various possible models is included. Here, we propose a new approach which overcomes all these difficulties. It consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors (for choosing the model and the number of groups) from the output using the Laplace–Metropolis estimator. It works well in several real and simulated examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Banfield, J. D. and Raftery, A. E. (1993) Model-based Gaussian and non Gaussian Clustering. Biometrics, 49, 803–21.

    Google Scholar 

  • Celeux, G. and Govaert, G. (1995) Gaussian parsimonious clustering models. Pattern Recognition, 28, 781–93.

    Google Scholar 

  • Celeux, G. and Robert, C. (1993) Une histoire de discrétisation (avec commentaires). La Revue de Modulad, 11, 7–44.

    Google Scholar 

  • Diebolt, J. and Robert, C. P. (1994) Bayesian estimation of finite mixture distributions. Journal of the Royal Statistical Society, Series B, 56, 363–75.

    Google Scholar 

  • Edwards, W., Lindman, H. and Savage, L. J. (1963) Bayesian statistical inference for psychological research. Psychological Review, 70, 193–242.

    Google Scholar 

  • Kass, R. E. and Raftery, A. E. (1995) Bayes factors. Journal of the American Statistical Association, 90, 773–95.

    Google Scholar 

  • Lavine, M. and West, M. (1992) A Bayesian method for classification and discrimination. The Canadian Journal of Statistics, 20, 451–61.

    Google Scholar 

  • Lewis, S. M. and Raftery, A. E. (1997) Estimating Bayes factors via posterior simulation with the Laplace-Metropolis estimator. Journal of the American Statistical Association, to appear.

  • Marriott, F. H. C. (1975) Separating mixtures of normal distributions. Biometrics, 31, 767–9.

    Google Scholar 

  • McLachlan, G. J. and Basford, K. E. (1988) Mixture Models, Inference and Applications to Clustering. New York, Marcel Dekker.

    Google Scholar 

  • Murtagh, F. and Raftery, A. E. (1984) Fitting straight lines to point patterns. Pattern Recognition, 17, 479–83.

    Google Scholar 

  • Raftery, A. E. (1996a) Hypothesis testing and model selection via posterior simulation. In Practical Markov Chain Monte Carlo (W. R. Gilks, D. J. Spiegelhalter and S. Richardson, eds), London: Chapman and Hall, pp. 163–88.

    Google Scholar 

  • Raftery, A. E. (1996b) Approximate Bayes factors and accounting for model uncertainty in generalized linear models. Biometrika, 83, 251–66.

    Google Scholar 

  • Raftery, A. E. and Lewis, S. M. (1993) How many iterations in the Gibbs sampler? In Bayesian Statistics 4 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds), Oxford University Press, pp. 763–73.

  • Raftery, A. E. and Lewis, S. M. (1996) Implementing MCMC. In Practical Markov Chain Monte Carlo (W. R. Gilks, D. J. Spiegelhalter and S. Richardson, eds), London: Chapman and Hall, pp. 115–30.

    Google Scholar 

  • Raftery, A. E., Madigan, D. and Hoeting, J. A. (1996) Accounting for model uncertainty in linear regression. Journal of the American Statistical Association, 91, to appear.

  • Robert, C. P. (1993) Convergence assessment of MCMC methods. Rapport Technique CREST, INSEE, Paris.

  • Smith, A. F. M. and Roberts, G. O. (1993) Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. Journal of the Royal Statistical Society, Series B, 55, 3–23.

    Google Scholar 

  • Soubiran, C. (1993) Kinematics of the Galaxy's stellar population from a proper motion survey. Astronomy and Astrophysics, 274, 181–8.

    Google Scholar 

  • Soubiran, C., Celeux, G., Diebolt, J. and Robert, C. P. (1991) Analyse de mélanges gaussiens pour de petits échantillons: application àla cinématique stellaire. Revue de Statistique Appliquée, 39, 3, 17–36.

    Google Scholar 

  • Tanner, M. and Wong, W. (1987) The calculation of posterior distribution by data augmentation (with Discussion). Journal of the American Statistical Association, 82, 528–50.

    Google Scholar 

  • Tierney, L. and Kadane, J. B. (1986) Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81, 82–6.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bensmail, H., Celeux, G., Raftery, A.E. et al. Inference in model-based cluster analysis. Statistics and Computing 7, 1–10 (1997). https://doi.org/10.1023/A:1018510926151

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018510926151