Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

MML clustering of multi-state, Poisson, vonMises circular and Gaussian distributions

Published: 01 January 2000 Publication History
  • Get Citation Alerts
  • Abstract

    Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also statistically consistent and efficient. We provide a brief overview of MML inductive inference (Wallace C.S. and Boulton D.M. 1968. Computer Journal, 11: 185–194; Wallace C.S. and Freeman P.R. 1987. J. Royal Statistical Society (Series B), 49: 240–252; Wallace C.S. and Dowe D.L. (1999). Computer Journal), and how it has both an information-theoretic and a Bayesian interpretation. We then outline how MML is used for statistical parameter estimation, and how the MML mixture modelling program, Snob (Wallace C.S. and Boulton D.M. 1968. Computer Journal, 11: 185–194; Wallace C.S. 1986. In: Proceedings of the Nineteenth Australian Computer Science Conference (ACSC-9), Vol. 8, Monash University, Australia, pp. 357–366; Wallace C.S. and Dowe D.L. 1994b. In: Zhang C. et al. (Eds.), Proc. 7th Australian Joint Conf. on Artif. Intelligence. World Scientific, Singapore, pp. 37–44. See http://www.csse.monash.edu.au/-dld/Snob.html) uses the message lengths from various parameter estimates to enable it to combine parameter estimation with selection of the number of components and estimation of the relative abundances of the components. The message length is (to within a constant) the logarithm of the posterior probability (not a posterior density) of the theory. So, the MML theory can also be regarded as the theory with the highest posterior probability. Snob currently assumes that variables are uncorrelated within each component, and permits multi-variate data from Gaussian, discrete multi-category (or multi-state or multinomial), Poisson and von Mises circular distributions, as well as missing data. Additionally, Snob can do fully-parameterised mixture modelling, estimating the latent class assignments in addition to estimating the number of components, the relative abundances of the parameters and the component parameters. We also report on extensions of Snob for data which has sequential or spatial correlations between observations, or correlations between attributes.

    References

    [1]
    Barron A.R. and Cover T.M. 1991. Minimum complexity density estimation. IEEE Transactions on Information Theory 37: 1034-1054.
    [2]
    Baxter R.A. and Oliver J.J. 1997. Finding overlapping distributions with MML. Statistics and Computing 10(1): 5-16.
    [3]
    Boulton D.M. 1975. The information criterion for intrinsic classification. Ph. D. Thesis, Dept. Computer Science, Monash University, Australia.
    [4]
    Boulton D.M. and Wallace C.S. 1969. The information content of a multistate distribution. Journal of Theoretical Biology 23: 269-278.
    [5]
    Boulton D.M. and Wallace C.S. 1970. A program for numerical classification. Computer Journal 13: 63-69.
    [6]
    Boulton D.M. and Wallace C.S. 1973a. An information measure for hierarchic classification. The Computer Journal 16: 254-261.
    [7]
    Boulton D.M. and Wallace C.S. 1973b. A comparison between information measure classification. In: Proceedings of ANZAAS Congress, Perth.
    [8]
    Boulton D.M. and Wallace. C.S. 1975. An information measure for single-link classification. The Computer Journal 18(3): 236-238.
    [9]
    Chaitin. G.J. 1966. On the length of programs for computing finite sequences. Journal of the Association for Computing Machinery 13: 547-549.
    [10]
    Cheeseman P., Self M., Kelly J., Taylor W., Freeman D., and Stutz J. 1988. Bayesian classification. In: Seventh National Conference on Artificial Intelligence, Saint Paul, Minnesota, pp. 607- 611.
    [11]
    Conway J.H. and Sloane N.J.A. 1988. Sphere Packings, Lattices and Groups. London, Springer Verlag.
    [12]
    Dellaportas P., Karlis D., and Xekalaki E. 1997. Bayesian Analysis of Finite Poisson Mixtures. Technical Report No. 32. Department of Statistics, Athens University of Economics and Business, Greece.
    [13]
    Dowe D.L., Allison L., Dix T.I., Hunter L., Wallace C.S., and Edgoose T. 1996. Circular clustering of protein dihedral angles by minimum message length. In: Proc. 1st Pacific Symp. Biocomp., HI, U.S.A., pp. 242-255.
    [14]
    Dowe D.L., Baxter R.A., Oliver J.J., and Wallace C.S. 1998. Point estimation using the Kullback-Leibler loss function and MML. In: Proc. 2nd Pacific Asian Conference on Knowledge Discovery and Data Mining (PAKDD'98), Melbourne, Australia. Springer Verlag, pp. 87-95.
    [15]
    Dowe D.L. and Korb K.B. 1996. Conceptual difficulties with the efficient market hypothesis: towards a naturalized economics. In: Dowe D.L., Korb K.B., and Oliver J.J. (Eds.), Proceedings of the Information, Statistics and Induction in Science (ISIS) Conference, Melbourne, Australia. World Scientific, pp. 212-223.
    [16]
    Dowe D.L., Oliver J.J., Baxter R.A., and Wallace C.S. 1995. Bayesian estimation of the von Mises concentration parameter. In: Proc. 15th Maximum Entropy Conference, Santa Fe, New Mexico.
    [17]
    Dowe D.L., Oliver J.J., and Wallace C.S. 1996. MML estimation of the parameters of the spherical Fisher distribution. In: Sharma A. et al. (Eds.), Proc. 7th Conf. Algorithmic Learning Theory (ALT'96), LNAI 1160, Sydney, Australia, pp. 213- 227.
    [18]
    Dowe D.L. and Wallace. C.S. 1997. Resolving the Neyman-Scott problem by minimum message length. In: Proc. Computing Science and Statistics - 28th Symposium on the Interface, Vol. 28, pp. 614-618.
    [19]
    Dowe D.L. and Wallace C.S. 1998. Kolmogorov complexity, minimum message lenth and inverse learning. In: Proc. 14th Australian Statistical Conference (ASC-14), Gold Coast, Qld., Australia, pp. 144.
    [20]
    Edgoose T.C. and Allison L. 1998. Unsupervised markov classification of sequenced data using MML. In: McDonald C. (Ed.), Proc. 21st Australasian Computer Science Conference (ACSC'98), Singapore. Springer-Verlag, ISBN: 981-3083-90-5, pp. 81-94.
    [21]
    Edgoose T.C., Allison L., and Dowe D.L. 1998. An MML classification of protein structure that knows about angles and sequences. In: Proc. 3rd Pacific Symp. Biocomp. (PSB-98) HI, U.S.A., pp. 585- 596.
    [22]
    Edwards R.T. and Dowe D.L. 1998. Single factor analysis in MML mixture modelling. In: Proc. 2nd Pacific Asian Conference on Knowledge Discovery and Data Mining (PAKDD'98), Melbourne, Australia. springer Verlag, pp. 96-109.
    [23]
    Everitt B.S. and Hand D.J. 1981. Finite Mixture Distributions. London, Chapman and Hall.
    [24]
    Fisher D.H. 1987. Conceptual clustering, learning from examples, and inference. In: Machine Learning: Proceedings of the Fourth International Workshop. Morgan Kaufmann, pp. 38-49.
    [25]
    Fisher N.I. 1993. Statistical Analysis of Circular Data. Cambridge University Press.
    [26]
    Fraley C. and Raftery A.E. 1998. Mclust: software for model-based clustering and discriminant analysis. Technical Report TR 342, Department of Statistics, Univeristy of Washington, U.S.A. Journal of Classification, to appear.
    [27]
    Georgeff M.P. and Wallace C.S. 1984. A general criterion for inductive inference. In: O'Shea T. (Ed.), Advances in Artificial Intelligence: Proc. Sixth European Conference on Artificial Intelligence, Amsterdam. North Holland, pp. 473-482.
    [28]
    Hunt L.A. and Jorgensen M.A. 1999. Mixture model clustering using the multimix program. Australian and New Zealand Journal of statistics 41(2): 153-171.
    [29]
    Jorgensen M.A. and Hunt L.A. 1996. Mixture modelling clustering of data sets with categorical and continous variables. In: Dowe D.L., Korb K.B., and Oliver J.J. (Eds.), Proceedings of the Information, Statistics and Induction in Science (ISIS) Conference, Melbourne, Australia. World Scientific, pp. 375-384.
    [30]
    Kearns M., Mansour Y., Ng A.Y., and Ron D. 1997. An experimental and theoretical comparison of model selection methods. Machine Learning 27: 7-50.
    [31]
    Kissane D.W., Bloch S., Dowe D.L., Snyder R.D., Onghena P., McKenzie D.P., and Wallace C.S. 1996. The Melbourne family grief study, I: Perceptions of family functioning in bereavement. American Journal of Psychiatry 153: 650-658.
    [32]
    Mardia K.V. 1972. Statistics of Directional Data. Academic Press.
    [33]
    McLachlan G.J. 1992. Discriminant Analysis and Statistical Pattern Recognition. New York, Wiley.
    [34]
    McLachlan G.J. and Basford. K.E. 1998. Mixture Models. New York, Marcel Dekker.
    [35]
    McLachlan G.J. and Krishnan T. 1996. The EM Algorithm and Extensions. New York, Wiley.
    [36]
    McLachlan G.J., Peel D., Basford K.E., and Adams P. 1999. The EMMIX software for the fitting of mixtures of Normal and t-components. Journal of Statistical Software 4, 1999.
    [37]
    Neal R.M. 1998. Markov chain sampling methods for dirichlet process mixture models. Technical Report 9815, Dept. of Statistics and Dept. of Computer Science, University of Toronto, Canada, pp. 17.
    [38]
    Neyman J. and Scott E.L. 1948. Consistent estimates based on partially consistent observations. Econometrika 16: 1-32.
    [39]
    Oliver J. Baxter R., and Wallace C. 1996. Unsupervised learning using MML. In: Proc. 13th International Conf. Machine Learning (ICML 96), San Francisco, CA. Morgan Kaufmann, pp. 364- 372.
    [40]
    Oliver J.J. and Dowe D.L. 1996. Minimum message length mixture modelling of spherical von Mises-Fisher distributions. In: Proc. Sydney International Statistical Congress (SISC-96), Sydney, Australia, p. 198.
    [41]
    Patrick J.D. 1991. Snob: A program for discriminating between classes. Technical report TR 151, Dept. of Computer Science, Monash University, Clayton, Victoria 3168, Australia.
    [42]
    Prior M., Eisenmajer R., Leekam S., Wing L., Gould J., Ong B., and Dowe D.L. 1998. Are there subgroups within the autistic spectrum? A cluster analysis of a group of children with autistic spectrum disorders. J. child Psychol. Psychiat. 39(6): 893-902.
    [43]
    Rissanen. J.J. 1978. Modeling by shortest data description. Automatica, 14: 465-471.
    [44]
    Rissanen. J.J. 1989. Stochastic Complexity in Statistical Inquiry. Singapore, World Scientific.
    [45]
    Rissanen J.J. and Ristad E.S. 1994. Unsupervised Classfication with stochastic complexity. In: Bozdogan H. et al. (Ed.), Proc. of the First US/Japan Conf. on the Frontiers of Statistical Modeling: An Informational Approach. Kluwer Academic Publishers, pp. 171- 182.
    [46]
    Roeder K. 1994. A graphical technique for determining the number of components in a mixture of normals. Journal of the American Statistical Association 89(426): 487-495.
    [47]
    Schou G. 1978. Estimation of the concentration parameter in von Mises-Fisher distributions. Biometrika 65: 369-377.
    [48]
    Solomonoff R.J. 1964. A formal theory of inductive inference. Information and Control 7: 1-22, 224-254.
    [49]
    Solomonoff R.J. 1995. The discovery of algorithmic probability: A guide for the programming of true creativity. In: Vitanyi P. (Ed.), Computational Learning Theory: EuroCOLT'95. Springer-Verlag, pp. 1-22.
    [50]
    Stutz J. and Cheeseman P. 1994. Autoclass: A Bayesian approach to classfication. In: Skilling J. and Subuiso S. (Eds.), Maximum Entropy and Bayesian Methods. Dordrecht, Kluwer academic.
    [51]
    Titterington D.M., Smith A.F.M., and Makov U.E. 1985. Statistical Analysis of Finite Mixture Distributions. John Wiley and Sons, Inc.
    [52]
    Vapnik V.N. 1995. The Nature of Statistical Learning Theory. Springer.
    [53]
    Viswanathan M. and Wallace C.S. 1999. A note on the comparison of polynomial selection methods. In: Proc. 7th Int. Workshop on Artif. Intelligence and Statistics. Morgan Kaufmann, pp. 169-177.
    [54]
    Viswanathan M., Wallace C.S., Dowe D.L., and Korb K.B. 1999. Finding cutpoints in Noisy Binary Sequences. In: Proc. 12th Australian Joint Conf. on Artif. Intelligence.
    [55]
    Wahba G. 1990. Spline Models for Observational Data. SIAM.
    [56]
    Wallace C.S. 1986. An improved program for classfication. In: Proceedings of the Nineteenth Australian Computer Science Conference (ACSC-9), Vol. 8, Monash University, Australia, pp. 357- 366.
    [57]
    Wallace C.S. 1990. Classfication by Minimum Message Length inference. In: Goos G. and Hartmanis J. (Eds.), Advances in Computing and Information - ICCI'90. Berlin, Springer-Verlag, pp. 72-81.
    [58]
    Wallace C.S. 1995. Multiple factor analysis by MML estimation. Technical Report 95/218, Dept. of Computer Science, Monash University, Clayton, Victoria 3168, Australia. J. Multiv. Analysis, (to appear).
    [59]
    Wallace C.S. 1989. False Oracles and SMML Estimators. In: Dowe D.L., Korb K.B., and Oliver J.J. (Eds.), Proceedings of the Information, Statistics and Induction in science (ISIS) Conference, Melbourne, Australia. World Scientific, pp. 304-316, Tech Rept 89/128, Dept. Comp. Sci., Monash Univ., Australia.
    [60]
    Wallace C.S. 1998. Intrinsic Classification of Spatially-Correlated Data. Computer Journal 41(8): 602-611.
    [61]
    Wallace C.S. and Boulton D.M. 1968. An information measure for classification. Computer Journal 11: 185-194.
    [62]
    Wallace C.S. and Boulton D.M. 1975. An invariant Bayes method for point estimation. Classification Society Bulletin 3(3): 11-34.
    [63]
    Wallace C.S. and Dowe D.L. 1993. MML estimation of the von Mises concentration parameter. Technical Report TR 93/193, Dept. of Comp. Sci., Monash Univ., Clayton 3168, Australia. Aust. and N.Z. J. Stat, prov. accepted.
    [64]
    Wallace C.S. and Dowe D.L. 1994. Estimation of the von Mises concentration parameter using minimum message length. In: Proc. 12th Australian Statistical Soc. Conf., Monash University, Australia.
    [65]
    Wallace C.S. and Dowe D.L. 1994. Intrinsic classification by MML - the Snob program. In: Zhang C. et al. (Eds.), Proc. 7th Australia Joint Conf. on Artif. Intelligence. World Scientific, Singapore, pp. 37-44. See http://www.csse.monash.edu.au/-dld/Snob.html.
    [66]
    Wallace C.S. and Dowe D.L. 1996. MML mixture modelling of Multistate, Poisson, von Mises circular and Gaussian distributions. In: Proc. Sydney International Statistical Congress (SISC-96), Sydney, Australia, p. 197.
    [67]
    Wallace C.S. and Dowe D.L. 1997. MML mixture modelling of Multistate, Poisson, von Mises circular and Gaussian distributions. In: Proc. 6th Int. Workshop on Artif. Intelligence and Statistics, pp. 529-536.
    [68]
    Wallace C.S. and Dowe D.L. 1999. Minimum Message Length and Kolmogorov Complexity. Computer Journal (Special issue on Kolmogorov Complexity) 42(4): 270-283.
    [69]
    Wallace C.S. and Freeman P.R. 1987. Estimation and inference by compact coding. J. Royal Statistical Society (Series B), 49: 240-252.
    [70]
    Wallace C.S. and Freeman P.R. 1992. Single factor analysis by MML estimation. Journal of the Royal Statistical Society (Series B) 54: 195-209.

    Cited By

    View all
    • (2024)Explainable finite mixture of mixtures of bounded asymmetric generalized Gaussian and Uniform distributions learning for energy demand managementACM Transactions on Intelligent Systems and Technology10.1145/365398015:4(1-26)Online publication date: 26-Mar-2024
    • (2023)Finite Multivariate McDonald's Beta Mixture Model Learning Approach in Medical ApplicationsProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577650(1143-1150)Online publication date: 27-Mar-2023
    • (2023)Minimum Message Length Inference of the Weibull Distribution with Complete and Censored DataAI 2023: Advances in Artificial Intelligence10.1007/978-981-99-8388-9_24(291-303)Online publication date: 28-Nov-2023
    • Show More Cited By

    Index Terms

    1. MML clustering of multi-state, Poisson, vonMises circular and Gaussian distributions
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Statistics and Computing
        Statistics and Computing  Volume 10, Issue 1
        January 2000
        80 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 01 January 2000

        Author Tags

        1. MML
        2. Snob
        3. classification
        4. clustering
        5. coding
        6. induction
        7. information theory
        8. intrinsic classification
        9. machine learning
        10. minimum message length
        11. mixture modelling
        12. numerical taxonomy
        13. statistical inference
        14. unsupervised learning

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Explainable finite mixture of mixtures of bounded asymmetric generalized Gaussian and Uniform distributions learning for energy demand managementACM Transactions on Intelligent Systems and Technology10.1145/365398015:4(1-26)Online publication date: 26-Mar-2024
        • (2023)Finite Multivariate McDonald's Beta Mixture Model Learning Approach in Medical ApplicationsProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577650(1143-1150)Online publication date: 27-Mar-2023
        • (2023)Minimum Message Length Inference of the Weibull Distribution with Complete and Censored DataAI 2023: Advances in Artificial Intelligence10.1007/978-981-99-8388-9_24(291-303)Online publication date: 28-Nov-2023
        • (2022)Deep Learning for Time Series Forecasting: Tutorial and Literature SurveyACM Computing Surveys10.1145/353338255:6(1-36)Online publication date: 7-Dec-2022
        • (2022)A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count dataPattern Analysis & Applications10.1007/s10044-022-01094-z26:1(91-106)Online publication date: 27-Jul-2022
        • (2021)Mixture-Based Unsupervised Learning for Positively Correlated Count DataIntelligent Information and Database Systems10.1007/978-3-030-73280-6_12(144-154)Online publication date: 7-Apr-2021
        • (2020)Probabilistic Modeling for Frequency Vectors Using a Flexible Shifted-Scaled Dirichlet Distribution PriorACM Transactions on Knowledge Discovery from Data10.1145/340624214:6(1-35)Online publication date: 28-Sep-2020
        • (2019)Model selection and application to high-dimensional count data clusteringApplied Intelligence10.1007/s10489-018-1333-949:4(1467-1488)Online publication date: 1-Apr-2019
        • (2019)Peak-Hour Rail Demand Shifting with Discrete OptimisationPrinciples and Practice of Constraint Programming10.1007/978-3-030-30048-7_43(748-763)Online publication date: 30-Sep-2019
        • (2015)Bayesian versus data driven model selection for microarray dataNatural Computing: an international journal10.1007/s11047-014-9446-514:3(393-402)Online publication date: 1-Sep-2015
        • Show More Cited By

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media