Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3097983.3098110acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Decomposed Normalized Maximum Likelihood Codelength Criterion for Selecting Hierarchical Latent Variable Models

Published: 04 August 2017 Publication History

Abstract

We propose a new model selection criterion based on the minimum description length principle in a name of the decomposed normalized maximum likelihood criterion. Our criterion can be applied to a large class of hierarchical latent variable models, such as the Naive Bayes models, stochastic block models and latent Dirichlet allocations, for which many conventional information criteria cannot be straightforwardly applied due to irregularity of latent variable models. Our method also has an advantage that it can be exactly evaluated without asymptotic approximation with small time complexity. Our experiments using synthetic and real data demonstrated validity of our method in terms of computational efficiency and model selection accuracy, while our criterion especially dominated the other criteria when sample size is small and when data are noisy.

References

[1]
E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research Vol. 9 (2008), 1981--2014.
[2]
H. Akaike. 1974. A new look at the statistical model identification. IEEE Trans. on Automatic Control Vol. 19, 6 (1974), 716--723.
[3]
C. C. Ana. 2007. Improving Methods for Single-label Text Categorization. PhD Thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa. (2007).
[4]
D. M. Blei and M. I. Jordan 2003. Modeling annotated data. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 127--134.
[5]
D. M Blei, A. Y. Ng, and M. I. Jordan 2003. Latent dirichlet allocation. Journal of Machine Learning Research Vol. 3 (2003), 993--1022.
[6]
J. J. Daudin, F. Picard, and S. Robin 2008. A mixture model for random graphs. Statistics and Computing Vol. 18, 2 (2008), 173--183.
[7]
S. Hirai and K. Yamanishi 2013. Efficient computation of normalized maximum likelihood codes for Gaussian mixture models with its applications to clustering. IEEE Transactions on Information Theory Vol. 59, 11 (2013), 7718--7727.
[8]
Y. Ito, S. Oeda, and K. Yamanishi 2016. Rank Selection for Non-negative Matrix Factorization with Normalized Maximum Likelihood Coding. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 720--728.
[9]
C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda 2006. Learning systems of concepts with an infinite relational model AAAI, Vol. Vol. 3. 5.
[10]
P. Kontkanen and P. Myllymäki 2007. A linear-time algorithm for computing the multinomial stochastic complexity. Inform. Process. Lett. Vol. 103, 6 (2007), 227--233.
[11]
P. Kontkanen, P. Myllym"aki, W. Buntine, J. Rissanen, and H. Tirri 2005. An MDL Framework for Data Clustering. Advances in Minimum Description Length: Theory and Applications. MIT Press, 323.
[12]
J. W. Miller and M. T. Harrison 2013. A simple example of Dirichlet process mixture inconsistency for the number of components Advances in neural information processing systems. 199--206.
[13]
J. Rissanen. 1998. Stochastic complexity in statistical inquiry. Vol. Vol. 15. World Scientific.
[14]
J. Rissanen. 2012. Optimal Estimation of Parameters:. Cambridge University Press.
[15]
J. J. Rissanen. 1996. Fisher information and stochastic complexity. IEEE Transactions on Information Theory Vol. 42, 1 (1996), 40--47.
[16]
M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P.S Smyth 2004. The author-topic model for authors and documents. Proceedings of the 20th conference on Uncertainty in artificial intelligence. AUAI Press, 487--494.
[17]
Y. Sakai and K. Yamanishi 2013. An NML-based model selection criterion for general relational data modeling 2013 IEEE International Conference on Big Data. IEEE, 421--429.
[18]
G. Schwarz. 1978. Estimating the dimension of a model. The Annals of Statistics Vol. 6, 2 (1978), 461--464.
[19]
Y. M. Shtar'kov. 1987. Universal sequential coding of single messages. Problemy Peredachi Informatsii Vol. 23, 3 (1987), 3--17.
[20]
T. A. B. Snijders and K. Nowicki 1997. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification Vol. 14, 1 (1997), 75--100.
[21]
M. Taddy 2012. On Estimation and Selection for Topic Models. In AISTATS. 1184--1193.
[22]
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. 2012. Hierarchical dirichlet processes. J. Amer. Statist. Assoc. Vol. 101, 476 (2012), 1566--1581.
[23]
H. M. Wallach, D. M. Mimno, and A. McCallum. 2009. Rethinking LDA: Why priors matter. In Advances in neural information processing systems. 1973--1981.
[24]
T. Wu, S. Sugawara, and K. Yamanishi 2017. Supplemental materials for "Decomposed Normalized Maximum Likelihood Codelength Criterion for Selecting Hierarchical Latent Variable Models". (2017). shownotehttps://sites.google.com/site/shinyasugawara2012/wu17_supp2.pdf.endthebibliography

Cited By

View all
  • (2023)Latent Variable Model SelectionLearning with the Minimum Description Length Principle10.1007/978-981-99-1790-7_4(137-183)Online publication date: 15-Sep-2023
  • (2022)Mixture Complexity and Its Application to Gradual Clustering Change DetectionEntropy10.3390/e2410140724:10(1407)Online publication date: 1-Oct-2022
  • (2022)Graph Summarization with Latent Variable Probabilistic ModelsComplex Networks & Their Applications X10.1007/978-3-030-93413-2_36(428-440)Online publication date: 1-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. model selection
  3. topic and latent variable models

Qualifiers

  • Research-article

Funding Sources

  • JST CREST

Conference

KDD '17
Sponsor:

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Latent Variable Model SelectionLearning with the Minimum Description Length Principle10.1007/978-981-99-1790-7_4(137-183)Online publication date: 15-Sep-2023
  • (2022)Mixture Complexity and Its Application to Gradual Clustering Change DetectionEntropy10.3390/e2410140724:10(1407)Online publication date: 1-Oct-2022
  • (2022)Graph Summarization with Latent Variable Probabilistic ModelsComplex Networks & Their Applications X10.1007/978-3-030-93413-2_36(428-440)Online publication date: 1-Jan-2022
  • (2021)Summarizing Finite Mixture Model with Overlapping QuantificationEntropy10.3390/e2311150323:11(1503)Online publication date: 13-Nov-2021
  • (2020)Minimum description length revisitedInternational Journal of Mathematics for Industry10.1142/S266133521930001811:01Online publication date: 12-Mar-2020
  • (2020)Detecting Hierarchical Changes in Latent Variable Models2020 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM50108.2020.00120(1028-1033)Online publication date: Nov-2020
  • (2020)Long-tailed distributions of inter-event times as mixtures of exponential distributionsRoyal Society Open Science10.1098/rsos.1916437:2(191643)Online publication date: 26-Feb-2020
  • (2019)Modern MDL meets Data Mining Insights, Theory, and PracticeProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3332284(3229-3230)Online publication date: 25-Jul-2019
  • (2019)The decomposed normalized maximum likelihood code-length criterion for selecting hierarchical latent variable modelsData Mining and Knowledge Discovery10.1007/s10618-019-00624-433:4(1017-1058)Online publication date: 1-Jul-2019
  • (2017)Latent Dimensionality Estimation for Probabilistic Canonical Correlation Analysis Using Normalized Maximum Likelihood Code-Length2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2017.39(716-725)Online publication date: Oct-2017

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media