research-article

Decomposed Normalized Maximum Likelihood Codelength Criterion for Selecting Hierarchical Latent Variable Models

Authors:

Shinya Sugawara,

Kenji YamanishiAuthors Info & Claims

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1165 - 1174

https://doi.org/10.1145/3097983.3098110

Published: 04 August 2017 Publication History

Abstract

We propose a new model selection criterion based on the minimum description length principle in a name of the decomposed normalized maximum likelihood criterion. Our criterion can be applied to a large class of hierarchical latent variable models, such as the Naive Bayes models, stochastic block models and latent Dirichlet allocations, for which many conventional information criteria cannot be straightforwardly applied due to irregularity of latent variable models. Our method also has an advantage that it can be exactly evaluated without asymptotic approximation with small time complexity. Our experiments using synthetic and real data demonstrated validity of our method in terms of computational efficiency and model selection accuracy, while our criterion especially dominated the other criteria when sample size is small and when data are noisy.

References

[1]

E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research Vol. 9 (2008), 1981--2014.

Digital Library

[2]

H. Akaike. 1974. A new look at the statistical model identification. IEEE Trans. on Automatic Control Vol. 19, 6 (1974), 716--723.

[3]

C. C. Ana. 2007. Improving Methods for Single-label Text Categorization. PhD Thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa. (2007).

[4]

D. M. Blei and M. I. Jordan 2003. Modeling annotated data. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 127--134.

Digital Library

[5]

D. M Blei, A. Y. Ng, and M. I. Jordan 2003. Latent dirichlet allocation. Journal of Machine Learning Research Vol. 3 (2003), 993--1022.

Digital Library

[6]

J. J. Daudin, F. Picard, and S. Robin 2008. A mixture model for random graphs. Statistics and Computing Vol. 18, 2 (2008), 173--183.

Digital Library

[7]

S. Hirai and K. Yamanishi 2013. Efficient computation of normalized maximum likelihood codes for Gaussian mixture models with its applications to clustering. IEEE Transactions on Information Theory Vol. 59, 11 (2013), 7718--7727.

Digital Library

[8]

Y. Ito, S. Oeda, and K. Yamanishi 2016. Rank Selection for Non-negative Matrix Factorization with Normalized Maximum Likelihood Coding. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 720--728.

[9]

C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda 2006. Learning systems of concepts with an infinite relational model AAAI, Vol. Vol. 3. 5.

[10]

P. Kontkanen and P. Myllymäki 2007. A linear-time algorithm for computing the multinomial stochastic complexity. Inform. Process. Lett. Vol. 103, 6 (2007), 227--233.

Digital Library

[11]

P. Kontkanen, P. Myllym"aki, W. Buntine, J. Rissanen, and H. Tirri 2005. An MDL Framework for Data Clustering. Advances in Minimum Description Length: Theory and Applications. MIT Press, 323.

[12]

J. W. Miller and M. T. Harrison 2013. A simple example of Dirichlet process mixture inconsistency for the number of components Advances in neural information processing systems. 199--206.

[13]

J. Rissanen. 1998. Stochastic complexity in statistical inquiry. Vol. Vol. 15. World Scientific.

[14]

J. Rissanen. 2012. Optimal Estimation of Parameters:. Cambridge University Press.

[15]

J. J. Rissanen. 1996. Fisher information and stochastic complexity. IEEE Transactions on Information Theory Vol. 42, 1 (1996), 40--47.

Digital Library

[16]

M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P.S Smyth 2004. The author-topic model for authors and documents. Proceedings of the 20th conference on Uncertainty in artificial intelligence. AUAI Press, 487--494.

Digital Library

[17]

Y. Sakai and K. Yamanishi 2013. An NML-based model selection criterion for general relational data modeling 2013 IEEE International Conference on Big Data. IEEE, 421--429.

[18]

G. Schwarz. 1978. Estimating the dimension of a model. The Annals of Statistics Vol. 6, 2 (1978), 461--464.

[19]

Y. M. Shtar'kov. 1987. Universal sequential coding of single messages. Problemy Peredachi Informatsii Vol. 23, 3 (1987), 3--17.

[20]

T. A. B. Snijders and K. Nowicki 1997. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification Vol. 14, 1 (1997), 75--100.

[21]

M. Taddy 2012. On Estimation and Selection for Topic Models. In AISTATS. 1184--1193.

[22]

Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. 2012. Hierarchical dirichlet processes. J. Amer. Statist. Assoc. Vol. 101, 476 (2012), 1566--1581.

[23]

H. M. Wallach, D. M. Mimno, and A. McCallum. 2009. Rethinking LDA: Why priors matter. In Advances in neural information processing systems. 1973--1981.

[24]

T. Wu, S. Sugawara, and K. Yamanishi 2017. Supplemental materials for "Decomposed Normalized Maximum Likelihood Codelength Criterion for Selecting Hierarchical Latent Variable Models". (2017). shownotehttps://sites.google.com/site/shinyasugawara2012/wu17_supp2.pdf.endthebibliography

Cited By

Yamanishi KYamanishi K(2023)Latent Variable Model SelectionLearning with the Minimum Description Length Principle10.1007/978-981-99-1790-7_4(137-183)Online publication date: 15-Sep-2023
https://doi.org/10.1007/978-981-99-1790-7_4
Kyoya SYamanishi K(2022)Mixture Complexity and Its Application to Gradual Clustering Change DetectionEntropy10.3390/e2410140724:10(1407)Online publication date: 1-Oct-2022
https://doi.org/10.3390/e24101407
Fukushima SKanai RYamanishi K(2022)Graph Summarization with Latent Variable Probabilistic ModelsComplex Networks & Their Applications X10.1007/978-3-030-93413-2_36(428-440)Online publication date: 1-Jan-2022
https://doi.org/10.1007/978-3-030-93413-2_36
Show More Cited By

Index Terms

Decomposed Normalized Maximum Likelihood Codelength Criterion for Selecting Hierarchical Latent Variable Models

Recommendations

The decomposed normalized maximum likelihood code-length criterion for selecting hierarchical latent variable models

We propose a new model selection criterion based on the minimum description length principle in a name of the decomposed normalized maximum likelihood (DNML) criterion. Our criterion can be applied to a large class of hierarchical latent variable models,...
A Comparative Analysis of Latent Variable Models for Web Page Classification
LA-WEB '08: Proceedings of the 2008 Latin American Web Conference

A main challenge for Web content classification is how to model the input data. This paper discusses the application of two text modeling approaches, Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA), in the Web page classification ...
Fine granular aspect analysis using latent structural models
ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2

In this paper, we present a structural learning model for joint sentiment classification and aspect analysis of text at various levels of granularity. Our model aims to identify highly informative sentences that are aspect-specific in online custom ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2017

2240 pages

ISBN:9781450348874

DOI:10.1145/3097983

General Chairs:
Stan Matwin
Dalhousie University
,
Shipeng Yu
LinkedIn
,
Faisal Farooq
IBM

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

JST CREST

Conference

KDD '17

Sponsor:

KDD '17: The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2017

NS, Halifax, Canada

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
373
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yamanishi KYamanishi K(2023)Latent Variable Model SelectionLearning with the Minimum Description Length Principle10.1007/978-981-99-1790-7_4(137-183)Online publication date: 15-Sep-2023
https://doi.org/10.1007/978-981-99-1790-7_4
Kyoya SYamanishi K(2022)Mixture Complexity and Its Application to Gradual Clustering Change DetectionEntropy10.3390/e2410140724:10(1407)Online publication date: 1-Oct-2022
https://doi.org/10.3390/e24101407
Fukushima SKanai RYamanishi K(2022)Graph Summarization with Latent Variable Probabilistic ModelsComplex Networks & Their Applications X10.1007/978-3-030-93413-2_36(428-440)Online publication date: 1-Jan-2022
https://doi.org/10.1007/978-3-030-93413-2_36
Kyoya SYamanishi K(2021)Summarizing Finite Mixture Model with Overlapping QuantificationEntropy10.3390/e2311150323:11(1503)Online publication date: 13-Nov-2021
https://doi.org/10.3390/e23111503
Grünwald PRoos T(2020)Minimum description length revisitedInternational Journal of Mathematics for Industry10.1142/S266133521930001811:01Online publication date: 12-Mar-2020
https://doi.org/10.1142/S2661335219300018
Fukushima SYamanishi K(2020)Detecting Hierarchical Changes in Latent Variable Models2020 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM50108.2020.00120(1028-1033)Online publication date: Nov-2020
https://doi.org/10.1109/ICDM50108.2020.00120
Okada MYamanishi KMasuda N(2020)Long-tailed distributions of inter-event times as mixtures of exponential distributionsRoyal Society Open Science10.1098/rsos.1916437:2(191643)Online publication date: 26-Feb-2020
https://doi.org/10.1098/rsos.191643
Vreeken JYamanishi KTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)Modern MDL meets Data Mining Insights, Theory, and PracticeProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3332284(3229-3230)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3332284
Yamanishi KWu TSugawara SOkada M(2019)The decomposed normalized maximum likelihood code-length criterion for selecting hierarchical latent variable modelsData Mining and Knowledge Discovery10.1007/s10618-019-00624-433:4(1017-1058)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1007/s10618-019-00624-4
Nakmaura TIwata TYamanishi K(2017)Latent Dimensionality Estimation for Probabilistic Canonical Correlation Analysis Using Normalized Maximum Likelihood Code-Length2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2017.39(716-725)Online publication date: Oct-2017
https://doi.org/10.1109/DSAA.2017.39

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents