Abstract
The Bayesian Information Criterion (BIC) and the Minimum Description Length Principle (MDL) have been widely proposed as good metrics for model selection. Such scores basically include two terms: one for accuracy and the other for complexity. Their philosophy is to find a model that rightly balances these terms. However, it is surprising that both metrics do often not work very well in practice for they overfit the data. In this paper, we present an analysis of the BIC and MDL scores using the framework of Bayesian networks that supports such a claim. To this end, we carry out different tests that include the recovery of gold-standard network structures as well as the construction and evaluation of Bayesian network classifiers. Finally, based on these results, we discuss the disadvantages of both metrics and propose some future work to examine these limitations more deeply.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Heckerman, D.: A Tutorial on Learning with Bayesian Networks. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 301–354. MIT Press, Cambridge (1998)
Grunwald, P.: Tutorial on MDL. In: Grunwald, P., Myung, I.J., Pitt, M.A. (eds.) Advances in Minimum Description Length: Theory and Applications, MIT Press, Cambridge (2005)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian Network Classifiers. Machine Learning 29, 131–163 (1997)
Lam, W., Bacchus: Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence 10(4) (1994)
Grunwald, P.: Model Selection Based on Minimum Description Length. Journal of Mathematical Psychology 44, 133–152 (2000)
Suzuki, J.: Learning Bayesian Belief Networks based on the MDL principle: An efficient algorithm using the branch and bound technique. In: International Conference on Machine Learning, Bary, Italy (1996)
Suzuki, J.: Learning Bayesian Belief Networks based on the Minimum Description Length Principle: Basic Properties. IEICE Transactions on Fundamentals E82-A(10), 2237–2245 (1999)
Cooper, G.F.: An Overview of the Representation and Discovery of Causal Relationships using Bayesian Networks. In: Glymour, C., Cooper, G.F. (eds.) Computation, Causation & Discovery, pp. 3–62. AAAI Press / MIT Press (1999)
Cooper, G.F., Herskovits, E.: A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning 9, 309–347 (1992)
Cheng, J.: Learning Bayesian Networks from data: An information theory based approach. In: Faculty of Informatics, University of Ulster, United Kingdom, University of Ulster: Jordanstown, United Kingdom (1998)
Friedman, N., Goldszmidt, M.: Learning Bayesian Networks from Data, University of California, Berkeley and Stanford Research Institute, p. 117 (1998)
Cheng, J., Bell, D.A., Liu, W.: Learning Belief Networks from Data: An Information Theory Based Approach. In: Sixth ACM International Conference on Information and Knowledge Management, ACM, New York (1997)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction and Search. In: Berger, J., et al. (eds.), 1st edn. Lecture Notes in Statistics, vol. 81, p. 526. Springer, Heidelberg (1993)
Bozdogan, H.: Akaike’s Information Criterion and Recent Developments in Information Complexity. Journal of Mathematical Psychology 44, 62–91 (2000)
Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian Networks: The combination of knowledge and statistical data. Machine Learning 20, 197–243 (1995)
Cruz-Ramirez Nicandro, N.-F.L., Gabriel, A.-M.H., Erandi, B.-M., Efrain, R.-M.J.: A Parsimonious Constraint-based Algorithm to Induce Bayesian Network Structures from Data. In: IEEE Proceedings of the Mexican International Conference on Computer Science ENC 2005, pp. 306–313. IEEE, Puebla (2005)
Cheng, J., Greiner, R.: Learning Bayesian Belief Network Classifiers: Algorithms and Systems. In: Proceedings of the Canadian Conference on Artificial Intelligence (CSCSI 2001), Ottawa, Canada (2001)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Chichester (2001)
Chickering, D.M.: Learning Bayesian Networks from Data. In: Computer Science, Cognitive Systems Laboratory, University of California, Los Angeles, California, p. 172 (1996)
Spiegelhalter, D.J., et al.: Bayesian Analysis in Expert Systems. Statistical Science 8(3), 219–247 (1993)
Norsys, http://www.norsys.com
Murphy, P.M., Aha, D.W.: UCI repository of Machine Learning Databases (1995)
Kurgan, L.A., Cios, K.J.: CAIM Discretization Algorithm. IEEE Transactions on Knowledge and Data Engineering 16(2), 145–153 (2004)
Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: 14th International Joint Conference on Artificial Intelligence IJCAI 1995, Morgan Kaufmann, Montreal, Canada (1995a)
Cheng, J., Greiner, R.: Comparing Bayesian Network Classifiers. In: Fifteenth Conference on Uncertainty in Artificial Intelligence (1999)
Spirtes, P., Meek, C.: Learning Bayesian Networks with Discrete Variables from Data. In: First International Conference on Knowledge Discovery and Data Mining (1995)
Singh, M., Valtorta, Marco: An Algorithm for the Construction of Bayesian Network Structures from Data. In: 9th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, San Francisco (1993)
Singh, M., Valtorta, M.: Construction of Bayesian Network Structures from Data: a Brief Survey and an Efficient Algorithm. International Journal of Approximate Reasoning 12, 111–131 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cruz-Ramírez, N., Acosta-Mesa, HG., Barrientos-Martínez, RE., Nava-Fernández, LA. (2006). How Good Are the Bayesian Information Criterion and the Minimum Description Length Principle for Model Selection? A Bayesian Network Analysis. In: Gelbukh, A., Reyes-Garcia, C.A. (eds) MICAI 2006: Advances in Artificial Intelligence. MICAI 2006. Lecture Notes in Computer Science(), vol 4293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11925231_46
Download citation
DOI: https://doi.org/10.1007/11925231_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49026-5
Online ISBN: 978-3-540-49058-6
eBook Packages: Computer ScienceComputer Science (R0)