Abstract
Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect modeling assumptions are possibly made, uniform averaging sometimes performs even better than bayesian model averaging. Linear mixtures over sets of models provide an space that includes uniform averaging as a particular case. We develop two algorithms for learning maximum a posteriori weights for linear mixtures, based on expectation maximization and on constrained optimizition. We provide a nontrivial example of the utility of these two algorithms by applying them for one dependence estimators. We develop the conjugate distribution for one dependence estimators and empirically show that uniform averaging is clearly superior to Bayesian model averaging for this family of models. After that we empirically show that the maximum a posteriori linear mixture weights improve accuracy significantly over uniform aggregation.
Chapter PDF
Similar content being viewed by others
References
Bouchard, G., Triggs, B.: The tradeoff between generative and discriminative classifiers. In: IASC International Symposium on Computational Statistics (COMPSTAT), Prague, August 2004, pp. 721–728 (2004)
Cerquides, J., López de Mántaras, R.: Tan classifiers based on decomposable distributions. Machine Learning- Special Issue on Graphical Models for Classification 59(3), 323–354 (2005)
Clarke, B.: Comparing bayes model averaging and stacking when model approximation error cannot be ignored. Journal of Machine Learning Research 4, 683–712 (2003)
Dash, D., Cooper, G.F.: Model averaging for prediction with discrete bayesian networks. Journal of Machine Learning Research 5, 1177–1203 (2004)
Dawes, R.: The robust beauty of improper linear models. American Psychologist 34, 571–582 (1979)
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Domingos, P.: Bayesian averaging of classifiers and the overfitting problem. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 223–230 (2000)
Fawcett, T.: Roc graphs: Notes and practical considerations for data mining researchers. Technical Report HPL-2003-4, HP Laboratories Palo Alto (2003)
Friedman, J.: Importance sampling: An alternative view of ensemble learning. In: Workshop on Data Mining Methodology and Applications (October 2004)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29, 131–163 (1997)
Genest, C., McConway, K.: Allocating the weights in the linear opinion pool. Journal of Forecasting 9, 53–73 (1990)
Genest, C., Zidek, J.: Combining probability distributions: A critique and an annotated bibliography. Statistical Science 1(1), 114–148 (1986)
Ghahramani, Z., Kim, H.-C.: Bayesian classifier combination. Gatsby Technical report (2003)
Gill, P., Murray, W., Saunders, M., Wright, M.: Constrained nonlinear programming. In: Nemhauser, G., Rinnooy Kan, A., Todd, M. (eds.) Optimization, Handbooks in Operations Research and Management Science. North-Holland, Amsterdam (1989)
Greiner, R., Su, X., Shen, B., Zhou, W.: Structural extension to logistic regression: Discriminant parameter learning of belief net classifiers. Machine Learning - Special Issue on Graphical Models for Classification 59(3), 297–322 (2005)
Grossman, D., Domingos, P.: Learning bayesian network classifiers by maximizing conditional likelihood. In: Brodley, C.E. (ed.) ICML. ACM, New York (2004)
Gruenwald, P., Kontkanen, P., Myllymäki, P., Roos, T., Tirri, H., Wettig, H.: Supervised posterior distributions. Presented at the Seventh Valencia International Meeting on Bayesian Statistics, Tenerife, Spain (2002)
Hand, D., Till, R.: A simple generalization of the area under the roc curve to multiple class classification problems. Machine Learning 45(2), 171–186 (2001)
Hoeting, J., Madigan, D., Raftery, A., Volinsky, C.: Bayesian model averaging: A tutorial (with discussion). Statistical science 14, 382–401 (1999)
Hoeting, J., Madigan, D., Raftery, A., Volinsky, C.: Bayesian model averaging: A tutorial (with discussion) - correction. Statistical science 15, 193–195 (1999)
Ide, J., Cozman, F.: Generation of random bayesian networks with constraints on induced width, with applications to the average analysis od d-connectivity, quasi-random sampling, and loopy propagation. Technical report, University of Sao Paulo (June 2003)
Keogh, E., Pazzani, M.: Learning augmented bayesian classifiers: A comparison of distribution-based and classification-based approaches. In: Uncertainty 1999: The Seventh International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL (1999)
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions. Wiley, Chichester (1997)
McLachlan, G.J., Basford, K.E.: Mixture Models. Marcel Dekker, New York (1988)
Meila, M., Jordan, M.I.: Learning with mixtures of trees. Journal of Machine Learning Research 1, 1–48 (2000)
Meila-Predoviciu, M.: Learning with mixtures of trees. PhD thesis, Department of Electrical Engineering and Computer Science. MIT (1999)
Minka, T.: Bayesian model averaging is not model combination. MIT Media Lab note (December 2002)
Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, pp. 841–848. MIT Press, Cambridge (2002)
Pedregal, P.: Introduction to Optimization. Texts in Applied Mathematics, vol. 46. Springer, Heidelberg (2004)
Raina, R., Shen, Y., Ng, A.Y., McCallum, A.: Classification with hybrid generative/discriminative models. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)
Roos, T., Wettig, H., Grünwald, P., Myllymäki, P., Tirri, H.: On discriminative bayesian network classifiers and logistic regression. Machine Learning - Special Issue on Graphical Models for Classification 59(3), 267–296 (2005)
Sahami, M.: Learning limited dependence Bayesian classifiers. In: Second International Conference on Knowledge Discovery in Databases, pp. 335–338 (1996)
Thiesson, B., Meek, C., Chickering, D., Heckerman, D.: Learning mixtures of bayesian networks (1997)
Thiesson, B., Meek, C., Chickering, D., Heckerman, D.: Learning mixtures of dag models. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI 1998), pp. 504–513 (1998)
Ting, K., Witten, I.: Issues in stacked generalization. Journal of Artificial Intelligence Research 10, 271–289 (1999)
Webb, G.I., Boughton, J., Wang, Z.: Not so naive bayes: Aggregating one-dependence estimators. Machine Learning 58(1), 5–24 (2005)
Witten, I.H., Frank, E.: Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco (2000)
Zheng, Z., Webb, G.I.: Lazy learning of bayesian rules. Machine Learning 41(1), 53–84 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cerquides, J., de Mántaras, R.L. (2005). Robust Bayesian Linear Classifier Ensembles. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_12
Download citation
DOI: https://doi.org/10.1007/11564096_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29243-2
Online ISBN: 978-3-540-31692-3
eBook Packages: Computer ScienceComputer Science (R0)