Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers

Hsu, Chun-Nan; Huang, Hung-Ju; Wong, Tzu-Tsung

doi:10.1023/A:1026367023636

Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers

Published: December 2003

Volume 53, pages 235–263, (2003)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers

Download PDF

Chun-Nan Hsu¹,
Hung-Ju Huang² &
Tzu-Tsung Wong¹

784 Accesses
27 Citations
Explore all metrics

Abstract

In a naive Bayesian classifier, discrete variables as well as discretized continuous variables are assumed to have Dirichlet priors. This paper describes the implications and applications of this model selection choice. We start by reviewing key properties of Dirichlet distributions. Among these properties, the most important one is “perfect aggregation,” which allows us to explain why discretization works for a naive Bayesian classifier. Since perfect aggregation holds for Dirichlets, we can explain that in general, discretization can outperform parameter estimation assuming a normal distribution. In addition, we can explain why a wide variety of well-known discretization methods, such as entropy-based, ten-bin, and bin-log l, can perform well with insignificant difference. We designed experiments to verify our explanation using synthesized and real data sets and showed that in addition to well-known methods, a wide variety of discretization methods all perform similarly. Our analysis leads to a lazy discretization method, which discretizes continuous variables according to test data. The Dirichlet assumption implies that lazy methods can perform as well as eager discretization methods. We empirically confirmed this implication and extended the lazy method to classify set-valued and multi-interval data with a naive Bayesian classifier.

Article PDF

Handling Different Levels of Granularity within Naive Bayes Classifiers

Bayesian Classifier Modeling for Dirty Data

Robust Discrete Bayesian Classifier Under Covariate and Label Noise

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Almond, R. (1995). Graphical Belief Modelling. New York: Chapman and Hall.
Google Scholar
Azaiez, M. N. (1993). Perfect aggregation in reliability models with Babyesian updating. Ph.D. thesis. Department of Industrial Engineering, University of Wisconsin-Madison, Madison, Wisconsin.
Google Scholar
Blake, C., &; Merz, C. (1998). UCI repository of machine learning databases.
Cestnik, B., &; Bratko, I. (1991). On estimating probabilities in tree pruning. In Machine Learning-EWSL-91, European Working Session on Learning (pp. 138–150). Berlin, Germany: Springer-Verlag.
Google Scholar
Domingos, P., &; Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.
Google Scholar
Dougherty, J., Kohavi, R., and Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Machine Learning: Proceedings of the 12th International Conference (ML '95). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Duda, R. O., &; Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York: Wiley and Sons.
Google Scholar
Fayyad, U. M., &; Irani, K. B. (1993). Multi-interval discretization of continuous valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI '93) (pp. 1022–1027).
Heckerman, D. (1998). A tutorial on learning with Bayesian networks. In M. I. Jordan (ed.), Learning in Graphical Models (pp. 301–354). Boston: Kluwer Academic Publishers.
Google Scholar
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–90.
Google Scholar
Iwasa, Y., Levin, S., &; Andreasen, V. (1987). Aggregation in model ecosystem: Perfect aggregation. Ecological Modeling, 37, 287–302.
Google Scholar
John, G., &; Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence (UAI '95) (pp. 338–345).
Kohavi, R., &; Sahami, M. (1996). Error-based and entropy-based discretization of continuous features. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD '96) (pp. 114–119). Portland, OR.
Langley and Thompson. (1992). An analysis of Bayesian classifier. In Proceedings of the 10th National Conference on Artificial Intelligence (pp. 223–228). Portland, OR: AAAI Press.
Google Scholar
McClave, J. T., &; Dietrich, F. H. (1991). Statistics. San Francisco: Dellen Publishing Company.
Google Scholar
Monti, S., &; Cooper, G. F. (1999). A bayesian network classifier that combines a finite mixture model and a naive bayes model. In The fifteenth Conference on Uncertainty in Artificial Intelligence (UAI '99) (pp. 447–456).
Spector, P. (1990). An Introduction to S and S-PLUS. Duxbury Press.
Wilks, S. S. (1962). Mathematical Statistics. New York: Wiley and Sons.
Google Scholar
Wong, T.-T. (1998). Perfect aggregation in dependent Bernoulli systems with Bayesian updating. Ph.D. thesis, Department of Industrial Engineering, University of Wisconsin-Madison, Madison, Wisconsin.
Google Scholar

Download references

Author information

Authors and Affiliations

Academia Sinica, Nankang, Institute of Information Science, Taipei City, 115, Taiwan
Chun-Nan Hsu & Tzu-Tsung Wong
Department of Computer and Information Science, National Chiao-Tung University, Hsinchu City, 300, Taiwan
Hung-Ju Huang

Authors

Chun-Nan Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Hung-Ju Huang
View author publications
You can also search for this author in PubMed Google Scholar
Tzu-Tsung Wong
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsu, CN., Huang, HJ. & Wong, TT. Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers. Machine Learning 53, 235–263 (2003). https://doi.org/10.1023/A:1026367023636

Download citation

Issue Date: December 2003
DOI: https://doi.org/10.1023/A:1026367023636

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers

Abstract

Article PDF

Similar content being viewed by others

Handling Different Levels of Granularity within Naive Bayes Classifiers

Bayesian Classifier Modeling for Dirty Data

Robust Discrete Bayesian Classifier Under Covariate and Label Noise

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers

Abstract

Article PDF

Similar content being viewed by others

Handling Different Levels of Granularity within Naive Bayes Classifiers

Bayesian Classifier Modeling for Dirty Data

Robust Discrete Bayesian Classifier Under Covariate and Label Noise

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation