Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2074158.2074196guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article
Free access

Estimating continuous distributions in Bayesian classifiers

Published: 18 August 1995 Publication History

Abstract

When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.

References

[1]
Buntine, W. L. (1994), "Operations for learning with graphical models", Journal of Artificial Intelligence Research 2, 159-225.
[2]
Casella, G. & Berger, R. L. (1990), Statistical Inference, Wadsworth & Brooks/Cole.
[3]
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W. & Freeman, D. (1988), Autoclass: A Bayesian classification system, in "Machine Learning: Proceedings of the Fifth International Workshop", Morgan Kaufmann, pp. 54-64.
[4]
Clark, P. & Niblett, T. (1989), "The CN2 induction algorithm", Machine Learning 3(4), 261-83.
[5]
Cooper, G. F. & Herskovits, E. (1992), "A Bayesian method for the induction of probabilistic networks from data", Machine Learning 9(4), 309- 347.
[6]
Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977), "Maximum likelihood from incomplete data via the EM algorithm", Journal of the Royal Statistical Society B 39, 1-38.
[7]
Devroye, L. (1983), "The equivalence of weak, strong, and complete convergence in l 1 for kernel density estimates", The Annals of Statistics 11, 896-904.
[8]
Dougherty, J., Kohavi, R. & Sahami, M. (1995), Supervised and unsupervised discretization of continuous features, in "Machine Learning: Proceedings of the Twelfth International Conference", Morgan Kaufmann.
[9]
Geiger, D. & Heckerman, D. (1994), Learning Gaussian networks, in Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pp. 235-243.
[10]
Hastie, T. J. & Tibshirani, R. J. (1990), Generalized Additive Models, Chapman and Hall.
[11]
Heckerman, D., Geiger, D. & Chickering, D. (1994), Learning Bayesian networks: The combination of knowledge and statistical data, in Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pp. 293-301.
[12]
Izenman, A. J. (1991), "Recent developments in non-parametric density estimation", Journal of the Am. Stat. Assoc. 86(413), 205-223.
[13]
John, G., Kohavi, R. & Pfleger, K. (1994), Irrelevant features and the subset selection problem, in "Machine Learning: Proceedings of the Eleventh International Conference", Morgan Kaufmann, pp. 121-129.
[14]
Kibler, D. & Langley, P. (1988), Machine learning as an experimental science, in "Proceedings of the Third European Working Session on Learning", Pitman Publishing, London, UK, pp. 81-92.
[15]
Kononenko, I. (1991), Semi-naive Bayesian classifier, in "Proceedings of the Sixth European Working Session on Learning", Pittman, Porto, Portugal, pp. 206-219.
[16]
Kononenko, I. (1993), "Inductive and Bayesian learning in medical diagnosis", Applied Artificial Intelligence 7, 317-337.
[17]
Langley, P. & Sage, S. (1994), Induction of selective Bayesian classifiers, in "Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence", Morgan Kaufmann, Seattle, WA, pp. 399-406.
[18]
Langley, P., Iba, W. & Thompson, K. (1992), An analysis of Bayesian classifiers, in "Proceedings of the Tenth National Conference on Artificial Intelligence", pp. 223-228.
[19]
Murphy, P. M. & Aha, D. W. (1994), "UCI repository of machine learning databases", Available by anonymous ftp to ics.uci.edu in the pub/machine-learning-databases directory.
[20]
Pazzani, M. (1995), Searching for attribute dependencies in Bayesian classifiers, in "Fifth International Workshop on Artificial Intelligence and Statistics", pp. 424-429.
[21]
Provan, G. M. & Singh, M. (1995), Learning Bayesian networks using feature selection, in "Fifth International Workshop on Artificial Intelligence and Statistics", pp. 450-456.
[22]
Quinlan, J. R. (1995), C4.5: Programs for Machine Learning, Morgan Kaufmann.
[23]
Schalkoff, R. (1992), Pattern Recognition: Statistical, Structural, and Neural Approaches, Wiley.
[24]
Silverman, B. W. (1986), Density estimation for statistics and data analysis, Chapman and Hall.
[25]
Specht, D. F. & Romsdahl, H. (1994), Experience with adaptive pobabilistic neural networks and adaptive general regression neural networks, in "IEEE International Conference on Neural Networks", Orlando, FL.
[26]
Venables, W. N. & Ripley, B. D. (1994), Modern Applied Statistics with S-Plus, Springer-Verlag.

Cited By

View all
  • (2022)Evolutionary Measures for Object-oriented Projects and Impact on the Performance of Cross-version Defect PredictionProceedings of the 13th Asia-Pacific Symposium on Internetware10.1145/3545258.3545275(192-201)Online publication date: 11-Jun-2022
  • (2022)Anomaly Detection and Failure Root Cause Analysis in (Micro) Service-Based Cloud Applications: A SurveyACM Computing Surveys10.1145/350129755:3(1-39)Online publication date: 3-Feb-2022
  • (2022)On the Structure of the Boolean Satisfiability Problem: A SurveyACM Computing Surveys10.1145/349121055:3(1-34)Online publication date: 30-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
UAI'95: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
August 1995
590 pages
ISBN:1558603859

Sponsors

  • Rockwell Science Center: Rockwell Science Center
  • Lumina Decision Systems: Lumina Decision Systems, Inc.

Publisher

Morgan Kaufmann Publishers Inc.

San Francisco, CA, United States

Publication History

Published: 18 August 1995

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)211
  • Downloads (Last 6 weeks)28
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Evolutionary Measures for Object-oriented Projects and Impact on the Performance of Cross-version Defect PredictionProceedings of the 13th Asia-Pacific Symposium on Internetware10.1145/3545258.3545275(192-201)Online publication date: 11-Jun-2022
  • (2022)Anomaly Detection and Failure Root Cause Analysis in (Micro) Service-Based Cloud Applications: A SurveyACM Computing Surveys10.1145/350129755:3(1-39)Online publication date: 3-Feb-2022
  • (2022)On the Structure of the Boolean Satisfiability Problem: A SurveyACM Computing Surveys10.1145/349121055:3(1-34)Online publication date: 30-Mar-2022
  • (2021)An Empirical Study on Software Fault Prediction Using Product and Process MetricsInternational Journal of Information Technologies and Systems Approach10.4018/IJITSA.202101010414:1(62-78)Online publication date: 1-Jan-2021
  • (2021)A Constrained Feature Selection Approach Based on Feature Clustering and Hypothesis Margin MaximizationComputational Intelligence and Neuroscience10.1155/2021/55548732021Online publication date: 1-Jan-2021
  • (2021)The Impact of Dormant Defects on Defect Prediction: A Study of 19 Apache ProjectsACM Transactions on Software Engineering and Methodology10.1145/346789531:1(1-26)Online publication date: 28-Sep-2021
  • (2021)Fast Rotation Kernel Density Estimation over Data StreamsProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467356(892-902)Online publication date: 14-Aug-2021
  • (2020)An Incremental Kernel Density Estimator for Data Stream ComputationComplexity10.1155/2020/18035252020Online publication date: 1-Jan-2020
  • (2020)Energy Diagnosis of Android ApplicationsACM Computing Surveys10.1145/341798653:6(1-36)Online publication date: 6-Dec-2020
  • (2020)Capacitor-based Activity Sensing for Kinetic-powered Wearable IoTsACM Transactions on Internet of Things10.1145/33621241:1(1-26)Online publication date: 2-Mar-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media