Article

Free access

The Bayesian structural EM algorithm

Author:

Nir FriedmanAuthors Info & Claims

UAI'98: Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Pages 129 - 138

Published: 24 July 1998 Publication History

PDF eReader Publisher Site

Abstract

In recent years there has been a flurry of works on learning Bayesian networks from data. One of the hard problems in this area is how to effectively learn the structure of a belief network from incomplete data--that is, in the presence of missing values or hidden variables. In a recent paper, I introduced an algorithm called Structural EM that combines the standard Expectation Maximization (EM) algorithm, which optimizes parameters, with structure search for model selection. That algorithm learns networks based on penalized likelihood scores, which include the BIC/MDL score and various approximations to the Bayesian score. In this paper, I extend Structural EM to deal directly with Bayesian model selection. I prove the convergence of the resulting algorithm and show how to apply it for learning a large class of probabilistic models, including Bayesian networks and some variants thereof.

References

[1]

M. Abramowitz and I. A. Stegun, eds. Handbook of Mathematical Functions. 1964.

[2]

I. Beinlich, G. Suermondt, R. Chavez, and G. Cooper. The ALARM monitoring system. In Proc. 2'nd Euro. Conf. on AI and Medicine, 1989.

[3]

J. Binder, D. Koller, S. Russell, and K. Kanazawa. Adaptive probabilistic networks with hidden variables. Machine Learning, 29:213-244,1997.

Digital Library

[4]

C. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller. Context-specific independence in Bayesian networks. In UAI '96, pp. 115-123. 1996.

Digital Library

[5]

W. Buntine. Learning classification trees. In D. J. Hand, ed., AI & Stats 3, 1993.

[6]

P. Cheeseman and J. Stutz Bayesian classification (AutoClass): Theory and results. In Advances in Knowledge Discoveryand Data Mining, pp. 153-180, 1995.

Digital Library

[7]

D. M. Chickering and D. Heckem. Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Machine Learning, 29:181-212, 1997.

Digital Library

[8]

D. M. Cbickering, D. Heckerman, and C. Meek. A Bayesian approach to learning Bayesian networks with local structure. In UAI '97, pp. 80-89, 1997.

Digital Library

[9]

G. F. Cooper and E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309-347, 1992.

Digital Library

[10]

M. H. DeGroot. Optimal Statistical Decisions, 1970.

[11]

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc., B 39:1-39, 1977.

[12]

N. Friedman. Learning Bayesian networks in the presence of missing values and hidden variables. In ML '97. 1997.

Digital Library

[13]

N. Friedman and M. Goldszmidt. Learning Bayesian networks with local smcture. In M. I. Jordan, ed., Learning in Graphical Models, 1998. A preliminary version appeared in UAI '96.

Digital Library

[14]

D. Geiger and D. Heckennan. Knowledge representation and inference in similarity networks and Bayesian multinets. Artificial Intelligence, 82:45-74, 1996.

Digital Library

[15]

D. Geiger, D. Heckennan, and C. Meek, Asymptotic model selection for directed graphs with hidden variables. In UAI '96, pp. 283-290. 1996.

Digital Library

[16]

D. Heckerman. A tutorial on learning Bayesian networks. In M. I. Jordan, ed., Learning in Graphical Models, 1998.

Digital Library

[17]

D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20:197-243, 1995.

Digital Library

[18]

W. Lam and F. Bacchus. Learning Bayesian belief networks: An approach based on the MDL ptinciple. Computational Intelligence, 10:269-293, 1994.

[19]

S. L. Lauritzen. The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis, 19:191-201, 1995.

Digital Library

[20]

D. J. C. MacKay. Ensemble learning for hidden Markov models. Unpublished manuscript, http://wol.ra.phy.cam.ac.uk/mackay, 1997.

[21]

M. Meila and M. I. Jordan. Estimating dependency structure as a hidden variable. In NIPS 10. 1998.

Digital Library

[22]

J. Pearl. Probabilistic Reasoning in Intelligent Systems, 1988.

Digital Library

[23]

D. R. Rubin. Inference and missing data. Biometrica, 63:581-592, 1976.

[24]

L. Saul, T. Jaakkola, and M. Jordan. Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 4:61-76, 1996.

[25]

M. Sineh. Learning Bayesian networks from incomulete data. In AAAI '97, pp. 27-31. 1997.

Digital Library

[26]

P. Spirtes, C. Glymour, and R. Scheines. Causation, prediction, and search, 1993.

[27]

B. Thiesson, C. Meek, D. M. Chickering, and D. Heckem. Learning mixtures of Bayesian networks. In UAI '98, 1998.

Cited By

Nguyen VYang Yde Campos CEvans RShpitser I(2023)Probabilistic multi-dimensional classificationProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625977(1522-1533)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625977
Gao ENg IGong MShen LHuang WLiu TZhang KBondell HKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)MissDAGProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600633(5024-5038)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600633
Son P(2020)Reasoned bargaining protocol in construction contracts using a novel Bayesian gameInternational Journal of Computer Applications in Technology10.1504/ijcat.2020.10468962:2(148-157)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1504/ijcat.2020.104689
Show More Cited By

Index Terms

The Bayesian structural EM algorithm
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Probabilistic reasoning
      2. Vagueness and fuzzy logic
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory

Recommendations

An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering
Structural learning of mixed noisy-OR Bayesian networks
Abstract
In this paper we discuss learning Bayesian networks whose conditional probability tables are either Noisy-OR models or general conditional probability tables. We refer to these models as Mixed Noisy-OR Bayesian Networks. To learn their ...
A gradient-based algorithm competitive with variational Bayesian EM for mixture of Gaussians
IJCNN'09: Proceedings of the 2009 international joint conference on Neural Networks

While variational Bayesian (VB) inference is typically done with the so called VB EM algorithm, there are models where it cannot be applied because either the E-step or the M-step cannot be solved analytically. In 2007, Honkela et al. introduced a ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

UAI'98: Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

July 1998

538 pages

ISBN:155860555X

Editors:
Gregory F. Cooper
University of Pittsburgh, Pittsburgh, Pennsylvania
,
Serafín Moral
Universidad de Granada, Granada, Spain

Sponsors

NEC
HUGIN: Hugin Expert A/S
Information Extraction and Transportation
Microsoft Research: Microsoft Research
AT&T: AT&T Labs Research

Publisher

Morgan Kaufmann Publishers Inc.

San Francisco, CA, United States

Publication History

Published: 24 July 1998

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

79
Total Citations
View Citations
193
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)5

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nguyen VYang Yde Campos CEvans RShpitser I(2023)Probabilistic multi-dimensional classificationProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625977(1522-1533)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625977
Gao ENg IGong MShen LHuang WLiu TZhang KBondell HKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)MissDAGProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600633(5024-5038)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600633
Son P(2020)Reasoned bargaining protocol in construction contracts using a novel Bayesian gameInternational Journal of Computer Applications in Technology10.1504/ijcat.2020.10468962:2(148-157)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1504/ijcat.2020.104689
Bongini MFreno ALaveglia VTrentin E(2018)Dynamic Hybrid Random Fields for the Probabilistic Graphical Modeling of Sequential DataNeural Processing Letters10.5555/3288065.328814348:2(733-768)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.5555/3288065.3288143
Scanagatta MCorani GZaffalon MYoo JKang U(2018)Efficient learning of bounded-treewidth Bayesian networks from complete and incomplete data setsInternational Journal of Approximate Reasoning10.1016/j.ijar.2018.02.00495:C(152-166)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1016/j.ijar.2018.02.004
Bueno MHommersom ALucas PLinard A(2017)Asymmetric hidden Markov modelsInternational Journal of Approximate Reasoning10.1016/j.ijar.2017.05.01188:C(169-191)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1016/j.ijar.2017.05.011
Fowkes JSutton CZimmermann TCleland-Huang JSu Z(2016)Parameter-free probabilistic API mining across GitHubProceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering10.1145/2950290.2950319(254-265)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1145/2950290.2950319
Fowkes JSutton CKrishnapuram BShah MSmola AAggarwal CShen DRastogi R(2016)A Subsequence Interleaving Model for Sequential Pattern MiningProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939787(835-844)Online publication date: 13-Aug-2016
https://dl.acm.org/doi/10.1145/2939672.2939787
Dalvi BMishra ACohen WBennett PJosifovski VNeville JRadlinski F(2016)Hierarchical Semi-supervised Classification with Incomplete Class HierarchiesProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835810(193-202)Online publication date: 8-Feb-2016
https://dl.acm.org/doi/10.1145/2835776.2835810
Micali SZhu Z(2016)Reconstructing Markov processes from independent and anonymous experimentsDiscrete Applied Mathematics10.1016/j.dam.2015.06.035200:C(108-122)Online publication date: 19-Feb-2016
https://dl.acm.org/doi/10.1016/j.dam.2015.06.035
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents