Abstract
Automated statistical learning of graphical models from data has attained a considerable degree of interest in the machine learning and related literature. Many authors have discussed and/or demonstrated the need for consistent stochastic search methods that would not be as prone to yield locally optimal model structures as simple greedy methods. However, at the same time most of the stochastic search methods are based on a standard Metropolis–Hastings theory that necessitates the use of relatively simple random proposals and prevents the utilization of intelligent and efficient search operators. Here we derive an algorithm for learning topologies of graphical models from samples of a finite set of discrete variables by utilizing and further enhancing a recently introduced theory for non-reversible parallel interacting Markov chain Monte Carlo-style computation. In particular, we illustrate how the non-reversible approach allows for novel type of creativity in the design of search operators. Also, the parallel aspect of our method illustrates well the advantages of the adaptive nature of search operators to avoid trapping states in the vicinity of locally optimal network topologies.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Andersson SA, Madigan D, Perlman MD (1996) An alternative Markov property for chain graphs. In: Uncertainty in artificial intelligence: proceedings of the twelfth conference. Morgan Kaufmann, San Francisco, pp 40–48
Andersson SA, Madigan D and Perlman MD (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann Statist 25: 505–541
Andersson SA, Madigan D and Perlman MD (2001). Alternative Markov properties for chain graphs. Scand J Stat 28: 33–85
Chickering DM (1995) A transformational characterization of equivalent Bayesian network structures. In: Uncertainty in artificial intelligence: proceedings of the eleventh conference. Morgan Kaufmann, San Francisco, pp 87–98
Chickering DM (2002a). Learning equivalence classes of Bayesian network structures. J Mach Learn Res 2: 445–498
Chickering DM (2002b). Optimal structure identification with greedy search. J Mach Learn Res 3: 507–554
Cooper G and Hershkovitz E (1992). A bayesian method for the induction of probabilistic networks from data. Mach Learn 9: 309–347
Corander J (2003). Bayesian graphical model determination using decision theory. J Multivariate Anal 85: 253–266
Corander J, Gyllenberg M and Koski T (2006). Bayesian model learning based on parallel mcmc strategy. Stat Comput 16: 355–362
Cowell RG, Dawid AP, Lauritzen SL and Spiegelhalter DJ (1999). Probabilistic networks and expert systems. Springer, New York
Dawid AP (1979). Conditional independence in statistical theory. J Roy Stat Soc B 41: 1–31
Dawid AP and Lauritzen SL (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann Statist 21: 1272–1317
Dellaportas P and Forster J (1999). Markov chain monte carlo model determination for hierarchical and graphical log-linear models. Biometrika 86: 615–633
Durrett R (1996). Probability: theory and examples. Duxbury Press, CA
Frydenberg M (1990). The chain graph Markov property. Scand J Stat 17: 333–353
Frydenberg M and Lauritzen SL (1989). Decomposition of maximum likelihood in mixed graphical interaction models. Biometrika 76: 539–555
Geyer CJ and Thompson EA (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J Am Stat Assoc 90: 909–920
Gillispie SB, Perlman MD (2001) Enumerating Markov equivalence classes of acyclic digraph models. In: Uncertainty in artificial intelligence: proceedings of the seventeeth conference. Morgan Kaufmann, San Francisco, pp 171–177
Giudici P and Castelo R (2003). Improving Markov chain Monte Carlo model search for data mining. Mach Learn 50: 127–158
Giudici P and Green PJ (1999). Decomposable graphical Gaussian model determination. Biometrika 86: 785–801
Isaacson DL and Madsen RW (1976). Markov Chains: theory and applications. Wiley, New York
Janzura M and Nielsen J (2006). A simulated annealing-based method for learning Bayesian networks from statistical data. Int J Intell Syst 21: 335–348
Jones B, Carvalho C and Dobra A et al (2005). Experiments in stochastic computation for high-dimensional graphical models. Stat Sci 20: 388–400
Jordan MI (1998). Learning in graphical models. MIT Press, Cumberland
Koivisto M and Sood K (2004). Exact Bayesian structure discovery in Bayesian networks. J Mach Learn Res 5: 549–573
Lam W and Bacchus F (1994). Learning Bayesian belief networks: An approach based on the MDL principle. Comput Intell 10: 269–293
Madigan D, Andersson S, Perlman M and Volinsky C (1996). Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs. Communtat Theor Meth 25: 2493–2519
Madigan D and Raftery A (1994). Model selection and accounting for model uncertainly in graphicalmodels using Occam’s window. J Am Stat Assoc 89: 1535–1546
Peña JM (2007) Approximate counting of graphical models via MCMC. In: Proceedings of the 11th international conference on artificial intelligence, pp 352–359
Poli I and Roverato A (1998). A genetic algorithm for graphical model selection. J Italian Stat Soc 2: 197–208
Riggelsen C (2005). MCMC learning of Bayesian network models by markov blanket decomposition. Springer, New York
Robert C and Casella G (2004). Monte Carlo statistical methods, 2nd edn. Springer, New York
Roverato A and Studený M (2006). A graphical representation of equivalence classes of AMP chain graphs. J Mach Learn Res 7: 1045–1078
Sanguesa R and Cortes U (1997). Learning causal networks from data: a survey and a new algorithm to learn possibilistic causal networks from data.. AI Commun 4: 1–31
Spirtes P, Glymour C and Scheines R (1993). Causation, prediction and search. Springer, New York
Studený M (1998) Bayesian networks from the point of view of chain graphs. Uncertainty in Artificial Intelligence: In: proceedings of the twelfth conference. Morgan Kaufmann, San Francisco, pp 496–503
Sundberg R (1975). Some results about decomposable (or markov-type) models for multidimensional contingency tables: distribution of marginals and partitioning of tests. Scand J Stat 2: 771–779
Suzuki J (1996) Learning Bayesian belief networks based on the minimum description length principle. In: International Conference Machine on Learning, Morgan Kaufmann, San Francisco, pp 462–470
Suzuki J (2006). On strong consistency of model selection in classification. IEEE Trans Inform Theory 52: 4767–4774
van Laarhoven PJM, Aarts EHJ (1987). Simulated annealing: theory and applications. Kluwer, Norwell
Verma E, Pearl J (1990) Equivalence and synthesis of causal models. In: Uncertainty in artificial intelligence: proceedings of the sixth conference. Elsevier, New York, pp 220–227
Volf M and Studený M (1999). A graphical characterization of the largest chain graphs. Int J Approx Reason 20: 209–236
Wedelin D (1996). Efficient estimation and model selection in large graphical models. Stat Comput 6: 313–323
Whittaker J (1990). Graphical models in applied multivariate statistics. Wiley, Chichester
Wong F, Carter C and Kohn R (2003). Efficient estimation of covariance selection models. Biometrika 90: 809–830
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Charu Aggarwal.
Rights and permissions
About this article
Cite this article
Corander, J., Ekdahl, M. & Koski, T. Parallell interacting MCMC for learning of topologies of graphical models. Data Min Knowl Disc 17, 431–456 (2008). https://doi.org/10.1007/s10618-008-0099-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-008-0099-9