-
The scalable Birth-Death MCMC Algorithm for Mixed Graphical Model Learning with Application to Genomic Data Integration
Authors:
Nanwei Wang,
Laurent Briollais,
Helene Massam
Abstract:
Recent advances in biological research have seen the emergence of high-throughput technologies with numerous applications that allow the study of biological mechanisms at an unprecedented depth and scale. A large amount of genomic data is now distributed through consortia like The Cancer Genome Atlas (TCGA), where specific types of biological information on specific type of tissue or cell are avai…
▽ More
Recent advances in biological research have seen the emergence of high-throughput technologies with numerous applications that allow the study of biological mechanisms at an unprecedented depth and scale. A large amount of genomic data is now distributed through consortia like The Cancer Genome Atlas (TCGA), where specific types of biological information on specific type of tissue or cell are available. In cancer research, the challenge is now to perform integrative analyses of high-dimensional multi-omic data with the goal to better understand genomic processes that correlate with cancer outcomes, e.g. elucidate gene networks that discriminate a specific cancer subgroups (cancer sub-typing) or discovering gene networks that overlap across different cancer types (pan-cancer studies). In this paper, we propose a novel mixed graphical model approach to analyze multi-omic data of different types (continuous, discrete and count) and perform model selection by extending the Birth-Death MCMC (BDMCMC) algorithm initially proposed by \citet{stephens2000bayesian} and later developed by \cite{mohammadi2015bayesian}. We compare the performance of our method to the LASSO method and the standard BDMCMC method using simulations and find that our method is superior in terms of both computational efficiency and the accuracy of the model selection results. Finally, an application to the TCGA breast cancer data shows that integrating genomic information at different levels (mutation and expression data) leads to better subtyping of breast cancers.
△ Less
Submitted 8 May, 2020;
originally announced May 2020.
-
Bayesian model selection approach for colored graphical Gaussian models
Authors:
Qiong Li,
Xin Gao,
Helene Massam
Abstract:
We consider a class of colored graphical Gaussian models obtained by placing symmetry constraints on the precision matrix in a Bayesian framework. The prior distribution on the precision matrix is the colored $G$-Wishart prior which is the Diaconis-Ylvisaker conjugate prior. In this paper, we develop a computationally efficient model search algorithm which combines linear regression with a double…
▽ More
We consider a class of colored graphical Gaussian models obtained by placing symmetry constraints on the precision matrix in a Bayesian framework. The prior distribution on the precision matrix is the colored $G$-Wishart prior which is the Diaconis-Ylvisaker conjugate prior. In this paper, we develop a computationally efficient model search algorithm which combines linear regression with a double reversible jump Markov chain Monte Carlo (MCMC) method. The latter is to estimate the Bayes factors expressed as the ratio of posterior probabilities of two competing models. We also establish the asymptotic consistency property of the model selection procedure based on the Bayes factors. Our procedure avoids an exhaustive search which is computationally impossible. Our method is illustrated with simulations and a real-world application with a protein signalling data set.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Accelerating Bayesian Structure Learning in Sparse Gaussian Graphical Models
Authors:
Reza Mohammadi,
Helene Massam,
Gerard Letac
Abstract:
Gaussian graphical models are relevant tools to learn conditional independence structure between variables. In this class of models, Bayesian structure learning is often done by search algorithms over the graph space. The conjugate prior for the precision matrix satisfying graphical constraints is the well-known G-Wishart. With this prior, the transition probabilities in the search algorithms nece…
▽ More
Gaussian graphical models are relevant tools to learn conditional independence structure between variables. In this class of models, Bayesian structure learning is often done by search algorithms over the graph space. The conjugate prior for the precision matrix satisfying graphical constraints is the well-known G-Wishart. With this prior, the transition probabilities in the search algorithms necessitate evaluating the ratios of the prior normalizing constants of G-Wishart. In moderate to high-dimensions, this ratio is often approximated using sampling-based methods as computationally expensive updates in the search algorithm. Calculating this ratio so far has been a major computational bottleneck. We overcome this issue by representing a search algorithm in which the ratio of normalizing constant is carried out by an explicit closed-form approximation. Using this approximation within our search algorithm yields significant improvement in the scalability of structure learning without sacrificing structure learning accuracy. We study the conditions under which the approximation is valid. We also evaluate the efficacy of our method with simulation studies. We show that the new search algorithm with our approximation outperforms state-of-the-art methods in both computational efficiency and accuracy. The implementation of our work is available in the R package BDgraph.
△ Less
Submitted 16 July, 2021; v1 submitted 14 June, 2017;
originally announced June 2017.
-
Analyzing Genome-wide Association Study Data with the R Package genMOSS
Authors:
Matthew Friedlander,
Adrian Dobra,
Helene Massam,
Laurent Briollais
Abstract:
The R package (R Core Team (2016)) genMOSS is specifically designed for the Bayesian analysis of genome-wide association study data. The package implements the mode oriented stochastic search (MOSS) procedure as well as a simple moving window approach to identify combinations of single nucleotide polymorphisms associated with a response. The prior used in Bayesian computations is the generalized h…
▽ More
The R package (R Core Team (2016)) genMOSS is specifically designed for the Bayesian analysis of genome-wide association study data. The package implements the mode oriented stochastic search (MOSS) procedure as well as a simple moving window approach to identify combinations of single nucleotide polymorphisms associated with a response. The prior used in Bayesian computations is the generalized hyper Dirichlet.
△ Less
Submitted 22 November, 2016;
originally announced November 2016.
-
Approximate Bayesian estimation in large coloured graphical Gaussian models
Authors:
Qiong Li,
Xin Gao,
Helene Massam
Abstract:
Distributed estimation methods have recently been used to compute the maximum likelihood estimate of the precision matrix for large graphical Gaussian models. Our aim, in this paper, is to give a Bayesian estimate of the precision matrix for large graphical Gaussian models with, additionally, symmetry constraints imposed by an underlying graph which is coloured. We take the sample posterior mean o…
▽ More
Distributed estimation methods have recently been used to compute the maximum likelihood estimate of the precision matrix for large graphical Gaussian models. Our aim, in this paper, is to give a Bayesian estimate of the precision matrix for large graphical Gaussian models with, additionally, symmetry constraints imposed by an underlying graph which is coloured. We take the sample posterior mean of the precision matrix as our estimate. We study its asymptotic behaviour under the regular asymptotic regime when the number of variables p is fixed and under the double asymptotic regime when both p and n grow to infinity. We show in particular, that when the number of parameters of the local models is uniformly bounded, the standard convergence rate we obtain for the asymptotic consistency, in the Frobenius norm, of our estimate of the precision matrix compares well with the rates in the current literature for the maximum likelihood estimate.
△ Less
Submitted 26 May, 2016;
originally announced May 2016.
-
Bayesian precision matrix estimation for graphical Gaussian models with edge and vertex symmetries
Authors:
Helene Massam,
Qiong Li,
Xin Gao
Abstract:
Graphical Gaussian models with edge and vertex symmetries were introduced by \citet{HojLaur:2008} who also gave an algorithm to compute the maximum likelihood estimate of the precision matrix for such models. In this paper, we take a Bayesian approach to the estimation of the precision matrix. We consider only those models where the symmetry constraints are imposed on the precision matrix and whic…
▽ More
Graphical Gaussian models with edge and vertex symmetries were introduced by \citet{HojLaur:2008} who also gave an algorithm to compute the maximum likelihood estimate of the precision matrix for such models. In this paper, we take a Bayesian approach to the estimation of the precision matrix. We consider only those models where the symmetry constraints are imposed on the precision matrix and which thus form a natural exponential family with the precision matrix as the canonical parameter.
We first identify the Diaconis-Ylvisaker conjugate prior for these models and develop a scheme to sample from the prior and posterior distributions. We thus obtain estimates of the posterior mean of the precision matrix.
Second, in order to verify the precision of our estimate, we derive the explicit analytic expression of the expected value of the precision matrix when the graph underlying our model is a tree, a complete graph on three vertices and a decomposable graph on four vertices with various symmetries. In those cases, we compare our estimates with the exact value of the mean of the prior distribution. We also verify the accuracy of our estimates of the posterior mean on simulated data for graphs with up to thirty vertices and various symmetries.
△ Less
Submitted 13 June, 2015;
originally announced June 2015.
-
A local approach to estimation in discrete loglinear models
Authors:
Helene Massam,
Nanwei Wang
Abstract:
We consider two connected aspects of maximum likelihood estimation of the parameter for high-dimensional discrete graphical models: the existence of the maximum likelihood estimate (mle) and its computation.
When the data is sparse, there are many zeros in the contingency table and the maximum likelihood estimate of the parameter may not exist. Fienberg and Rinaldo (2012) have shown that the mle…
▽ More
We consider two connected aspects of maximum likelihood estimation of the parameter for high-dimensional discrete graphical models: the existence of the maximum likelihood estimate (mle) and its computation.
When the data is sparse, there are many zeros in the contingency table and the maximum likelihood estimate of the parameter may not exist. Fienberg and Rinaldo (2012) have shown that the mle does not exists iff the data vector belongs to a face of the so-called marginal cone spanned by the rows of the design matrix of the model. Identifying these faces in high-dimension is challenging. In this paper, we take a local approach : we show that one such face, albeit possibly not the smallest one, can be identified by looking at a collection of marginal graphical models generated by induced subgraphs $G_i,i=1,\ldots,k$ of $G$. This is our first contribution.
Our second contribution concerns the composite maximum likelihood estimate. When the dimension of the problem is large, estimating the parameters of a given graphical model through maximum likelihood is challenging, if not impossible. The traditional approach to this problem has been local with the use of composite likelihood based on local conditional likelihoods.
A more recent development is to have the components of the composite likelihood be marginal likelihoods centred around each $v$. We first show that the estimates obtained by consensus through local conditional and marginal likelihoods are identical. We then study the asymptotic properties of the composite maximum likelihood estimate when both the dimension of the model and the sample size $N$ go to infinity.
△ Less
Submitted 21 April, 2015;
originally announced April 2015.
-
Distributed parameter estimation of discrete hierarchical models via marginal likelihoods
Authors:
Helene Massam,
Nanwei Wang
Abstract:
We consider discrete graphical models Markov with respect to a graph $G$ and propose two distributed marginal methods to estimate the maximum likelihood estimate of the canonical parameter of the model. Both methods are based on a relaxation of the marginal likelihood obtained by considering the density of the variables represented by a vertex $v$ of $G$ and a neighborhood. The two methods differ…
▽ More
We consider discrete graphical models Markov with respect to a graph $G$ and propose two distributed marginal methods to estimate the maximum likelihood estimate of the canonical parameter of the model. Both methods are based on a relaxation of the marginal likelihood obtained by considering the density of the variables represented by a vertex $v$ of $G$ and a neighborhood. The two methods differ by the size of the neighborhood of $v$. We show that the estimates are consistent and that those obtained with the larger neighborhood have smaller asymptotic variance than the ones obtained through the smaller neighborhood.
△ Less
Submitted 21 October, 2013;
originally announced October 2013.
-
Composite likelihood estimation of sparse Gaussian graphical models with symmetry
Authors:
Xin Gao,
Helene Massam
Abstract:
In this article, we discuss the composite likelihood estimation of sparse Gaussian graphical models. When there are symmetry constraints on the concentration matrix or partial correlation matrix, the likelihood estimation can be computational intensive. The composite likelihood offers an alternative formulation of the objective function and yields consistent estimators. When a sparse model is cons…
▽ More
In this article, we discuss the composite likelihood estimation of sparse Gaussian graphical models. When there are symmetry constraints on the concentration matrix or partial correlation matrix, the likelihood estimation can be computational intensive. The composite likelihood offers an alternative formulation of the objective function and yields consistent estimators. When a sparse model is considered, the penalized composite likelihood estimation can yield estimates satisfying both the symmetry and sparsity constraints and possess ORACLE property. Application of the proposed method is demonstrated through simulation studies and a network analysis of a biological data set.
△ Less
Submitted 21 August, 2012;
originally announced August 2012.
-
High dimensional Bayesian inference for Gaussian directed acyclic graph models
Authors:
Emanuel Ben-David,
Tianxi Li,
Helene Massam,
Bala Rajaratnam
Abstract:
In this paper, we consider Gaussian models Markov with respect to an arbitrary DAG. We first construct a family of conjugate priors for the Cholesky parametrization of the covariance matrix of such models. This family has as many shape parameters as the DAG has vertices, and naturally extends the work of Geiger and Heckerman [8]. From these distributions, we derive prior distributions for the cova…
▽ More
In this paper, we consider Gaussian models Markov with respect to an arbitrary DAG. We first construct a family of conjugate priors for the Cholesky parametrization of the covariance matrix of such models. This family has as many shape parameters as the DAG has vertices, and naturally extends the work of Geiger and Heckerman [8]. From these distributions, we derive prior distributions for the covariance and precision parameters of the Gaussian DAG Markov models. Our works thus extends the work of Dawid and Lauritzen [5] and Letac and Massam [16] for Gaussian models Markov with respect to a decomposable graph to arbitrary DAGs. For this reason, we call our distributions DAG-Wishart distributions. An advantage of these distributions is that they possess strong hyper Markov properties and thus allow for explicit estimation of the covariance and precision parameters, regardless of the dimension of the problem. They also allow us to develop methodology for model selection and covariance estimation in the space of DAG-Markov models. We demonstrate via several numerical examples that the proposed method scales well to high-dimensions.
△ Less
Submitted 5 March, 2015; v1 submitted 20 September, 2011;
originally announced September 2011.