Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Anirban Bhattacharya

    We consider Markov chain Monte Carlo (MCMC) algorithms for Bayesian high-dimensional regression with continuous shrinkage priors. A common challenge with these algorithms is the choice of the number of iterations to perform. This is... more
    We consider Markov chain Monte Carlo (MCMC) algorithms for Bayesian high-dimensional regression with continuous shrinkage priors. A common challenge with these algorithms is the choice of the number of iterations to perform. This is critical when each iteration is expensive, as is the case when dealing with modern data sets, such as genome-wide association studies with thousands of rows and up to hundreds of thousands of columns. We develop coupling techniques tailored to the setting of high-dimensional regression with shrinkage priors, which enable practical, non-asymptotic diagnostics of convergence without relying on traceplots or long-run asymptotics. By establishing geometric drift and minorization conditions for the algorithm under consideration, we prove that the proposed couplings have finite expected meeting time. Focusing on a class of shrinkage priors which includes the ‘Horseshoe’, we empirically demonstrate the scalability of the proposed couplings. A highlight of our f...
    We consider the fractional posterior distribution that is obtained by updating a prior distribution via Bayes theorem with a fractional likelihood function, a usual likelihood function raised to a fractional power. First, we analyze the... more
    We consider the fractional posterior distribution that is obtained by updating a prior distribution via Bayes theorem with a fractional likelihood function, a usual likelihood function raised to a fractional power. First, we analyze the contraction property of the fractional posterior in a general misspecified framework. Our contraction results only require a prior mass condition on certain Kullback-Leibler (KL) neighborhood of the true parameter (or the KL divergence minimizer in the misspecified case), and obviate constructions of test functions and sieves commonly used in the literature for analyzing the contraction property of a regular posterior. We show through a counterexample that some condition controlling the complexity of the parameter space is necessary for the regular posterior to contract, rendering additional flexibility on the choice of the prior for the fractional posterior. Second, we derive a novel Bayesian oracle inequality based on a PAC-Bayes inequality in miss...
    We present non-asymptotic two-sided bounds to the log-marginal likelihood in Bayesian inference. The classical Laplace approximation is recovered as the leading term. Our derivation permits model misspecification and allows the parameter... more
    We present non-asymptotic two-sided bounds to the log-marginal likelihood in Bayesian inference. The classical Laplace approximation is recovered as the leading term. Our derivation permits model misspecification and allows the parameter dimension to grow with the sample size. We do not make any assumptions about the asymptotic shape of the posterior, and instead require certain regularity conditions on the likelihood ratio and that the posterior to be sufficiently concentrated.
    The article addresses a long-standing open problem on the justification of using variational Bayes methods for parameter estimation. We provide general conditions for obtaining optimal risk bounds for point estimates acquired from... more
    The article addresses a long-standing open problem on the justification of using variational Bayes methods for parameter estimation. We provide general conditions for obtaining optimal risk bounds for point estimates acquired from mean-field variational Bayesian inference. The conditions pertain to the existence of certain test functions for the distance metric on the parameter space and minimal assumptions on the prior. A general recipe for verification of the conditions is outlined which is broadly applicable to existing Bayesian models with or without latent variables. As illustrations, specific applications to Latent Dirichlet Allocation and Gaussian mixture models are discussed.
    The horseshoe prior is frequently employed in Bayesian analysis of high-dimensional models, and has been shown to achieve minimax optimal risk properties when the truth is sparse. While optimization-based algorithms for the extremely... more
    The horseshoe prior is frequently employed in Bayesian analysis of high-dimensional models, and has been shown to achieve minimax optimal risk properties when the truth is sparse. While optimization-based algorithms for the extremely popular Lasso and elastic net procedures can scale to dimension in the hundreds of thousands, algorithms for the horseshoe that use Markov chain Monte Carlo (MCMC) for computation are limited to problems an order of magnitude smaller. This is due to high computational cost per step and growth of the variance of time-averaging estimators as a function of dimension. We propose two new MCMC algorithms for computation in these models that have significantly improved performance compared to existing alternatives. One of the algorithms also approximates an expensive matrix product to give orders of magnitude speedup in high-dimensional applications. We prove guarantees for the accuracy of the approximate algorithm, and show that gradually decreasing the appro...
    The horseshoe prior is frequently employed in Bayesian analysis of high-dimensional models, and has been shown to achieve minimax optimal risk properties when the truth is sparse. While optimization-based algorithms for the extremely... more
    The horseshoe prior is frequently employed in Bayesian analysis of high-dimensional models, and has been shown to achieve minimax optimal risk properties when the truth is sparse. While optimization-based algorithms for the extremely popular Lasso and elastic net procedures can scale to dimension in the hundreds of thousands, algorithms for the horseshoe that use Markov chain Monte Carlo (MCMC) for computation are limited to problems an order of magnitude smaller. This is due to high computational cost per step and growth of the variance of time-averaging estimators as a function of dimension. We propose two new MCMC algorithms for computation in these models that have improved performance compared to existing alternatives. One of the algorithms also approximates an expensive matrix product to give orders of magnitude speedup in high-dimensional applications. We prove that the exact algorithm is geometrically ergodic, and give guarantees for the accuracy of the approximate algorithm...
    Continuous shrinkage priors are commonly used in Bayesian analysis of high-dimensional data, due to both their computational advantages and favorable statistical properties. We develop coupled Markov chain Monte Carlo (MCMC) algorithms... more
    Continuous shrinkage priors are commonly used in Bayesian analysis of high-dimensional data, due to both their computational advantages and favorable statistical properties. We develop coupled Markov chain Monte Carlo (MCMC) algorithms for Bayesian shrinkage regression in high dimensions. Following Glynn & Rhee (2014), these couplings can then be used in parallel computation strategies and practical diagnostics of convergence. Focusing on a class of shrinkage priors which include the Horseshoe, we demonstrate the scalability of the proposed couplings with high-dimensional simulations and data from a genome-wide association study with 2000 rows and 100,000 covariates. The results highlight the impact of the shrinkage prior on the computational efficiency of the coupling procedure, and motivates priors where the local precisions are Half-t distributions with degree of freedom larger than one, which are statistically justifiable in terms of posterior concentration, and lead to practica...
    We show that any lower-dimensional marginal density obtained from truncating dependent multivariate normal distributions to the positive orthant exhibits a mass-shifting phenomenon. Despite the truncated multivariate normal having a mode... more
    We show that any lower-dimensional marginal density obtained from truncating dependent multivariate normal distributions to the positive orthant exhibits a mass-shifting phenomenon. Despite the truncated multivariate normal having a mode at the origin, the marginal density assigns increasingly small mass near the origin as the dimension increases. The phenomenon is accentuated as the correlation between the random variables increases; in particular we show that the univariate marginal assigns vanishingly small mass near zero as the dimension increases provided the correlation between any two variables is greater than 0.8. En-route, we develop precise comparison inequalities to estimate the probability near the origin under the marginal distribution of the truncated multivariate normal. This surprising behavior has serious repercussions in the context of Bayesian constrained estimation and inference, where the prior, in addition to having a full support, is required to assign a subst...
    We investigate the maximum neutron star mass based on constraints from low-energy nuclear physics, neutron star tidal deformabilities from GW170817, and simultaneous mass-radius measurements of PSR J0030+045 from NICER. Our prior... more
    We investigate the maximum neutron star mass based on constraints from low-energy nuclear physics, neutron star tidal deformabilities from GW170817, and simultaneous mass-radius measurements of PSR J0030+045 from NICER. Our prior distribution is based on a combination of nuclear modeling valid in the vicinity of normal nuclear densities together with the assumption of a maximally stiff equation of state at high densities. The transition density is treated as a model parameter with uniform prior. Bayesian likelihood functions involving measured neutron star tidal deformabilities and radii are subsequently used to generate equation of state posteriors. We demonstrate that a modification of the highly uncertain supra-saturation density equation of state allows for the support of $2.5-2.6\,M_\odot$ neutron stars without strongly modifying the properties (radius, tidal deformability, and moment of inertia) of $\sim 1.4\,M_\odot$ neutron stars. In our analysis, only the softest equations ...
    Gaussian process (GP) regression is a powerful interpolation technique due to its flexibility in capturing non-linearity. In this paper, we provide a general framework for understanding the frequentist coverage of point-wise and... more
    Gaussian process (GP) regression is a powerful interpolation technique due to its flexibility in capturing non-linearity. In this paper, we provide a general framework for understanding the frequentist coverage of point-wise and simultaneous Bayesian credible sets in GP regression. As an intermediate result, we develop a Bernstein von-Mises type result under supremum norm in random design GP regression. Identifying both the mean and covariance function of the posterior distribution of the Gaussian process as regularized $M$-estimators, we show that the sampling distribution of the posterior mean function and the centered posterior distribution can be respectively approximated by two population level GPs. By developing a comparison inequality between two GPs, we provide exact characterization of frequentist coverage probabilities of Bayesian point-wise credible intervals and simultaneous credible bands of the regression function. Our results show that inference based on GP regression...
    Variational algorithms have gained prominence over the past two decades as a scalable computational environment for Bayesian inference. In this article, we explore tools from the dynamical systems literature to study the convergence of... more
    Variational algorithms have gained prominence over the past two decades as a scalable computational environment for Bayesian inference. In this article, we explore tools from the dynamical systems literature to study the convergence of coordinate ascent algorithms for mean field variational inference. Focusing on the Ising model defined on two nodes, we fully characterize the dynamics of the sequential coordinate ascent algorithm and its parallel version. We observe that in the regime where the objective function is convex, both the algorithms are stable and exhibit convergence to the unique fixed point. Our analyses reveal interesting discordances between these two versions of the algorithm in the region when the objective function is non-convex. In fact, the parallel version exhibits a periodic oscillatory behavior which is absent in the sequential version. Drawing intuition from the Markov chain Monte Carlo literature, we empirically show that a parameter expansion of the Ising m...
    Summary We develop a Bayesian methodology aimed at simultaneously estimating low-rank and row-sparse matrices in a high-dimensional multiple-response linear regression model. We consider a carefully devised shrinkage prior on the matrix... more
    Summary We develop a Bayesian methodology aimed at simultaneously estimating low-rank and row-sparse matrices in a high-dimensional multiple-response linear regression model. We consider a carefully devised shrinkage prior on the matrix of regression coefficients which obviates the need to specify a prior on the rank, and shrinks the regression matrix towards low-rank and row-sparse structures. We provide theoretical support to the proposed methodology by proving minimax optimality of the posterior mean under the prediction risk in ultra-high-dimensional settings where the number of predictors can grow subexponentially relative to the sample size. A one-step post-processing scheme induced by group lasso penalties on the rows of the estimated coefficient matrix is proposed for variable selection, with default choices of tuning parameters. We additionally provide an estimate of the rank using a novel optimization function achieving dimension reduction in the covariate space. We exhibi...
    It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses... more
    It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent screening, considering each predictor one at a time, or in some cases on logistic regression assuming no interactions. We propose a fundamentally different approach based on a nonparametric Bayesian low rank tensor factorization model for the retrospective likelihood. Our model allows a very flexible structure in characterizing the distribution of multivariate variables as unknown and without any linear assumptions as in logistic regression. Predictors are excluded only if they have no impact on disease risk, either directly or through interactions with other predictors. Hence, we obtain an omnibus approach for screening for important predictors. Computation relies on an efficient Gibbs sampler. The methods are shown to have high po...
    We study the Bernstein von-Mises (BvM) phenomenon in Gaussian process regression models by retaining the leading terms of the induced Karhunen--Loeve expansion. A recent related result by Bontemps, 2011 in a sieve prior context... more
    We study the Bernstein von-Mises (BvM) phenomenon in Gaussian process regression models by retaining the leading terms of the induced Karhunen--Loeve expansion. A recent related result by Bontemps, 2011 in a sieve prior context necessitates the prior to be flat, ruling out commonly used Gaussian process models where the prior flatness is determined by the decay rate of the eigenvalues of the covariance kernel. We establish the BvM phenomena in the L_2 Wasserstein distance instead of the commonly used total variation distance, thereby encompassing a wide class of practically useful Gaussian process priors. We also develop a general technique to derive posterior rates of convergence from Wasserstein BvMs and apply it to Gaussian process priors with unbounded covariate domain. Specific illustrations are provided for the squared-exponential covariance kernel.
    Research Interests:
    Identifying a lower-dimensional latent space for representation of high-dimensional observations is of significant importance in numerous biomedical and machine learning applications. In many such applications, it is now routine to... more
    Identifying a lower-dimensional latent space for representation of high-dimensional observations is of significant importance in numerous biomedical and machine learning applications. In many such applications, it is now routine to collect data where the ...
    A systematic approach to finding variational approximation in an otherwise intractable non-conjugate model is to exploit the general principle of convex duality by minorizing the marginal likelihood that renders the problem tractable.... more
    A systematic approach to finding variational approximation in an otherwise intractable non-conjugate model is to exploit the general principle of convex duality by minorizing the marginal likelihood that renders the problem tractable. While such approaches are popular in the context of variational inference in non-conjugate Bayesian models, theoretical guarantees on statistical optimality and algorithmic convergence are lacking. Focusing on logistic regression models, we provide mild conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the variational optima. We demonstrate that these assumptions can be completely relaxed if one considers a slight variation of the algorithm by raising the likelihood to a fractional power. Next, we utilize the theory of dynamical systems to provide convergence guarantees for such algorithms in logistic and multinomial logit regression. In particular, we establish local asymptotic stability of the alg...
    We propose a new distribution, called the soft tMVN distribution, which provides a smooth approximation to the truncated multivariate normal (tMVN) distribution with linear constraints. An efficient blocked Gibbs sampler is developed to... more
    We propose a new distribution, called the soft tMVN distribution, which provides a smooth approximation to the truncated multivariate normal (tMVN) distribution with linear constraints. An efficient blocked Gibbs sampler is developed to sample from the soft tMVN distribution in high dimensions. We provide theoretical support to the approximation capability of the soft tMVN and provide further empirical evidence thereof. The soft tMVN distribution can be used to approximate simulations from a multivariate truncated normal distribution with linear constraints, or itself as a prior in shape-constrained problems.
    Penalized regression methods, such as $L_1$ regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is... more
    Penalized regression methods, such as $L_1$ regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. This has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians, facilitating computation. In sharp contrast to the corresponding frequentist literature, very little is known about the properties of such priors. Focusing on a broad class of shrinkage priors, we provide precise results on prior and posterior concentration. Interestingly, we demonstrate that most commonly used shrinkage priors, including the Bayesian Lasso, are suboptimal in high-dimensional settings. A new class of Dirichlet Laplace (DL) priors are p...
    Research Interests:
    Alterations in anxiety-related processing are observed across many neuropsychiatric disorders, including bipolar disorder. Though polymorphisms in a number of circadian genes confer risk for this disorder, little is known about how... more
    Alterations in anxiety-related processing are observed across many neuropsychiatric disorders, including bipolar disorder. Though polymorphisms in a number of circadian genes confer risk for this disorder, little is known about how changes in circadian gene function disrupt brain circuits critical for anxiety-related processing. Here we characterize neurophysiological activity simultaneously across five limbic brain areas (nucleus accumbens, amygdala, prelimbic cortex, ventral hippocampus, and ventral tegmental area) as wild-type (WT) mice and mice with a mutation in the circadian gene, CLOCK (Clock-Δ19 mice) perform an elevated zero maze task. In WT mice, basal limbic gamma oscillatory synchrony observed before task performance predicted future anxiety-related behaviors. Additionally, dynamic changes in limbic gamma oscillatory synchrony were observed based on the position of WT mice in the zero maze.Clock-Δ19 mice, which displayed an increased propensity to enter the open section ...
    Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for... more
    Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.