-
A PAC-Bayesian Perspective on the Interpolating Information Criterion
Authors:
Liam Hodgkinson,
Chris van der Heide,
Robert Salomone,
Fred Roosta,
Michael W. Mahoney
Abstract:
Deep learning is renowned for its theory-practice gap, whereby principled theory typically fails to provide much beneficial guidance for implementation in practice. This has been highlighted recently by the benign overfitting phenomenon: when neural networks become sufficiently large to interpolate the dataset perfectly, model performance appears to improve with increasing model size, in apparent…
▽ More
Deep learning is renowned for its theory-practice gap, whereby principled theory typically fails to provide much beneficial guidance for implementation in practice. This has been highlighted recently by the benign overfitting phenomenon: when neural networks become sufficiently large to interpolate the dataset perfectly, model performance appears to improve with increasing model size, in apparent contradiction with the well-known bias-variance tradeoff. While such phenomena have proven challenging to theoretically study for general models, the recently proposed Interpolating Information Criterion (IIC) provides a valuable theoretical framework to examine performance for overparameterized models. Using the IIC, a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence generalization performance in the interpolating regime. From the provided bound, we quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, optimizer, and parameter-initialization scheme; the spectrum of the empirical neural tangent kernel; curvature of the loss landscape; and noise present in the data.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
The Interpolating Information Criterion for Overparameterized Models
Authors:
Liam Hodgkinson,
Chris van der Heide,
Robert Salomone,
Fred Roosta,
Michael W. Mahoney
Abstract:
The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical information criteria typically consider the large-data limit, penalizing model size. However, these criteria are not appropriate in modern settings where overparameterized models tend to perform well. For any overparameterized mod…
▽ More
The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical information criteria typically consider the large-data limit, penalizing model size. However, these criteria are not appropriate in modern settings where overparameterized models tend to perform well. For any overparameterized model, we show that there exists a dual underparameterized model that possesses the same marginal likelihood, thus establishing a form of Bayesian duality. This enables more classical methods to be used in the overparameterized setting, revealing the Interpolating Information Criterion, a measure of model quality that naturally incorporates the choice of prior into the model selection. Our new information criterion accounts for prior misspecification, geometric and spectral properties of the model, and is numerically consistent with known empirical and theoretical behavior in this regime.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Generalization Guarantees via Algorithm-dependent Rademacher Complexity
Authors:
Sarah Sachs,
Tim van Erven,
Liam Hodgkinson,
Rajiv Khanna,
Umut Simsekli
Abstract:
Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure…
▽ More
Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class. Combining standard properties of Rademacher complexity with the convenient structure of this class, we are able to (i) obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal dimension-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term that was required in prior work; (ii) we greatly simplify the proof of a recent dimension-independent generalization bound for stochastic gradient descent; and (iii) we easily recover results for VC classes and compression schemes, similar to approaches based on conditional mutual information.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
A Heavy-Tailed Algebra for Probabilistic Programming
Authors:
Feynman Liang,
Liam Hodgkinson,
Michael W. Mahoney
Abstract:
Despite the successes of probabilistic models based on passing noise through neural networks, recent work has identified that such methods often fail to capture tail behavior accurately, unless the tails of the base distribution are appropriately calibrated. To overcome this deficiency, we propose a systematic approach for analyzing the tails of random variables, and we illustrate how this approac…
▽ More
Despite the successes of probabilistic models based on passing noise through neural networks, recent work has identified that such methods often fail to capture tail behavior accurately, unless the tails of the base distribution are appropriately calibrated. To overcome this deficiency, we propose a systematic approach for analyzing the tails of random variables, and we illustrate how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler. To characterize how the tails change under various operations, we develop an algebra which acts on a three-parameter family of tail asymptotics and which is based on the generalized Gamma distribution. Our algebraic operations are closed under addition and multiplication; they are capable of distinguishing sub-Gaussians with differing scales; and they handle ratios sufficiently well to reproduce the tails of most important statistical distributions directly from their definitions. Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
When are ensembles really effective?
Authors:
Ryan Theisen,
Hyunsuk Kim,
Yaoqing Yang,
Liam Hodgkinson,
Michael W. Mahoney
Abstract:
Ensembling has a long history in statistical data analysis, with many impactful applications. However, in many modern machine learning settings, the benefits of ensembling are less ubiquitous and less obvious. We study, both theoretically and empirically, the fundamental question of when ensembling yields significant performance improvements in classification tasks. Theoretically, we prove new res…
▽ More
Ensembling has a long history in statistical data analysis, with many impactful applications. However, in many modern machine learning settings, the benefits of ensembling are less ubiquitous and less obvious. We study, both theoretically and empirically, the fundamental question of when ensembling yields significant performance improvements in classification tasks. Theoretically, we prove new results relating the \emph{ensemble improvement rate} (a measure of how much ensembling decreases the error rate versus a single model, on a relative scale) to the \emph{disagreement-error ratio}. We show that ensembling improves performance significantly whenever the disagreement rate is large relative to the average error rate; and that, conversely, one classifier is often enough whenever the disagreement rate is low relative to the average error rate. On the way to proving these results, we derive, under a mild condition called \emph{competence}, improved upper and lower bounds on the average test error rate of the majority vote classifier. To complement this theory, we study ensembling empirically in a variety of settings, verifying the predictions made by our theory, and identifying practical scenarios where ensembling does and does not result in large performance improvements. Perhaps most notably, we demonstrate a distinct difference in behavior between interpolating models (popular in current practice) and non-interpolating models (such as tree-based methods, where ensembling is popular), demonstrating that ensembling helps considerably more in the latter case than in the former.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes
Authors:
Liam Hodgkinson,
Chris van der Heide,
Fred Roosta,
Michael W. Mahoney
Abstract:
Despite their importance for assessing reliability of predictions, uncertainty quantification (UQ) measures for machine learning models have only recently begun to be rigorously characterized. One prominent issue is the curse of dimensionality: it is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input d…
▽ More
Despite their importance for assessing reliability of predictions, uncertainty quantification (UQ) measures for machine learning models have only recently begun to be rigorously characterized. One prominent issue is the curse of dimensionality: it is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input dimensions. We prove that by tuning hyperparameters to maximize marginal likelihood (the empirical Bayes procedure), the performance, as measured by the marginal likelihood, improves monotonically} with the input dimension. On the other hand, we prove that cross-validation metrics exhibit qualitatively different behavior that is characteristic of double descent. Cold posteriors, which have recently attracted interest due to their improved performance in certain settings, appear to exacerbate these phenomena. We verify empirically that our results hold for real data, beyond our considered assumptions, and we explore consequences involving synthetic covariates.
△ Less
Submitted 25 July, 2023; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Fat-Tailed Variational Inference with Anisotropic Tail Adaptive Flows
Authors:
Feynman Liang,
Liam Hodgkinson,
Michael W. Mahoney
Abstract:
While fat-tailed densities commonly arise as posterior and marginal distributions in robust models and scale mixtures, they present challenges when Gaussian-based variational inference fails to capture tail decay accurately. We first improve previous theory on tails of Lipschitz flows by quantifying how the tails affect the rate of tail decay and by expanding the theory to non-Lipschitz polynomial…
▽ More
While fat-tailed densities commonly arise as posterior and marginal distributions in robust models and scale mixtures, they present challenges when Gaussian-based variational inference fails to capture tail decay accurately. We first improve previous theory on tails of Lipschitz flows by quantifying how the tails affect the rate of tail decay and by expanding the theory to non-Lipschitz polynomial flows. Then, we develop an alternative theory for multivariate tail parameters which is sensitive to tail-anisotropy. In doing so, we unveil a fundamental problem which plagues many existing flow-based methods: they can only model tail-isotropic distributions (i.e., distributions having the same tail parameter in every direction). To mitigate this and enable modeling of tail-anisotropic targets, we propose anisotropic tail-adaptive flows (ATAF). Experimental results on both synthetic and real-world targets confirm that ATAF is competitive with prior work while also exhibiting appropriate tail-anisotropy.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data
Authors:
Yaoqing Yang,
Ryan Theisen,
Liam Hodgkinson,
Joseph E. Gonzalez,
Kannan Ramchandran,
Charles H. Martin,
Michael W. Mahoney
Abstract:
Selecting suitable architecture parameters and training hyperparameters is essential for enhancing machine learning (ML) model performance. Several recent empirical studies conduct large-scale correlational analysis on neural networks (NNs) to search for effective \emph{generalization metrics} that can guide this type of model selection. Effective metrics are typically expected to correlate strong…
▽ More
Selecting suitable architecture parameters and training hyperparameters is essential for enhancing machine learning (ML) model performance. Several recent empirical studies conduct large-scale correlational analysis on neural networks (NNs) to search for effective \emph{generalization metrics} that can guide this type of model selection. Effective metrics are typically expected to correlate strongly with test performance. In this paper, we expand on prior analyses by examining generalization-metric-based model selection with the following objectives: (i) focusing on natural language processing (NLP) tasks, as prior work primarily concentrates on computer vision (CV) tasks; (ii) considering metrics that directly predict \emph{test error} instead of the \emph{generalization gap}; (iii) exploring metrics that do not need access to data to compute. From these objectives, we are able to provide the first model selection results on large pretrained Transformers from Huggingface using generalization metrics. Our analyses consider (I) hundreds of Transformers trained in different settings, in which we systematically vary the amount of data, the model size and the optimization hyperparameters, (II) a total of 51 pretrained Transformers from eight families of Huggingface NLP models, including GPT2, BERT, etc., and (III) a total of 28 existing and novel generalization metrics. Despite their niche status, we find that metrics derived from the heavy-tail (HT) perspective are particularly useful in NLP tasks, exhibiting stronger correlations than other, more popular metrics. To further examine these metrics, we extend prior formulations relying on power law (PL) spectral distributions to exponential (EXP) and exponentially-truncated power law (E-TPL) families.
△ Less
Submitted 4 June, 2023; v1 submitted 6 February, 2022;
originally announced February 2022.
-
Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers
Authors:
Liam Hodgkinson,
Umut Şimşekli,
Rajiv Khanna,
Michael W. Mahoney
Abstract:
Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, this work mainly relied on continuous-time ap…
▽ More
Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, this work mainly relied on continuous-time approximations; and a rigorous treatment for the original discrete-time iterations is yet to be performed. To bridge this gap, we present novel bounds linking generalization to the lower tail exponent of the transition kernel associated with the optimizer around a local minimum, in both discrete- and continuous-time settings. To achieve this, we first prove a data- and algorithm-dependent generalization bound in terms of the celebrated Fernique-Talagrand functional applied to the trajectory of the optimizer. Then, we specialize this result by exploiting the Markovian structure of stochastic optimizers, and derive bounds in terms of their (data-dependent) transition kernels. We support our theory with empirical results from a variety of neural networks, showing correlations between generalization error and lower tail exponents.
△ Less
Submitted 11 July, 2022; v1 submitted 2 August, 2021;
originally announced August 2021.
-
Taxonomizing local versus global structure in neural network loss landscapes
Authors:
Yaoqing Yang,
Liam Hodgkinson,
Ryan Theisen,
Joe Zou,
Joseph E. Gonzalez,
Kannan Ramchandran,
Michael W. Mahoney
Abstract:
Viewing neural network models in terms of their loss landscapes has a long history in the statistical mechanics approach to learning, and in recent years it has received attention within machine learning proper. Among other things, local metrics (such as the smoothness of the loss landscape) have been shown to correlate with global properties of the model (such as good generalization performance).…
▽ More
Viewing neural network models in terms of their loss landscapes has a long history in the statistical mechanics approach to learning, and in recent years it has received attention within machine learning proper. Among other things, local metrics (such as the smoothness of the loss landscape) have been shown to correlate with global properties of the model (such as good generalization performance). Here, we perform a detailed empirical analysis of the loss landscape structure of thousands of neural network models, systematically varying learning tasks, model architectures, and/or quantity/quality of data. By considering a range of metrics that attempt to capture different aspects of the loss landscape, we demonstrate that the best test accuracy is obtained when: the loss landscape is globally well-connected; ensembles of trained models are more similar to each other; and models converge to locally smooth regions. We also show that globally poorly-connected landscapes can arise when models are small or when they are trained to lower quality data; and that, if the loss landscape is globally poorly-connected, then training to zero loss can actually lead to worse test accuracy. Our detailed empirical results shed light on phases of learning (and consequent double descent behavior), fundamental versus incidental determinants of good generalization, the role of load-like and temperature-like parameters in the learning process, different influences on the loss landscape from model and data, and the relationships between local and global metrics, all topics of recent interest.
△ Less
Submitted 12 December, 2021; v1 submitted 23 July, 2021;
originally announced July 2021.
-
Stateful ODE-Nets using Basis Function Expansions
Authors:
Alejandro Queiruga,
N. Benjamin Erichson,
Liam Hodgkinson,
Michael W. Mahoney
Abstract:
The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems. In this work, we reconsider formulations of the weights as continuous-in-depth functions using linear combinations of basis functions which enables us to leverage parameter transformations such as function projections. In turn, this view…
▽ More
The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems. In this work, we reconsider formulations of the weights as continuous-in-depth functions using linear combinations of basis functions which enables us to leverage parameter transformations such as function projections. In turn, this view allows us to formulate a novel stateful ODE-Block that handles stateful layers. The benefits of this new ODE-Block are twofold: first, it enables incorporating meaningful continuous-in-depth batch normalization layers to achieve state-of-the-art performance; second, it enables compressing the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance and reducing both inference time and memory footprint. Performance is demonstrated by applying our stateful ODE-Block to (a) image classification tasks using convolutional units and (b) sentence-tagging tasks using transformer encoder units.
△ Less
Submitted 6 November, 2021; v1 submitted 20 June, 2021;
originally announced June 2021.
-
Noisy Recurrent Neural Networks
Authors:
Soon Hoe Lim,
N. Benjamin Erichson,
Liam Hodgkinson,
Michael W. Mahoney
Abstract:
We provide a general framework for studying recurrent neural networks (RNNs) trained by injecting noise into hidden states. Specifically, we consider RNNs that can be viewed as discretizations of stochastic differential equations driven by input data. This framework allows us to study the implicit regularization effect of general noise injection schemes by deriving an approximate explicit regulari…
▽ More
We provide a general framework for studying recurrent neural networks (RNNs) trained by injecting noise into hidden states. Specifically, we consider RNNs that can be viewed as discretizations of stochastic differential equations driven by input data. This framework allows us to study the implicit regularization effect of general noise injection schemes by deriving an approximate explicit regularizer in the small noise regime. We find that, under reasonable assumptions, this implicit regularization promotes flatter minima; it biases towards models with more stable dynamics; and, in classification tasks, it favors models with larger classification margin. Sufficient conditions for global stability are obtained, highlighting the phenomenon of stochastic stabilization, where noise injection can improve stability during training. Our theory is supported by empirical results which demonstrate that the RNNs have improved robustness with respect to various input perturbations.
△ Less
Submitted 1 December, 2021; v1 submitted 9 February, 2021;
originally announced February 2021.
-
Lipschitz Recurrent Neural Networks
Authors:
N. Benjamin Erichson,
Omri Azencot,
Alejandro Queiruga,
Liam Hodgkinson,
Michael W. Mahoney
Abstract:
Viewing recurrent neural networks (RNNs) as continuous-time dynamical systems, we propose a recurrent unit that describes the hidden state's evolution with two parts: a well-understood linear component plus a Lipschitz nonlinearity. This particular functional form facilitates stability analysis of the long-term behavior of the recurrent unit using tools from nonlinear systems theory. In turn, this…
▽ More
Viewing recurrent neural networks (RNNs) as continuous-time dynamical systems, we propose a recurrent unit that describes the hidden state's evolution with two parts: a well-understood linear component plus a Lipschitz nonlinearity. This particular functional form facilitates stability analysis of the long-term behavior of the recurrent unit using tools from nonlinear systems theory. In turn, this enables architectural design decisions before experimentation. Sufficient conditions for global stability of the recurrent unit are obtained, motivating a novel scheme for constructing hidden-to-hidden matrices. Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks, including computer vision, language modeling and speech prediction tasks. Finally, through Hessian-based analysis we demonstrate that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs.
△ Less
Submitted 23 April, 2021; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Multiplicative noise and heavy tails in stochastic optimization
Authors:
Liam Hodgkinson,
Michael W. Mahoney
Abstract:
Although stochastic optimization is central to modern machine learning, the precise mechanisms underlying its success, and in particular, the precise role of the stochasticity, still remain unclear. Modelling stochastic optimization algorithms as discrete random recurrence relations, we show that multiplicative noise, as it commonly arises due to variance in local rates of convergence, results in…
▽ More
Although stochastic optimization is central to modern machine learning, the precise mechanisms underlying its success, and in particular, the precise role of the stochasticity, still remain unclear. Modelling stochastic optimization algorithms as discrete random recurrence relations, we show that multiplicative noise, as it commonly arises due to variance in local rates of convergence, results in heavy-tailed stationary behaviour in the parameters. A detailed analysis is conducted for SGD applied to a simple linear regression problem, followed by theoretical results for a much larger class of models (including non-linear and non-convex) and optimizers (including momentum, Adam, and stochastic Newton), demonstrating that our qualitative results hold much more generally. In each case, we describe dependence on key factors, including step size, batch size, and data variability, all of which exhibit similar qualitative behavior to recent empirical results on state-of-the-art neural network models from computer vision and natural language processing. Furthermore, we empirically demonstrate how multiplicative noise and heavy-tailed structure improve capacity for basin hopping and exploration of non-convex loss surfaces, over commonly-considered stochastic dynamics with only additive noise and light-tailed structure.
△ Less
Submitted 11 June, 2020;
originally announced June 2020.
-
Stochastic Normalizing Flows
Authors:
Liam Hodgkinson,
Chris van der Heide,
Fred Roosta,
Michael W. Mahoney
Abstract:
We introduce stochastic normalizing flows, an extension of continuous normalizing flows for maximum likelihood estimation and variational inference (VI) using stochastic differential equations (SDEs). Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs as random neural ordinary differential equ…
▽ More
We introduce stochastic normalizing flows, an extension of continuous normalizing flows for maximum likelihood estimation and variational inference (VI) using stochastic differential equations (SDEs). Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs as random neural ordinary differential equations. These SDEs can be used for constructing efficient Markov chains to sample from the underlying distribution of a given dataset. Furthermore, by considering families of targeted SDEs with prescribed stationary distribution, we can apply VI to the optimization of hyperparameters in stochastic MCMC.
△ Less
Submitted 25 February, 2020; v1 submitted 21 February, 2020;
originally announced February 2020.
-
The reproducing Stein kernel approach for post-hoc corrected sampling
Authors:
Liam Hodgkinson,
Robert Salomone,
Fred Roosta
Abstract:
Stein importance sampling is a widely applicable technique based on kernelized Stein discrepancy, which corrects the output of approximate sampling algorithms by reweighting the empirical distribution of the samples. A general analysis of this technique is conducted for the previously unconsidered setting where samples are obtained via the simulation of a Markov chain, and applies to an arbitrary…
▽ More
Stein importance sampling is a widely applicable technique based on kernelized Stein discrepancy, which corrects the output of approximate sampling algorithms by reweighting the empirical distribution of the samples. A general analysis of this technique is conducted for the previously unconsidered setting where samples are obtained via the simulation of a Markov chain, and applies to an arbitrary underlying Polish space. We prove that Stein importance sampling yields consistent estimators for quantities related to a target distribution of interest by using samples obtained from a geometrically ergodic Markov chain with a possibly unknown invariant measure that differs from the desired target. The approach is shown to be valid under conditions that are satisfied for a large number of unadjusted samplers, and is capable of retaining consistency when data subsampling is used. Along the way, a universal theory of reproducing Stein kernels is established, which enables the construction of kernelized Stein discrepancy on general Polish spaces, and provides sufficient conditions for kernels to be convergence-determining on such spaces. These results are of independent interest for the development of future methodology based on kernelized Stein discrepancies.
△ Less
Submitted 13 September, 2021; v1 submitted 25 January, 2020;
originally announced January 2020.
-
Fast approximate simulation of finite long-range spin systems
Authors:
Ross McVinish,
Liam Hodgkinson
Abstract:
Tau leaping is a popular method for performing fast approximate simulation of certain continuous time Markov chain models typically found in chemistry and biochemistry. This method is known to perform well when the transition rates satisfy some form of scaling behaviour. In a similar spirit to tau leaping, we propose a new method for approximate simulation of spin systems which approximates the ev…
▽ More
Tau leaping is a popular method for performing fast approximate simulation of certain continuous time Markov chain models typically found in chemistry and biochemistry. This method is known to perform well when the transition rates satisfy some form of scaling behaviour. In a similar spirit to tau leaping, we propose a new method for approximate simulation of spin systems which approximates the evolution of spin at each site between sampling epochs as an independent two-state Markov chain. When combined with fast summation methods, our method offers considerable improvement in speed over the standard Doob-Gillespie algorithm. We provide a detailed analysis of the error incurred for both the number of sites incorrectly labelled and for linear functions of the state.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Geometric Rates of Convergence for Kernel-based Sampling Algorithms
Authors:
Rajiv Khanna,
Liam Hodgkinson,
Michael W. Mahoney
Abstract:
The rate of convergence of weighted kernel herding (WKH) and sequential Bayesian quadrature (SBQ), two kernel-based sampling algorithms for estimating integrals with respect to some target probability measure, is investigated. Under verifiable conditions on the chosen kernel and target measure, we establish a near-geometric rate of convergence for target measures that are nearly atomic. Furthermor…
▽ More
The rate of convergence of weighted kernel herding (WKH) and sequential Bayesian quadrature (SBQ), two kernel-based sampling algorithms for estimating integrals with respect to some target probability measure, is investigated. Under verifiable conditions on the chosen kernel and target measure, we establish a near-geometric rate of convergence for target measures that are nearly atomic. Furthermore, we show these algorithms perform comparably to the theoretical best possible sampling algorithm under the maximum mean discrepancy. An analysis is also conducted in a distributed setting. Our theoretical developments are supported by empirical observations on simulated data as well as a real world application.
△ Less
Submitted 31 October, 2021; v1 submitted 19 July, 2019;
originally announced July 2019.
-
Implicit Langevin Algorithms for Sampling From Log-concave Densities
Authors:
Liam Hodgkinson,
Robert Salomone,
Fred Roosta
Abstract:
For sampling from a log-concave density, we study implicit integrators resulting from $θ$-method discretization of the overdamped Langevin diffusion stochastic differential equation. Theoretical and algorithmic properties of the resulting sampling methods for $ θ\in [0,1] $ and a range of step sizes are established. Our results generalize and extend prior works in several directions. In particular…
▽ More
For sampling from a log-concave density, we study implicit integrators resulting from $θ$-method discretization of the overdamped Langevin diffusion stochastic differential equation. Theoretical and algorithmic properties of the resulting sampling methods for $ θ\in [0,1] $ and a range of step sizes are established. Our results generalize and extend prior works in several directions. In particular, for $θ\ge1/2$, we prove geometric ergodicity and stability of the resulting methods for all step sizes. We show that obtaining subsequent samples amounts to solving a strongly-convex optimization problem, which is readily achievable using one of numerous existing methods. Numerical examples supporting our theoretical analysis are also presented.
△ Less
Submitted 10 July, 2021; v1 submitted 28 March, 2019;
originally announced March 2019.
-
Normal approximations for discrete-time occupancy processes
Authors:
Liam Hodgkinson,
Ross McVinish,
Philip K. Pollett
Abstract:
We study normal approximations for a class of discrete-time occupancy processes, namely, Markov chains with transition kernels of product Bernoulli form. This class encompasses numerous models which appear in the complex networks literature, including stochastic patch occupancy models in ecology, network models in epidemiology, and a variety of dynamic random graph models. Bounds on the rate of co…
▽ More
We study normal approximations for a class of discrete-time occupancy processes, namely, Markov chains with transition kernels of product Bernoulli form. This class encompasses numerous models which appear in the complex networks literature, including stochastic patch occupancy models in ecology, network models in epidemiology, and a variety of dynamic random graph models. Bounds on the rate of convergence for a central limit theorem are obtained using Stein's method and moment inequalities on the deviation from an analogous deterministic model. As a consequence, our work also implies a uniform law of large numbers for a subclass of these processes.
△ Less
Submitted 10 November, 2018; v1 submitted 1 January, 2018;
originally announced January 2018.
-
Unruh-DeWitt detector response along static and circular geodesic trajectories for Schwarzschild-AdS black holes
Authors:
Keith K. Ng,
Lee Hodgkinson,
Jorma Louko,
Robert B. Mann,
Eduardo Martin-Martinez
Abstract:
We present novel methods to numerically address the problem of characterizing the response of particle detectors in curved spacetimes. These methods allow for the integration of the Wightman function, at least in principle, in rather general backgrounds. In particular we will use this tool to further understand the nature of conformal massless scalar Hawking radiation from a Schwarzschild black ho…
▽ More
We present novel methods to numerically address the problem of characterizing the response of particle detectors in curved spacetimes. These methods allow for the integration of the Wightman function, at least in principle, in rather general backgrounds. In particular we will use this tool to further understand the nature of conformal massless scalar Hawking radiation from a Schwarzschild black hole in anti-de Sitter space. We do that by studying an Unruh-DeWitt detector at rest above the horizon and in circular geodesic orbit. The method allows us to see that the response rate shows peaks at certain characteristic frequencies, which correspond to the quasinormal modes (QNMs) of the space-time. It is in principle possible to apply these techniques to more complicated and interesting physical scenarios, e.g. geodesic infall or multiple detector entanglement evolution, or the study of the behaviour of quantum correlations in spacetimes with black hole horizons.
△ Less
Submitted 15 September, 2014; v1 submitted 10 June, 2014;
originally announced June 2014.
-
Static detectors and circular-geodesic detectors on the Schwarzschild black hole
Authors:
Lee Hodgkinson,
Jorma Louko,
Adrian C. Ottewill
Abstract:
We examine the response of an Unruh-DeWitt particle detector coupled to a massless scalar field on the (3+1)-dimensional Schwarzschild spacetime, in the Boulware, Hartle-Hawking and Unruh states, for static detectors and detectors on circular geodesics, by primarily numerical methods. For the static detector, the response in the Hartle-Hawking state exhibits the known thermality at the local Hawki…
▽ More
We examine the response of an Unruh-DeWitt particle detector coupled to a massless scalar field on the (3+1)-dimensional Schwarzschild spacetime, in the Boulware, Hartle-Hawking and Unruh states, for static detectors and detectors on circular geodesics, by primarily numerical methods. For the static detector, the response in the Hartle-Hawking state exhibits the known thermality at the local Hawking temperature, and the response in the Unruh state is thermal at the local Hawking temperature in the limit of a large detector energy gap. For the circular-geodesic detector, we find evidence of thermality in the limit of a large energy gap for the Hartle-Hawking and Unruh states, at a temperature that exceeds the Doppler-shifted local Hawing temperature. Detailed quantitative comparisons between the three states are given. The response in the Hartle-Hawking state is compared with the response in the Minkowski vacuum and in the Minkowski thermal state for the corresponding Rindler, drifted Rindler, and circularly accelerated trajectories. The analysis takes place within first-order perturbation theory and relies in an essential way on stationarity.
△ Less
Submitted 5 May, 2014; v1 submitted 12 January, 2014;
originally announced January 2014.
-
Particle detectors in curved spacetime quantum field theory
Authors:
Lee Hodgkinson
Abstract:
Unruh-DeWitt particle detector models are studied in a variety of time-dependent and time-independent settings. We work within the framework of first-order perturbation theory and couple the detector to a massless scalar field. The necessity of switching on (off) the detector smoothly is emphasised throughout, and the transition rate is found by taking the sharp-switching limit of the regulator-fr…
▽ More
Unruh-DeWitt particle detector models are studied in a variety of time-dependent and time-independent settings. We work within the framework of first-order perturbation theory and couple the detector to a massless scalar field. The necessity of switching on (off) the detector smoothly is emphasised throughout, and the transition rate is found by taking the sharp-switching limit of the regulator-free and finite response function. The detector is analysed on a variety of spacetimes: $d$-dimensional Minkowski, the Bañados-Teitelboim-Zanelli (BTZ) black hole, the two-dimensional Minkowski half-plane, two-dimensional Minkowski with a receding mirror, and the two- and four-dimensional Schwarzschild black holes. In $d$-dimensional Minkowski spacetime, the transition rate is found to be finite up to dimension five. In dimension six, the transition rate diverges unless the detector is on a trajectory of constant proper acceleration, and the implications of this divergence to the global embedding spacetime (GEMS) methods are studied. In three-dimensional curved spacetime, the transition rate for the scalar field in an arbitrary Hadamard state is found to be finite and regulator-free. Then on the Bañados-Teitelboim-Zanelli (BTZ) black hole spacetime, we analyse the detector coupled to the field in the Hartle-Hawking vacua, under both transparent and reflective boundary conditions at infinity. Results are presented for the co-rotating detector, which responds thermally, and for the radially-infalling detector. In four-dimensional Schwarzschild spacetime, we proceed numerically, and the Hartle-Hawking, Boulware and Unruh vacua rates are compared. Results are presented for the case of the static detectors, which respond thermally, and also for the case of co-rotating detectors.
△ Less
Submitted 15 October, 2013; v1 submitted 27 September, 2013;
originally announced September 2013.
-
Unruh-DeWitt detector on the BTZ black hole
Authors:
Lee Hodgkinson,
Jorma Louko
Abstract:
We examine an Unruh-DeWitt particle detector coupled to a scalar field in three-dimensional curved spacetime, within first-order perturbation theory. We first obtain a causal and manifestly regular expression for the instantaneous transition rate in an arbitrary Hadamard state. We then specialise to the Bañados-Teitelboim-Zanelli black hole and to a massless conformally coupled field in the Hartle…
▽ More
We examine an Unruh-DeWitt particle detector coupled to a scalar field in three-dimensional curved spacetime, within first-order perturbation theory. We first obtain a causal and manifestly regular expression for the instantaneous transition rate in an arbitrary Hadamard state. We then specialise to the Bañados-Teitelboim-Zanelli black hole and to a massless conformally coupled field in the Hartle-Hawking vacuum. A co-rotating detector responds thermally in the expected local Hawking temperature, while a freely-falling detector shows no evidence of thermality in regimes that we are able to probe, not even far from the horizon. The boundary condition at the asymptotically anti-de Sitter infinity has a significant effect on the transition rate.
△ Less
Submitted 15 August, 2012;
originally announced August 2012.
-
Static, stationary and inertial Unruh-DeWitt detectors on the BTZ black hole
Authors:
Lee Hodgkinson,
Jorma Louko
Abstract:
We examine an Unruh-DeWitt particle detector coupled to a scalar field in three-dimensional curved spacetime. We first obtain a regulator-free expression for the transition probability in an arbitrary Hadamard state, working within first-order perturbation theory and assuming smooth switching, and we show that both the transition probability and the instantaneous transition rate remain well define…
▽ More
We examine an Unruh-DeWitt particle detector coupled to a scalar field in three-dimensional curved spacetime. We first obtain a regulator-free expression for the transition probability in an arbitrary Hadamard state, working within first-order perturbation theory and assuming smooth switching, and we show that both the transition probability and the instantaneous transition rate remain well defined in the sharp switching limit. We then analyse a detector coupled to a massless conformally coupled field in the Hartle-Hawking vacua on the Banados-Teitelboim-Zanelli black hole, under both transparent and reflective boundary conditions at the infinity. A selection of stationary and freely-falling detector trajectories are examined, including the co-rotating trajectories, for which the response is shown to be thermal. Analytic results in a number of asymptotic regimes, including those of large and small mass, are complemented by numerical results in the interpolating regimes. The boundary condition at infinity is seen to have a significant effect on the transition rate.
△ Less
Submitted 4 October, 2012; v1 submitted 10 June, 2012;
originally announced June 2012.
-
How often does the Unruh-DeWitt detector click beyond four dimensions?
Authors:
Lee Hodgkinson,
Jorma Louko
Abstract:
We analyse the response of an arbitrarily-accelerated Unruh-DeWitt detector coupled to a massless scalar field in Minkowski spacetimes of dimensions up to six, working within first-order perturbation theory and assuming a smooth switch-on and switch-off. We express the total transition probability as a manifestly finite and regulator-free integral formula. In the sharp switching limit, the transit…
▽ More
We analyse the response of an arbitrarily-accelerated Unruh-DeWitt detector coupled to a massless scalar field in Minkowski spacetimes of dimensions up to six, working within first-order perturbation theory and assuming a smooth switch-on and switch-off. We express the total transition probability as a manifestly finite and regulator-free integral formula. In the sharp switching limit, the transition probability diverges in dimensions greater than three but the transition rate remains finite up to dimension five. In dimension six, the transition rate remains finite in the sharp switching limit for trajectories of constant scalar proper acceleration, including all stationary trajectories, but it diverges for generic trajectories. The divergence of the transition rate in six dimensions suggests that global embedding spacetime (GEMS) methods for investigating detector response in curved spacetime may have limited validity for generic trajectories when the embedding spacetime has dimension higher than five.
△ Less
Submitted 6 August, 2012; v1 submitted 20 September, 2011;
originally announced September 2011.
-
Reinstating the 'no-lose' theorem for NMSSM Higgs discovery at the LHC
Authors:
J. R. Forshaw,
J. F. Gunion,
L. Hodgkinson,
A. Papaefstathiou,
A. D. Pilkington
Abstract:
The simplest supersymmetric model that solves the mu problem and in which the GUT-scale parameters need not be finely tuned in order to predict the correct value of the Z boson mass at low scales is the Next-to-Minimal Supersymmetric Standard Model (NMSSM). However, in order that fine tuning be absent, the lightest CP-even Higgs boson h should have mass ~100 GeV and SM couplings to gauge bosons…
▽ More
The simplest supersymmetric model that solves the mu problem and in which the GUT-scale parameters need not be finely tuned in order to predict the correct value of the Z boson mass at low scales is the Next-to-Minimal Supersymmetric Standard Model (NMSSM). However, in order that fine tuning be absent, the lightest CP-even Higgs boson h should have mass ~100 GeV and SM couplings to gauge bosons and fermions. The only way that this can be consistent with LEP limits is if h decays primarily via h->aa->4 tau or 4j but not 4b, where a is the lighter of the two pseudo-scalar Higgses that are present in the NMSSM. Interestingly, m_a < 2 m_b is natural in the NMSSM with m_a > 2 m_tau somewhat preferred. Thus, h -> 4 tau becomes a key mode of interest. Meanwhile, all other Higgs bosons of the NMSSM are typically quite heavy. Detection of any of the NMSSM Higgs bosons at the LHC in this preferred scenario will be very challenging using conventional channels. In this paper, we demonstrate that the h -> aa -> 4 tau decay chain should be visible if the Higgs is produced in the process pp -> p+h+p with the final state protons being measured using suitably installed forward detectors. Moreover, we show that the mass of both the h and the a can be determined on an event-by-event basis.
△ Less
Submitted 27 March, 2008; v1 submitted 20 December, 2007;
originally announced December 2007.