Search | arXiv e-print repository

Conditioning diffusion models by explicit forward-backward bridging

Authors: Adrien Corenflos, Zheng Zhao, Simo Särkkä, Jens Sjölund, Thomas B. Schön

Abstract: Given an unconditional diffusion model $π(x, y)$, using it to perform conditional simulation $π(x \mid y)$ is still largely an open question and is typically achieved by learning conditional drifts to the denoising SDE after the fact. In this work, we express conditional simulation as an inference problem on an augmented space corresponding to a partial SDE bridge. This perspective allows us to im… ▽ More Given an unconditional diffusion model $π(x, y)$, using it to perform conditional simulation $π(x \mid y)$ is still largely an open question and is typically achieved by learning conditional drifts to the denoising SDE after the fact. In this work, we express conditional simulation as an inference problem on an augmented space corresponding to a partial SDE bridge. This perspective allows us to implement efficient and principled particle Gibbs and pseudo-marginal samplers marginally targeting the conditional distribution $π(x \mid y)$. Contrary to existing methodology, our methods do not introduce any additional approximation to the unconditional diffusion model aside from the Monte Carlo error. We showcase the benefits and drawbacks of our approach on a series of synthetic and real data examples. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 24 pages, 12 figures

arXiv:2311.12566 [pdf, other]

Variational Elliptical Processes

Authors: Maria Bånkestad, Jens Sjölund, Jalil Taghia, Thomas B. Schöon

Abstract: We present elliptical processes, a family of non-parametric probabilistic models that subsume Gaussian processes and Student's t processes. This generalization includes a range of new heavy-tailed behaviors while retaining computational tractability. Elliptical processes are based on a representation of elliptical distributions as a continuous mixture of Gaussian distributions. We parameterize thi… ▽ More We present elliptical processes, a family of non-parametric probabilistic models that subsume Gaussian processes and Student's t processes. This generalization includes a range of new heavy-tailed behaviors while retaining computational tractability. Elliptical processes are based on a representation of elliptical distributions as a continuous mixture of Gaussian distributions. We parameterize this mixture distribution as a spline normalizing flow, which we train using variational inference. The proposed form of the variational posterior enables a sparse variational elliptical process applicable to large-scale problems. We highlight advantages compared to Gaussian processes through regression and classification experiments. Elliptical processes can supersede Gaussian processes in several settings, including cases where the likelihood is non-Gaussian or when accurate tail modeling is essential. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: 14 pages, 15 figures, appendix 9 pages

Journal ref: Transactions on Machine Learning Research, September 2023

arXiv:2310.19608 [pdf, other]

On Feynman--Kac training of partial Bayesian neural networks

Authors: Zheng Zhao, Sebastian Mair, Thomas B. Schön, Jens Sjölund

Abstract: Recently, partial Bayesian neural networks (pBNNs), which only consider a subset of the parameters to be stochastic, were shown to perform competitively with full Bayesian neural networks. However, pBNNs are often multi-modal in the latent variable space and thus challenging to approximate with parametric models. To address this problem, we propose an efficient sampling-based training strategy, wh… ▽ More Recently, partial Bayesian neural networks (pBNNs), which only consider a subset of the parameters to be stochastic, were shown to perform competitively with full Bayesian neural networks. However, pBNNs are often multi-modal in the latent variable space and thus challenging to approximate with parametric models. To address this problem, we propose an efficient sampling-based training strategy, wherein the training of a pBNN is formulated as simulating a Feynman--Kac model. We then describe variations of sequential Monte Carlo samplers that allow us to simultaneously estimate the parameters and the latent posterior distribution of this model at a tractable computational cost. Using various synthetic and real-world datasets we show that our proposed training scheme outperforms the state of the art in terms of predictive performance. △ Less

Submitted 27 February, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: In AISTATS 2024

arXiv:2310.10807 [pdf, other]

Regularization properties of adversarially-trained linear regression

Authors: Antônio H. Ribeiro, Dave Zachariah, Francis Bach, Thomas B. Schön

Abstract: State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against it. Formulated as a min-max problem, it searches for the best solution when the training data were corrupted by the worst-case attacks. Linear models are among the simple models where vulnerabilities can be… ▽ More State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against it. Formulated as a min-max problem, it searches for the best solution when the training data were corrupted by the worst-case attacks. Linear models are among the simple models where vulnerabilities can be observed and are the focus of our study. In this case, adversarial training leads to a convex optimization problem which can be formulated as the minimization of a finite sum. We provide a comparative analysis between the solution of adversarial training in linear regression and other regularization methods. Our main findings are that: (A) Adversarial training yields the minimum-norm interpolating solution in the overparameterized regime (more parameters than data), as long as the maximum disturbance radius is smaller than a threshold. And, conversely, the minimum-norm interpolator is the solution to adversarial training with a given radius. (B) Adversarial training can be equivalent to parameter shrinking methods (ridge regression and Lasso). This happens in the underparametrized region, for an appropriate choice of adversarial radius and zero-mean symmetrically distributed covariates. (C) For $\ell_\infty$-adversarial training -- as in square-root Lasso -- the choice of adversarial radius for optimal bounds does not depend on the additive noise variance. We confirm our theoretical findings with numerical examples. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted (spotlight) NeurIPS 2023; A preliminary version of this work titled: "Surprises in adversarially-trained linear regression" was made available under a different identifier: arXiv:2205.12695

arXiv:2309.16335 [pdf, other]

doi 10.1016/j.jelectrocard.2023.09.011

End-to-end Risk Prediction of Atrial Fibrillation from the 12-Lead ECG by Deep Neural Networks

Authors: Theogene Habineza, Antônio H. Ribeiro, Daniel Gedon, Joachim A. Behar, Antonio Luiz P. Ribeiro, Thomas B. Schön

Abstract: Background: Atrial fibrillation (AF) is one of the most common cardiac arrhythmias that affects millions of people each year worldwide and it is closely linked to increased risk of cardiovascular diseases such as stroke and heart failure. Machine learning methods have shown promising results in evaluating the risk of developing atrial fibrillation from the electrocardiogram. We aim to develop and… ▽ More Background: Atrial fibrillation (AF) is one of the most common cardiac arrhythmias that affects millions of people each year worldwide and it is closely linked to increased risk of cardiovascular diseases such as stroke and heart failure. Machine learning methods have shown promising results in evaluating the risk of developing atrial fibrillation from the electrocardiogram. We aim to develop and evaluate one such algorithm on a large CODE dataset collected in Brazil. Results: The deep neural network model identified patients without indication of AF in the presented ECG but who will develop AF in the future with an AUC score of 0.845. From our survival model, we obtain that patients in the high-risk group (i.e. with the probability of a future AF case being greater than 0.7) are 50% more likely to develop AF within 40 weeks, while patients belonging to the minimal-risk group (i.e. with the probability of a future AF case being less than or equal to 0.1) have more than 85% chance of remaining AF free up until after seven years. Conclusion: We developed and validated a model for AF risk prediction. If applied in clinical practice, the model possesses the potential of providing valuable and useful information in decision-making and patient management processes. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 16 pages with 7 figures

Journal ref: @article{HABINEZA2023193, journal = {Journal of Electrocardiology}, volume = {81}, pages = {193-200}, year = {2023}, issn = {0022-0736}}

arXiv:2210.14684 [pdf, other]

doi 10.1109/MCS.2021.3122269

Nonlinear System Identification: Learning while respecting physical models using a sequential Monte Carlo method

Authors: Anna Wigren, Johan Wågberg, Fredrik Lindsten, Adrian Wills, Thomas B. Schön

Abstract: Identification of nonlinear systems is a challenging problem. Physical knowledge of the system can be used in the identification process to significantly improve the predictive performance by restricting the space of possible mappings from the input to the output. Typically, the physical models contain unknown parameters that must be learned from data. Classical methods often restrict the possible… ▽ More Identification of nonlinear systems is a challenging problem. Physical knowledge of the system can be used in the identification process to significantly improve the predictive performance by restricting the space of possible mappings from the input to the output. Typically, the physical models contain unknown parameters that must be learned from data. Classical methods often restrict the possible models or have to resort to approximations of the model that introduce biases. Sequential Monte Carlo methods enable learning without introducing any bias for a more general class of models. In addition, they can also be used to approximate a posterior distribution of the model parameters in a Bayesian setting. This article provides a general introduction to sequential Monte Carlo and shows how it naturally fits in system identification by giving examples of specific algorithms. The methods are illustrated on two systems: a system with two cascaded water tanks with possible overflow in both tanks and a compartmental model for the spreading of a disease. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: 52 pages, 13 figures

Journal ref: IEEE Control Systems Magazine, Volume 42, Issue 1, pages 75 - 102, February 2022

arXiv:2205.12695 [pdf, other]

Surprises in adversarially-trained linear regression

Authors: Antônio H. Ribeiro, Dave Zachariah, Thomas B. Schön

Abstract: State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against such examples. It is formulated as a min-max problem, searching for the best solution when the training data was corrupted by the worst-case attacks. For linear regression problems, adversarial training can… ▽ More State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against such examples. It is formulated as a min-max problem, searching for the best solution when the training data was corrupted by the worst-case attacks. For linear regression problems, adversarial training can be formulated as a convex problem. We use this reformulation to make two technical contributions: First, we formulate the training problem as an instance of robust regression to reveal its connection to parameter-shrinking methods, specifically that $\ell_\infty$-adversarial training produces sparse solutions. Secondly, we study adversarial training in the overparameterized regime, i.e. when there are more parameters than data. We prove that adversarial training with small disturbances gives the solution with the minimum-norm that interpolates the training data. Ridge regression and lasso approximate such interpolating solutions as their regularization parameter vanishes. By contrast, for adversarial training, the transition into the interpolation regime is abrupt and for non-zero values of disturbance. This result is proved and illustrated with numerical examples. △ Less

Submitted 20 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

arXiv:2205.06306 [pdf, other]

doi 10.1109/TSP.2023.3245720

Probabilistic Estimation of Instantaneous Frequencies of Chirp Signals

Authors: Zheng Zhao, Simo Särkkä, Jens Sjölund, Thomas B. Schön

Abstract: We present a continuous-time probabilistic approach for estimating the chirp signal and its instantaneous frequency function when the true forms of these functions are not accessible. Our model represents these functions by non-linearly cascaded Gaussian processes represented as non-linear stochastic differential equations. The posterior distribution of the functions is then estimated with stochas… ▽ More We present a continuous-time probabilistic approach for estimating the chirp signal and its instantaneous frequency function when the true forms of these functions are not accessible. Our model represents these functions by non-linearly cascaded Gaussian processes represented as non-linear stochastic differential equations. The posterior distribution of the functions is then estimated with stochastic filters and smoothers. We compute a (posterior) Cramér--Rao lower bound for the Gaussian process model, and derive a theoretical upper bound for the estimation error in the mean squared sense. The experiments show that the proposed method outperforms a number of state-of-the-art methods on a synthetic data. We also show that the method works out-of-the-box for two real-world datasets. △ Less

Submitted 13 February, 2023; v1 submitted 12 May, 2022; originally announced May 2022.

Comments: Accepted for publication in IEEE Transactions on Signal Processing

arXiv:2204.06274 [pdf, other]

doi 10.1109/TSP.2023.3246228

Overparameterized Linear Regression under Adversarial Attacks

Authors: Antônio H. Ribeiro, Thomas B. Schön

Abstract: We study the error of linear regression in the face of adversarial attacks. In this framework, an adversary changes the input to the regression model in order to maximize the prediction error. We provide bounds on the prediction error in the presence of an adversary as a function of the parameter norm and the error in the absence of such an adversary. We show how these bounds make it possible to s… ▽ More We study the error of linear regression in the face of adversarial attacks. In this framework, an adversary changes the input to the regression model in order to maximize the prediction error. We provide bounds on the prediction error in the presence of an adversary as a function of the parameter norm and the error in the absence of such an adversary. We show how these bounds make it possible to study the adversarial error using analysis from non-adversarial setups. The obtained results shed light on the robustness of overparameterized linear models to adversarial attacks. Adding features might be either a source of additional robustness or brittleness. On the one hand, we use asymptotic results to illustrate how double-descent curves can be obtained for the adversarial error. On the other hand, we derive conditions under which the adversarial error can grow to infinity as more features are added, while at the same time, the test error goes to zero. We show this behavior is caused by the fact that the norm of the parameter vector grows with the number of features. It is also established that $\ell_\infty$ and $\ell_2$-adversarial attacks might behave fundamentally differently due to how the $\ell_1$ and $\ell_2$-norms of random projections concentrate. We also show how our reformulation allows for solving adversarial training as a convex optimization problem. This fact is then exploited to establish similarities between adversarial training and parameter-shrinking methods and to study how the training might affect the robustness of the estimated models. △ Less

Submitted 27 January, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

arXiv:2202.01793 [pdf, other]

Incorporating Sum Constraints into Multitask Gaussian Processes

Authors: Philipp Pilar, Carl Jidling, Thomas B. Schön, Niklas Wahlström

Abstract: Machine learning models can be improved by adapting them to respect existing background knowledge. In this paper we consider multitask Gaussian processes, with background knowledge in the form of constraints that require a specific sum of the outputs to be constant. This is achieved by conditioning the prior distribution on the constraint fulfillment. The approach allows for both linear and nonlin… ▽ More Machine learning models can be improved by adapting them to respect existing background knowledge. In this paper we consider multitask Gaussian processes, with background knowledge in the form of constraints that require a specific sum of the outputs to be constant. This is achieved by conditioning the prior distribution on the constraint fulfillment. The approach allows for both linear and nonlinear constraints. We demonstrate that the constraints are fulfilled with high precision and that the construction can improve the overall prediction accuracy as compared to the standard Gaussian process. △ Less

Submitted 1 February, 2023; v1 submitted 3 February, 2022; originally announced February 2022.

Journal ref: Transactions on Machine Learning Research, 2022

arXiv:2111.01409 [pdf, other]

doi 10.1109/TSP.2022.3187868

Efficient Learning of the Parameters of Non-Linear Models using Differentiable Resampling in Particle Filters

Authors: Conor Rosato, Vincent Beraud, Paul Horridge, Thomas B. Schön, Simon Maskell

Abstract: It has been widely documented that the sampling and resampling steps in particle filters cannot be differentiated. The {\itshape reparameterisation trick} was introduced to allow the sampling step to be reformulated into a differentiable function. We extend the {\itshape reparameterisation trick} to include the stochastic input to resampling therefore limiting the discontinuities in the gradient c… ▽ More It has been widely documented that the sampling and resampling steps in particle filters cannot be differentiated. The {\itshape reparameterisation trick} was introduced to allow the sampling step to be reformulated into a differentiable function. We extend the {\itshape reparameterisation trick} to include the stochastic input to resampling therefore limiting the discontinuities in the gradient calculation after this step. Knowing the gradients of the prior and likelihood allows us to run particle Markov Chain Monte Carlo (p-MCMC) and use the No-U-Turn Sampler (NUTS) as the proposal when estimating parameters. We compare the Metropolis-adjusted Langevin algorithm (MALA), Hamiltonian Monte Carlo with different number of steps and NUTS. We consider two state-space models and show that NUTS improves the mixing of the Markov chain and can produce more accurate results in less computational time. △ Less

Submitted 27 April, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

Comments: 35 pages, 10 figures

arXiv:2110.11948 [pdf, other]

Learning Proposals for Practical Energy-Based Regression

Authors: Fredrik K. Gustafsson, Martin Danelljan, Thomas B. Schön

Abstract: Energy-based models (EBMs) have experienced a resurgence within machine learning in recent years, including as a promising alternative for probabilistic regression. However, energy-based regression requires a proposal distribution to be manually designed for training, and an initial estimate has to be provided at test-time. We address both of these issues by introducing a conceptually simple metho… ▽ More Energy-based models (EBMs) have experienced a resurgence within machine learning in recent years, including as a promising alternative for probabilistic regression. However, energy-based regression requires a proposal distribution to be manually designed for training, and an initial estimate has to be provided at test-time. We address both of these issues by introducing a conceptually simple method to automatically learn an effective proposal distribution, which is parameterized by a separate network head. To this end, we derive a surprising result, leading to a unified training objective that jointly minimizes the KL divergence from the proposal to the EBM, and the negative log-likelihood of the EBM. At test-time, we can then employ importance sampling with the trained proposal to efficiently evaluate the learned EBM and produce stand-alone predictions. Furthermore, we utilize our derived training objective to learn mixture density networks (MDNs) with a jointly trained energy-based teacher, consistently outperforming conventional MDN training on four real-world regression tasks within computer vision. Code is available at https://github.com/fregu856/ebms_proposals. △ Less

Submitted 7 November, 2023; v1 submitted 22 October, 2021; originally announced October 2021.

Comments: AISTATS 2022. Code is available at https://github.com/fregu856/ebms_proposals

arXiv:2012.07269 [pdf, ps, other]

Variational State and Parameter Estimation

Authors: Jarrad Courts, Johannes Hendriks, Adrian Wills, Thomas Schön, Brett Ninness

Abstract: This paper considers the problem of computing Bayesian estimates of both states and model parameters for nonlinear state-space models. Generally, this problem does not have a tractable solution and approximations must be utilised. In this work, a variational approach is used to provide an assumed density which approximates the desired, intractable, distribution. The approach is deterministic and r… ▽ More This paper considers the problem of computing Bayesian estimates of both states and model parameters for nonlinear state-space models. Generally, this problem does not have a tractable solution and approximations must be utilised. In this work, a variational approach is used to provide an assumed density which approximates the desired, intractable, distribution. The approach is deterministic and results in an optimisation problem of a standard form. Due to the parametrisation of the assumed density selected first- and second-order derivatives are readily available which allows for efficient solutions. The proposed method is compared against state-of-the-art Hamiltonian Monte Carlo in two numerical examples. △ Less

Submitted 14 December, 2020; originally announced December 2020.

arXiv:2012.06341 [pdf, other]

Beyond Occam's Razor in System Identification: Double-Descent when Modeling Dynamics

Authors: Antônio H. Ribeiro, Johannes N. Hendriks, Adrian G. Wills, Thomas B. Schön

Abstract: System identification aims to build models of dynamical systems from data. Traditionally, choosing the model requires the designer to balance between two goals of conflicting nature; the model must be rich enough to capture the system dynamics, but not so flexible that it learns spurious random effects from the dataset. It is typically observed that the model validation performance follows a U-sha… ▽ More System identification aims to build models of dynamical systems from data. Traditionally, choosing the model requires the designer to balance between two goals of conflicting nature; the model must be rich enough to capture the system dynamics, but not so flexible that it learns spurious random effects from the dataset. It is typically observed that the model validation performance follows a U-shaped curve as the model complexity increases. Recent developments in machine learning and statistics, however, have observed situations where a "double-descent" curve subsumes this U-shaped model-performance curve. With a second decrease in performance occurring beyond the point where the model has reached the capacity of interpolating - i.e., (near) perfectly fitting - the training data. To the best of our knowledge, such phenomena have not been studied within the context of dynamic systems. The present paper aims to answer the question: "Can such a phenomenon also be observed when estimating parameters of dynamic systems?" We show that the answer is yes, verifying such behavior experimentally both for artificially generated and real-world datasets. △ Less

Submitted 6 August, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: To appear in the Proceedings of the 19th IFAC Symposium in System Identification (2021)

arXiv:2012.05072 [pdf, ps, other]

Variational System Identification for Nonlinear State-Space Models

Authors: Jarrad Courts, Adrian Wills, Thomas Schön, Brett Ninness

Abstract: This paper considers parameter estimation for nonlinear state-space models, which is an important but challenging problem. We address this challenge by employing a variational inference (VI) approach, which is a principled method that has deep connections to maximum likelihood estimation. This VI approach ultimately provides estimates of the model as solutions to an optimisation problem, which is… ▽ More This paper considers parameter estimation for nonlinear state-space models, which is an important but challenging problem. We address this challenge by employing a variational inference (VI) approach, which is a principled method that has deep connections to maximum likelihood estimation. This VI approach ultimately provides estimates of the model as solutions to an optimisation problem, which is deterministic, tractable and can be solved using standard optimisation tools. A specialisation of this approach for systems with additive Gaussian noise is also detailed. The proposed method is examined numerically on a range of simulated and real examples focusing on the robustness to parameter initialisation; additionally, favourable comparisons are performed against state-of-the-art alternatives. △ Less

Submitted 14 September, 2022; v1 submitted 8 December, 2020; originally announced December 2020.

arXiv:2012.04634 [pdf, other]

Accurate 3D Object Detection using Energy-Based Models

Authors: Fredrik K. Gustafsson, Martin Danelljan, Thomas B. Schön

Abstract: Accurate 3D object detection (3DOD) is crucial for safe navigation of complex environments by autonomous robots. Regressing accurate 3D bounding boxes in cluttered environments based on sparse LiDAR data is however a highly challenging problem. We address this task by exploring recent advances in conditional energy-based models (EBMs) for probabilistic regression. While methods employing EBMs for… ▽ More Accurate 3D object detection (3DOD) is crucial for safe navigation of complex environments by autonomous robots. Regressing accurate 3D bounding boxes in cluttered environments based on sparse LiDAR data is however a highly challenging problem. We address this task by exploring recent advances in conditional energy-based models (EBMs) for probabilistic regression. While methods employing EBMs for regression have demonstrated impressive performance on 2D object detection in images, these techniques are not directly applicable to 3D bounding boxes. In this work, we therefore design a differentiable pooling operator for 3D bounding boxes, serving as the core module of our EBM network. We further integrate this general approach into the state-of-the-art 3D object detector SA-SSD. On the KITTI dataset, our proposed approach consistently outperforms the SA-SSD baseline across all 3DOD metrics, demonstrating the potential of EBM-based regression for highly accurate 3DOD. Code is available at https://github.com/fregu856/ebms_3dod. △ Less

Submitted 7 November, 2023; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: CVPR Workshops 2021. Code is available at https://github.com/fregu856/ebms_3dod

arXiv:2005.01698 [pdf, other]

How to Train Your Energy-Based Model for Regression

Authors: Fredrik K. Gustafsson, Martin Danelljan, Radu Timofte, Thomas B. Schön

Abstract: Energy-based models (EBMs) have become increasingly popular within computer vision in recent years. While they are commonly employed for generative image modeling, recent work has applied EBMs also for regression tasks, achieving state-of-the-art performance on object detection and visual tracking. Training EBMs is however known to be challenging. While a variety of different techniques have been… ▽ More Energy-based models (EBMs) have become increasingly popular within computer vision in recent years. While they are commonly employed for generative image modeling, recent work has applied EBMs also for regression tasks, achieving state-of-the-art performance on object detection and visual tracking. Training EBMs is however known to be challenging. While a variety of different techniques have been explored for generative modeling, the application of EBMs to regression is not a well-studied problem. How EBMs should be trained for best possible regression performance is thus currently unclear. We therefore accept the task of providing the first detailed study of this problem. To that end, we propose a simple yet highly effective extension of noise contrastive estimation, and carefully compare its performance to six popular methods from literature on the tasks of 1D regression and object detection. The results of this comparison suggest that our training method should be considered the go-to approach. We also apply our method to the visual tracking task, achieving state-of-the-art performance on five datasets. Notably, our tracker achieves 63.7% AUC on LaSOT and 78.7% Success on TrackingNet. Code is available at https://github.com/fregu856/ebms_regression. △ Less

Submitted 14 August, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

Comments: BMVC 2020. Code is available at https://github.com/fregu856/ebms_regression

arXiv:2003.14162 [pdf, other]

Deep State Space Models for Nonlinear System Identification

Authors: Daniel Gedon, Niklas Wahlström, Thomas B. Schön, Lennart Ljung

Abstract: Deep state space models (SSMs) are an actively researched model class for temporal models developed in the deep learning community which have a close connection to classic SSMs. The use of deep SSMs as a black-box identification model can describe a wide range of dynamics due to the flexibility of deep neural networks. Additionally, the probabilistic nature of the model class allows the uncertaint… ▽ More Deep state space models (SSMs) are an actively researched model class for temporal models developed in the deep learning community which have a close connection to classic SSMs. The use of deep SSMs as a black-box identification model can describe a wide range of dynamics due to the flexibility of deep neural networks. Additionally, the probabilistic nature of the model class allows the uncertainty of the system to be modelled. In this work a deep SSM class and its parameter learning algorithm are explained in an effort to extend the toolbox of nonlinear identification methods with a deep learning based method. Six recent deep SSMs are evaluated in a first unified implementation on nonlinear system identification benchmarks. △ Less

Submitted 18 June, 2021; v1 submitted 31 March, 2020; originally announced March 2020.

arXiv:2003.07201 [pdf, ps, other]

The Elliptical Processes: a Family of Fat-tailed Stochastic Processes

Authors: Maria Bånkestad, Jens Sjölund, Jalil Taghia, Thomas Schön

Abstract: We present the elliptical processes -- a family of non-parametric probabilistic models that subsumes the Gaussian process and the Student-t process. This generalization includes a range of new fat-tailed behaviors yet retains computational tractability. We base the elliptical processes on a representation of elliptical distributions as a continuous mixture of Gaussian distributions and derive clos… ▽ More We present the elliptical processes -- a family of non-parametric probabilistic models that subsumes the Gaussian process and the Student-t process. This generalization includes a range of new fat-tailed behaviors yet retains computational tractability. We base the elliptical processes on a representation of elliptical distributions as a continuous mixture of Gaussian distributions and derive closed-form expressions for the marginal and conditional distributions. We perform numerical experiments on robust regression using an elliptical process defined by a piecewise constant mixing distribution, and show advantages compared with a Gaussian process. The elliptical processes may become a replacement for Gaussian processes in several settings, including when the likelihood is not Gaussian or when accurate tail modeling is critical. △ Less

Submitted 2 December, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

arXiv:2002.02620 [pdf, ps, other]

doi 10.1109/TSP.2021.3122296

Gaussian Variational State Estimation for Nonlinear State-Space Models

Authors: Jarrad Courts, Adrian Wills, Thomas B. Schön

Abstract: In this paper, the problem of state estimation, in the context of both filtering and smoothing, for nonlinear state-space models is considered. Due to the nonlinear nature of the models, the state estimation problem is generally intractable as it involves integrals of general nonlinear functions and the filtered and smoothed state distributions lack closed-form solutions. As such, it is common to… ▽ More In this paper, the problem of state estimation, in the context of both filtering and smoothing, for nonlinear state-space models is considered. Due to the nonlinear nature of the models, the state estimation problem is generally intractable as it involves integrals of general nonlinear functions and the filtered and smoothed state distributions lack closed-form solutions. As such, it is common to approximate the state estimation problem. In this paper, we develop an assumed Gaussian solution based on variational inference, which offers the key advantage of a flexible, but principled, mechanism for approximating the required distributions. Our main contribution lies in a new formulation of the state estimation problem as an optimisation problem, which can then be solved using standard optimisation routines that employ exact first- and second-order derivatives. The resulting state estimation approach involves a minimal number of assumptions and applies directly to nonlinear systems with both Gaussian and non-Gaussian probabilistic models. The performance of our approach is demonstrated on several examples; a challenging scalar system, a model of a simple robotic system, and a target tracking problem using a von Mises-Fisher distribution and outperforms alternative assumed Gaussian approaches to state estimation. △ Less

Submitted 1 October, 2021; v1 submitted 6 February, 2020; originally announced February 2020.

arXiv:2002.01600 [pdf, other]

Linearly Constrained Neural Networks

Authors: Johannes Hendriks, Carl Jidling, Adrian Wills, Thomas Schön

Abstract: We present a novel approach to modelling and learning vector fields from physical systems using neural networks that explicitly satisfy known linear operator constraints. To achieve this, the target function is modelled as a linear transformation of an underlying potential field, which is in turn modelled by a neural network. This transformation is chosen such that any prediction of the target fun… ▽ More We present a novel approach to modelling and learning vector fields from physical systems using neural networks that explicitly satisfy known linear operator constraints. To achieve this, the target function is modelled as a linear transformation of an underlying potential field, which is in turn modelled by a neural network. This transformation is chosen such that any prediction of the target function is guaranteed to satisfy the constraints. The approach is demonstrated on both simulated and real data examples. △ Less

Submitted 27 April, 2021; v1 submitted 4 February, 2020; originally announced February 2020.

arXiv:1910.09527 [pdf, ps, other]

Particle filter with rejection control and unbiased estimator of the marginal likelihood

Authors: Jan Kudlicka, Lawrence M. Murray, Thomas B. Schön, Fredrik Lindsten

Abstract: We consider the combined use of resampling and partial rejection control in sequential Monte Carlo methods, also known as particle filters. While the variance reducing properties of rejection control are known, there has not been (to the best of our knowledge) any work on unbiased estimation of the marginal likelihood (also known as the model evidence or the normalizing constant) in this type of p… ▽ More We consider the combined use of resampling and partial rejection control in sequential Monte Carlo methods, also known as particle filters. While the variance reducing properties of rejection control are known, there has not been (to the best of our knowledge) any work on unbiased estimation of the marginal likelihood (also known as the model evidence or the normalizing constant) in this type of particle filter. Being able to estimate the marginal likelihood without bias is highly relevant for model comparison, computation of interpretable and reliable confidence intervals, and in exact approximation methods, such as particle Markov chain Monte Carlo. In the paper we present a particle filter with rejection control that enables unbiased estimation of the marginal likelihood. △ Less

Submitted 4 March, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

arXiv:1909.12297 [pdf, other]

Energy-Based Models for Deep Probabilistic Regression

Authors: Fredrik K. Gustafsson, Martin Danelljan, Goutam Bhat, Thomas B. Schön

Abstract: While deep learning-based classification is generally tackled using standardized approaches, a wide variety of techniques are employed for regression. In computer vision, one particularly popular such technique is that of confidence-based regression, which entails predicting a confidence value for each input-target pair (x,y). While this approach has demonstrated impressive results, it requires im… ▽ More While deep learning-based classification is generally tackled using standardized approaches, a wide variety of techniques are employed for regression. In computer vision, one particularly popular such technique is that of confidence-based regression, which entails predicting a confidence value for each input-target pair (x,y). While this approach has demonstrated impressive results, it requires important task-dependent design choices, and the predicted confidences lack a natural probabilistic meaning. We address these issues by proposing a general and conceptually simple regression method with a clear probabilistic interpretation. In our proposed approach, we create an energy-based model of the conditional target density p(y|x), using a deep neural network to predict the un-normalized density from (x,y). This model of p(y|x) is trained by directly minimizing the associated negative log-likelihood, approximated using Monte Carlo sampling. We perform comprehensive experiments on four computer vision regression tasks. Our approach outperforms direct regression, as well as other probabilistic and confidence-based methods. Notably, our model achieves a 2.2% AP improvement over Faster-RCNN for object detection on the COCO dataset, and sets a new state-of-the-art on visual tracking when applied for bounding box estimation. In contrast to confidence-based methods, our approach is also shown to be directly applicable to more general tasks such as age and head-pose estimation. Code is available at https://github.com/fregu856/ebms_regression. △ Less

Submitted 19 July, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

Comments: ECCV 2020. Code is available at https://github.com/fregu856/ebms_regression

arXiv:1909.01844 [pdf, other]

Deep kernel learning for integral measurements

Authors: Carl Jidling, Johannes Hendriks, Thomas B. Schön, Adrian Wills

Abstract: Deep kernel learning refers to a Gaussian process that incorporates neural networks to improve the modelling of complex functions. We present a method that makes this approach feasible for problems where the data consists of line integral measurements of the target function. The performance is illustrated on computed tomography reconstruction examples. Deep kernel learning refers to a Gaussian process that incorporates neural networks to improve the modelling of complex functions. We present a method that makes this approach feasible for problems where the data consists of line integral measurements of the target function. The performance is illustrated on computed tomography reconstruction examples. △ Less

Submitted 4 September, 2019; originally announced September 2019.

arXiv:1909.01730 [pdf, other]

Deep Convolutional Networks in System Identification

Authors: Carl Andersson, Antônio H. Ribeiro, Koen Tiels, Niklas Wahlström, Thomas B. Schön

Abstract: Recent developments within deep learning are relevant for nonlinear system identification problems. In this paper, we establish connections between the deep learning and the system identification communities. It has recently been shown that convolutional architectures are at least as capable as recurrent architectures when it comes to sequence modeling tasks. Inspired by these results we explore t… ▽ More Recent developments within deep learning are relevant for nonlinear system identification problems. In this paper, we establish connections between the deep learning and the system identification communities. It has recently been shown that convolutional architectures are at least as capable as recurrent architectures when it comes to sequence modeling tasks. Inspired by these results we explore the explicit relationships between the recently proposed temporal convolutional network (TCN) and two classic system identification model structures; Volterra series and block-oriented models. We end the paper with an experimental study where we provide results on two real-world problems, the well-known Silverbox dataset and a newer dataset originating from ground vibration experiments on an F-16 fighter aircraft. △ Less

Submitted 19 November, 2019; v1 submitted 4 September, 2019; originally announced September 2019.

Comments: Accepted to Conference on Decision and Control, The first two authors contributed equally

arXiv:1909.01238 [pdf, other]

Stochastic quasi-Newton with line-search regularization

Authors: Adrian Wills, Thomas Schön

Abstract: In this paper we present a novel quasi-Newton algorithm for use in stochastic optimisation. Quasi-Newton methods have had an enormous impact on deterministic optimisation problems because they afford rapid convergence and computationally attractive algorithms. In essence, this is achieved by learning the second-order (Hessian) information based on observing first-order gradients. We extend these i… ▽ More In this paper we present a novel quasi-Newton algorithm for use in stochastic optimisation. Quasi-Newton methods have had an enormous impact on deterministic optimisation problems because they afford rapid convergence and computationally attractive algorithms. In essence, this is achieved by learning the second-order (Hessian) information based on observing first-order gradients. We extend these ideas to the stochastic setting by employing a highly flexible model for the Hessian and infer its value based on observing noisy gradients. In addition, we propose a stochastic counterpart to standard line-search procedures and demonstrate the utility of this combination on maximum likelihood identification for general nonlinear state space models. △ Less

Submitted 3 September, 2019; originally announced September 2019.

arXiv:1907.04615 [pdf, other]

Probabilistic programming for birth-death models of evolution using an alive particle filter with delayed sampling

Authors: Jan Kudlicka, Lawrence M. Murray, Fredrik Ronquist, Thomas B. Schön

Abstract: We consider probabilistic programming for birth-death models of evolution and introduce a new widely-applicable inference method that combines an extension of the alive particle filter (APF) with automatic Rao-Blackwellization via delayed sampling. Birth-death models of evolution are an important family of phylogenetic models of the diversification processes that lead to evolutionary trees. Probab… ▽ More We consider probabilistic programming for birth-death models of evolution and introduce a new widely-applicable inference method that combines an extension of the alive particle filter (APF) with automatic Rao-Blackwellization via delayed sampling. Birth-death models of evolution are an important family of phylogenetic models of the diversification processes that lead to evolutionary trees. Probabilistic programming languages (PPLs) give phylogeneticists a new and exciting tool: their models can be implemented as probabilistic programs with just a basic knowledge of programming. The general inference methods in PPLs reduce the need for external experts, allow quick prototyping and testing, and accelerate the development and deployment of new models. We show how these birth-death models can be implemented as simple programs in existing PPLs, and demonstrate the usefulness of the proposed inference method for such models. For the popular BiSSE model the method yields an increase of the effective sample size and the conditional acceptance rate by a factor of 30 in comparison with a standard bootstrap particle filter. Although concentrating on phylogenetics, the extended APF is a general inference method that shows its strength in situations where particles are often assigned zero weight. In the case when the weights are always positive, the extra cost of using the APF rather than the bootstrap particle filter is negligible, making our method a suitable drop-in replacement for the bootstrap particle filter in probabilistic programming inference. △ Less

Submitted 14 February, 2021; v1 submitted 10 July, 2019; originally announced July 2019.

Journal ref: Conference on Uncertainty in Artificial Intelligence (UAI) 2019

arXiv:1906.08482 [pdf, other]

Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness

Authors: Antônio H. Ribeiro, Koen Tiels, Luis A. Aguirre, Thomas B. Schön

Abstract: The exploding and vanishing gradient problem has been the major conceptual principle behind most architecture and training improvements in recurrent neural networks (RNNs) during the last decade. In this paper, we argue that this principle, while powerful, might need some refinement to explain recent developments. We refine the concept of exploding gradients by reformulating the problem in terms o… ▽ More The exploding and vanishing gradient problem has been the major conceptual principle behind most architecture and training improvements in recurrent neural networks (RNNs) during the last decade. In this paper, we argue that this principle, while powerful, might need some refinement to explain recent developments. We refine the concept of exploding gradients by reformulating the problem in terms of the cost function smoothness, which gives insight into higher-order derivatives and the existence of regions with many close local minima. We also clarify the distinction between vanishing gradients and the need for the RNN to learn attractors to fully use its expressive power. Through the lens of these refinements, we shed new light on recent developments in the RNN field, namely stable RNN and unitary (or orthogonal) RNNs. △ Less

Submitted 5 March, 2020; v1 submitted 20 June, 2019; originally announced June 2019.

Comments: To appear in the Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 2020. PMLR: Volume 108. This paper was previously titled "The trade-off between long-term memory and smoothness for recurrent networks". The current version subsumes all previous versions

arXiv:1906.01620 [pdf, other]

Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision

Authors: Fredrik K. Gustafsson, Martin Danelljan, Thomas B. Schön

Abstract: While deep neural networks have become the go-to approach in computer vision, the vast majority of these models fail to properly capture the uncertainty inherent in their predictions. Estimating this predictive uncertainty can be crucial, for example in automotive applications. In Bayesian deep learning, predictive uncertainty is commonly decomposed into the distinct types of aleatoric and epistem… ▽ More While deep neural networks have become the go-to approach in computer vision, the vast majority of these models fail to properly capture the uncertainty inherent in their predictions. Estimating this predictive uncertainty can be crucial, for example in automotive applications. In Bayesian deep learning, predictive uncertainty is commonly decomposed into the distinct types of aleatoric and epistemic uncertainty. The former can be estimated by letting a neural network output the parameters of a certain probability distribution. Epistemic uncertainty estimation is a more challenging problem, and while different scalable methods recently have emerged, no extensive comparison has been performed in a real-world setting. We therefore accept this task and propose a comprehensive evaluation framework for scalable epistemic uncertainty estimation methods in deep learning. Our proposed framework is specifically designed to test the robustness required in real-world computer vision applications. We also apply this framework to provide the first properly extensive and conclusive comparison of the two current state-of-the-art scalable methods: ensembling and MC-dropout. Our comparison demonstrates that ensembling consistently provides more reliable and practically useful uncertainty estimates. Code is available at https://github.com/fregu856/evaluating_bdl. △ Less

Submitted 7 April, 2020; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: CVPR Workshops 2020. Code is available at https://github.com/fregu856/evaluating_bdl

arXiv:1906.01584 [pdf, other]

Robust exploration in linear quadratic reinforcement learning

Authors: Jack Umenberger, Mina Ferizbegovic, Thomas B. Schön, Håkan Hjalmarsson

Abstract: This paper concerns the problem of learning control policies for an unknown linear dynamical system to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task robustly: i.e., we minimize the worst-case cost, accounting for system uncertainty given the observed data. The method balances exploitation and exploration, exciting the system in s… ▽ More This paper concerns the problem of learning control policies for an unknown linear dynamical system to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task robustly: i.e., we minimize the worst-case cost, accounting for system uncertainty given the observed data. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both. △ Less

Submitted 4 June, 2019; originally announced June 2019.

arXiv:1905.06854 [pdf, other]

doi 10.1016/j.nimb.2019.07.005

Neutron Transmission Strain Tomography for Non-Constant Stress-Free Lattice Spacing

Authors: J. N. Hendriks, C. Jidling, T. B. Schön, A. Wills, C. M. Wensrich, E. H. Kisi

Abstract: Recently, several algorithms for strain tomography from energy-resolved neutron transmission measurements have been proposed. These methods assume that the stress-free lattice spacing $d_0$ is a known constant limiting their application to the study of stresses generated by manufacturing and loading methods that do not alter this parameter. In this paper, we consider the more general problem of jo… ▽ More Recently, several algorithms for strain tomography from energy-resolved neutron transmission measurements have been proposed. These methods assume that the stress-free lattice spacing $d_0$ is a known constant limiting their application to the study of stresses generated by manufacturing and loading methods that do not alter this parameter. In this paper, we consider the more general problem of jointly reconstructing the strain and $d_0$ fields. A method for solving this inherently non-linear problem is presented that ensures the estimated strain field satisfies equilibrium and can include knowledge of boundary conditions. This method is tested on a simulated data set with realistic noise levels, demonstrating that it is possible to jointly reconstruct $d_0$ and the strain field. △ Less

Submitted 18 July, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

Comments: Journal article, 14 pages, 5 figures

Journal ref: Nuclear instruments and methods in physics research section B, 456:64-73, 2019

arXiv:1904.01949 [pdf, other]

doi 10.1038/s41467-020-15432-4

Automatic diagnosis of the 12-lead ECG using a deep neural network

Authors: Antônio H. Ribeiro, Manoel Horta Ribeiro, Gabriela M. M. Paixão, Derick M. Oliveira, Paulo R. Gomes, Jéssica A. Canazart, Milton P. S. Ferreira, Carl R. Andersson, Peter W. Macfarlane, Wagner Meira Jr., Thomas B. Schön, Antonio Luiz P. Ribeiro

Abstract: The role of automatic electrocardiogram (ECG) analysis in clinical practice is limited by the accuracy of existing models. Deep Neural Networks (DNNs) are models composed of stacked transformations that learn tasks by examples. This technology has recently achieved striking success in a variety of task and there are great expectations on how it might improve clinical practice. Here we present a DN… ▽ More The role of automatic electrocardiogram (ECG) analysis in clinical practice is limited by the accuracy of existing models. Deep Neural Networks (DNNs) are models composed of stacked transformations that learn tasks by examples. This technology has recently achieved striking success in a variety of task and there are great expectations on how it might improve clinical practice. Here we present a DNN model trained in a dataset with more than 2 million labeled exams analyzed by the Telehealth Network of Minas Gerais and collected under the scope of the CODE (Clinical Outcomes in Digital Electrocardiology) study. The DNN outperform cardiology resident medical doctors in recognizing 6 types of abnormalities in 12-lead ECG recordings, with F1 scores above 80% and specificity over 99%. These results indicate ECG analysis based on DNNs, previously studied in a single-lead setup, generalizes well to 12-lead exams, taking the technology closer to the standard clinical practice. △ Less

Submitted 14 April, 2020; v1 submitted 1 April, 2019; originally announced April 2019.

Comments: A preliminary version of this work titled: "Automatic Diagnosis of Short-Duration 12-Lead ECG using a Deep Convolutional Network " was presented in the Machine Learning for Health Workshop at NeurIPS 2018 and was made available under a different identifier: arXiv:1811.12194. The current version subsumes all previous versions

Journal ref: Nature Communications 11, article number: 1760 (2020)

arXiv:1903.04797 [pdf, other]

Elements of Sequential Monte Carlo

Authors: Christian A. Naesseth, Fredrik Lindsten, Thomas B. Schön

Abstract: A core problem in statistics and probabilistic machine learning is to compute probability distributions and expectations. This is the fundamental problem of Bayesian statistics and machine learning, which frames all inference as expectations with respect to the posterior distribution. The key challenge is to approximate these intractable expectations. In this tutorial, we review sequential Monte C… ▽ More A core problem in statistics and probabilistic machine learning is to compute probability distributions and expectations. This is the fundamental problem of Bayesian statistics and machine learning, which frames all inference as expectations with respect to the posterior distribution. The key challenge is to approximate these intractable expectations. In this tutorial, we review sequential Monte Carlo (SMC), a random-sampling-based class of methods for approximate inference. First, we explain the basics of SMC, discuss practical issues, and review theoretical results. We then examine two of the main user design choices: the proposal distributions and the so called intermediate target distributions. We review recent results on how variational inference and amortization can be used to learn efficient proposals and target distributions. Next, we discuss the SMC estimate of the normalizing constant, how this can be used for pseudo-marginal inference and inference evaluation. Throughout the tutorial we illustrate the use of SMC on various models commonly used in machine learning, such as stochastic recurrent neural networks, probabilistic graphical models, and probabilistic programs. △ Less

Submitted 4 March, 2022; v1 submitted 12 March, 2019; originally announced March 2019.

Comments: Foundations and Trends in Machine Learning

arXiv:1903.02250 [pdf, other]

Nonlinear input design as optimal control of a Hamiltonian system

Authors: Jack Umenberger, Thomas B. Schön

Abstract: We propose an input design method for a general class of parametric probabilistic models, including nonlinear dynamical systems with process noise. The goal of the procedure is to select inputs such that the parameter posterior distribution concentrates about the true value of the parameters; however, exact computation of the posterior is intractable. By representing (samples from) the posterior a… ▽ More We propose an input design method for a general class of parametric probabilistic models, including nonlinear dynamical systems with process noise. The goal of the procedure is to select inputs such that the parameter posterior distribution concentrates about the true value of the parameters; however, exact computation of the posterior is intractable. By representing (samples from) the posterior as trajectories from a certain Hamiltonian system, we transform the input design task into an optimal control problem. The method is illustrated via numerical examples, including MRI pulse sequence design. △ Less

Submitted 6 March, 2019; originally announced March 2019.

arXiv:1902.06977 [pdf]

Evaluating model calibration in classification

Authors: Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, Thomas B. Schön

Abstract: Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their ability to represent uncertainty about predictions. In safety-critical applications, it is pivotal for a model to possess an adequate sense of uncertainty, which… ▽ More Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their ability to represent uncertainty about predictions. In safety-critical applications, it is pivotal for a model to possess an adequate sense of uncertainty, which for probabilistic classifiers translates into outputting probability distributions that are consistent with the empirical frequencies observed from realized outcomes. A classifier with such a property is called calibrated. In this work, we develop a general theoretical calibration evaluation framework grounded in probability theory, and point out subtleties present in model calibration evaluation that lead to refined interpretations of existing evaluation techniques. Lastly, we propose new ways to quantify and visualize miscalibration in probabilistic classification, including novel multidimensional reliability diagrams. △ Less

Submitted 19 February, 2019; originally announced February 2019.

arXiv:1902.04272 [pdf, other]

Towards Self-Supervised High Level Sensor Fusion

Authors: Qadeer Khan, Torsten Schön, Patrick Wenzel

Abstract: In this paper, we present a framework to control a self-driving car by fusing raw information from RGB images and depth maps. A deep neural network architecture is used for mapping the vision and depth information, respectively, to steering commands. This fusion of information from two sensor sources allows to provide redundancy and fault tolerance in the presence of sensor failures. Even if one o… ▽ More In this paper, we present a framework to control a self-driving car by fusing raw information from RGB images and depth maps. A deep neural network architecture is used for mapping the vision and depth information, respectively, to steering commands. This fusion of information from two sensor sources allows to provide redundancy and fault tolerance in the presence of sensor failures. Even if one of the input sensors fails to produce the correct output, the other functioning sensor would still be able to maneuver the car. Such redundancy is crucial in the critical application of self-driving cars. The experimental results have showed that our method is capable of learning to use the relevant sensor information even when one of the sensors fail without any explicit signal. △ Less

Submitted 12 February, 2019; originally announced February 2019.

arXiv:1902.03777 [pdf, other]

Semantic Label Reduction Techniques for Autonomous Driving

Authors: Qadeer Khan, Torsten Schön, Patrick Wenzel

Abstract: Semantic segmentation maps can be used as input to models for maneuvering the controls of a car. However, not all labels may be necessary for making the control decision. One would expect that certain labels such as road lanes or sidewalks would be more critical in comparison with labels for vegetation or buildings which may not have a direct influence on the car's driving decision. In this append… ▽ More Semantic segmentation maps can be used as input to models for maneuvering the controls of a car. However, not all labels may be necessary for making the control decision. One would expect that certain labels such as road lanes or sidewalks would be more critical in comparison with labels for vegetation or buildings which may not have a direct influence on the car's driving decision. In this appendix, we evaluate and quantify how sensitive and important the different semantic labels are for controlling the car. Labels that do not influence the driving decision are remapped to other classes, thereby simplifying the task by reducing to only labels critical for driving of the vehicle. △ Less

Submitted 11 February, 2019; originally announced February 2019.

arXiv:1902.03765 [pdf, other]

Latent Space Reinforcement Learning for Steering Angle Prediction

Authors: Qadeer Khan, Torsten Schön, Patrick Wenzel

Abstract: Model-free reinforcement learning has recently been shown to successfully learn navigation policies from raw sensor data. In this work, we address the problem of learning driving policies for an autonomous agent in a high-fidelity simulator. Building upon recent research that applies deep reinforcement learning to navigation problems, we present a modular deep reinforcement learning approach to pr… ▽ More Model-free reinforcement learning has recently been shown to successfully learn navigation policies from raw sensor data. In this work, we address the problem of learning driving policies for an autonomous agent in a high-fidelity simulator. Building upon recent research that applies deep reinforcement learning to navigation problems, we present a modular deep reinforcement learning approach to predict the steering angle of the car from raw images. The first module extracts a low-dimensional latent semantic representation of the image. The control module trained with reinforcement learning takes the latent vector as input to predict the correct steering angle. The experimental results have showed that our method is capable of learning to maneuver the car without any human control signals. △ Less

Submitted 11 February, 2019; originally announced February 2019.

arXiv:1902.01182 [pdf, other]

Constructing the Matrix Multilayer Perceptron and its Application to the VAE

Authors: Jalil Taghia, Maria Bånkestad, Fredrik Lindsten, Thomas B. Schön

Abstract: Like most learning algorithms, the multilayer perceptrons (MLP) is designed to learn a vector of parameters from data. However, in certain scenarios we are interested in learning structured parameters (predictions) in the form of symmetric positive definite matrices. Here, we introduce a variant of the MLP, referred to as the matrix MLP, that is specialized at learning symmetric positive definite… ▽ More Like most learning algorithms, the multilayer perceptrons (MLP) is designed to learn a vector of parameters from data. However, in certain scenarios we are interested in learning structured parameters (predictions) in the form of symmetric positive definite matrices. Here, we introduce a variant of the MLP, referred to as the matrix MLP, that is specialized at learning symmetric positive definite matrices. We also present an application of the model within the context of the variational autoencoder (VAE). Our formulation of the VAE extends the vanilla formulation to the cases where the recognition and the generative networks can be from the parametric family of distributions with dense covariance matrices. Two specific examples are discussed in more detail: the dense covariance Gaussian and its generalization, the power exponential distribution. Our new developments are illustrated using both synthetic and real data. △ Less

Submitted 4 February, 2019; originally announced February 2019.

arXiv:1901.09919 [pdf, other]

Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding

Authors: Muhammad Osama, Dave Zachariah, Thomas B. Schön

Abstract: We address the problem of inferring the causal effect of an exposure on an outcome across space, using observational data. The data is possibly subject to unmeasured confounding variables which, in a standard approach, must be adjusted for by estimating a nuisance function. Here we develop a method that eliminates the nuisance function, while mitigating the resulting errors-in-variables. The resul… ▽ More We address the problem of inferring the causal effect of an exposure on an outcome across space, using observational data. The data is possibly subject to unmeasured confounding variables which, in a standard approach, must be adjusted for by estimating a nuisance function. Here we develop a method that eliminates the nuisance function, while mitigating the resulting errors-in-variables. The result is a robust and accurate inference method for spatially varying heterogeneous causal effects. The properties of the method are demonstrated on synthetic as well as real data from Germany and the US. △ Less

Submitted 3 June, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

Comments: 10 pages, 10 figures

arXiv:1812.07319 [pdf, other]

Evaluating the squared-exponential covariance function in Gaussian processes with integral observations

Authors: J. N. Hendriks, C. Jidling, A. Wills, T. B. Schön

Abstract: This paper deals with the evaluation of double line integrals of the squared exponential covariance function. We propose a new approach in which the double integral is reduced to a single integral using the error function. This single integral is then computed with efficiently implemented numerical techniques. The performance is compared against existing state of the art methods and the results sh… ▽ More This paper deals with the evaluation of double line integrals of the squared exponential covariance function. We propose a new approach in which the double integral is reduced to a single integral using the error function. This single integral is then computed with efficiently implemented numerical techniques. The performance is compared against existing state of the art methods and the results show superior properties in numerical robustness and accuracy per computation time. △ Less

Submitted 18 December, 2018; originally announced December 2018.

arXiv:1811.12194 [pdf, other]

Automatic Diagnosis of Short-Duration 12-Lead ECG using a Deep Convolutional Network

Authors: Antônio H. Ribeiro, Manoel Horta Ribeiro, Gabriela Paixão, Derick Oliveira, Paulo R. Gomes, Jéssica A. Canazart, Milton Pifano, Wagner Meira Jr., Thomas B. Schön, Antonio Luiz Ribeiro

Abstract: We present a model for predicting electrocardiogram (ECG) abnormalities in short-duration 12-lead ECG signals which outperformed medical doctors on the 4th year of their cardiology residency. Such exams can provide a full evaluation of heart activity and have not been studied in previous end-to-end machine learning papers. Using the database of a large telehealth network, we built a novel dataset… ▽ More We present a model for predicting electrocardiogram (ECG) abnormalities in short-duration 12-lead ECG signals which outperformed medical doctors on the 4th year of their cardiology residency. Such exams can provide a full evaluation of heart activity and have not been studied in previous end-to-end machine learning papers. Using the database of a large telehealth network, we built a novel dataset with more than 2 million ECG tracings, orders of magnitude larger than those used in previous studies. Moreover, our dataset is more realistic, as it consist of 12-lead ECGs recorded during standard in-clinics exams. Using this data, we trained a residual neural network with 9 convolutional layers to map 7 to 10 second ECG signals to 6 classes of ECG abnormalities. Future work should extend these results to cover a large range of ECG abnormalities, which could improve the accessibility of this diagnostic tool and avoid wrong diagnosis from medical doctors. △ Less

Submitted 17 February, 2019; v1 submitted 28 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/82

arXiv:1811.10947 [pdf, other]

Reliable Semi-Supervised Learning when Labels are Missing at Random

Authors: Xiuming Liu, Dave Zachariah, Johan Wågberg, Thomas B. Schön

Abstract: Semi-supervised learning methods are motivated by the availability of large datasets with unlabeled features in addition to labeled data. Unlabeled data is, however, not guaranteed to improve classification performance and has in fact been reported to impair the performance in certain cases. A fundamental source of error arises from restrictive assumptions about the unlabeled features, which resul… ▽ More Semi-supervised learning methods are motivated by the availability of large datasets with unlabeled features in addition to labeled data. Unlabeled data is, however, not guaranteed to improve classification performance and has in fact been reported to impair the performance in certain cases. A fundamental source of error arises from restrictive assumptions about the unlabeled features, which result in unreliable classifiers that underestimate their prediction error probabilities. In this paper, we develop a semi-supervised learning approach that relaxes such assumptions and is capable of providing classifiers that reliably quantify the label uncertainty. The approach is applicable using any generative model with a supervised learning algorithm. We illustrate the approach using both handwritten digit and cloth classification data where the labels are missing at random. △ Less

Submitted 24 October, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

arXiv:1810.01539 [pdf, other]

Automated learning with a probabilistic programming language: Birch

Authors: Lawrence M. Murray, Thomas B. Schön

Abstract: This work offers a broad perspective on probabilistic modeling and inference in light of recent advances in probabilistic programming, in which models are formally expressed in Turing-complete programming languages. We consider a typical workflow and how probabilistic programming languages can help to automate this workflow, especially in the matching of models with inference methods. We focus on… ▽ More This work offers a broad perspective on probabilistic modeling and inference in light of recent advances in probabilistic programming, in which models are formally expressed in Turing-complete programming languages. We consider a typical workflow and how probabilistic programming languages can help to automate this workflow, especially in the matching of models with inference methods. We focus on two properties of a model that are critical in this matching: its structure---the conditional dependencies between random variables---and its form---the precise mathematical definition of those dependencies. While the structure and form of a probabilistic model are often fixed a priori, it is a curiosity of probabilistic programming that they need not be, and may instead vary according to random choices made during program execution. We introduce a formal description of models expressed as programs, and discuss some of the ways in which probabilistic programming languages can reveal the structure and form of these, in order to tailor inference methods. We demonstrate the ideas with a new probabilistic programming language called Birch, with a multiple object tracking example. △ Less

Submitted 16 April, 2020; v1 submitted 2 October, 2018; originally announced October 2018.

arXiv:1810.01269 [pdf, other]

A fast quasi-Newton-type method for large-scale stochastic optimisation

Authors: Adrian Wills, Carl Jidling, Thomas Schon

Abstract: During recent years there has been an increased interest in stochastic adaptations of limited memory quasi-Newton methods, which compared to pure gradient-based routines can improve the convergence by incorporating second order information. In this work we propose a direct least-squares approach conceptually similar to the limited memory quasi-Newton methods, but that computes the search direction… ▽ More During recent years there has been an increased interest in stochastic adaptations of limited memory quasi-Newton methods, which compared to pure gradient-based routines can improve the convergence by incorporating second order information. In this work we propose a direct least-squares approach conceptually similar to the limited memory quasi-Newton methods, but that computes the search direction in a slightly different way. This is achieved in a fast and numerically robust manner by maintaining a Cholesky factor of low dimension. This is combined with a stochastic line search relying upon fulfilment of the Wolfe condition in a backtracking manner, where the step length is adaptively modified with respect to the optimisation progress. We support our new algorithm by providing several theoretical results guaranteeing its performance. The performance is demonstrated on real-world benchmark problems which shows improved results in comparison with already established methods. △ Less

Submitted 29 September, 2018; originally announced October 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1802.04310

arXiv:1809.03779 [pdf, other]

doi 10.1088/1361-6420/ab2e2a

Probabilistic approach to limited-data computed tomography reconstruction

Authors: Zenith Purisha, Carl Jidling, Niklas Wahlström, Simo Särkkä, Thomas B. Schön

Abstract: In this work, we consider the inverse problem of reconstructing the internal structure of an object from limited x-ray projections. We use a Gaussian process prior to model the target function and estimate its (hyper)parameters from measured data. In contrast to other established methods, this comes with the advantage of not requiring any manual parameter tuning, which usually arises in classical… ▽ More In this work, we consider the inverse problem of reconstructing the internal structure of an object from limited x-ray projections. We use a Gaussian process prior to model the target function and estimate its (hyper)parameters from measured data. In contrast to other established methods, this comes with the advantage of not requiring any manual parameter tuning, which usually arises in classical regularization strategies. Our method uses a basis function expansion technique for the Gaussian process which significantly reduces the computational complexity and avoids the need for numerical integration. The approach also allows for reformulation of come classical regularization methods as Laplacian and Tikhonov regularization as Gaussian process regression, and hence provides an efficient algorithm and principled means for their parameter tuning. Results from simulated and real data indicate that this approach is less sensitive to streak artifacts as compared to the commonly used method of filtered backprojection. △ Less

Submitted 3 July, 2019; v1 submitted 11 September, 2018; originally announced September 2018.

arXiv:1808.05889 [pdf, ps, other]

Data Consistency Approach to Model Validation

Authors: Andreas Svensson, Dave Zachariah, Petre Stoica, Thomas B. Schön

Abstract: In scientific inference problems, the underlying statistical modeling assumptions have a crucial impact on the end results. There exist, however, only a few automatic means for validating these fundamental modelling assumptions. The contribution in this paper is a general criterion to evaluate the consistency of a set of statistical models with respect to observed data. This is achieved by automat… ▽ More In scientific inference problems, the underlying statistical modeling assumptions have a crucial impact on the end results. There exist, however, only a few automatic means for validating these fundamental modelling assumptions. The contribution in this paper is a general criterion to evaluate the consistency of a set of statistical models with respect to observed data. This is achieved by automatically gauging the models' ability to generate data that is similar to the observed data. Importantly, the criterion follows from the model class itself and is therefore directly applicable to a broad range of inference problems with varying data types, ranging from independent univariate data to high-dimensional time-series. The proposed data consistency criterion is illustrated, evaluated and compared to several well-established methods using three synthetic and two real data sets. △ Less

Submitted 20 May, 2019; v1 submitted 17 August, 2018; originally announced August 2018.

Journal ref: IEEE Access, 7(1):59788-59796, 2019

arXiv:1806.00319 [pdf, other]

Learning convex bounds for linear quadratic control policy synthesis

Authors: Jack Umenberger, Thomas B. Schön

Abstract: Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a number of fields, from artificial intelligence and robotics, to medicine and finance. This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function. We present a method to optimize the expected value… ▽ More Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a number of fields, from artificial intelligence and robotics, to medicine and finance. This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function. We present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data. The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees. Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both. △ Less

Submitted 1 June, 2018; originally announced June 2018.

arXiv:1802.09086 [pdf, other]

Conditionally Independent Multiresolution Gaussian Processes

Authors: Jalil Taghia, Thomas B. Schön

Abstract: The multiresolution Gaussian process (GP) has gained increasing attention as a viable approach towards improving the quality of approximations in GPs that scale well to large-scale data. Most of the current constructions assume full independence across resolutions. This assumption simplifies the inference, but it underestimates the uncertainties in transitioning from one resolution to another. Thi… ▽ More The multiresolution Gaussian process (GP) has gained increasing attention as a viable approach towards improving the quality of approximations in GPs that scale well to large-scale data. Most of the current constructions assume full independence across resolutions. This assumption simplifies the inference, but it underestimates the uncertainties in transitioning from one resolution to another. This in turn results in models which are prone to overfitting in the sense of excessive sensitivity to the chosen resolution, and predictions which are non-smooth at the boundaries. Our contribution is a new construction which instead assumes conditional independence among GPs across resolutions. We show that relaxing the full independence assumption enables robustness against overfitting, and that it delivers predictions that are smooth at the boundaries. Our new model is compared against current state of the art on 2 synthetic and 9 real-world datasets. In most cases, our new conditionally independent construction performed favorably when compared against models based on the full independence assumption. In particular, it exhibits little to no signs of overfitting. △ Less

Submitted 24 February, 2019; v1 submitted 25 February, 2018; originally announced February 2018.

arXiv:1802.04310 [pdf, other]

Stochastic quasi-Newton with adaptive step lengths for large-scale problems

Authors: Adrian Wills, Thomas Schön

Abstract: We provide a numerically robust and fast method capable of exploiting the local geometry when solving large-scale stochastic optimisation problems. Our key innovation is an auxiliary variable construction coupled with an inverse Hessian approximation computed using a receding history of iterates and gradients. It is the Markov chain nature of the classic stochastic gradient algorithm that enables… ▽ More We provide a numerically robust and fast method capable of exploiting the local geometry when solving large-scale stochastic optimisation problems. Our key innovation is an auxiliary variable construction coupled with an inverse Hessian approximation computed using a receding history of iterates and gradients. It is the Markov chain nature of the classic stochastic gradient algorithm that enables this development. The construction offers a mechanism for stochastic line search adapting the step length. We numerically evaluate and compare against current state-of-the-art with encouraging performance on real-world benchmark problems where the number of observations and unknowns is in the order of millions. △ Less

Submitted 12 February, 2018; originally announced February 2018.

Showing 1–50 of 97 results for author: Schön, T