Search | arXiv e-print repository

A survey and benchmark of high-dimensional Bayesian optimization of discrete sequences

Authors: Miguel González-Duque, Richard Michael, Simon Bartels, Yevgen Zainchkovskyy, Søren Hauberg, Wouter Boomsma

Abstract: Optimizing discrete black-box functions is key in several domains, e.g. protein engineering and drug design. Due to the lack of gradient information and the need for sample efficiency, Bayesian optimization is an ideal candidate for these tasks. Several methods for high-dimensional continuous and categorical Bayesian optimization have been proposed recently. However, our survey of the field reveal… ▽ More Optimizing discrete black-box functions is key in several domains, e.g. protein engineering and drug design. Due to the lack of gradient information and the need for sample efficiency, Bayesian optimization is an ideal candidate for these tasks. Several methods for high-dimensional continuous and categorical Bayesian optimization have been proposed recently. However, our survey of the field reveals highly heterogeneous experimental set-ups across methods and technical barriers for the replicability and application of published algorithms to real-world tasks. To address these issues, we develop a unified framework to test a vast array of high-dimensional Bayesian optimization methods and a collection of standardized black-box functions representing real-world application domains in chemistry and biology. These two components of the benchmark are each supported by flexible, scalable, and easily extendable software libraries (poli and poli-baselines), allowing practitioners to readily incorporate new optimization objectives or discrete optimizers. Project website: https://machinelearninglifescience.github.io/hdbo_benchmark △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.03334 [pdf, other]

Reparameterization invariance in approximate Bayesian inference

Authors: Hrittik Roy, Marco Miani, Carl Henrik Ek, Philipp Hennig, Marvin Pförtner, Lukas Tatzel, Søren Hauberg

Abstract: Current approximate posteriors in Bayesian neural networks (BNNs) exhibit a crucial limitation: they fail to maintain invariance under reparameterization, i.e. BNNs assign different posterior densities to different parametrizations of identical functions. This creates a fundamental flaw in the application of Bayesian principles as it breaks the correspondence between uncertainty over the parameter… ▽ More Current approximate posteriors in Bayesian neural networks (BNNs) exhibit a crucial limitation: they fail to maintain invariance under reparameterization, i.e. BNNs assign different posterior densities to different parametrizations of identical functions. This creates a fundamental flaw in the application of Bayesian principles as it breaks the correspondence between uncertainty over the parameters with uncertainty over the parametrized function. In this paper, we investigate this issue in the context of the increasingly popular linearized Laplace approximation. Specifically, it has been observed that linearized predictives alleviate the common underfitting problems of the Laplace approximation. We develop a new geometric view of reparametrizations from which we explain the success of linearization. Moreover, we demonstrate that these reparameterization invariance properties can be extended to the original neural network predictive using a Riemannian diffusion process giving a straightforward algorithm for approximate posterior sampling, which empirically improves posterior fit. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.17277 [pdf, other]

Gradients of Functions of Large Matrices

Authors: Nicholas Krämer, Pablo Moreno-Muñoz, Hrittik Roy, Søren Hauberg

Abstract: Tuning scientific and probabilistic machine learning models -- for example, partial differential equations, Gaussian processes, or Bayesian neural networks -- often relies on evaluating functions of matrices whose size grows with the data set or the number of parameters. While the state-of-the-art for evaluating these quantities is almost always based on Lanczos and Arnoldi iterations, the present… ▽ More Tuning scientific and probabilistic machine learning models -- for example, partial differential equations, Gaussian processes, or Bayesian neural networks -- often relies on evaluating functions of matrices whose size grows with the data set or the number of parameters. While the state-of-the-art for evaluating these quantities is almost always based on Lanczos and Arnoldi iterations, the present work is the first to explain how to differentiate these workhorses of numerical linear algebra efficiently. To get there, we derive previously unknown adjoint systems for Lanczos and Arnoldi iterations, implement them in JAX, and show that the resulting code can compete with Diffrax when it comes to differentiating PDEs, GPyTorch for selecting Gaussian process models and beats standard factorisation methods for calibrating Bayesian neural networks. All this is achieved without any problem-specific code optimisation. Find the code at https://github.com/pnkraemer/experiments-lanczos-adjoints and install the library with pip install matfree. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2404.17452 [pdf, other]

A Continuous Relaxation for Discrete Bayesian Optimization

Authors: Richard Michael, Simon Bartels, Miguel González-Duque, Yevgen Zainchkovskyy, Jes Frellsen, Søren Hauberg, Wouter Boomsma

Abstract: To optimize efficiently over discrete data and with only few available target observations is a challenge in Bayesian optimization. We propose a continuous relaxation of the objective function and show that inference and optimization can be computationally tractable. We consider in particular the optimization domain where very few observations and strict budgets exist; motivated by optimizing prot… ▽ More To optimize efficiently over discrete data and with only few available target observations is a challenge in Bayesian optimization. We propose a continuous relaxation of the objective function and show that inference and optimization can be computationally tractable. We consider in particular the optimization domain where very few observations and strict budgets exist; motivated by optimizing protein sequences for expensive to evaluate bio-chemical properties. The advantages of our approach are two-fold: the problem is treated in the continuous setting, and available prior knowledge over sequences can be incorporated directly. More specifically, we utilize available and learned distributions over the problem domain for a weighting of the Hellinger distance which yields a covariance function. We show that the resulting acquisition function can be optimized with both continuous or discrete optimization algorithms and empirically assess our method on two bio-chemical sequence optimization tasks. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2403.01666 [pdf, other]

Improving Adversarial Energy-Based Model via Diffusion Process

Authors: Cong Geng, Tian Han, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Søren Hauberg, Bo Li

Abstract: Generative models have shown strong generation ability while efficient likelihood estimation is less explored. Energy-based models~(EBMs) define a flexible energy function to parameterize unnormalized densities efficiently but are notorious for being difficult to train. Adversarial EBMs introduce a generator to form a minimax training game to avoid expensive MCMC sampling used in traditional EBMs,… ▽ More Generative models have shown strong generation ability while efficient likelihood estimation is less explored. Energy-based models~(EBMs) define a flexible energy function to parameterize unnormalized densities efficiently but are notorious for being difficult to train. Adversarial EBMs introduce a generator to form a minimax training game to avoid expensive MCMC sampling used in traditional EBMs, but a noticeable gap between adversarial EBMs and other strong generative models still exists. Inspired by diffusion-based models, we embedded EBMs into each denoising step to split a long-generated process into several smaller steps. Besides, we employ a symmetric Jeffrey divergence and introduce a variational posterior distribution for the generator's training to address the main challenges that exist in adversarial EBMs. Our experiments show significant improvement in generation compared to existing adversarial EBMs, while also providing a useful energy function for efficient density estimation. △ Less

Submitted 8 June, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2401.09352 [pdf, other]

Neural Contractive Dynamical Systems

Authors: Hadi Beik-Mohammadi, Søren Hauberg, Georgios Arvanitidis, Nadia Figueroa, Gerhard Neumann, Leonel Rozo

Abstract: Stability guarantees are crucial when ensuring a fully autonomous robot does not take undesirable or potentially harmful actions. Unfortunately, global stability guarantees are hard to provide in dynamical systems learned from data, especially when the learned dynamics are governed by neural networks. We propose a novel methodology to learn neural contractive dynamical systems, where our neural ar… ▽ More Stability guarantees are crucial when ensuring a fully autonomous robot does not take undesirable or potentially harmful actions. Unfortunately, global stability guarantees are hard to provide in dynamical systems learned from data, especially when the learned dynamics are governed by neural networks. We propose a novel methodology to learn neural contractive dynamical systems, where our neural architecture ensures contraction, and hence, global stability. To efficiently scale the method to high-dimensional dynamical systems, we develop a variant of the variational autoencoder that learns dynamics in a low-dimensional latent representation space while retaining contractive stability after decoding. We further extend our approach to learning contractive systems on the Lie group of rotations to account for full-pose end-effector dynamic motions. The result is the first highly flexible learning architecture that provides contractive stability guarantees with capability to perform obstacle avoidance. Empirically, we demonstrate that our approach encodes the desired dynamics more accurately than the current state-of-the-art, which provides less strong stability guarantees. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2308.16900 [pdf, other]

Learning to Taste: A Multimodal Wine Dataset

Authors: Thoranna Bender, Simon Moe Sørensen, Alireza Kashani, K. Eldjarn Hjorleifsson, Grethe Hyldig, Søren Hauberg, Serge Belongie, Frederik Warburg

Abstract: We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique bottlings, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor anno… ▽ More We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique bottlings, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor annotations on a subset by conducting a wine-tasting experiment with 256 participants who were asked to rank wines based on their similarity in flavor, resulting in more than 5k pairwise flavor distances. We propose a low-dimensional concept embedding algorithm that combines human experience with automatic machine similarity kernels. We demonstrate that this shared concept embedding space improves upon separate embedding spaces for coarse flavor classification (alcohol percentage, country, grape, price, rating) and aligns with the intricate human perception of flavor. △ Less

Submitted 15 January, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

Comments: Accepted to NeurIPS 2023. See project page: https://thoranna.github.io/learning_to_taste/

arXiv:2307.10895 [pdf, other]

Variational Autoencoding of Dental Point Clouds

Authors: Johan Ziruo Ye, Thomas Ørkild, Peter Lempel Søndergaard, Søren Hauberg

Abstract: Digital dentistry has made significant advancements, yet numerous challenges remain. This paper introduces the FDI 16 dataset, an extensive collection of tooth meshes and point clouds. Additionally, we present a novel approach: Variational FoldingNet (VF-Net), a fully probabilistic variational autoencoder designed for point clouds. Notably, prior latent variable models for point clouds lack a one-… ▽ More Digital dentistry has made significant advancements, yet numerous challenges remain. This paper introduces the FDI 16 dataset, an extensive collection of tooth meshes and point clouds. Additionally, we present a novel approach: Variational FoldingNet (VF-Net), a fully probabilistic variational autoencoder designed for point clouds. Notably, prior latent variable models for point clouds lack a one-to-one correspondence between input and output points. Instead, they rely on optimizing Chamfer distances, a metric that lacks a normalized distributional counterpart, rendering it unsuitable for probabilistic modeling. We replace the explicit minimization of Chamfer distances with a suitable encoder, increasing computational efficiency while simplifying the probabilistic extension. This allows for straightforward application in various tasks, including mesh generation, shape completion, and representation learning. Empirically, we provide evidence of lower reconstruction error in dental reconstruction and interpolation, showcasing state-of-the-art performance in dental sample generation while identifying valuable latent representations. △ Less

Submitted 31 January, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

arXiv:2306.07158 [pdf, other]

Riemannian Laplace approximations for Bayesian neural networks

Authors: Federico Bergamin, Pablo Moreno-Muñoz, Søren Hauberg, Georgios Arvanitidis

Abstract: Bayesian neural networks often approximate the weight-posterior with a Gaussian distribution. However, practical posteriors are often, even locally, highly non-Gaussian, and empirical performance deteriorates. We propose a simple parametric approximate posterior that adapts to the shape of the true posterior through a Riemannian metric that is determined by the log-posterior gradient. We develop a… ▽ More Bayesian neural networks often approximate the weight-posterior with a Gaussian distribution. However, practical posteriors are often, even locally, highly non-Gaussian, and empirical performance deteriorates. We propose a simple parametric approximate posterior that adapts to the shape of the true posterior through a Riemannian metric that is determined by the log-posterior gradient. We develop a Riemannian Laplace approximation where samples naturally fall into weight-regions with low negative log-posterior. We show that these samples can be drawn by solving a system of ordinary differential equations, which can be done efficiently by leveraging the structure of the Riemannian metric and automatic differentiation. Empirically, we demonstrate that our approach consistently improves over the conventional Laplace approximation across tasks. We further show that, unlike the conventional Laplace approximation, our method is not overly sensitive to the choice of prior, which alleviates a practical pitfall of current approaches. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: 28 pages, 12 figures. Under submission

arXiv:2306.00520 [pdf, other]

On Masked Pre-training and the Marginal Likelihood

Authors: Pablo Moreno-Muñoz, Pol G. Recasens, Søren Hauberg

Abstract: Masked pre-training removes random input dimensions and learns a model that can predict the missing values. Empirical results indicate that this intuitive form of self-supervised learning yields models that generalize very well to new domains. A theoretical understanding is, however, lacking. This paper shows that masked pre-training with a suitable cumulative scoring function corresponds to maxim… ▽ More Masked pre-training removes random input dimensions and learns a model that can predict the missing values. Empirical results indicate that this intuitive form of self-supervised learning yields models that generalize very well to new domains. A theoretical understanding is, however, lacking. This paper shows that masked pre-training with a suitable cumulative scoring function corresponds to maximizing the model's marginal likelihood, which is de facto the Bayesian model selection measure of generalization. Beyond shedding light on the success of masked pre-training, this insight also suggests that Bayesian models can be trained with appropriately designed self-supervision. Empirically, we confirm the developed theory and explore the main learning principles of masked pre-training in large language models. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2303.13123 [pdf, other]

Laplacian Segmentation Networks Improve Epistemic Uncertainty Quantification

Authors: Kilian Zepf, Selma Wanna, Marco Miani, Juston Moore, Jes Frellsen, Søren Hauberg, Frederik Warburg, Aasa Feragen

Abstract: Image segmentation relies heavily on neural networks which are known to be overconfident, especially when making predictions on out-of-distribution (OOD) images. This is a common scenario in the medical domain due to variations in equipment, acquisition sites, or image corruptions. This work addresses the challenge of OOD detection by proposing Laplacian Segmentation Networks (LSN): methods which… ▽ More Image segmentation relies heavily on neural networks which are known to be overconfident, especially when making predictions on out-of-distribution (OOD) images. This is a common scenario in the medical domain due to variations in equipment, acquisition sites, or image corruptions. This work addresses the challenge of OOD detection by proposing Laplacian Segmentation Networks (LSN): methods which jointly model epistemic (model) and aleatoric (data) uncertainty for OOD detection. In doing so, we propose the first Laplace approximation of the weight posterior that scales to large neural networks with skip connections that have high-dimensional outputs. We demonstrate on three datasets that the LSN-modeled parameter distributions, in combination with suitable uncertainty measures, gives superior OOD detection. △ Less

Submitted 23 July, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Published in the Conference Proceedings of the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI)

arXiv:2302.01332 [pdf, other]

Bayesian Metric Learning for Uncertainty Quantification in Image Retrieval

Authors: Frederik Warburg, Marco Miani, Silas Brack, Soren Hauberg

Abstract: We propose the first Bayesian encoder for metric learning. Rather than relying on neural amortization as done in prior works, we learn a distribution over the network weights with the Laplace Approximation. We actualize this by first proving that the contrastive loss is a valid log-posterior. We then propose three methods that ensure a positive definite Hessian. Lastly, we present a novel decompos… ▽ More We propose the first Bayesian encoder for metric learning. Rather than relying on neural amortization as done in prior works, we learn a distribution over the network weights with the Laplace Approximation. We actualize this by first proving that the contrastive loss is a valid log-posterior. We then propose three methods that ensure a positive definite Hessian. Lastly, we present a novel decomposition of the Generalized Gauss-Newton approximation. Empirically, we show that our Laplacian Metric Learner (LAM) estimates well-calibrated uncertainties, reliably detects out-of-distribution examples, and yields state-of-the-art predictive performance. △ Less

Submitted 4 February, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

Comments: Code: https://github.com/FrederikWarburg/bayesian-metric-learning

arXiv:2212.10010 [pdf, other]

Identifying latent distances with Finslerian geometry

Authors: Alison Pouplin, David Eklund, Carl Henrik Ek, Søren Hauberg

Abstract: Riemannian geometry provides us with powerful tools to explore the latent space of generative models while preserving the underlying structure of the data. The latent space can be equipped it with a Riemannian metric, pulled back from the data manifold. With this metric, we can systematically navigate the space relying on geodesics defined as the shortest curves between two points. Generative mode… ▽ More Riemannian geometry provides us with powerful tools to explore the latent space of generative models while preserving the underlying structure of the data. The latent space can be equipped it with a Riemannian metric, pulled back from the data manifold. With this metric, we can systematically navigate the space relying on geodesics defined as the shortest curves between two points. Generative models are often stochastic, causing the data space, the Riemannian metric, and the geodesics, to be stochastic as well. Stochastic objects are at best impractical, and at worst impossible, to manipulate. A common solution is to approximate the stochastic pullback metric by its expectation. But the geodesics derived from this expected Riemannian metric do not correspond to the expected length-minimising curves. In this work, we propose another metric whose geodesics explicitly minimise the expected length of the pullback metric. We show this metric defines a Finsler metric, and we compare it with the expected Riemannian metric. In high dimensions, we prove that both metrics converge to each other at a rate of $O\left(\frac{1}{D}\right)$. This convergence implies that the established expected Riemannian metric is an accurate approximation of the theoretically more grounded Finsler metric. This provides justification for using the expected Riemannian metric for practical implementations. △ Less

Submitted 11 October, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: 36 pages, 12 figures, accepted at TMLR (October 2023)

arXiv:2211.05698 [pdf, other]

Probabilistic thermal stability prediction through sparsity promoting transformer representation

Authors: Yevgen Zainchkovskyy, Jesper Ferkinghoff-Borg, Anja Bennett, Thomas Egebjerg, Nikolai Lorenzen, Per Jr. Greisen, Søren Hauberg, Carsten Stahlhut

Abstract: Pre-trained protein language models have demonstrated significant applicability in different protein engineering task. A general usage of these pre-trained transformer models latent representation is to use a mean pool across residue positions to reduce the feature dimensions to further downstream tasks such as predicting bio-physics properties or other functional behaviours. In this paper we prov… ▽ More Pre-trained protein language models have demonstrated significant applicability in different protein engineering task. A general usage of these pre-trained transformer models latent representation is to use a mean pool across residue positions to reduce the feature dimensions to further downstream tasks such as predicting bio-physics properties or other functional behaviours. In this paper we provide a two-fold contribution to machine learning (ML) driven drug design. Firstly, we demonstrate the power of sparsity by promoting penalization of pre-trained transformer models to secure more robust and accurate melting temperature (Tm) prediction of single-chain variable fragments with a mean absolute error of 0.23C. Secondly, we demonstrate the power of framing our prediction problem in a probabilistic framework. Specifically, we advocate for the need of adopting probabilistic frameworks especially in the context of ML driven drug design. △ Less

Submitted 10 November, 2022; originally announced November 2022.

arXiv:2209.04636 [pdf, other]

Revisiting Active Sets for Gaussian Process Decoders

Authors: Pablo Moreno-Muñoz, Cilie W Feldager, Søren Hauberg

Abstract: Decoders built on Gaussian processes (GPs) are enticing due to the marginalisation over the non-linear function space. Such models (also known as GP-LVMs) are often expensive and notoriously difficult to train in practice, but can be scaled using variational inference and inducing points. In this paper, we revisit active set approximations. We develop a new stochastic estimate of the log-marginal… ▽ More Decoders built on Gaussian processes (GPs) are enticing due to the marginalisation over the non-linear function space. Such models (also known as GP-LVMs) are often expensive and notoriously difficult to train in practice, but can be scaled using variational inference and inducing points. In this paper, we revisit active set approximations. We develop a new stochastic estimate of the log-marginal likelihood based on recently discovered links to cross-validation, and propose a computationally efficient approximation thereof. We demonstrate that the resulting stochastic active sets (SAS) approximation significantly improves the robustness of GP decoder training while reducing computational cost. The SAS-GP obtains more structure in the latent space, scales to many datapoints and learns better representations than variational autoencoders, which is rarely the case for GP decoders. △ Less

Submitted 24 November, 2022; v1 submitted 10 September, 2022; originally announced September 2022.

Comments: Accepted at Advances in Neural Information Processing Systems (NeurIPS) 2022

arXiv:2206.15078 [pdf, other]

Laplacian Autoencoders for Learning Stochastic Representations

Authors: Marco Miani, Frederik Warburg, Pablo Moreno-Muñoz, Nicke Skafte Detlefsen, Søren Hauberg

Abstract: Established methods for unsupervised representation learning such as variational autoencoders produce none or poorly calibrated uncertainty estimates making it difficult to evaluate if learned representations are stable and reliable. In this work, we present a Bayesian autoencoder for unsupervised representation learning, which is trained using a novel variational lower-bound of the autoencoder ev… ▽ More Established methods for unsupervised representation learning such as variational autoencoders produce none or poorly calibrated uncertainty estimates making it difficult to evaluate if learned representations are stable and reliable. In this work, we present a Bayesian autoencoder for unsupervised representation learning, which is trained using a novel variational lower-bound of the autoencoder evidence. This is maximized using Monte Carlo EM with a variational distribution that takes the shape of a Laplace approximation. We develop a new Hessian approximation that scales linearly with data size allowing us to model high-dimensional data. Empirically, we show that our Laplacian autoencoder estimates well-calibrated uncertainties in both latent and output space. We demonstrate that this results in improved performance across a multitude of downstream tasks. △ Less

Submitted 23 August, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

arXiv:2206.09048 [pdf, ps, other]

doi 10.5281/zenodo.6554616

ICLR 2022 Challenge for Computational Geometry and Topology: Design and Results

Authors: Adele Myers, Saiteja Utpala, Shubham Talbar, Sophia Sanborn, Christian Shewmake, Claire Donnat, Johan Mathe, Umberto Lupo, Rishi Sonthalia, Xinyue Cui, Tom Szwagier, Arthur Pignet, Andri Bergsson, Soren Hauberg, Dmitriy Nielsen, Stefan Sommer, David Klindt, Erik Hermansen, Melvin Vaupel, Benjamin Dunn, Jeffrey Xiong, Noga Aharony, Itsik Pe'er, Felix Ambellan, Martin Hanik , et al. (3 additional authors not shown)

Abstract: This paper presents the computational challenge on differential geometry and topology that was hosted within the ICLR 2022 workshop ``Geometric and Topological Representation Learning". The competition asked participants to provide implementations of machine learning algorithms on manifolds that would respect the API of the open-source software Geomstats (manifold part) and Scikit-Learn (machine l… ▽ More This paper presents the computational challenge on differential geometry and topology that was hosted within the ICLR 2022 workshop ``Geometric and Topological Representation Learning". The competition asked participants to provide implementations of machine learning algorithms on manifolds that would respect the API of the open-source software Geomstats (manifold part) and Scikit-Learn (machine learning part) or PyTorch. The challenge attracted seven teams in its two month duration. This paper describes the design of the challenge and summarizes its main findings. △ Less

Submitted 26 June, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

arXiv:2206.01552 [pdf, other]

Is an encoder within reach?

Authors: Helene Hauschultz, Rasmus Berg Palm. Pablo Moreno-Muños, Nicki Skafte Detlefsen, Andrew Allan du Plessis, Søren Hauberg

Abstract: The encoder network of an autoencoder is an approximation of the nearest point projection onto the manifold spanned by the decoder. A concern with this approximation is that, while the output of the encoder is always unique, the projection can possibly have infinitely many values. This implies that the latent representations learned by the autoencoder can be misleading. Borrowing from geometric me… ▽ More The encoder network of an autoencoder is an approximation of the nearest point projection onto the manifold spanned by the decoder. A concern with this approximation is that, while the output of the encoder is always unique, the projection can possibly have infinitely many values. This implies that the latent representations learned by the autoencoder can be misleading. Borrowing from geometric measure theory, we introduce the idea of using the reach of the manifold spanned by the decoder to determine if an optimal encoder exists for a given dataset and decoder. We develop a local generalization of this reach and propose a numerical estimator thereof. We demonstrate that this allows us to determine which observations can be expected to have a unique, and thereby trustworthy, latent representation. As our local reach estimator is differentiable, we investigate its usage as a regularizer and show that this leads to learned manifolds for which projections are more often unique than without regularization. △ Less

Submitted 25 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: 11 pages, 10 figures

MSC Class: 68T07 (Primary) 57Z25 (Secondary)

arXiv:2206.00106 [pdf, other]

Mario Plays on a Manifold: Generating Functional Content in Latent Space through Differential Geometry

Authors: Miguel González-Duque, Rasmus Berg Palm, Søren Hauberg, Sebastian Risi

Abstract: Deep generative models can automatically create content of diverse types. However, there are no guarantees that such content will satisfy the criteria necessary to present it to end-users and be functional, e.g. the generated levels could be unsolvable or incoherent. In this paper we study this problem from a geometric perspective, and provide a method for reliable interpolation and random walks i… ▽ More Deep generative models can automatically create content of diverse types. However, there are no guarantees that such content will satisfy the criteria necessary to present it to end-users and be functional, e.g. the generated levels could be unsolvable or incoherent. In this paper we study this problem from a geometric perspective, and provide a method for reliable interpolation and random walks in the latent spaces of Categorical VAEs based on Riemannian geometry. We test our method with "Super Mario Bros" and "The Legend of Zelda" levels, and against simpler baselines inspired by current practice. Results show that the geometry we propose is better able to interpolate and sample, reliably staying closer to parts of the latent space that decode to playable content. △ Less

Submitted 31 May, 2022; originally announced June 2022.

Comments: Accepted at CoG 2022

arXiv:2203.09253 [pdf, other]

Visualizing Riemannian data with Rie-SNE

Authors: Andri Bergsson, Søren Hauberg

Abstract: Faithful visualizations of data residing on manifolds must take the underlying geometry into account when producing a flat planar view of the data. In this paper, we extend the classic stochastic neighbor embedding (SNE) algorithm to data on general Riemannian manifolds. We replace standard Gaussian assumptions with Riemannian diffusion counterparts and propose an efficient approximation that only… ▽ More Faithful visualizations of data residing on manifolds must take the underlying geometry into account when producing a flat planar view of the data. In this paper, we extend the classic stochastic neighbor embedding (SNE) algorithm to data on general Riemannian manifolds. We replace standard Gaussian assumptions with Riemannian diffusion counterparts and propose an efficient approximation that only requires access to calculations of Riemannian distances and volumes. We demonstrate that the approach also allows for mapping data from one manifold to another, e.g. from a high-dimensional sphere to a low-dimensional one. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Comments: 7 pages, 4 figures

arXiv:2203.07761 [pdf, other]

Reactive Motion Generation on Learned Riemannian Manifolds

Authors: Hadi Beik-Mohammadi, Søren Hauberg, Georgios Arvanitidis, Gerhard Neumann, Leonel Rozo

Abstract: In recent decades, advancements in motion learning have enabled robots to acquire new skills and adapt to unseen conditions in both structured and unstructured environments. In practice, motion learning methods capture relevant patterns and adjust them to new conditions such as dynamic obstacle avoidance or variable targets. In this paper, we investigate the robot motion learning paradigm from a R… ▽ More In recent decades, advancements in motion learning have enabled robots to acquire new skills and adapt to unseen conditions in both structured and unstructured environments. In practice, motion learning methods capture relevant patterns and adjust them to new conditions such as dynamic obstacle avoidance or variable targets. In this paper, we investigate the robot motion learning paradigm from a Riemannian manifold perspective. We argue that Riemannian manifolds may be learned via human demonstrations in which geodesics are natural motion skills. The geodesics are generated using a learned Riemannian metric produced by our novel variational autoencoder (VAE), which is especially intended to recover full-pose end-effector states and joint space configurations. In addition, we propose a technique for facilitating on-the-fly end-effector/multiple-limb obstacle avoidance by reshaping the learned manifold using an obstacle-aware ambient metric. The motion generated using these geodesics may naturally result in multiple-solution tasks that have not been explicitly demonstrated previously. We extensively tested our approach in task space and joint space scenarios using a 7-DoF robotic manipulator. We demonstrate that our method is capable of learning and generating motion skills based on complicated motion patterns demonstrated by a human operator. Additionally, we assess several obstacle avoidance strategies and generate trajectories in multiple-mode settings. △ Less

Submitted 17 August, 2023; v1 submitted 15 March, 2022; originally announced March 2022.

arXiv:2203.01097 [pdf, other]

Model-agnostic out-of-distribution detection using combined statistical tests

Authors: Federico Bergamin, Pierre-Alexandre Mattei, Jakob D. Havtorn, Hugo Senetaire, Hugo Schmutz, Lars Maaløe, Søren Hauberg, Jes Frellsen

Abstract: We present simple methods for out-of-distribution detection using a trained generative model. These techniques, based on classical statistical tests, are model-agnostic in the sense that they can be applied to any differentiable generative model. The idea is to combine a classical parametric test (Rao's score test) with the recently introduced typicality test. These two test statistics are both th… ▽ More We present simple methods for out-of-distribution detection using a trained generative model. These techniques, based on classical statistical tests, are model-agnostic in the sense that they can be applied to any differentiable generative model. The idea is to combine a classical parametric test (Rao's score test) with the recently introduced typicality test. These two test statistics are both theoretically well-founded and exploit different sources of information based on the likelihood for the typicality test and its gradient for the score test. We show that combining them using Fisher's method overall leads to a more accurate out-of-distribution test. We also discuss the benefits of casting out-of-distribution detection as a statistical testing problem, noting in particular that false positive rate control can be valuable for practical out-of-distribution detection. Despite their simplicity and generality, these methods can be competitive with model-specific out-of-distribution detection algorithms without any assumptions on the out-distribution. △ Less

Submitted 2 March, 2022; originally announced March 2022.

Comments: Accepted at the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

arXiv:2202.12707 [pdf, other]

Benchmarking Generative Latent Variable Models for Speech

Authors: Jakob D. Havtorn, Lasse Borgholt, Søren Hauberg, Jes Frellsen, Lars Maaløe

Abstract: Stochastic latent variable models (LVMs) achieve state-of-the-art performance on natural image generation but are still inferior to deterministic models on speech. In this paper, we develop a speech benchmark of popular temporal LVMs and compare them against state-of-the-art deterministic models. We report the likelihood, which is a much used metric in the image domain, but rarely, or incomparably… ▽ More Stochastic latent variable models (LVMs) achieve state-of-the-art performance on natural image generation but are still inferior to deterministic models on speech. In this paper, we develop a speech benchmark of popular temporal LVMs and compare them against state-of-the-art deterministic models. We report the likelihood, which is a much used metric in the image domain, but rarely, or incomparably, reported for speech models. To assess the quality of the learned representations, we also compare their usefulness for phoneme recognition. Finally, we adapt the Clockwork VAE, a state-of-the-art temporal LVM for video generation, to the speech domain. Despite being autoregressive only in latent space, we find that the Clockwork VAE can outperform previous LVMs and reduce the gap to deterministic models by using a hierarchy of latent variables. △ Less

Submitted 5 April, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: Accepted at the 2022 ICLR workshop on Deep Generative Models for Highly Structured Data (https://deep-gen-struct.github.io)

arXiv:2202.10769 [pdf, other]

Adaptive Cholesky Gaussian Processes

Authors: Simon Bartels, Kristoffer Stensbo-Smidt, Pablo Moreno-Muñoz, Wouter Boomsma, Jes Frellsen, Søren Hauberg

Abstract: We present a method to approximate Gaussian process regression models for large datasets by considering only a subset of the data. Our approach is novel in that the size of the subset is selected on the fly during exact inference with little computational overhead. From an empirical observation that the log-marginal likelihood often exhibits a linear trend once a sufficient subset of a dataset has… ▽ More We present a method to approximate Gaussian process regression models for large datasets by considering only a subset of the data. Our approach is novel in that the size of the subset is selected on the fly during exact inference with little computational overhead. From an empirical observation that the log-marginal likelihood often exhibits a linear trend once a sufficient subset of a dataset has been observed, we conclude that many large datasets contain redundant information that only slightly affects the posterior. Based on this, we provide probabilistic bounds on the full model evidence that can identify such subsets. Remarkably, these bounds are largely composed of terms that appear in intermediate steps of the standard Cholesky decomposition, allowing us to modify the algorithm to adaptively stop the decomposition once enough data have been observed. △ Less

Submitted 23 February, 2023; v1 submitted 22 February, 2022; originally announced February 2022.

Journal ref: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (2023)

arXiv:2202.01821 [pdf, other]

Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization

Authors: Andrea Vallone, Frederik Warburg, Hans Hansen, Søren Hauberg, Javier Civera

Abstract: Place recognition and visual localization are particularly challenging in wide baseline configurations. In this paper, we contribute with the \emph{Danish Airs and Grounds} (DAG) dataset, a large collection of street-level and aerial images targeting such cases. Its main challenge lies in the extreme viewing-angle difference between query and reference images with consequent changes in illuminatio… ▽ More Place recognition and visual localization are particularly challenging in wide baseline configurations. In this paper, we contribute with the \emph{Danish Airs and Grounds} (DAG) dataset, a large collection of street-level and aerial images targeting such cases. Its main challenge lies in the extreme viewing-angle difference between query and reference images with consequent changes in illumination and perspective. The dataset is larger and more diverse than current publicly available data, including more than 50 km of road in urban, suburban and rural areas. All images are associated with accurate 6-DoF metadata that allows the benchmarking of visual localization methods. We also propose a map-to-image re-localization pipeline, that first estimates a dense 3D reconstruction from the aerial images and then matches query street-level images to street-level renderings of the 3D model. The dataset can be downloaded at: https://frederikwarburg.github.io/DAG △ Less

Submitted 3 February, 2022; originally announced February 2022.

Comments: Submitted to RA-L (IROS)

arXiv:2201.05890 [pdf, other]

Robust uncertainty estimates with out-of-distribution pseudo-inputs training

Authors: Pierre Segonne, Yevgen Zainchkovskyy, Søren Hauberg

Abstract: Probabilistic models often use neural networks to control their predictive uncertainty. However, when making out-of-distribution (OOD)} predictions, the often-uncontrollable extrapolation properties of neural networks yield poor uncertainty predictions. Such models then don't know what they don't know, which directly limits their robustness w.r.t unexpected inputs. To counter this, we propose to e… ▽ More Probabilistic models often use neural networks to control their predictive uncertainty. However, when making out-of-distribution (OOD)} predictions, the often-uncontrollable extrapolation properties of neural networks yield poor uncertainty predictions. Such models then don't know what they don't know, which directly limits their robustness w.r.t unexpected inputs. To counter this, we propose to explicitly train the uncertainty predictor where we are not given data to make it reliable. As one cannot train without data, we provide mechanisms for generating pseudo-inputs in informative low-density regions of the input space, and show how to leverage these in a practical Bayesian framework that casts a prior distribution over the model uncertainty. With a holistic evaluation, we demonstrate that this yields robust and interpretable predictions of uncertainty while retaining state-of-the-art performance on diverse tasks such as regression and generative modelling △ Less

Submitted 15 January, 2022; originally announced January 2022.

arXiv:2111.00929 [pdf, other]

Bounds all around: training energy-based models with bidirectional bounds

Authors: Cong Geng, Jia Wang, Zhiyong Gao, Jes Frellsen, Søren Hauberg

Abstract: Energy-based models (EBMs) provide an elegant framework for density estimation, but they are notoriously difficult to train. Recent work has established links to generative adversarial networks, where the EBM is trained through a minimax game with a variational value function. We propose a bidirectional bound on the EBM log-likelihood, such that we maximize a lower bound and minimize an upper boun… ▽ More Energy-based models (EBMs) provide an elegant framework for density estimation, but they are notoriously difficult to train. Recent work has established links to generative adversarial networks, where the EBM is trained through a minimax game with a variational value function. We propose a bidirectional bound on the EBM log-likelihood, such that we maximize a lower bound and minimize an upper bound when solving the minimax game. We link one bound to a gradient penalty that stabilizes training, thereby providing grounding for best engineering practice. To evaluate the bounds we develop a new and efficient estimator of the Jacobi-determinant of the EBM generator. We demonstrate that these developments significantly stabilize training and yield high-quality density estimation and sample generation. △ Less

Submitted 2 November, 2021; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: This paper has been accepted by NeurIPS 2021

arXiv:2106.05367 [pdf, other]

Pulling back information geometry

Authors: Georgios Arvanitidis, Miguel González-Duque, Alison Pouplin, Dimitris Kalatzis, Søren Hauberg

Abstract: Latent space geometry has shown itself to provide a rich and rigorous framework for interacting with the latent variables of deep generative models. The existing theory, however, relies on the decoder being a Gaussian distribution as its simple reparametrization allows us to interpret the generating process as a random projection of a deterministic manifold. Consequently, this approach breaks down… ▽ More Latent space geometry has shown itself to provide a rich and rigorous framework for interacting with the latent variables of deep generative models. The existing theory, however, relies on the decoder being a Gaussian distribution as its simple reparametrization allows us to interpret the generating process as a random projection of a deterministic manifold. Consequently, this approach breaks down when applied to decoders that are not as easily reparametrized. We here propose to use the Fisher-Rao metric associated with the space of decoder distributions as a reference metric, which we pull back to the latent space. We show that we can achieve meaningful latent geometries for a wide range of decoder distributions for which the previous theory was not applicable, opening the door to `black box' latent geometries. △ Less

Submitted 23 April, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: Presented at AISTATS 2022

arXiv:2106.04315 [pdf, other]

Learning Riemannian Manifolds for Geodesic Motion Skills

Authors: Hadi Beik-Mohammadi, Søren Hauberg, Georgios Arvanitidis, Gerhard Neumann, Leonel Rozo

Abstract: For robots to work alongside humans and perform in unstructured environments, they must learn new motion skills and adapt them to unseen situations on the fly. This demands learning models that capture relevant motion patterns, while offering enough flexibility to adapt the encoded skills to new requirements, such as dynamic obstacle avoidance. We introduce a Riemannian manifold perspective on thi… ▽ More For robots to work alongside humans and perform in unstructured environments, they must learn new motion skills and adapt them to unseen situations on the fly. This demands learning models that capture relevant motion patterns, while offering enough flexibility to adapt the encoded skills to new requirements, such as dynamic obstacle avoidance. We introduce a Riemannian manifold perspective on this problem, and propose to learn a Riemannian manifold from human demonstrations on which geodesics are natural motion skills. We realize this with a variational autoencoder (VAE) over the space of position and orientations of the robot end-effector. Geodesic motion skills let a robot plan movements from and to arbitrary points on the data manifold. They also provide a straightforward method to avoid obstacles by redefining the ambient metric in an online fashion. Moreover, geodesics naturally exploit the manifold resulting from multiple--mode tasks to design motions that were not explicitly demonstrated previously. We test our learning framework using a 7-DoF robotic manipulator, where the robot satisfactorily learns and reproduces realistic skills featuring elaborated motion patterns, avoids previously unseen obstacles, and generates novel movements in multiple-mode settings. △ Less

Submitted 1 July, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

arXiv:2106.03500 [pdf, other]

Density estimation on smooth manifolds with normalizing flows

Authors: Dimitris Kalatzis, Johan Ziruo Ye, Alison Pouplin, Jesper Wohlert, Søren Hauberg

Abstract: We present a framework for learning probability distributions on topologically non-trivial manifolds, utilizing normalizing flows. Current methods focus on manifolds that are homeomorphic to Euclidean space, enforce strong structural priors on the learned models or use operations that do not easily scale to high dimensions. In contrast, our method learns distributions on a data manifold by "gluing… ▽ More We present a framework for learning probability distributions on topologically non-trivial manifolds, utilizing normalizing flows. Current methods focus on manifolds that are homeomorphic to Euclidean space, enforce strong structural priors on the learned models or use operations that do not easily scale to high dimensions. In contrast, our method learns distributions on a data manifold by "gluing" together multiple local models, thus defining an open cover of the data manifold. We demonstrate the efficiency of our approach on synthetic data of known manifolds, as well as higher dimensional manifolds of unknown topology, where our method exhibits better sample efficiency and competitive or superior performance against baselines in a number of tasks. △ Less

Submitted 9 July, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

arXiv:2102.08248 [pdf, other]

Hierarchical VAEs Know What They Don't Know

Authors: Jakob D. Havtorn, Jes Frellsen, Søren Hauberg, Lars Maaløe

Abstract: Deep generative models have been demonstrated as state-of-the-art density estimators. Yet, recent work has found that they often assign a higher likelihood to data from outside the training distribution. This seemingly paradoxical behavior has caused concerns over the quality of the attained density estimates. In the context of hierarchical variational autoencoders, we provide evidence to explain… ▽ More Deep generative models have been demonstrated as state-of-the-art density estimators. Yet, recent work has found that they often assign a higher likelihood to data from outside the training distribution. This seemingly paradoxical behavior has caused concerns over the quality of the attained density estimates. In the context of hierarchical variational autoencoders, we provide evidence to explain this behavior by out-of-distribution data having in-distribution low-level features. We argue that this is both expected and desirable behavior. With this insight in hand, we develop a fast, scalable and fully unsupervised likelihood-ratio score for OOD detection that requires data to be in-distribution across all feature-levels. We benchmark the method on a vast set of data and model combinations and achieve state-of-the-art results on out-of-distribution detection. △ Less

Submitted 18 January, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: Appeared in Proceedings of the 38th International Conference on Machine Learning (ICML 2021). 18 pages, source code available at https://github.com/JakobHavtorn/hvae-oodd, https://github.com/vlievin/biva-pytorch and https://github.com/larsmaaloee/BIVA

arXiv:2012.02679 [pdf, other]

doi 10.1038/s41467-022-29443-w

What is a meaningful representation of protein sequences?

Authors: Nicki Skafte Detlefsen, Søren Hauberg, Wouter Boomsma

Abstract: How we choose to represent our data has a fundamental impact on our ability to subsequently extract information from them. Machine learning promises to automatically determine efficient representations from large unstructured datasets, such as those arising in biology. However, empirical evidence suggests that seemingly minor changes to these machine learning models yield drastically different dat… ▽ More How we choose to represent our data has a fundamental impact on our ability to subsequently extract information from them. Machine learning promises to automatically determine efficient representations from large unstructured datasets, such as those arising in biology. However, empirical evidence suggests that seemingly minor changes to these machine learning models yield drastically different data representations that result in different biological interpretations of data. This begs the question of what even constitutes the most meaningful representation. Here, we approach this question for representations of protein sequences, which have received considerable attention in the recent literature. We explore two key contexts in which representations naturally arise: transfer learning and interpretable learning. In the first context, we demonstrate that several contemporary practices yield suboptimal performance, and in the latter we demonstrate that taking representation geometry into account significantly improves interpretability and lets the models reveal biological information that is otherwise obscured. △ Less

Submitted 7 March, 2022; v1 submitted 28 November, 2020; originally announced December 2020.

Comments: 17 pages, 8 figures, 2 tables

Journal ref: Nature Communications 13, 1914 (2022)

arXiv:2011.12663 [pdf, other]

Bayesian Triplet Loss: Uncertainty Quantification in Image Retrieval

Authors: Frederik Warburg, Martin Jørgensen, Javier Civera, Søren Hauberg

Abstract: Uncertainty quantification in image retrieval is crucial for downstream decisions, yet it remains a challenging and largely unexplored problem. Current methods for estimating uncertainties are poorly calibrated, computationally expensive, or based on heuristics. We present a new method that views image embeddings as stochastic features rather than deterministic features. Our two main contributions… ▽ More Uncertainty quantification in image retrieval is crucial for downstream decisions, yet it remains a challenging and largely unexplored problem. Current methods for estimating uncertainties are poorly calibrated, computationally expensive, or based on heuristics. We present a new method that views image embeddings as stochastic features rather than deterministic features. Our two main contributions are (1) a likelihood that matches the triplet constraint and that evaluates the probability of an anchor being closer to a positive than a negative; and (2) a prior over the feature space that justifies the conventional l2 normalization. To ensure computational efficiency, we derive a variational approximation of the posterior, called the Bayesian triplet loss, that produces state-of-the-art uncertainty estimates and matches the predictive performance of current state-of-the-art methods. △ Less

Submitted 17 September, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

Journal ref: 2021 ICCV

arXiv:2008.05552 [pdf, other]

Reparametrization Invariance in non-parametric Causal Discovery

Authors: Martin Jørgensen, Søren Hauberg

Abstract: Causal discovery estimates the underlying physical process that generates the observed data: does X cause Y or does Y cause X? Current methodologies use structural conditions to turn the causal query into a statistical query, when only observational data is available. But what if these statistical queries are sensitive to causal invariants? This study investigates one such invariant: the causal re… ▽ More Causal discovery estimates the underlying physical process that generates the observed data: does X cause Y or does Y cause X? Current methodologies use structural conditions to turn the causal query into a statistical query, when only observational data is available. But what if these statistical queries are sensitive to causal invariants? This study investigates one such invariant: the causal relationship between X and Y is invariant to the marginal distributions of X and Y. We propose an algorithm that uses a non-parametric estimator that is robust to changes in the marginal distributions. This way we may marginalize the marginals, and inspect what relationship is intrinsically there. The resulting causal estimator is competitive with current methodologies and has high emphasis on the uncertainty in the causal query; an aspect just as important as the query itself. △ Less

Submitted 12 August, 2020; originally announced August 2020.

arXiv:2008.00565 [pdf, other]

Geometrically Enriched Latent Spaces

Authors: Georgios Arvanitidis, Søren Hauberg, Bernhard Schölkopf

Abstract: A common assumption in generative models is that the generator immerses the latent space into a Euclidean ambient space. Instead, we consider the ambient space to be a Riemannian manifold, which allows for encoding domain knowledge through the associated Riemannian metric. Shortest paths can then be defined accordingly in the latent space to both follow the learned manifold and respect the ambient… ▽ More A common assumption in generative models is that the generator immerses the latent space into a Euclidean ambient space. Instead, we consider the ambient space to be a Riemannian manifold, which allows for encoding domain knowledge through the associated Riemannian metric. Shortest paths can then be defined accordingly in the latent space to both follow the learned manifold and respect the ambient geometry. Through careful design of the ambient metric we can ensure that shortest paths are well-behaved even for deterministic generators that otherwise would exhibit a misleading bias. Experimentally we show that our approach improves interpretability of learned representations both using stochastic and deterministic generators. △ Less

Submitted 2 August, 2020; originally announced August 2020.

arXiv:2006.11741 [pdf, other]

Isometric Gaussian Process Latent Variable Model for Dissimilarity Data

Authors: Martin Jørgensen, Søren Hauberg

Abstract: We present a probabilistic model where the latent variable respects both the distances and the topology of the modeled data. The model leverages the Riemannian geometry of the generated manifold to endow the latent space with a well-defined stochastic distance measure, which is modeled locally as Nakagami distributions. These stochastic distances are sought to be as similar as possible to observed… ▽ More We present a probabilistic model where the latent variable respects both the distances and the topology of the modeled data. The model leverages the Riemannian geometry of the generated manifold to endow the latent space with a well-defined stochastic distance measure, which is modeled locally as Nakagami distributions. These stochastic distances are sought to be as similar as possible to observed distances along a neighborhood graph through a censoring process. The model is inferred by variational inference based on observations of pairwise distances. We demonstrate how the new model can encode invariances in the learned manifolds. △ Less

Submitted 8 June, 2021; v1 submitted 21 June, 2020; originally announced June 2020.

Comments: ICML 2021

arXiv:2004.03637 [pdf, other]

Probabilistic Spatial Transformer Networks

Authors: Pola Schwöbel, Frederik Warburg, Martin Jørgensen, Kristoffer H. Madsen, Søren Hauberg

Abstract: Spatial Transformer Networks (STNs) estimate image transformations that can improve downstream tasks by `zooming in' on relevant regions in an image. However, STNs are hard to train and sensitive to mis-predictions of transformations. To circumvent these limitations, we propose a probabilistic extension that estimates a stochastic transformation rather than a deterministic one. Marginalizing trans… ▽ More Spatial Transformer Networks (STNs) estimate image transformations that can improve downstream tasks by `zooming in' on relevant regions in an image. However, STNs are hard to train and sensitive to mis-predictions of transformations. To circumvent these limitations, we propose a probabilistic extension that estimates a stochastic transformation rather than a deterministic one. Marginalizing transformations allows us to consider each image at multiple poses, which makes the localization task easier and the training more robust. As an additional benefit, the stochastic transformations act as a localized, learned data augmentation that improves the downstream tasks. We show across standard imaging benchmarks and on a challenging real-world dataset that these two properties lead to improved classification performance, robustness and model calibration. We further demonstrate that the approach generalizes to non-visual domains by improving model performance on time-series data. △ Less

Submitted 15 June, 2022; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: UAI 2022

arXiv:2002.05227 [pdf, other]

Variational Autoencoders with Riemannian Brownian Motion Priors

Authors: Dimitris Kalatzis, David Eklund, Georgios Arvanitidis, Søren Hauberg

Abstract: Variational Autoencoders (VAEs) represent the given data in a low-dimensional latent space, which is generally assumed to be Euclidean. This assumption naturally leads to the common choice of a standard Gaussian prior over continuous latent variables. Recent work has, however, shown that this prior has a detrimental effect on model capacity, leading to subpar performance. We propose that the Eucli… ▽ More Variational Autoencoders (VAEs) represent the given data in a low-dimensional latent space, which is generally assumed to be Euclidean. This assumption naturally leads to the common choice of a standard Gaussian prior over continuous latent variables. Recent work has, however, shown that this prior has a detrimental effect on model capacity, leading to subpar performance. We propose that the Euclidean assumption lies at the heart of this failure mode. To counter this, we assume a Riemannian structure over the latent space, which constitutes a more principled geometric view of the latent codes, and replace the standard Gaussian prior with a Riemannian Brownian motion prior. We propose an efficient inference scheme that does not rely on the unknown normalizing factor of this prior. Finally, we demonstrate that this prior significantly increases model capacity using only one additional scalar parameter. △ Less

Submitted 7 August, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

Comments: Published in ICML 2020

Journal ref: Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020

arXiv:1908.07377 [pdf, other]

Expected path length on random manifolds

Authors: David Eklund, Søren Hauberg

Abstract: Manifold learning seeks a low dimensional representation that faithfully captures the essence of data. Current methods can successfully learn such representations, but do not provide a meaningful set of operations that are associated with the representation. Working towards operational representation learning, we endow the latent space of a large class of generative models with a random Riemannian… ▽ More Manifold learning seeks a low dimensional representation that faithfully captures the essence of data. Current methods can successfully learn such representations, but do not provide a meaningful set of operations that are associated with the representation. Working towards operational representation learning, we endow the latent space of a large class of generative models with a random Riemannian metric, which provides us with elementary operators. As computational tools are unavailable for random Riemannian manifolds, we study deterministic approximations and derive tight error bounds on expected distances. △ Less

Submitted 20 August, 2019; originally announced August 2019.

arXiv:1906.11881 [pdf, other]

Explicit Disentanglement of Appearance and Perspective in Generative Models

Authors: Nicki Skafte Detlefsen, Søren Hauberg

Abstract: Disentangled representation learning finds compact, independent and easy-to-interpret factors of the data. Learning such has been shown to require an inductive bias, which we explicitly encode in a generative model of images. Specifically, we propose a model with two latent spaces: one that represents spatial transformations of the input data, and another that represents the transformed data. We f… ▽ More Disentangled representation learning finds compact, independent and easy-to-interpret factors of the data. Learning such has been shown to require an inductive bias, which we explicitly encode in a generative model of images. Specifically, we propose a model with two latent spaces: one that represents spatial transformations of the input data, and another that represents the transformed data. We find that the latter naturally captures the intrinsic appearance of the data. To realize the generative model, we propose a Variationally Inferred Transformational Autoencoder (VITAE) that incorporates a spatial transformer into a variational autoencoder. We show how to perform inference in the model efficiently by carefully designing the encoders and restricting the transformation class to be diffeomorphic. Empirically, our model separates the visual style from digit type on MNIST, separates shape and pose in images of human bodies and facial features from facial shape on CelebA. △ Less

Submitted 13 November, 2019; v1 submitted 11 June, 2019; originally announced June 2019.

Comments: 9 main pages + 2 pages references + 8 pages of supplementary material

arXiv:1906.03260 [pdf, other]

Reliable training and estimation of variance networks

Authors: Nicki S. Detlefsen, Martin Jørgensen, Søren Hauberg

Abstract: We propose and investigate new complementary methodologies for estimating predictive variance networks in regression neural networks. We derive a locally aware mini-batching scheme that result in sparse robust gradients, and show how to make unbiased weight updates to a variance network. Further, we formulate a heuristic for robustly fitting both the mean and variance networks post hoc. Finally, w… ▽ More We propose and investigate new complementary methodologies for estimating predictive variance networks in regression neural networks. We derive a locally aware mini-batching scheme that result in sparse robust gradients, and show how to make unbiased weight updates to a variance network. Further, we formulate a heuristic for robustly fitting both the mean and variance networks post hoc. Finally, we take inspiration from posterior Gaussian processes and propose a network architecture with similar extrapolation properties to Gaussian processes. The proposed methodologies are complementary, and improve upon baseline methods individually. Experimentally, we investigate the impact on predictive uncertainty on multiple datasets and tasks ranging from regression, active learning and generative modeling. Experiments consistently show significant improvements in predictive uncertainty estimation over state-of-the-art methods across tasks and datasets. △ Less

Submitted 4 November, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: Appeared at NeurIPS 2019

arXiv:1901.07229 [pdf, other]

Fast and Robust Shortest Paths on Manifolds Learned from Data

Authors: Georgios Arvanitidis, Søren Hauberg, Philipp Hennig, Michael Schober

Abstract: We propose a fast, simple and robust algorithm for computing shortest paths and distances on Riemannian manifolds learned from data. This amounts to solving a system of ordinary differential equations (ODEs) subject to boundary conditions. Here standard solvers perform poorly because they require well-behaved Jacobians of the ODE, and usually, manifolds learned from data imply unstable and ill-con… ▽ More We propose a fast, simple and robust algorithm for computing shortest paths and distances on Riemannian manifolds learned from data. This amounts to solving a system of ordinary differential equations (ODEs) subject to boundary conditions. Here standard solvers perform poorly because they require well-behaved Jacobians of the ODE, and usually, manifolds learned from data imply unstable and ill-conditioned Jacobians. Instead, we propose a fixed-point iteration scheme for solving the ODE that avoids Jacobians. This enhances the stability of the solver, while reduces the computational cost. In experiments involving both Riemannian metric learning and deep generative models we demonstrate significant improvements in speed and stability over both general-purpose state-of-the-art solvers as well as over specialized solvers. △ Less

Submitted 22 January, 2019; originally announced January 2019.

Comments: Accepted at Artificial Intelligence and Statistics (AISTATS) 2019

arXiv:1809.04747 [pdf, other]

Geodesic Clustering in Deep Generative Models

Authors: Tao Yang, Georgios Arvanitidis, Dongmei Fu, Xiaogang Li, Søren Hauberg

Abstract: Deep generative models are tremendously successful in learning low-dimensional latent representations that well-describe the data. These representations, however, tend to much distort relationships between points, i.e. pairwise distances tend to not reflect semantic similarities well. This renders unsupervised tasks, such as clustering, difficult when working with the latent representations. We de… ▽ More Deep generative models are tremendously successful in learning low-dimensional latent representations that well-describe the data. These representations, however, tend to much distort relationships between points, i.e. pairwise distances tend to not reflect semantic similarities well. This renders unsupervised tasks, such as clustering, difficult when working with the latent representations. We demonstrate that taking the geometry of the generative model into account is sufficient to make simple clustering algorithms work well over latent representations. Leaning on the recent finding that deep generative models constitute stochastically immersed Riemannian manifolds, we propose an efficient algorithm for computing geodesics (shortest paths) and computing distances in the latent space, while taking its distortion into account. We further propose a new architecture for modeling uncertainty in variational autoencoders, which is essential for understanding the geometry of deep generative models. Experiments show that the geodesic distance is very likely to reflect the internal structure of the data. △ Less

Submitted 12 September, 2018; originally announced September 2018.

arXiv:1806.04994 [pdf, other]

Only Bayes should learn a manifold (on the estimation of differential geometric structure from data)

Authors: Søren Hauberg

Abstract: We investigate learning of the differential geometric structure of a data manifold embedded in a high-dimensional Euclidean space. We first analyze kernel-based algorithms and show that under the usual regularizations, non-probabilistic methods cannot recover the differential geometric structure, but instead find mostly linear manifolds or spaces equipped with teleports. To properly learn the diff… ▽ More We investigate learning of the differential geometric structure of a data manifold embedded in a high-dimensional Euclidean space. We first analyze kernel-based algorithms and show that under the usual regularizations, non-probabilistic methods cannot recover the differential geometric structure, but instead find mostly linear manifolds or spaces equipped with teleports. To properly learn the differential geometric structure, non-probabilistic methods must apply regularizations that enforce large gradients, which go against common wisdom. We repeat the analysis for probabilistic methods and find that under reasonable priors, the geometric structure can be recovered. Fully exploiting the recovered structure, however, requires the development of stochastic extensions to classic Riemannian geometry. We take early steps in that regard. Finally, we partly extend the analysis to modern models based on neural networks, thereby highlighting geometric and probabilistic shortcomings of current deep generative models. △ Less

Submitted 26 September, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

arXiv:1805.09122 [pdf, other]

Probabilistic Riemannian submanifold learning with wrapped Gaussian process latent variable models

Authors: Anton Mallasto, Søren Hauberg, Aasa Feragen

Abstract: Latent variable models (LVMs) learn probabilistic models of data manifolds lying in an \emph{ambient} Euclidean space. In a number of applications, a priori known spatial constraints can shrink the ambient space into a considerably smaller manifold. Additionally, in these applications the Euclidean geometry might induce a suboptimal similarity measure, which could be improved by choosing a differe… ▽ More Latent variable models (LVMs) learn probabilistic models of data manifolds lying in an \emph{ambient} Euclidean space. In a number of applications, a priori known spatial constraints can shrink the ambient space into a considerably smaller manifold. Additionally, in these applications the Euclidean geometry might induce a suboptimal similarity measure, which could be improved by choosing a different metric. Euclidean models ignore such information and assign probability mass to data points that can never appear as data, and vastly different likelihoods to points that are similar under the desired metric. We propose the wrapped Gaussian process latent variable model (WGPLVM), that extends Gaussian process latent variable models to take values strictly on a given ambient Riemannian manifold, making the model blind to impossible data points. This allows non-linear, probabilistic inference of low-dimensional Riemannian submanifolds from data. Our evaluation on diverse datasets show that we improve performance on several tasks, including encoding, visualization and uncertainty quantification. △ Less

Submitted 24 February, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

arXiv:1710.11379 [pdf, other]

Latent Space Oddity: on the Curvature of Deep Generative Models

Authors: Georgios Arvanitidis, Lars Kai Hansen, Søren Hauberg

Abstract: Deep generative models provide a systematic way to learn nonlinear data distributions, through a set of latent variables and a nonlinear "generator" function that maps latent points into the input space. The nonlinearity of the generator imply that the latent space gives a distorted view of the input space. Under mild conditions, we show that this distortion can be characterized by a stochastic Ri… ▽ More Deep generative models provide a systematic way to learn nonlinear data distributions, through a set of latent variables and a nonlinear "generator" function that maps latent points into the input space. The nonlinearity of the generator imply that the latent space gives a distorted view of the input space. Under mild conditions, we show that this distortion can be characterized by a stochastic Riemannian metric, and demonstrate that distances and interpolants are significantly improved under this metric. This in turn improves probability distributions, sampling algorithms and clustering in the latent space. Our geometric analysis further reveals that current generators provide poor variance estimates and we propose a new generator architecture with vastly improved variance estimates. Results are demonstrated on convolutional and fully connected variational autoencoders, but the formalism easily generalize to other deep generative models. △ Less

Submitted 13 December, 2021; v1 submitted 31 October, 2017; originally announced October 2017.

Comments: Published at International Conference on Learning Representations (ICLR) 2018

arXiv:1702.01005 [pdf, other]

Intrinsic Grassmann Averages for Online Linear, Robust and Nonlinear Subspace Learning

Authors: Rudrasis Chakraborty, Søren Hauberg, Baba C. Vemuri

Abstract: Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) are fundamental methods in machine learning for dimensionality reduction. The former is a technique for finding this approximation in finite dimensions and the latter is often in an infinite dimensional Reproducing Kernel Hilbert-space (RKHS). In this paper, we present a geometric framework for computing the principa… ▽ More Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) are fundamental methods in machine learning for dimensionality reduction. The former is a technique for finding this approximation in finite dimensions and the latter is often in an infinite dimensional Reproducing Kernel Hilbert-space (RKHS). In this paper, we present a geometric framework for computing the principal linear subspaces in both situations as well as for the robust PCA case, that amounts to computing the intrinsic average on the space of all subspaces: the Grassmann manifold. Points on this manifold are defined as the subspaces spanned by $K$-tuples of observations. The intrinsic Grassmann average of these subspaces are shown to coincide with the principal components of the observations when they are drawn from a Gaussian distribution. We show similar results in the RKHS case and provide an efficient algorithm for computing the projection onto the this average subspace. The result is a method akin to KPCA which is substantially faster. Further, we present a novel online version of the KPCA using our geometric framework. Competitive performance of all our algorithms are demonstrated on a variety of real and synthetic data sets. △ Less

Submitted 9 July, 2018; v1 submitted 3 February, 2017; originally announced February 2017.

arXiv:1606.02518 [pdf, other]

A Locally Adaptive Normal Distribution

Authors: Georgios Arvanitidis, Lars Kai Hansen, Søren Hauberg

Abstract: The multivariate normal density is a monotonic function of the distance to the mean, and its ellipsoidal shape is due to the underlying Euclidean metric. We suggest to replace this metric with a locally adaptive, smoothly changing (Riemannian) metric that favors regions of high local density. The resulting locally adaptive normal distribution (LAND) is a generalization of the normal distribution t… ▽ More The multivariate normal density is a monotonic function of the distance to the mean, and its ellipsoidal shape is due to the underlying Euclidean metric. We suggest to replace this metric with a locally adaptive, smoothly changing (Riemannian) metric that favors regions of high local density. The resulting locally adaptive normal distribution (LAND) is a generalization of the normal distribution to the "manifold" setting, where data is assumed to lie near a potentially low-dimensional manifold embedded in $\mathbb{R}^D$. The LAND is parametric, depending only on a mean and a covariance, and is the maximum entropy distribution under the given metric. The underlying metric is, however, non-parametric. We develop a maximum likelihood algorithm to infer the distribution parameters that relies on a combination of gradient descent and Monte Carlo integration. We further extend the LAND to mixture models, and provide the corresponding EM algorithm. We demonstrate the efficiency of the LAND to fit non-trivial probability distributions over both synthetic data, and EEG measurements of human sleep. △ Less

Submitted 23 September, 2016; v1 submitted 8 June, 2016; originally announced June 2016.

arXiv:1510.02795 [pdf, other]

Dreaming More Data: Class-dependent Distributions over Diffeomorphisms for Learned Data Augmentation

Authors: Søren Hauberg, Oren Freifeld, Anders Boesen Lindbo Larsen, John W. Fisher III, Lars Kai Hansen

Abstract: Data augmentation is a key element in training high-dimensional models. In this approach, one synthesizes new observations by applying pre-specified transformations to the original training data; e.g.~new images are formed by rotating old ones. Current augmentation schemes, however, rely on manual specification of the applied transformations, making data augmentation an implicit form of feature en… ▽ More Data augmentation is a key element in training high-dimensional models. In this approach, one synthesizes new observations by applying pre-specified transformations to the original training data; e.g.~new images are formed by rotating old ones. Current augmentation schemes, however, rely on manual specification of the applied transformations, making data augmentation an implicit form of feature engineering. With an eye towards true end-to-end learning, we suggest learning the applied transformations on a per-class basis. Particularly, we align image pairs within each class under the assumption that the spatial transformation between images belongs to a large class of diffeomorphisms. We then learn a class-specific probabilistic generative models of the transformations in a Riemannian submanifold of the Lie group of diffeomorphisms. We demonstrate significant performance improvements in training deep neural nets over manually-specified augmentation schemes. Our code and augmented datasets are available online. △ Less

Submitted 30 June, 2016; v1 submitted 9 October, 2015; originally announced October 2015.

Journal ref: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 342-350, 2016

arXiv:1411.7432 [pdf, other]

Metrics for Probabilistic Geometries

Authors: Alessandra Tosi, Søren Hauberg, Alfredo Vellido, Neil D. Lawrence

Abstract: We investigate the geometrical structure of probabilistic generative dimensionality reduction models using the tools of Riemannian geometry. We explicitly define a distribution over the natural metric given by the models. We provide the necessary algorithms to compute expected metric tensors where the distribution over mappings is given by a Gaussian process. We treat the corresponding latent vari… ▽ More We investigate the geometrical structure of probabilistic generative dimensionality reduction models using the tools of Riemannian geometry. We explicitly define a distribution over the natural metric given by the models. We provide the necessary algorithms to compute expected metric tensors where the distribution over mappings is given by a Gaussian process. We treat the corresponding latent variable model as a Riemannian manifold and we use the expectation of the metric under the Gaussian process prior to define interpolating paths and measure distance between latent points. We show how distances that respect the expected metric lead to more appropriate generation of new data. △ Less

Submitted 26 November, 2014; originally announced November 2014.

Comments: UAI 2014

Showing 1–50 of 52 results for author: Hauberg, S