Search | arXiv e-print repository

A Quadrature Approach for General-Purpose Batch Bayesian Optimization via Probabilistic Lifting

Authors: Masaki Adachi, Satoshi Hayakawa, Martin Jørgensen, Saad Hamid, Harald Oberhauser, Michael A. Osborne

Abstract: Parallelisation in Bayesian optimisation is a common strategy but faces several challenges: the need for flexibility in acquisition functions and kernel choices, flexibility dealing with discrete and continuous variables simultaneously, model misspecification, and lastly fast massive parallelisation. To address these challenges, we introduce a versatile and modular framework for batch Bayesian opt… ▽ More Parallelisation in Bayesian optimisation is a common strategy but faces several challenges: the need for flexibility in acquisition functions and kernel choices, flexibility dealing with discrete and continuous variables simultaneously, model misspecification, and lastly fast massive parallelisation. To address these challenges, we introduce a versatile and modular framework for batch Bayesian optimisation via probabilistic lifting with kernel quadrature, called SOBER, which we present as a Python library based on GPyTorch/BoTorch. Our framework offers the following unique benefits: (1) Versatility in downstream tasks under a unified approach. (2) A gradient-free sampler, which does not require the gradient of acquisition functions, offering domain-agnostic sampling (e.g., discrete and mixed variables, non-Euclidean space). (3) Flexibility in domain prior distribution. (4) Adaptive batch size (autonomous determination of the optimal batch size). (5) Robustness against a misspecified reproducing kernel Hilbert space. (6) Natural stopping criterion. △ Less

Submitted 19 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: This work is the journal extension of the workshop paper (arXiv:2301.11832) and AISTATS paper (arXiv:2306.05843). 48 pages, 11 figures

MSC Class: 62C10; 62F15

arXiv:2311.12214 [pdf, other]

Random Fourier Signature Features

Authors: Csaba Toth, Harald Oberhauser, Zoltan Szabo

Abstract: Tensor algebras give rise to one of the most powerful measures of similarity for sequences of arbitrary length called the signature kernel accompanied with attractive theoretical guarantees from stochastic analysis. Previous algorithms to compute the signature kernel scale quadratically in terms of the length and the number of the sequences. To mitigate this severe computational bottleneck, we dev… ▽ More Tensor algebras give rise to one of the most powerful measures of similarity for sequences of arbitrary length called the signature kernel accompanied with attractive theoretical guarantees from stochastic analysis. Previous algorithms to compute the signature kernel scale quadratically in terms of the length and the number of the sequences. To mitigate this severe computational bottleneck, we develop a random Fourier feature-based acceleration of the signature kernel acting on the inherently non-Euclidean domain of sequences. We show uniform approximation guarantees for the proposed unbiased estimator of the signature kernel, while keeping its computation linear in the sequence length and number. In addition, combined with recent advances on tensor projections, we derive two even more scalable time series features with favourable concentration properties and computational complexity both in time and memory. Our empirical results show that the reduction in computational cost comes at a negligible price in terms of accuracy on moderate-sized datasets, and it enables one to scale to large datasets up to a million time series. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.04171 [pdf, other]

HADES: Fast Singularity Detection with Local Measure Comparison

Authors: Uzu Lim, Harald Oberhauser, Vidit Nanda

Abstract: We introduce Hades, an unsupervised algorithm to detect singularities in data. This algorithm employs a kernel goodness-of-fit test, and as a consequence it is much faster and far more scaleable than the existing topology-based alternatives. Using tools from differential geometry and optimal transport theory, we prove that Hades correctly detects singularities with high probability when the data s… ▽ More We introduce Hades, an unsupervised algorithm to detect singularities in data. This algorithm employs a kernel goodness-of-fit test, and as a consequence it is much faster and far more scaleable than the existing topology-based alternatives. Using tools from differential geometry and optimal transport theory, we prove that Hades correctly detects singularities with high probability when the data sample lives on a transverse intersection of equidimensional manifolds. In computational experiments, Hades recovers singularities in synthetically generated data, branching points in road network data, intersection rings in molecular conformation space, and anomalies in image data. △ Less

Submitted 7 November, 2023; originally announced November 2023.

MSC Class: 55N31; 32S50

arXiv:2306.05843 [pdf, other]

Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach

Authors: Masaki Adachi, Satoshi Hayakawa, Martin Jørgensen, Xingchen Wan, Vu Nguyen, Harald Oberhauser, Michael A. Osborne

Abstract: Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation. This fixed approach is inefficient because of a dynamic trade-off between cost and speed -- larger batches are more costly, smaller batches lead to slower wall-clock run-times -- and the trade-off may change over the run (larger batches are often preferable earlier). To address… ▽ More Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation. This fixed approach is inefficient because of a dynamic trade-off between cost and speed -- larger batches are more costly, smaller batches lead to slower wall-clock run-times -- and the trade-off may change over the run (larger batches are often preferable earlier). To address this trade-off, we propose a novel Probabilistic Numerics framework that adaptively changes batch sizes. By framing batch selection as a quadrature task, our integration-error-aware algorithm facilitates the automatic tuning of batch sizes to meet predefined quadrature precision objectives, akin to how typical optimizers terminate based on convergence thresholds. This approach obviates the necessity for exhaustive searches across all potential batch sizes. We also extend this to scenarios with constrained active learning and constrained optimization, interpreting constraint violations as reductions in the precision requirement, to subsequently adapt batch construction. Through extensive experiments, we demonstrate that our approach significantly enhances learning efficiency and flexibility in diverse Bayesian batch active learning and Bayesian optimization applications. △ Less

Submitted 21 February, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: Accepted at AISTATS 2024. 33 pages, 6 figures

MSC Class: 62C10; 62F15

arXiv:2305.04625 [pdf, other]

The Signature Kernel

Authors: Darrick Lee, Harald Oberhauser

Abstract: The signature kernel is a positive definite kernel for sequential data. It inherits theoretical guarantees from stochastic analysis, has efficient algorithms for computation, and shows strong empirical performance. In this short survey paper for a forthcoming Springer handbook, we give an elementary introduction to the signature kernel and highlight these theoretical and computational properties. The signature kernel is a positive definite kernel for sequential data. It inherits theoretical guarantees from stochastic analysis, has efficient algorithms for computation, and shows strong empirical performance. In this short survey paper for a forthcoming Springer handbook, we give an elementary introduction to the signature kernel and highlight these theoretical and computational properties. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: 31 pages, 2 figures

arXiv:2301.12466 [pdf, other]

Kernelized Cumulants: Beyond Kernel Mean Embeddings

Authors: Patric Bonnier, Harald Oberhauser, Zoltán Szabó

Abstract: In $\mathbb R^d$, it is well-known that cumulants provide an alternative to moments that can achieve the same goals with numerous benefits such as lower variance estimators. In this paper we extend cumulants to reproducing kernel Hilbert spaces (RKHS) using tools from tensor algebras and show that they are computationally tractable by a kernel trick. These kernelized cumulants provide a new set of… ▽ More In $\mathbb R^d$, it is well-known that cumulants provide an alternative to moments that can achieve the same goals with numerous benefits such as lower variance estimators. In this paper we extend cumulants to reproducing kernel Hilbert spaces (RKHS) using tools from tensor algebras and show that they are computationally tractable by a kernel trick. These kernelized cumulants provide a new set of all-purpose statistics; the classical maximum mean discrepancy and Hilbert-Schmidt independence criterion arise as the degree one objects in our general construction. We argue both theoretically and empirically (on synthetic, environmental, and traffic data analysis) that going beyond degree one has several advantages and can be achieved with the same computational complexity and minimal overhead in our experiments. △ Less

Submitted 29 October, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

Comments: 19 pages, 8 figures

arXiv:2301.11832 [pdf, other]

SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces

Authors: Masaki Adachi, Satoshi Hayakawa, Saad Hamid, Martin Jørgensen, Harald Oberhauser, Micheal A. Osborne

Abstract: Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOB… ▽ More Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOBER, which permits scalable and diversified batch global optimisation and quadrature with arbitrary acquisition functions and kernels over discrete and mixed spaces. The key to our approach is to reformulate batch selection for global optimisation as a quadrature problem, which relaxes acquisition function maximisation (non-convex) to kernel recombination (convex). Bridging global optimisation and quadrature can efficiently solve both tasks by balancing the merits of exploitative Bayesian optimisation and explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive baselines on 12 synthetic and diverse real-world tasks. △ Less

Submitted 5 July, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

Comments: 34 pages, 12 figures

MSC Class: 62C10; 62F15

arXiv:2301.09517 [pdf, other]

Sampling-based Nyström Approximation and Kernel Quadrature

Authors: Satoshi Hayakawa, Harald Oberhauser, Terry Lyons

Abstract: We analyze the Nyström approximation of a positive definite kernel associated with a probability measure. We first prove an improved error bound for the conventional Nyström approximation with i.i.d. sampling and singular-value decomposition in the continuous regime; the proof techniques are borrowed from statistical learning theory. We further introduce a refined selection of subspaces in Nyström… ▽ More We analyze the Nyström approximation of a positive definite kernel associated with a probability measure. We first prove an improved error bound for the conventional Nyström approximation with i.i.d. sampling and singular-value decomposition in the continuous regime; the proof techniques are borrowed from statistical learning theory. We further introduce a refined selection of subspaces in Nyström approximation with theoretical guarantees that is applicable to non-i.i.d. landmark points. Finally, we discuss their application to convex kernel quadrature and give novel theoretical guarantees as well as numerical observations. △ Less

Submitted 22 May, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: 22 pages, ICML 2023 camera-ready version. Typos fixed

arXiv:2206.04734 [pdf, other]

Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination

Authors: Masaki Adachi, Satoshi Hayakawa, Martin Jørgensen, Harald Oberhauser, Michael A. Osborne

Abstract: Calculation of Bayesian posteriors and model evidences typically requires numerical integration. Bayesian quadrature (BQ), a surrogate-model-based approach to numerical integration, is capable of superb sample efficiency, but its lack of parallelisation has hindered its practical applications. In this work, we propose a parallelised (batch) BQ method, employing techniques from kernel quadrature, t… ▽ More Calculation of Bayesian posteriors and model evidences typically requires numerical integration. Bayesian quadrature (BQ), a surrogate-model-based approach to numerical integration, is capable of superb sample efficiency, but its lack of parallelisation has hindered its practical applications. In this work, we propose a parallelised (batch) BQ method, employing techniques from kernel quadrature, that possesses an empirically exponential convergence rate. Additionally, just as with Nested Sampling, our method permits simultaneous inference of both posteriors and model evidence. Samples from our BQ surrogate model are re-selected to give a sparse set of samples, via a kernel recombination algorithm, requiring negligible additional time to increase the batch size. Empirically, we find that our approach significantly outperforms the sampling efficiency of both state-of-the-art BQ techniques and Nested Sampling in various real-world datasets, including lithium-ion battery analytics. △ Less

Submitted 27 January, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 38 pages, 6 figures

MSC Class: 62C10; 62F15

Journal ref: NeurIPS 35, 16533--16547 (2022)

arXiv:2205.14092 [pdf, other]

Capturing Graphs with Hypo-Elliptic Diffusions

Authors: Csaba Toth, Darrick Lee, Celia Hacker, Harald Oberhauser

Abstract: Convolutional layers within graph neural networks operate by aggregating information about local neighbourhood structures; one common way to encode such substructures is through random walks. The distribution of these random walks evolves according to a diffusion equation defined using the graph Laplacian. We extend this approach by leveraging classic mathematical results about hypo-elliptic diffu… ▽ More Convolutional layers within graph neural networks operate by aggregating information about local neighbourhood structures; one common way to encode such substructures is through random walks. The distribution of these random walks evolves according to a diffusion equation defined using the graph Laplacian. We extend this approach by leveraging classic mathematical results about hypo-elliptic diffusions. This results in a novel tensor-valued graph operator, which we call the hypo-elliptic graph Laplacian. We provide theoretical guarantees and efficient low-rank approximation algorithms. In particular, this gives a structured approach to capture long-range dependencies on graphs that is robust to pooling. Besides the attractive theoretical properties, our experiments show that this method competes with graph transformers on datasets requiring long-range reasoning but scales only linearly in the number of edges as opposed to quadratically in nodes. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: 30 pages

arXiv:2110.06357 [pdf, other]

Tangent Space and Dimension Estimation with the Wasserstein Distance

Authors: Uzu Lim, Harald Oberhauser, Vidit Nanda

Abstract: Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space. We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold with high confidence. The algorithm for this estimation is Local PCA, a local version of principal component analysis. Our results accommodate for… ▽ More Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space. We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold with high confidence. The algorithm for this estimation is Local PCA, a local version of principal component analysis. Our results accommodate for noisy non-uniform data distribution with the noise that may vary across the manifold, and allow simultaneous estimation at multiple points. Crucially, all of the constants appearing in our bound are explicitly described. The proof uses a matrix concentration inequality to estimate covariance matrices and a Wasserstein distance bound for quantifying nonlinearity of the underlying manifold and non-uniformity of the probability measure. △ Less

Submitted 25 September, 2023; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: Main theorems rewritten. Introduction is written more compactly

arXiv:2107.09597 [pdf, other]

Positively Weighted Kernel Quadrature via Subsampling

Authors: Satoshi Hayakawa, Harald Oberhauser, Terry Lyons

Abstract: We study kernel quadrature rules with convex weights. Our approach combines the spectral properties of the kernel with recombination results about point measures. This results in effective algorithms that construct convex quadrature rules using only access to i.i.d. samples from the underlying measure and evaluation of the kernel and that result in a small worst-case error. In addition to our theo… ▽ More We study kernel quadrature rules with convex weights. Our approach combines the spectral properties of the kernel with recombination results about point measures. This results in effective algorithms that construct convex quadrature rules using only access to i.i.d. samples from the underlying measure and evaluation of the kernel and that result in a small worst-case error. In addition to our theoretical results and the benefits resulting from convex weights, our experiments indicate that this construction can compete with the optimal bounds in well-known examples. △ Less

Submitted 11 October, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

Comments: 29 pages, NeurIPS 2022 camera-ready version

arXiv:2104.14691 [pdf, other]

Grid-Free Computation of Probabilistic Safety with Malliavin Calculus

Authors: Francesco Cosentino, Harald Oberhauser, Alessandro Abate

Abstract: This work concerns continuous-time, continuous-space stochastic dynamical systems described by stochastic differential equations (SDE). It presents a new approach to compute probabilistic safety regions, namely sets of initial conditions of the SDE associated to trajectories that are safe with a probability larger than a given threshold. The approach introduces a functional that is minimised at th… ▽ More This work concerns continuous-time, continuous-space stochastic dynamical systems described by stochastic differential equations (SDE). It presents a new approach to compute probabilistic safety regions, namely sets of initial conditions of the SDE associated to trajectories that are safe with a probability larger than a given threshold. The approach introduces a functional that is minimised at the border of the probabilistic safety region, then solves an optimisation problem using techniques from Malliavin Calculus, which computes such region. Unlike existing results in the literature, the new approach allows one to compute probabilistic safety regions without gridding the state space of the SDE. △ Less

Submitted 10 January, 2023; v1 submitted 29 April, 2021; originally announced April 2021.

arXiv:2102.03657 [pdf, other]

Neural SDEs as Infinite-Dimensional GANs

Authors: Patrick Kidger, James Foster, Xuechen Li, Harald Oberhauser, Terry Lyons

Abstract: Stochastic differential equations (SDEs) are a staple of mathematical modelling of temporal dynamics. However, a fundamental limitation has been that such models have typically been relatively inflexible, which recent work introducing Neural SDEs has sought to solve. Here, we show that the current classical approach to fitting SDEs may be approached as a special case of (Wasserstein) GANs, and in… ▽ More Stochastic differential equations (SDEs) are a staple of mathematical modelling of temporal dynamics. However, a fundamental limitation has been that such models have typically been relatively inflexible, which recent work introducing Neural SDEs has sought to solve. Here, we show that the current classical approach to fitting SDEs may be approached as a special case of (Wasserstein) GANs, and in doing so the neural and classical regimes may be brought together. The input noise is Brownian motion, the output samples are time-evolving paths produced by a numerical solver, and by parameterising a discriminator as a Neural Controlled Differential Equation (CDE), we obtain Neural SDEs as (in modern machine learning parlance) continuous-time generative time series models. Unlike previous work on this problem, this is a direct extension of the classical approach without reference to either prespecified statistics or density functions. Arbitrary drift and diffusions are admissible, so as the Wasserstein loss has a unique global minima, in the infinite data limit any SDE may be learnt. Example code has been made available as part of the \texttt{torchsde} repository. △ Less

Submitted 11 May, 2021; v1 submitted 6 February, 2021; originally announced February 2021.

Comments: Published at ICML 2021

arXiv:2102.02876 [pdf, other]

Nonlinear Independent Component Analysis for Discrete-Time and Continuous-Time Signals

Authors: Alexander Schell, Harald Oberhauser

Abstract: We study the classical problem of recovering a multidimensional source signal from observations of nonlinear mixtures of this signal. We show that this recovery is possible (up to a permutation and monotone scaling of the source's original component signals) if the mixture is due to a sufficiently differentiable and invertible but otherwise arbitrarily nonlinear function and the component signals… ▽ More We study the classical problem of recovering a multidimensional source signal from observations of nonlinear mixtures of this signal. We show that this recovery is possible (up to a permutation and monotone scaling of the source's original component signals) if the mixture is due to a sufficiently differentiable and invertible but otherwise arbitrarily nonlinear function and the component signals of the source are statistically independent with 'non-degenerate' second-order statistics. The latter assumption requires the source signal to meet one of three regularity conditions which essentially ensure that the source is sufficiently far away from the non-recoverable extremes of being deterministic or constant in time. These assumptions, which cover many popular time series models and stochastic processes, allow us to reformulate the initial problem of nonlinear blind source separation as a simple-to-state problem of optimisation-based function approximation. We propose to solve this approximation problem by minimizing a novel type of objective function that efficiently quantifies the mutual statistical dependence between multiple stochastic processes via cumulant-like statistics. This yields a scalable and direct new method for nonlinear Independent Component Analysis with widely applicable theoretical guarantees and for which our experiments indicate good performance. △ Less

Submitted 15 January, 2023; v1 submitted 4 February, 2021; originally announced February 2021.

Comments: 89 pages, 10 figures; thoroughly revised presentation (including newly added Sections 2, 10, A.19). To appear in the Annals of Statistics

MSC Class: 62H25; 62M99; 62H05; 60L10; 62M45; 62R10

arXiv:2006.07027 [pdf, other]

Seq2Tens: An Efficient Representation of Sequences by Low-Rank Tensor Projections

Authors: Csaba Toth, Patric Bonnier, Harald Oberhauser

Abstract: Sequential data such as time series, video, or text can be challenging to analyse as the ordered structure gives rise to complex dependencies. At the heart of this is non-commutativity, in the sense that reordering the elements of a sequence can completely change its meaning. We use a classical mathematical object -- the tensor algebra -- to capture such dependencies. To address the innate computa… ▽ More Sequential data such as time series, video, or text can be challenging to analyse as the ordered structure gives rise to complex dependencies. At the heart of this is non-commutativity, in the sense that reordering the elements of a sequence can completely change its meaning. We use a classical mathematical object -- the tensor algebra -- to capture such dependencies. To address the innate computational complexity of high degree tensors, we use compositions of low-rank tensor projections. This yields modular and scalable building blocks for neural networks that give state-of-the-art performance on standard benchmarks such as multivariate time series classification and generative models for video. △ Less

Submitted 30 July, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: 37 pages, 6 figures, 8 tables

arXiv:2006.01819 [pdf, other]

Carathéodory Sampling for Stochastic Gradient Descent

Authors: Francesco Cosentino, Harald Oberhauser, Alessandro Abate

Abstract: Many problems require to optimize empirical risk functions over large data sets. Gradient descent methods that calculate the full gradient in every descent step do not scale to such datasets. Various flavours of Stochastic Gradient Descent (SGD) replace the expensive summation that computes the full gradient by approximating it with a small sum over a randomly selected subsample of the data set th… ▽ More Many problems require to optimize empirical risk functions over large data sets. Gradient descent methods that calculate the full gradient in every descent step do not scale to such datasets. Various flavours of Stochastic Gradient Descent (SGD) replace the expensive summation that computes the full gradient by approximating it with a small sum over a randomly selected subsample of the data set that in turn suffers from a high variance. We present a different approach that is inspired by classical results of Tchakaloff and Carathéodory about measure reduction. These results allow to replace an empirical measure with another, carefully constructed probability measure that has a much smaller support, but can preserve certain statistics such as the expected gradient. To turn this into scalable algorithms we firstly, adaptively select the descent steps where the measure reduction is carried out; secondly, we combine this with Block Coordinate Descent so that measure reduction can be done very cheaply. This makes the resulting methods scalable to high-dimensional spaces. Finally, we provide an experimental validation and comparison. △ Less

Submitted 25 November, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

arXiv:2006.01757 [pdf, other]

A Randomized Algorithm to Reduce the Support of Discrete Measures

Authors: Francesco Cosentino, Harald Oberhauser, Alessandro Abate

Abstract: Given a discrete probability measure supported on $N$ atoms and a set of $n$ real-valued functions, there exists a probability measure that is supported on a subset of $n+1$ of the original $N$ atoms and has the same mean when integrated against each of the $n$ functions. If $ N \gg n$ this results in a huge reduction of complexity. We give a simple geometric characterization of barycenters via ne… ▽ More Given a discrete probability measure supported on $N$ atoms and a set of $n$ real-valued functions, there exists a probability measure that is supported on a subset of $n+1$ of the original $N$ atoms and has the same mean when integrated against each of the $n$ functions. If $ N \gg n$ this results in a huge reduction of complexity. We give a simple geometric characterization of barycenters via negative cones and derive a randomized algorithm that computes this new measure by "greedy geometric sampling". We then study its properties, and benchmark it on synthetic and real-world data to show that it can be very beneficial in the $N\gg n$ regime. A Python implementation is available at \url{https://github.com/FraCose/Recombination_Random_Algos}. △ Less

Submitted 26 November, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

Journal ref: 34th Conference on Advances in Neural Information Processing Systems, 2020

arXiv:1906.08215 [pdf, other]

Bayesian Learning from Sequential Data using Gaussian Processes with Signature Covariances

Authors: Csaba Toth, Harald Oberhauser

Abstract: We develop a Bayesian approach to learning from sequential data by using Gaussian processes (GPs) with so-called signature kernels as covariance functions. This allows to make sequences of different length comparable and to rely on strong theoretical results from stochastic analysis. Signatures capture sequential structure with tensors that can scale unfavourably in sequence length and state space… ▽ More We develop a Bayesian approach to learning from sequential data by using Gaussian processes (GPs) with so-called signature kernels as covariance functions. This allows to make sequences of different length comparable and to rely on strong theoretical results from stochastic analysis. Signatures capture sequential structure with tensors that can scale unfavourably in sequence length and state space dimension. To deal with this, we introduce a sparse variational approach with inducing tensors. We then combine the resulting GP with LSTMs and GRUs to build larger models that leverage the strengths of each of these approaches and benchmark the resulting GPs on multivariate time series (TS) classification datasets. Code available at https://github.com/tgcsaba/GPSig. △ Less

Submitted 6 July, 2020; v1 submitted 19 June, 2019; originally announced June 2019.

Comments: Near camera ready version for ICML 2020. Previous title: "Variational Gaussian Processes with Signature Covariances"

arXiv:1806.00381 [pdf, other]

doi 10.1109/TPAMI.2018.2885516

Persistence paths and signature features in topological data analysis

Authors: Ilya Chevyrev, Vidit Nanda, Harald Oberhauser

Abstract: We introduce a new feature map for barcodes that arise in persistent homology computation. The main idea is to first realize each barcode as a path in a convenient vector space, and to then compute its path signature which takes values in the tensor algebra of that vector space. The composition of these two operations - barcode to path, path to tensor series - results in a feature map that has sev… ▽ More We introduce a new feature map for barcodes that arise in persistent homology computation. The main idea is to first realize each barcode as a path in a convenient vector space, and to then compute its path signature which takes values in the tensor algebra of that vector space. The composition of these two operations - barcode to path, path to tensor series - results in a feature map that has several desirable properties for statistical learning, such as universality and characteristicness, and achieves state-of-the-art results on common classification benchmarks. △ Less

Submitted 12 December, 2018; v1 submitted 1 June, 2018; originally announced June 2018.

Comments: Additional experiment and further details. To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence

Journal ref: IEEE TPAMI (2020) Volume: 42, Issue: 1, pp. 192 - 202

arXiv:1801.00753 [pdf, other]

Probabilistic supervised learning

Authors: Frithjof Gressmann, Franz J. Király, Bilal Mateen, Harald Oberhauser

Abstract: Predictive modelling and supervised learning are central to modern data science. With predictions from an ever-expanding number of supervised black-box strategies - e.g., kernel methods, random forests, deep learning aka neural networks - being employed as a basis for decision making processes, it is crucial to understand the statistical uncertainty associated with these predictions. As a genera… ▽ More Predictive modelling and supervised learning are central to modern data science. With predictions from an ever-expanding number of supervised black-box strategies - e.g., kernel methods, random forests, deep learning aka neural networks - being employed as a basis for decision making processes, it is crucial to understand the statistical uncertainty associated with these predictions. As a general means to approach the issue, we present an overarching framework for black-box prediction strategies that not only predict the target but also their own predictions' uncertainty. Moreover, the framework allows for fair assessment and comparison of disparate prediction strategies. For this, we formally consider strategies capable of predicting full distributions from feature variables, so-called probabilistic supervised learning strategies. Our work draws from prior work including Bayesian statistics, information theory, and modern supervised machine learning, and in a novel synthesis leads to (a) new theoretical insights such as a probabilistic bias-variance decomposition and an entropic formulation of prediction, as well as to (b) new algorithms and meta-algorithms, such as composite prediction strategies, probabilistic boosting and bagging, and a probabilistic predictive independence test. Our black-box formulation also leads (c) to a new modular interface view on probabilistic supervised learning and a modelling workflow API design, which we have implemented in the newly released skpro machine learning toolbox, extending the familiar modelling interface and meta-modelling functionality of sklearn. The skpro package provides interfaces for construction, composition, and tuning of probabilistic supervised learning strategies, together with orchestration features for validation and comparison of any such strategy - be it frequentist, Bayesian, or other. △ Less

Submitted 7 May, 2019; v1 submitted 2 January, 2018; originally announced January 2018.

arXiv:1708.09708 [pdf, ps, other]

Sketching the order of events

Authors: Terry Lyons, Harald Oberhauser

Abstract: We introduce features for massive data streams. These stream features can be thought of as "ordered moments" and generalize stream sketches from "moments of order one" to "ordered moments of arbitrary order". In analogy to classic moments, they have theoretical guarantees such as universality that are important for learning algorithms. We introduce features for massive data streams. These stream features can be thought of as "ordered moments" and generalize stream sketches from "moments of order one" to "ordered moments of arbitrary order". In analogy to classic moments, they have theoretical guarantees such as universality that are important for learning algorithms. △ Less

Submitted 31 August, 2017; originally announced August 2017.

arXiv:1601.08169 [pdf, ps, other]

Kernels for sequentially ordered data

Authors: Franz J Király, Harald Oberhauser

Abstract: We present a novel framework for kernel learning with sequential data of any kind, such as time series, sequences of graphs, or strings. Our approach is based on signature features which can be seen as an ordered variant of sample (cross-)moments; it allows to obtain a "sequentialized" version of any static kernel. The sequential kernels are efficiently computable for discrete sequences and are sh… ▽ More We present a novel framework for kernel learning with sequential data of any kind, such as time series, sequences of graphs, or strings. Our approach is based on signature features which can be seen as an ordered variant of sample (cross-)moments; it allows to obtain a "sequentialized" version of any static kernel. The sequential kernels are efficiently computable for discrete sequences and are shown to approximate a continuous moment form in a sampling sense. A number of known kernels for sequences arise as "sequentializations" of suitable static kernels: string kernels may be obtained as a special case, and alignment kernels are closely related up to a modification that resolves their open non-definiteness issue. Our experiments indicate that our signature-based sequential kernel framework may be a promising approach to learning with sequential data, such as time series, that allows to avoid extensive manual pre-processing. △ Less

Submitted 29 January, 2016; originally announced January 2016.

Showing 1–23 of 23 results for author: Oberhauser, H