-
Synthetic data generation for a longitudinal cohort study -- Evaluation, method extension and reproduction of published data analysis results
Authors:
Lisa Kühnel,
Julian Schneider,
Ines Perrar,
Tim Adams,
Fabian Prasser,
Ute Nöthlings,
Holger Fröhlich,
Juliane Fluck
Abstract:
Access to individual-level health data is essential for gaining new insights and advancing science. In particular, modern methods based on artificial intelligence rely on the availability of and access to large datasets. In the health sector, access to individual-level data is often challenging due to privacy concerns. A promising alternative is the generation of fully synthetic data, i.e. data ge…
▽ More
Access to individual-level health data is essential for gaining new insights and advancing science. In particular, modern methods based on artificial intelligence rely on the availability of and access to large datasets. In the health sector, access to individual-level data is often challenging due to privacy concerns. A promising alternative is the generation of fully synthetic data, i.e. data generated through a randomised process that have similar statistical properties as the original data, but do not have a one-to-one correspondence with the original individual-level records. In this study, we use a state-of-the-art synthetic data generation method and perform in-depth quality analyses of the generated data for a specific use case in the field of nutrition. We demonstrate the need for careful analyses of synthetic data that go beyond descriptive statistics and provide valuable insights into how to realise the full potential of synthetic datasets. By extending the methods, but also by thoroughly analysing the effects of sampling from a trained model, we are able to largely reproduce significant real-world analysis results in the chosen use case.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
BERT WEAVER: Using WEight AVERaging to enable lifelong learning for transformer-based models in biomedical semantic search engines
Authors:
Lisa Kühnel,
Alexander Schulz,
Barbara Hammer,
Juliane Fluck
Abstract:
Recent developments in transfer learning have boosted the advancements in natural language processing tasks. The performance is, however, dependent on high-quality, manually annotated training data. Especially in the biomedical domain, it has been shown that one training corpus is not enough to learn generic models that are able to efficiently predict on new data. Therefore, in order to be used in…
▽ More
Recent developments in transfer learning have boosted the advancements in natural language processing tasks. The performance is, however, dependent on high-quality, manually annotated training data. Especially in the biomedical domain, it has been shown that one training corpus is not enough to learn generic models that are able to efficiently predict on new data. Therefore, in order to be used in real world applications state-of-the-art models need the ability of lifelong learning to improve performance as soon as new data are available - without the need of re-training the whole model from scratch. We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model, thereby reducing catastrophic forgetting. We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once, while being computationally more efficient. Because there is no need of data sharing, the presented method is also easily applicable to federated learning settings and can for example be beneficial for the mining of electronic health records from different clinics.
△ Less
Submitted 31 October, 2023; v1 submitted 21 February, 2022;
originally announced February 2022.
-
Stochastic Image Deformation in Frequency Domain and Parameter Estimation using Moment Evolutions
Authors:
Line Kühnel,
Alexis Arnaudon,
Tom Fletcher,
Stefan Sommer
Abstract:
Modelling deformation of anatomical objects observed in medical images can help describe disease progression patterns and variations in anatomy across populations. We apply a stochastic generalisation of the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework to model differences in the evolution of anatomical objects detected in populations of image data. The computational challenges…
▽ More
Modelling deformation of anatomical objects observed in medical images can help describe disease progression patterns and variations in anatomy across populations. We apply a stochastic generalisation of the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework to model differences in the evolution of anatomical objects detected in populations of image data. The computational challenges that are prevalent even in the deterministic LDDMM setting are handled by extending the FLASH LDDMM representation to the stochastic setting keeping a finite discretisation of the infinite dimensional space of image deformations. In this computationally efficient setting, we perform estimation to infer parameters for noise correlations and local variability in datasets of images. Fundamental for the optimisation procedure is using the finite dimensional Fourier representation to derive approximations of the evolution of moments for the stochastic warps. Particularly, the first moment allows us to infer deformation mean trajectories. The second moment encodes variation around the mean, and thus provides information on the noise correlation. We show on simulated datasets of 2D MR brain images that the estimation algorithm can successfully recover parameters of the stochastic model.
△ Less
Submitted 13 December, 2018;
originally announced December 2018.
-
Latent Space Non-Linear Statistics
Authors:
Line Kuhnel,
Tom Fletcher,
Sarang Joshi,
Stefan Sommer
Abstract:
Given data, deep generative models, such as variational autoencoders (VAE) and generative adversarial networks (GAN), train a lower dimensional latent representation of the data space. The linear Euclidean geometry of data space pulls back to a nonlinear Riemannian geometry on the latent space. The latent space thus provides a low-dimensional nonlinear representation of data and classical linear s…
▽ More
Given data, deep generative models, such as variational autoencoders (VAE) and generative adversarial networks (GAN), train a lower dimensional latent representation of the data space. The linear Euclidean geometry of data space pulls back to a nonlinear Riemannian geometry on the latent space. The latent space thus provides a low-dimensional nonlinear representation of data and classical linear statistical techniques are no longer applicable. In this paper we show how statistics of data in their latent space representation can be performed using techniques from the field of nonlinear manifold statistics. Nonlinear manifold statistics provide generalizations of Euclidean statistical notions including means, principal component analysis, and maximum likelihood fits of parametric probability distributions. We develop new techniques for maximum likelihood inference in latent space, and adress the computational complexity of using geometric algorithms with high-dimensional data by training a separate neural network to approximate the Riemannian metric and cometric tensor capturing the shape of the learned data manifold.
△ Less
Submitted 19 May, 2018;
originally announced May 2018.
-
Differential geometry and stochastic dynamics with deep learning numerics
Authors:
Line Kühnel,
Alexis Arnaudon,
Stefan Sommer
Abstract:
In this paper, we demonstrate how deterministic and stochastic dynamics on manifolds, as well as differential geometric constructions can be implemented concisely and efficiently using modern computational frameworks that mix symbolic expressions with efficient numerical computations. In particular, we use the symbolic expression and automatic differentiation features of the python library Theano,…
▽ More
In this paper, we demonstrate how deterministic and stochastic dynamics on manifolds, as well as differential geometric constructions can be implemented concisely and efficiently using modern computational frameworks that mix symbolic expressions with efficient numerical computations. In particular, we use the symbolic expression and automatic differentiation features of the python library Theano, originally developed for high-performance computations in deep learning. We show how various aspects of differential geometry and Lie group theory, connections, metrics, curvature, left/right invariance, geodesics and parallel transport can be formulated with Theano using the automatic computation of derivatives of any order. We will also show how symbolic stochastic integrators and concepts from non-linear statistics can be formulated and optimized with only a few lines of code. We will then give explicit examples on low-dimensional classical manifolds for visualization and demonstrate how this approach allows both a concise implementation and efficient scaling to high dimensional problems.
△ Less
Submitted 22 December, 2017;
originally announced December 2017.
-
Computational Anatomy in Theano
Authors:
Line Kühnel,
Stefan Sommer
Abstract:
To model deformation of anatomical shapes, non-linear statistics are required to take into account the non-linear structure of the data space. Computer implementations of non-linear statistics and differential geometry algorithms often lead to long and complex code sequences. The aim of the paper is to show how the Theano framework can be used for simple and concise implementation of complex diffe…
▽ More
To model deformation of anatomical shapes, non-linear statistics are required to take into account the non-linear structure of the data space. Computer implementations of non-linear statistics and differential geometry algorithms often lead to long and complex code sequences. The aim of the paper is to show how the Theano framework can be used for simple and concise implementation of complex differential geometry algorithms while being able to handle complex and high-dimensional data structures. We show how the Theano framework meets both of these requirements. The framework provides a symbolic language that allows mathematical equations to be directly translated into Theano code, and it is able to perform both fast CPU and GPU computations on high-dimensional data. We show how different concepts from non-linear statistics and differential geometry can be implemented in Theano, and give examples of the implemented theory visualized on landmark representations of Corpus Callosum shapes.
△ Less
Submitted 15 June, 2017;
originally announced June 2017.
-
Bridge Simulation and Metric Estimation on Landmark Manifolds
Authors:
Stefan Sommer,
Alexis Arnaudon,
Line Kuhnel,
Sarang Joshi
Abstract:
We present an inference algorithm and connected Monte Carlo based estimation procedures for metric estimation from landmark configurations distributed according to the transition distribution of a Riemannian Brownian motion arising from the Large Deformation Diffeomorphic Metric Mapping (LDDMM) metric. The distribution possesses properties similar to the regular Euclidean normal distribution but i…
▽ More
We present an inference algorithm and connected Monte Carlo based estimation procedures for metric estimation from landmark configurations distributed according to the transition distribution of a Riemannian Brownian motion arising from the Large Deformation Diffeomorphic Metric Mapping (LDDMM) metric. The distribution possesses properties similar to the regular Euclidean normal distribution but its transition density is governed by a high-dimensional PDE with no closed-form solution in the nonlinear case. We show how the density can be numerically approximated by Monte Carlo sampling of conditioned Brownian bridges, and we use this to estimate parameters of the LDDMM kernel and thus the metric structure by maximum likelihood.
△ Less
Submitted 31 May, 2017;
originally announced May 2017.
-
A Statistical Model for Simultaneous Template Estimation, Bias Correction, and Registration of 3D Brain Images
Authors:
Akshay Pai,
Stefan Sommer,
Lars Lau Raket,
Line Kühnel,
Sune Darkner,
Lauge Sørensen,
Mads Nielsen
Abstract:
Template estimation plays a crucial role in computational anatomy since it provides reference frames for performing statistical analysis of the underlying anatomical population variability. While building models for template estimation, variability in sites and image acquisition protocols need to be accounted for. To account for such variability, we propose a generative template estimation model t…
▽ More
Template estimation plays a crucial role in computational anatomy since it provides reference frames for performing statistical analysis of the underlying anatomical population variability. While building models for template estimation, variability in sites and image acquisition protocols need to be accounted for. To account for such variability, we propose a generative template estimation model that makes simultaneous inference of both bias fields in individual images, deformations for image registration, and variance hyperparameters. In contrast, existing maximum a posterori based methods need to rely on either bias-invariant similarity measures or robust image normalization. Results on synthetic and real brain MRI images demonstrate the capability of the model to capture heterogeneity in intensities and provide a reliable template estimation from registration.
△ Less
Submitted 1 May, 2017;
originally announced May 2017.
-
Stochastic Development Regression on Non-Linear Manifolds
Authors:
Line Kühnel,
Stefan Sommer
Abstract:
We introduce a regression model for data on non-linear manifolds. The model describes the relation between a set of manifold valued observations, such as shapes of anatomical objects, and Euclidean explanatory variables. The approach is based on stochastic development of Euclidean diffusion processes to the manifold. Defining the data distribution as the transition distribution of the mapped stoch…
▽ More
We introduce a regression model for data on non-linear manifolds. The model describes the relation between a set of manifold valued observations, such as shapes of anatomical objects, and Euclidean explanatory variables. The approach is based on stochastic development of Euclidean diffusion processes to the manifold. Defining the data distribution as the transition distribution of the mapped stochastic process, parameters of the model, the non-linear analogue of design matrix and intercept, are found via maximum likelihood. The model is intrinsically related to the geometry encoded in the connection of the manifold. We propose an estimation procedure which applies the Laplace approximation of the likelihood function. A simulation study of the performance of the model is performed and the model is applied to a real dataset of Corpus Callosum shapes.
△ Less
Submitted 1 March, 2017;
originally announced March 2017.
-
Most Likely Separation of Intensity and Warping Effects in Image Registration
Authors:
Line Kühnel,
Stefan Sommer,
Akshay Pai,
Lars Lau Raket
Abstract:
This paper introduces a class of mixed-effects models for joint modeling of spatially correlated intensity variation and warping variation in 2D images. Spatially correlated intensity variation and warp variation are modeled as random effects, resulting in a nonlinear mixed-effects model that enables simultaneous estimation of template and model parameters by optimization of the likelihood functio…
▽ More
This paper introduces a class of mixed-effects models for joint modeling of spatially correlated intensity variation and warping variation in 2D images. Spatially correlated intensity variation and warp variation are modeled as random effects, resulting in a nonlinear mixed-effects model that enables simultaneous estimation of template and model parameters by optimization of the likelihood function. We propose an algorithm for fitting the model which alternates estimation of variance parameters and image registration. This approach avoids the potential estimation bias in the template estimate that arises when treating registration as a preprocessing step. We apply the model to datasets of facial images and 2D brain magnetic resonance images to illustrate the simultaneous estimation and prediction of intensity and warp effects.
△ Less
Submitted 15 March, 2017; v1 submitted 18 April, 2016;
originally announced April 2016.