Search | arXiv e-print repository

Identifiability of latent-variable and structural-equation models: from linear to nonlinear

Authors: Aapo Hyvärinen, Ilyes Khemakhem, Ricardo Monti

Abstract: An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable, i.e. some parameters cannot be uniquely estimated. In factor (component) analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, non-Gaussianity of the (latent) variables has been shown to pro… ▽ More An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable, i.e. some parameters cannot be uniquely estimated. In factor (component) analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, non-Gaussianity of the (latent) variables has been shown to provide identifiability. In the case of factor analysis, this leads to independent component analysis, while in the case of the direction of effect, non-Gaussian versions of structural equation modelling solve the problem. More recently, we have shown how even general nonparametric nonlinear versions of such models can be estimated. Non-Gaussianity is not enough in this case, but assuming we have time series, or that the distributions are suitably modulated by some observed auxiliary variables, the models are identifiable. This paper reviews the identifiability theory for the linear and nonlinear cases, considering both factor analytic models and structural equation models. △ Less

Submitted 3 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Comments: Revised final version of invited review to be published at Annals of the Institute of Statistical Mathematics

arXiv:2111.13424 [pdf, other]

ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics

Authors: Aiham Taleb, Matthias Kirchler, Remo Monti, Christoph Lippert

Abstract: High annotation costs are a substantial bottleneck in applying modern deep learning architectures to clinically relevant medical use cases, substantiating the need for novel algorithms to learn from unlabeled data. In this work, we propose ContIG, a self-supervised method that can learn from large datasets of unlabeled medical images and genetic data. Our approach aligns images and several genetic… ▽ More High annotation costs are a substantial bottleneck in applying modern deep learning architectures to clinically relevant medical use cases, substantiating the need for novel algorithms to learn from unlabeled data. In this work, we propose ContIG, a self-supervised method that can learn from large datasets of unlabeled medical images and genetic data. Our approach aligns images and several genetic modalities in the feature space using a contrastive loss. We design our method to integrate multiple modalities of each individual person in the same model end-to-end, even when the available modalities vary across individuals. Our procedure outperforms state-of-the-art self-supervised methods on all evaluated downstream benchmark tasks. We also adapt gradient-based explainability algorithms to better understand the learned cross-modal associations between the images and genetic modalities. Finally, we perform genome-wide association studies on the features learned by our models, uncovering interesting relationships between images and genetic data. △ Less

Submitted 26 November, 2021; originally announced November 2021.

arXiv:2107.05679 [pdf, other]

Teaching Design by Contract using Snap!

Authors: Marieke Huisman, Raúl E. Monti

Abstract: With the progress in deductive program verification research, new tools and techniques have become available to support design-by-contract reasoning about non-trivial programs written in widely-used programming languages. However, deductive program verification remains an activity for experts, with ample experience in programming, specification and verification. We would like to change this situat… ▽ More With the progress in deductive program verification research, new tools and techniques have become available to support design-by-contract reasoning about non-trivial programs written in widely-used programming languages. However, deductive program verification remains an activity for experts, with ample experience in programming, specification and verification. We would like to change this situation, by developing program verification techniques that are available to a larger audience. In this paper, we present how we developed prototypal program verification support for Snap!. Snap! is a visual programming language, aiming in particular at high school students. We added specification language constructs in a similar visual style, designed to make the intended semantics clear from the look and feel of the specification constructs. We provide support both for static and dynamic verification of Snap! programs. Special attention is given to the error messaging, to make this as intuitive as possible. △ Less

Submitted 12 July, 2021; originally announced July 2021.

arXiv:2011.02268 [pdf, other]

Causal Autoregressive Flows

Authors: Ilyes Khemakhem, Ricardo Pio Monti, Robert Leech, Aapo Hyvärinen

Abstract: Two apparently unrelated fields -- normalizing flows and causality -- have recently received considerable attention in the machine learning community. In this work, we highlight an intrinsic correspondence between a simple family of autoregressive normalizing flows and identifiable causal models. We exploit the fact that autoregressive flow architectures define an ordering over variables, analogou… ▽ More Two apparently unrelated fields -- normalizing flows and causality -- have recently received considerable attention in the machine learning community. In this work, we highlight an intrinsic correspondence between a simple family of autoregressive normalizing flows and identifiable causal models. We exploit the fact that autoregressive flow architectures define an ordering over variables, analogous to a causal ordering, to show that they are well-suited to performing a range of causal inference tasks, ranging from causal discovery to making interventional and counterfactual predictions. First, we show that causal models derived from both affine and additive autoregressive flows with fixed orderings over variables are identifiable, i.e. the true direction of causal influence can be recovered. This provides a generalization of the additive noise model well-known in causal discovery. Second, we derive a bivariate measure of causal direction based on likelihood ratios, leveraging the fact that flow models can estimate normalized log-densities of data. Third, we demonstrate that flows naturally allow for direct evaluation of both interventional and counterfactual queries, the latter case being possible due to the invertible nature of flows. Finally, throughout a series of experiments on synthetic and real data, the proposed method is shown to outperform current approaches for causal discovery as well as making accurate interventional and counterfactual predictions. △ Less

Submitted 24 February, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

Comments: Published at AISTATS2021. Code available at https://github.com/piomonti/carefl

arXiv:2007.09989 [pdf, other]

Bayesian optimization for automatic design of face stimuli

Authors: Pedro F. da Costa, Romy Lorenz, Ricardo Pio Monti, Emily Jones, Robert Leech

Abstract: Investigating the cognitive and neural mechanisms involved with face processing is a fundamental task in modern neuroscience and psychology. To date, the majority of such studies have focused on the use of pre-selected stimuli. The absence of personalized stimuli presents a serious limitation as it fails to account for how each individual face processing system is tuned to cultural embeddings or h… ▽ More Investigating the cognitive and neural mechanisms involved with face processing is a fundamental task in modern neuroscience and psychology. To date, the majority of such studies have focused on the use of pre-selected stimuli. The absence of personalized stimuli presents a serious limitation as it fails to account for how each individual face processing system is tuned to cultural embeddings or how it is disrupted in disease. In this work, we propose a novel framework which combines generative adversarial networks (GANs) with Bayesian optimization to identify individual response patterns to many different faces. Formally, we employ Bayesian optimization to efficiently search the latent space of state-of-the-art GAN models, with the aim to automatically generate novel faces, to maximize an individual subject's response. We present results from a web-based proof-of-principle study, where participants rated images of themselves generated via performing Bayesian optimization over the latent space of a GAN. We show how the algorithm can efficiently locate an individual's optimal face while mapping out their response across different semantic transformations of a face; inter-individual analyses suggest how the approach can provide rich information about individual differences in face processing. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: Accepted at ICML2020 workshop track

arXiv:2007.09390 [pdf, other]

Autoregressive flow-based causal discovery and inference

Authors: Ricardo Pio Monti, Ilyes Khemakhem, Aapo Hyvarinen

Abstract: We posit that autoregressive flow models are well-suited to performing a range of causal inference tasks - ranging from causal discovery to making interventional and counterfactual predictions. In particular, we exploit the fact that autoregressive architectures define an ordering over variables, analogous to a causal ordering, in order to propose a single flow architecture to perform all three af… ▽ More We posit that autoregressive flow models are well-suited to performing a range of causal inference tasks - ranging from causal discovery to making interventional and counterfactual predictions. In particular, we exploit the fact that autoregressive architectures define an ordering over variables, analogous to a causal ordering, in order to propose a single flow architecture to perform all three aforementioned tasks. We first leverage the fact that flow models estimate normalized log-densities of data to derive a bivariate measure of causal direction based on likelihood ratios. Whilst traditional measures of causal direction often require restrictive assumptions on the nature of causal relationships (e.g., linearity),the flexibility of flow models allows for arbitrary causal dependencies. Our approach compares favourably against alternative methods on synthetic data as well as on the Cause-Effect Pairs bench-mark dataset. Subsequently, we demonstrate that the invertible nature of flows naturally allows for direct evaluation of both interventional and counterfactual predictions, which require marginalization and conditioning over latent variables respectively. We present examples over synthetic data where autoregressive flows, when trained under the correct causal ordering, are able to make accurate interventional and counterfactual predictions △ Less

Submitted 26 July, 2020; v1 submitted 18 July, 2020; originally announced July 2020.

Comments: 6 pages, 3 figures. Accepted at the 2nd ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models

arXiv:2002.11537 [pdf, other]

ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA

Authors: Ilyes Khemakhem, Ricardo Pio Monti, Diederik P. Kingma, Aapo Hyvärinen

Abstract: We consider the identifiability theory of probabilistic models and establish sufficient conditions under which the representations learned by a very broad family of conditional energy-based models are unique in function space, up to a simple transformation. In our model family, the energy function is the dot-product between two feature extractors, one for the dependent variable, and one for the co… ▽ More We consider the identifiability theory of probabilistic models and establish sufficient conditions under which the representations learned by a very broad family of conditional energy-based models are unique in function space, up to a simple transformation. In our model family, the energy function is the dot-product between two feature extractors, one for the dependent variable, and one for the conditioning variable. We show that under mild conditions, the features are unique up to scaling and permutation. Our results extend recent developments in nonlinear ICA, and in fact, they lead to an important generalization of ICA models. In particular, we show that our model can be used for the estimation of the components in the framework of Independently Modulated Component Analysis (IMCA), a new generalization of nonlinear ICA that relaxes the independence assumption. A thorough empirical study shows that representations learned by our model from real-world image datasets are identifiable, and improve performance in transfer learning and semi-supervised learning tasks. △ Less

Submitted 26 October, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

Comments: Accepted for publication at NeurIPS 2020

arXiv:1911.00265 [pdf, other]

Robust contrastive learning and nonlinear ICA in the presence of outliers

Authors: Hiroaki Sasaki, Takashi Takenouchi, Ricardo Monti, Aapo Hyvärinen

Abstract: Nonlinear independent component analysis (ICA) is a general framework for unsupervised representation learning, and aimed at recovering the latent variables in data. Recent practical methods perform nonlinear ICA by solving a series of classification problems based on logistic regression. However, it is well-known that logistic regression is vulnerable to outliers, and thus the performance can be… ▽ More Nonlinear independent component analysis (ICA) is a general framework for unsupervised representation learning, and aimed at recovering the latent variables in data. Recent practical methods perform nonlinear ICA by solving a series of classification problems based on logistic regression. However, it is well-known that logistic regression is vulnerable to outliers, and thus the performance can be strongly weakened by outliers. In this paper, we first theoretically analyze nonlinear ICA models in the presence of outliers. Our analysis implies that estimation in nonlinear ICA can be seriously hampered when outliers exist on the tails of the (noncontaminated) target density, which happens in a typical case of contamination by outliers. We develop two robust nonlinear ICA methods based on the γ-divergence, which is a robust alternative to the KL-divergence in logistic regression. The proposed methods are shown to have desired robustness properties in the context of nonlinear ICA. We also experimentally demonstrate that the proposed methods are very robust and outperform existing methods in the presence of outliers. Finally, the proposed method is applied to ICA-based causal discovery and shown to find a plausible causal relationship on fMRI data. △ Less

Submitted 1 November, 2019; originally announced November 2019.

arXiv:1910.11672 [pdf, other]

Rare Event Simulation for non-Markovian repairable Fault Trees

Authors: Carlos E. Budde, Marco Biagi, Raúl E. Monti, Pedro R. D'Argenio, Mariëlle Stoelinga

Abstract: Dynamic Fault Trees (DFT) are widely adopted in industry to assess the dependability of safety-critical equipment. Since many systems are too large to be studied numerically, DFTs dependability is often analysed using Monte Carlo simulation. A bottleneck here is that many simulation samples are required in the case of rare events, e.g. in highly reliable systems where components fail seldomly. Rar… ▽ More Dynamic Fault Trees (DFT) are widely adopted in industry to assess the dependability of safety-critical equipment. Since many systems are too large to be studied numerically, DFTs dependability is often analysed using Monte Carlo simulation. A bottleneck here is that many simulation samples are required in the case of rare events, e.g. in highly reliable systems where components fail seldomly. Rare Event Simulation (RES) provides techniques to reduce the number of samples in the case of rare events. We present a RES technique based on importance splitting, to study failures in highly reliable DFTs. Whereas RES usually requires meta-information from an expert, our method is fully automatic: by cleverly exploiting the fault tree structure we extract the so-called importance function. We handle DFTs with Markovian and non-Markovian failure and repair distributions (for which no numerical methods exist) and show the efficiency of our approach on several case studies. △ Less

Submitted 28 October, 2019; v1 submitted 23 October, 2019; originally announced October 2019.

ACM Class: D.2.4; I.6.1; I.6.8

arXiv:1910.10507 [pdf, other]

A compositional semantics for Repairable Fault Trees with general distributions

Authors: Raul E. Monti, Pedro R. D'Argenio, Carlos E. Budde

Abstract: Fault Tree Analysis (FTA) is a prominent technique in industrial and scientific risk assessment. Repairable Fault Trees (RFT) enhance the classical Fault Tree (FT) model by introducing the possibility to describe complex dependent repairs of system components. Usual frameworks for analyzing FTs such as BDD, SBDD, and Markov chains fail to assess the desired properties over RFT complex models, eith… ▽ More Fault Tree Analysis (FTA) is a prominent technique in industrial and scientific risk assessment. Repairable Fault Trees (RFT) enhance the classical Fault Tree (FT) model by introducing the possibility to describe complex dependent repairs of system components. Usual frameworks for analyzing FTs such as BDD, SBDD, and Markov chains fail to assess the desired properties over RFT complex models, either because these become too large, or due to cyclic behaviour introduced by dependent repairs. Simulation is another way to carry out this kind of analysis. In this paper we review the RFT model with Repair Boxes as introduced by Daniele Codetta-Raiteri. We present compositional semantics for this model in terms of Input/Output Stochastic Automata, which allows for the modelling of events occurring according to general continuous distribution. Moreover, we prove that the semantics generates (weakly) deterministic models, hence suitable for discrete event simulation, and prominently for Rare Event Simulation using the FIG tool. △ Less

Submitted 23 October, 2019; originally announced October 2019.

arXiv:1907.04809 [pdf, other]

Variational Autoencoders and Nonlinear ICA: A Unifying Framework

Authors: Ilyes Khemakhem, Diederik P. Kingma, Ricardo Pio Monti, Aapo Hyvärinen

Abstract: The framework of variational autoencoders allows us to efficiently learn deep latent-variable models, such that the model's marginal distribution over observed variables fits the data. Often, we're interested in going a step further, and want to approximate the true joint distribution over observed and latent variables, including the true prior and posterior distributions over latent variables. Th… ▽ More The framework of variational autoencoders allows us to efficiently learn deep latent-variable models, such that the model's marginal distribution over observed variables fits the data. Often, we're interested in going a step further, and want to approximate the true joint distribution over observed and latent variables, including the true prior and posterior distributions over latent variables. This is known to be generally impossible due to unidentifiability of the model. We address this issue by showing that for a broad family of deep latent-variable models, identification of the true joint distribution over observed and latent variables is actually possible up to very simple transformations, thus achieving a principled and powerful form of disentanglement. Our result requires a factorized prior distribution over the latent variables that is conditioned on an additionally observed variable, such as a class label or almost any other observation. We build on recent developments in nonlinear ICA, which we extend to the case with noisy, undercomplete or discrete observations, integrated in a maximum likelihood framework. The result also trivially contains identifiable flow-based generative models as a special case. △ Less

Submitted 21 December, 2020; v1 submitted 10 July, 2019; originally announced July 2019.

Comments: Accepted for publication at AISTATS 2020. This is a slightly updated version of the published manuscript; see Corrigendum at the end of the paper

Journal ref: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, pages 2207-2217, year 2020

arXiv:1904.09096 [pdf, other]

Causal Discovery with General Non-Linear Relationships Using Non-Linear ICA

Authors: Ricardo Pio Monti, Kun Zhang, Aapo Hyvarinen

Abstract: We consider the problem of inferring causal relationships between two or more passively observed variables. While the problem of such causal discovery has been extensively studied especially in the bivariate setting, the majority of current methods assume a linear causal relationship, and the few methods which consider non-linear dependencies usually make the assumption of additive noise. Here, we… ▽ More We consider the problem of inferring causal relationships between two or more passively observed variables. While the problem of such causal discovery has been extensively studied especially in the bivariate setting, the majority of current methods assume a linear causal relationship, and the few methods which consider non-linear dependencies usually make the assumption of additive noise. Here, we propose a framework through which we can perform causal discovery in the presence of general non-linear relationships. The proposed method is based on recent progress in non-linear independent component analysis and exploits the non-stationarity of observations in order to recover the underlying sources or latent disturbances. We show rigorously that in the case of bivariate causal discovery, such non-linear ICA can be used to infer the causal direction via a series of independence tests. We further propose an alternative measure of causal direction based on asymptotic approximations to the likelihood ratio, as well as an extension to multivariate causal discovery. We demonstrate the capabilities of the proposed method via a series of simulation studies and conclude with an application to neuroimaging data. △ Less

Submitted 19 April, 2019; originally announced April 2019.

arXiv:1808.02777 [pdf, other]

Input/Output Stochastic Automata with Urgency: Confluence and weak determinism

Authors: Pedro R. D'Argenio, Raúl E. Monti

Abstract: In a previous work, we introduced an input/output variant of stochastic automata (IOSA) that, once the model is closed (i.e., all synchronizations are resolved), the resulting automaton is fully stochastic, that is, it does not contain non-deterministic choices. However, such variant is not sufficiently versatile for compositional modelling. In this article, we extend IOSA with urgent actions. Thi… ▽ More In a previous work, we introduced an input/output variant of stochastic automata (IOSA) that, once the model is closed (i.e., all synchronizations are resolved), the resulting automaton is fully stochastic, that is, it does not contain non-deterministic choices. However, such variant is not sufficiently versatile for compositional modelling. In this article, we extend IOSA with urgent actions. This extension greatly increases the modularization of the models, allowing to take better advantage on compositionality than its predecessor. However, this extension introduces non-determinism even in closed models. We first show that confluent models are weakly deterministic in the sense that, regardless the resolution of the non-determinism, the stochastic behaviour is the same. In addition, we provide sufficient conditions to ensure that a network of interacting IOSAs is confluent without the need to obtain the composed IOSA. △ Less

Submitted 17 August, 2018; v1 submitted 8 August, 2018; originally announced August 2018.

arXiv:1805.09567 [pdf, other]

A Unified Probabilistic Model for Learning Latent Factors and Their Connectivities from High-Dimensional Data

Authors: Ricardo Pio Monti, Aapo Hyvärinen

Abstract: Connectivity estimation is challenging in the context of high-dimensional data. A useful preprocessing step is to group variables into clusters, however, it is not always clear how to do so from the perspective of connectivity estimation. Another practical challenge is that we may have data from multiple related classes (e.g., multiple subjects or conditions) and wish to incorporate constraints on… ▽ More Connectivity estimation is challenging in the context of high-dimensional data. A useful preprocessing step is to group variables into clusters, however, it is not always clear how to do so from the perspective of connectivity estimation. Another practical challenge is that we may have data from multiple related classes (e.g., multiple subjects or conditions) and wish to incorporate constraints on the similarities across classes. We propose a probabilistic model which simultaneously performs both a grouping of variables (i.e., detecting community structure) and estimation of connectivities between the groups which correspond to latent variables. The model is essentially a factor analysis model where the factors are allowed to have arbitrary correlations, while the factor loading matrix is constrained to express a community structure. The model can be applied on multiple classes so that the connectivities can be different between the classes, while the community structure is the same for all classes. We propose an efficient estimation algorithm based on score matching, and prove the identifiability of the model. Finally, we present an extension to directed (causal) connectivities over latent variables. Simulations and experiments on fMRI data validate the practical utility of the method. △ Less

Submitted 24 May, 2018; originally announced May 2018.

Comments: 13 pages, 6 figures. To appear in UAI 2018

arXiv:1610.09127 [pdf, other]

Adaptive regularization for Lasso models in the context of non-stationary data streams

Authors: Ricardo Pio Monti, Christoforos Anagnostopoulos, Giovanni Montana

Abstract: Large scale, streaming datasets are ubiquitous in modern machine learning. Streaming algorithms must be scalable, amenable to incremental training and robust to the presence of non-stationarity. In this work consider the problem of learning $\ell_1$ regularized linear models in the context of streaming data. In particular, the focus of this work revolves around how to select the regularization par… ▽ More Large scale, streaming datasets are ubiquitous in modern machine learning. Streaming algorithms must be scalable, amenable to incremental training and robust to the presence of non-stationarity. In this work consider the problem of learning $\ell_1$ regularized linear models in the context of streaming data. In particular, the focus of this work revolves around how to select the regularization parameter when data arrives sequentially and the underlying distribution is non-stationary (implying the choice of optimal regularization parameter is itself time-varying). We propose a framework through which to infer an adaptive regularization parameter. Our approach employs an $\ell_1$ penalty constraint where the corresponding sparsity parameter is iteratively updated via stochastic gradient descent. This serves to reformulate the choice of regularization parameter in a principled framework for online learning. The proposed method is derived for linear regression and subsequently extended to generalized linear models. We validate our approach using simulated and real datasets and present an application to a neuroimaging dataset. △ Less

Submitted 14 December, 2017; v1 submitted 28 October, 2016; originally announced October 2016.

Comments: 20 pages, 3 figures. arXiv admin note: text overlap with arXiv:1511.02187

arXiv:1605.00223 [pdf, other]

Text-mining the NeuroSynth corpus using Deep Boltzmann Machines

Authors: Ricardo Pio Monti, Romy Lorenz, Robert Leech, Christoforos Anagnostopoulos, Giovanni Montana

Abstract: Large-scale automated meta-analysis of neuroimaging data has recently established itself as an important tool in advancing our understanding of human brain function. This research has been pioneered by NeuroSynth, a database collecting both brain activation coordinates and associated text across a large cohort of neuroimaging research papers. One of the fundamental aspects of such meta-analysis is… ▽ More Large-scale automated meta-analysis of neuroimaging data has recently established itself as an important tool in advancing our understanding of human brain function. This research has been pioneered by NeuroSynth, a database collecting both brain activation coordinates and associated text across a large cohort of neuroimaging research papers. One of the fundamental aspects of such meta-analysis is text-mining. To date, word counts and more sophisticated methods such as Latent Dirichlet Allocation have been proposed. In this work we present an unsupervised study of the NeuroSynth text corpus using Deep Boltzmann Machines (DBMs). The use of DBMs yields several advantages over the aforementioned methods, principal among which is the fact that it yields both word and document embeddings in a high-dimensional vector space. Such embeddings serve to facilitate the use of traditional machine learning techniques on the text corpus. The proposed DBM model is shown to learn embeddings with a clear semantic structure. △ Less

Submitted 1 May, 2016; originally announced May 2016.

Comments: 4 pages, 1 figure

Showing 1–16 of 16 results for author: Monti, R