Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Peter Spirtes

The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In large-scale complex dynamical systems such as the Earth system, real experiments are rarely feasible. However, a... more
The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In large-scale complex dynamical systems such as the Earth system, real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated data opens up the use of novel data-driven causal methods beyond the commonly adopted correlation techniques. Here, we give an overview of causal inference frameworks and identify promising generic application cases common in Earth system sciences and beyond. We discuss challenges and initiate the benchmark platform causeme.net to close the gap between method users and developers.
DNA microarrays are perfectly suited for comparing gene expression in different populations of cells. An important application of microarray techniques is identifying genes which are activated by a particular drug of interest. This... more
DNA microarrays are perfectly suited for comparing gene expression in different populations of cells. An important application of microarray techniques is identifying genes which are activated by a particular drug of interest. This process will allow biologists to identify therapies targeted to particular diseases, and, eventually, to gain more knowledge about the biological processes in organisms. Such an application is described in this paper. It is focused on diabetes and obesity, which is a genetically heterogeneous disease, meaning that multiple defective genes are responsible for the diseases. The paper is divided in three parts, each dealing with a different problem addressed to our study. First we validate the data from our microarray experiment. We identified significant systematic sources of variability which are potentially issues for other microarray datasets. Second, we applied multiple hypothesis testing to identify differentially expressed genes. We found a set of gen...
Publisher Summary This chapter discusses the mathematical foundations of the TETRAD program. The TETRAD program uses a modification of a well-known path finding algorithm to compute the tetrad equations a model implies. It first... more
Publisher Summary This chapter discusses the mathematical foundations of the TETRAD program. The TETRAD program uses a modification of a well-known path finding algorithm to compute the tetrad equations a model implies. It first calculates the open paths and the treks between each pair of variables in the graph. The program then calculates, for each pair u,v of measured variables, the trek sum for that pair. The products of the trek sums are then compared to determine whether or not they constitute an algebraic identity. If they do, the model implies the corresponding tetrad equation. To determine the sets of suggested trek additions to an initial model, the program proceeds through all foursomes of measured variables. For each foursome, it locates an appropriate subgraph of the initial model and determines whether or not to recommend the addition of a trek between any pair of variables in the subgraph. The program's recommendations do not distinguish between directed edges connecting two measured variables in either direction or the introduction of a new error variable connected to two measured variables. The program distinguishes, however, between these sorts of modifications and the addition of a directed edge between a latent variable and a measured variable.
Over the last two decades, a fundamental outline of a theory of causal inference has emerged. However, this theory does not consider the following problem. Sometimes two or more measured variables are deterministic functions of one... more
Over the last two decades, a fundamental outline of a theory of causal inference has emerged. However, this theory does not consider the following problem. Sometimes two or more measured variables are deterministic functions of one another, not deliberately, but because of redundant measurements. In these cases, manipulation of an observed defined variable may actually be an ambiguous description of a manipulation of some underlying variables, although the manipulator does not know that this is the case. In this article we revisit the question of precisely characterizing conditions and assumptions under which reliable inference about the effects of manipulations is possible, even when the possibility of “ambiguous manipulations” is allowed.
This paper aims to give a broad coverage of central concepts and principles involved in automated causal inference and emerging approaches to causal discovery from i.i.d data and from time series. After reviewing concepts including... more
This paper aims to give a broad coverage of central concepts and principles involved in automated causal inference and emerging approaches to causal discovery from i.i.d data and from time series. After reviewing concepts including manipulations, causal models, sample predictive modeling, causal predictive modeling, and structural equation models, we present the constraint-based approach to causal discovery, which relies on the conditional independence relationships in the data, and discuss the assumptions underlying its validity. We then focus on causal discovery based on structural equations models, in which a key issue is the identifiability of the causal structure implied by appropriately defined structural equation models: in the two-variable case, under what conditions (and why) is the causal direction between the two variables identifiable? We show that the independence between the error term and causes, together with appropriate structural constraints on the structural equat...
The Trek Separation Theorem (Sullivant et al. 2010) states necessary and sufficient conditions for a linear directed acyclic graphical model to entail for all possible values of its linear coefficients that the rank of various... more
The Trek Separation Theorem (Sullivant et al. 2010) states necessary and sufficient conditions for a linear directed acyclic graphical model to entail for all possible values of its linear coefficients that the rank of various sub-matrices of the covariance matrix is less than or equal to n, for any given n. In this paper, I extend the Trek Separation Theorem in two ways: I prove that the same necessary and sufficient conditions apply even when the generating model is partially non-linear and contains some cycles. This justifies application of constraint-based causal search algorithms to data generated by a wider class of causal models that may contain non-linear and cyclic relations among the latent variables.
Discovering causal structure from observational data in the presence of latent variables remains an active research area. Constraint-based causal discovery algorithms are relatively efficient at discovering such causal models from data... more
Discovering causal structure from observational data in the presence of latent variables remains an active research area. Constraint-based causal discovery algorithms are relatively efficient at discovering such causal models from data using independence tests. Typically, however, they derive and output only one such model. In contrast, Bayesian methods can generate and probabilistically score multiple models, outputting the most probable one; however, they are often computationally infeasible to apply when modeling latent variables. We introduce a hybrid method that derives a Bayesian probability that the set of independence tests associated with a given causal model are jointly correct. Using this constraint-based scoring method, we are able to score multiple causal models, which possibly contain latent variables, and output the most probable one. The structure-discovery performance of the proposed method is compared to an existing constraint-based method (RFCI) using data generat...
We present an algorithm for estimating bounds on causal effects from observational data which combines graphical model search with simple linear regression. We assume that the underlying system can be represented by a linear structural... more
We present an algorithm for estimating bounds on causal effects from observational data which combines graphical model search with simple linear regression. We assume that the underlying system can be represented by a linear structural equation model with no feedback, and we allow for the possibility of latent variables. Under assumptions standard in the causal search literature, we use conditional independence constraints to search for an equivalence class of ancestral graphs. Then, for each model in the equivalence class, we perform the appropriate regression (using causal structure information to determine which covariates to include in the regression) to estimate a set of possible causal effects. Our approach is based on the "IDA" procedure of Maathuis et al. (2009), which assumes that all relevant variables have been measured (i.e., no unmeasured confounders). We generalize their work by relaxing this assumption, which is often violated in applied contexts. We validat...
Researchers routinely face the problem of inferring causal relationships from large amounts of data, sometimes involving hundreds of variables. Often, it is the causal relationships between"... more
Researchers routinely face the problem of inferring causal relationships from large amounts of data, sometimes involving hundreds of variables. Often, it is the causal relationships between" latent"(unmeasured) variables that are of primary interest. The problem is how causal relationships between unmeasured variables can be inferred from measured data. For example, naval manpower researchers have been asked to infer the causal relations among psychological traits such as job satisfaction and job challenge from a data base in ...
Worst case complexity analyses of algorithms are sometimes held to be less informative about the real difficulty of computation than are expected complexity analyses. We show that the two most common representations of problem solving in... more
Worst case complexity analyses of algorithms are sometimes held to be less informative about the real difficulty of computation than are expected complexity analyses. We show that the two most common representations of problem solving in cognitive science each admit algorithms that have constant expected complexity, and for one of these representations we obtain constant expected complexity bounds under a variety of probability measures.
WW useful response. We plan to implement tests based on his derivation of sampling distributions for vanishing tetrad differences, and to check their behavior at the first opportunity. We find Bentler and Chou's sketch of an... more
WW useful response. We plan to implement tests based on his derivation of sampling distributions for vanishing tetrad differences, and to check their behavior at the first opportunity. We find Bentler and Chou's sketch of an alternative search procedure using EQS interesting, and we hope they make it fully algorithmic and test it. If the procedure works we hope we have had some role in provoking its development. We also hope that we have provoked more statisticians to explicitly consider the design, the reliability, and the ...
For all the ferocity of the denunciation of sample based causal inference, it is hard to find any sober analysis that justifies the conviction that reliable inference of this kind is impossible. There are worst-case arguments that point... more
For all the ferocity of the denunciation of sample based causal inference, it is hard to find any sober analysis that justifies the conviction that reliable inference of this kind is impossible. There are worst-case arguments that point out the unreliability of data based inference if the ...
Disjoint sets of vertices X and Y are d-separated given S in G if and only if every member of X is d-separated from every member of Y given S in G. If distribution P satisfies the Markov and Faithfulness Conditions, then for disjoint sets... more
Disjoint sets of vertices X and Y are d-separated given S in G if and only if every member of X is d-separated from every member of Y given S in G. If distribution P satisfies the Markov and Faithfulness Conditions, then for disjoint sets of vertices X, Y, and S, X is independent of Y ...
Vector Autoregressions (VARs) are a class of time series models commonly used in econometrics to study the dynamic effect of exogenous shocks to the economy. While the estimation of a VAR is straightforward, there is a problem of finding... more
Vector Autoregressions (VARs) are a class of time series models commonly used in econometrics to study the dynamic effect of exogenous shocks to the economy. While the estimation of a VAR is straightforward, there is a problem of finding the transformation of the estimated model consistent with the causal relations among the contemporaneous variables. Such problem, which is a version
Abstract In both linear and nonlinear multiple regression, when regressors are correlated the existence of an unmeasured common cause of regressor X¡ and outcome variable Y may bias estimates of the influence of other regressors,... more
Abstract In both linear and nonlinear multiple regression, when regressors are correlated the existence of an unmeasured common cause of regressor X¡ and outcome variable Y may bias estimates of the influence of other regressors, X|<; variables having no influence on Y whatsoever may thereby be given significant regression coefficients. The bias may be quite large. Simulation studies show that standard regression model specification procedures make the same error. The strategy of regressing on a larger set of variables and checking ...
This Technical Report is brought to you for free and open access by the College of Humanities and Social Sciences at Research Showcase. It has been accepted for inclusion in Department of Philosophy by an authorized administrator of... more
This Technical Report is brought to you for free and open access by the College of Humanities and Social Sciences at Research Showcase. It has been accepted for inclusion in Department of Philosophy by an authorized administrator of Research Showcase. For ...
We investigate the asymptotic consistency of causal inference procedures in the framework of directed acyclic graphs (DAG's) as developed by Spirtes, Glymour and Scheines (SGS) and Pearl and Verma (PV). We show that there... more
We investigate the asymptotic consistency of causal inference procedures in the framework of directed acyclic graphs (DAG's) as developed by Spirtes, Glymour and Scheines (SGS) and Pearl and Verma (PV). We show that there exist" pointwise consistent" but not" uniformly consistent" procedures. These results have implications for making inferences based on finite sample sizes and for constructing valid confidence intervals for causal effects.
Logic, Methodology and Philosophy of Science IX D. Prawitz, B. Skyrms and D. Westerst~ thl 0Editors) 1994 Elsevier Science BV All rights reserved. 813 BUILDING CAUSAL GRAPHS FROM STATISTICAL DATA IN THE PRESENCE OF LATENT VARIABLES PETER... more
Logic, Methodology and Philosophy of Science IX D. Prawitz, B. Skyrms and D. Westerst~ thl 0Editors) 1994 Elsevier Science BV All rights reserved. 813 BUILDING CAUSAL GRAPHS FROM STATISTICAL DATA IN THE PRESENCE OF LATENT VARIABLES PETER ...
This article is reproduced from the previous edition, volume 12, pp. 8395–8400, © 2001, Elsevier Ltd. with an updated Bibliography section supplied by the Editor.
Recursive linear structural equation models can be represented by directed acyclic graphs. When represented in this way, they satisfy the Markov Condition. Hence it is possible to use the graphical d-separation to determine what... more
Recursive linear structural equation models can be represented by directed acyclic graphs. When represented in this way, they satisfy the Markov Condition. Hence it is possible to use the graphical d-separation to determine what conditional independence relations are entailed by a given linear structural equation model. I prove in this paper that it is also possible to use the graphical d-separation applied to a cyclic graph to determine what conditional independence relations are entailed to hold by a given non-recursive linear structural equation model. I also give a causal intepretation to the linear coefficients in a non-recursive structural equation models, and explore the relationships between cyclic graphs and undirected graphs, directed acyclic graphs with latent variables, and chain independence graphs.
Research Interests:
Research Interests:

And 189 more