research-article

Identifying patient-specific root causes of disease

Authors:

Eric V. Strobl,

Thomas A. LaskoAuthors Info & Claims

BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

Article No.: 18, Pages 1 - 10

https://doi.org/10.1145/3535508.3545553

Published: 07 August 2022 Publication History

Abstract

Complex diseases are caused by a multitude of factors that may differ between patients. As a result, hypothesis tests comparing all patients to all healthy controls can detect many significant variables with inconsequential effect sizes. A few highly predictive root causes may nevertheless generate disease within each patient. In this paper, we define patient-specific root causes as variables subject to exogenous "shocks" which go on to perturb an otherwise healthy system and induce disease. In other words, the variables are associated with the exogenous errors of a structural equation model (SEM), and these errors predict a downstream diagnostic label. We quantify predictivity using sample-specific Shapley values. This derivation allows us to develop a fast algorithm called Root Causal Inference for identifying patient-specific root causes by extracting the error terms of a linear SEM and then computing the Shapley value associated with each error. Experiments highlight considerable improvements in accuracy because the method uncovers root causes that may have large effect sizes at the individual level but clinically insignificant effect sizes at the group level. An R implementation is available at github.com/ericstrobl/RCI.

References

[1]

Constantin F Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D Koutsoukos. 2010. Local causal and Markov blanket induction for causal discovery and feature selection for classification part I: algorithms and empirical evaluation. Journal of Machine Learning Research 11, 1 (2010).

[2]

Kailash Budhathoki, Dominik Janzing, Patrick Bloebaum, and Hoiyi Ng. 2021. Why did the distribution change?. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 130), Arindam Banerjee and Kenji Fukumizu (Eds.). PMLR, 1666--1674. https://proceedings.mlr.press/v130/budhathoki21a.html

[3]

Zhengping Che, David Kale, Wenzhe Li, Mohammad Taha Bahadori, and Yan Liu. 2015. Deep computational phenotyping. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 507--516.

Digital Library

[4]

Charles Gardner Child. 1964. Surgery and portal hypertension. The Liver and Portal Hypertension (1964), 50--52.

[5]

Thomas R Dawber, Gilcin F Meadors, and Felix E Moore Jr. 1951. Epidemiological approaches to heart disease: the Framingham Study. American Journal of Public Health and the Nations Health 41, 3 (1951), 279--286. https://biolincc.nhlbi.nih.gov/studies/framcohort/

[6]

Thomas R Fleming and David P Harrington. 2011. Counting Processes and Survival Analysis. Vol. 169. John Wiley & Sons.

[7]

Asish Ghoshal, Kevin Bello, and Jean Honorio. 2019. Direct Learning with Guarantees of the Difference DAG Between Structural Equation Models. arXiv preprint arXiv:1906.12024 (2019).

[8]

Gideon M Hirschfield and M Eric Gershwin. 2013. The immunobiology and pathophysiology of primary biliary cirrhosis. Annual Review of Pathology: Mechanisms of Disease 8 (2013), 303--330.

[9]

Joyce C Ho, Joydeep Ghosh, and Jimeng Sun. 2014. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 115--124.

Digital Library

[10]

Biwei Huang, Kun Zhang, Jiji Zhang, Joseph D Ramsey, Ruben Sanchez-Romero, Clark Glymour, and Bernhard Schölkopf. 2020. Causal Discovery from Heterogeneous/Nonstationary Data. J. Mach. Learn. Res. 21, 89 (2020), 1--53.

[11]

Aapo Hyvärinen and Stephen M Smith. 2013. Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research 14, Jan (2013), 111--152.

[12]

Dominik Janzing, Kailash Budhathoki, Lenon Minorics, and Patrick Blöbaum. 2019. Causal structure based root cause analysis of outliers. arXiv preprint arXiv:1912.02724 (2019).

[13]

David Kale, Zhengping Che, Yan Liu, and Randall Wetzel. 2014. Computational discovery of physiomes in critically ill children using deep learning. In DMMI Workshop, AMIA, Vol. 2014.

[14]

David C Kale, Zhengping Che, Mohammad Taha Bahadori, Wenzhe Li, Yan Liu, and Randall Wetzel. 2015. Causal phenotype discovery via deep networks. In AMIA Annual Symposium Proceedings, Vol. 2015. American Medical Informatics Association, 677.

[15]

Thomas A Lasko, Joshua C Denny, and Mia A Levy. 2013. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PloS One 8, 6 (2013), e66341.

[16]

Thomas A Lasko and Diego A Mesa. 2019. Computational Phenotype Discovery via Probabilistic Independence. KDD Workshop on Applied Data Science for Healthcare (2019).

[17]

Steffen L Lauritzen, A Philip Dawid, Birgitte N Larsen, and H-G Leimer. 1990. Independence properties of directed Markov fields. Networks 20, 5 (1990), 491--505.

[18]

Keith R Loeb and Lawrence A Loeb. 2000. Significance of multiple mutations in cancer. Carcinogenesis 21, 3 (2000), 379--385.

[19]

Jorge A López-Velázquez, Norberto C Chávez-Tapia, Guadalupe Ponciano-Rodríguez, Vicente Sánchez-Valle, Stephen H Caldwell, Misael Uribe, and Nahum Méndez-Sánchez. 2014. Bilirubin alone as a biomarker for short-term mortality in acute-on-chronic liver failure: an important prognostic indicator. Annals of Hepatology 13, 1 (2014), 98--104.

[20]

Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 4768--4777.

[21]

Benjamin M Marlin, David C Kale, Robinder G Khemani, and Randall C Wetzel. 2012. Unsupervised pattern discovery in electronic health care data using probabilistic clustering models. In Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium. 389--398.

Digital Library

[22]

Joris M Mooij, Sara Magliacane, and Tom Claassen. 2020. Joint Causal Inference from Multiple Contexts. Journal of Machine Learning Research 21 (2020), 1--108.

[23]

Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan kaufmann.

[24]

Alexander Reisach, Christof Seiler, and Sebastian Weichwald. 2021. Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy to Game. Advances in Neural Information Processing Systems 34 (2021).

[25]

Peter Schulam, Fredrick Wigley, and Suchi Saria. 2015. Clustering longitudinal clinical marker trajectories from electronic health data: Applications to phenotyping and endotype discovery. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.

[26]

Shohei Shimizu, Patrik O Hoyer, Aapo Hyvärinen, Antti Kerminen, and Michael Jordan. 2006. A linear non-Gaussian acyclic model for causal discovery. The Journal of Machine Learning Research 7, 10 (2006).

[27]

Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvärinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, and Kenneth Bollen. 2011. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. The Journal of Machine Learning Research 12 (2011), 1225--1248.

Digital Library

[28]

Peter Spirtes, Clark N Glymour, Richard Scheines, and David Heckerman. 2000. Causation, Prediction, and Search. MIT press.

[29]

Alexander Statnikov, Jan Lemeir, and Constantin F Aliferis. 2013. Algorithms for discovery of multiple Markov boundaries. The Journal of Machine Learning Research 14, 1 (2013), 499--566.

Digital Library

[30]

Erik Štrumbelj and Igor Kononenko. 2014. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems 41, 3 (2014), 647--665.

Digital Library

[31]

Y Wang, C Squires, A Belyaeva, and C Uhler. 2018. Direct estimation of differences in causal graphs. Advances in Neural Information Processing Systems 31 (2018).

[32]

William Webber, Alistair Moffat, and Justin Zobel. 2010. A similarity measure for indefinite rankings. ACM Transactions on Information Systems (TOIS) 28, 4 (2010), 1--38.

Digital Library

[33]

Jiayu Zhou, Fei Wang, Jianying Hu, and Jieping Ye. 2014. From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 135--144.

Digital Library

[34]

Hui Zou. 2006. The adaptive lasso and its oracle properties. Journal of the American statistical association 101, 476 (2006), 1418--1429.

Cited By

Lasko TStrobl EStead W(2024)Why do probabilistic clinical models fail to transport between sitesnpj Digital Medicine10.1038/s41746-024-01037-47:1Online publication date: 1-Mar-2024
https://doi.org/10.1038/s41746-024-01037-4
Weilbach JGerwinn SBarsim KFränzle M(2024)Counterfactual-Based Root Cause Analysis for Dynamical SystemsMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70365-2_18(303-319)Online publication date: 22-Aug-2024
https://doi.org/10.1007/978-3-031-70365-2_18
Schultheiss CBühlmann P(2023)On the pitfalls of Gaussian likelihood scoring for causal discoveryJournal of Causal Inference10.1515/jci-2022-006811:1Online publication date: 11-May-2023
https://doi.org/10.1515/jci-2022-0068
Show More Cited By

Recommendations

Counterfactual formulation of patient-specific root causes of disease
Abstract Objective:
Root causes of disease intuitively correspond to root vertices of a causal model that increase the likelihood of a diagnosis. This description of a root cause nevertheless lacks the rigorous mathematical formulation needed for the ...
Graphical abstract

Display Omitted
Empirical study of root cause analysis of software failure

Root Cause Analysis (RCA) is the process of identifying project issues, correcting them and taking preventive actions to avoid occurrences of such issues in the future. Issues could be variance in schedule, effort, cost, productivity, expected results ...
Mitigating pathogenesis for target discovery and disease subtyping
Abstract
Treatments ideally mitigate pathogenesis, or the detrimental effects of the root causes of disease. However, existing definitions of treatment effect fail to account for pathogenic mechanism. We therefore introduce the Treated Root causal Effects ...
Highlights
- Treatments ideally mitigate pathogenesis, but existing definitions of treatment effect fail to account for pathogenic mechanism.
- We introduce the treated root causal effects metric that measures the ability of a treatment to modify ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

August 2022

549 pages

ISBN:9781450393867

DOI:10.1145/3535508

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGBIOM: ACM Special Interest Group on Biomedical Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

BCB '22

Sponsor:

SIGBIOM

BCB '22: 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

August 7 - 10, 2022

Illinois, Northbrook

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
118
Total Downloads

Downloads (Last 12 months)33
Downloads (Last 6 weeks)3

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lasko TStrobl EStead W(2024)Why do probabilistic clinical models fail to transport between sitesnpj Digital Medicine10.1038/s41746-024-01037-47:1Online publication date: 1-Mar-2024
https://doi.org/10.1038/s41746-024-01037-4
Weilbach JGerwinn SBarsim KFränzle M(2024)Counterfactual-Based Root Cause Analysis for Dynamical SystemsMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70365-2_18(303-319)Online publication date: 22-Aug-2024
https://doi.org/10.1007/978-3-031-70365-2_18
Schultheiss CBühlmann P(2023)On the pitfalls of Gaussian likelihood scoring for causal discoveryJournal of Causal Inference10.1515/jci-2022-006811:1Online publication date: 11-May-2023
https://doi.org/10.1515/jci-2022-0068
Strobl EWang MYoon B(2023)Root Causal Inference from Single Cell RNA Sequencing with the Negative BinomialProceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/3584371.3612972(1-10)Online publication date: 3-Sep-2023
https://dl.acm.org/doi/10.1145/3584371.3612972
Hyvärinen AKhemakhem IMonti R(2023)Identifiability of latent-variable and structural-equation models: from linear to nonlinearAnnals of the Institute of Statistical Mathematics10.1007/s10463-023-00884-476:1(1-33)Online publication date: 4-Nov-2023
https://doi.org/10.1007/s10463-023-00884-4
Kikuchi G(2022)Differentiable Causal Discovery Under Heteroscedastic NoiseNeural Information Processing10.1007/978-3-031-30105-6_24(284-295)Online publication date: 22-Nov-2022
https://dl.acm.org/doi/10.1007/978-3-031-30105-6_24

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents