Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3535508.3545553acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Identifying patient-specific root causes of disease

Published: 07 August 2022 Publication History

Abstract

Complex diseases are caused by a multitude of factors that may differ between patients. As a result, hypothesis tests comparing all patients to all healthy controls can detect many significant variables with inconsequential effect sizes. A few highly predictive root causes may nevertheless generate disease within each patient. In this paper, we define patient-specific root causes as variables subject to exogenous "shocks" which go on to perturb an otherwise healthy system and induce disease. In other words, the variables are associated with the exogenous errors of a structural equation model (SEM), and these errors predict a downstream diagnostic label. We quantify predictivity using sample-specific Shapley values. This derivation allows us to develop a fast algorithm called Root Causal Inference for identifying patient-specific root causes by extracting the error terms of a linear SEM and then computing the Shapley value associated with each error. Experiments highlight considerable improvements in accuracy because the method uncovers root causes that may have large effect sizes at the individual level but clinically insignificant effect sizes at the group level. An R implementation is available at github.com/ericstrobl/RCI.

References

[1]
Constantin F Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D Koutsoukos. 2010. Local causal and Markov blanket induction for causal discovery and feature selection for classification part I: algorithms and empirical evaluation. Journal of Machine Learning Research 11, 1 (2010).
[2]
Kailash Budhathoki, Dominik Janzing, Patrick Bloebaum, and Hoiyi Ng. 2021. Why did the distribution change?. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 130), Arindam Banerjee and Kenji Fukumizu (Eds.). PMLR, 1666--1674. https://proceedings.mlr.press/v130/budhathoki21a.html
[3]
Zhengping Che, David Kale, Wenzhe Li, Mohammad Taha Bahadori, and Yan Liu. 2015. Deep computational phenotyping. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 507--516.
[4]
Charles Gardner Child. 1964. Surgery and portal hypertension. The Liver and Portal Hypertension (1964), 50--52.
[5]
Thomas R Dawber, Gilcin F Meadors, and Felix E Moore Jr. 1951. Epidemiological approaches to heart disease: the Framingham Study. American Journal of Public Health and the Nations Health 41, 3 (1951), 279--286. https://biolincc.nhlbi.nih.gov/studies/framcohort/
[6]
Thomas R Fleming and David P Harrington. 2011. Counting Processes and Survival Analysis. Vol. 169. John Wiley & Sons.
[7]
Asish Ghoshal, Kevin Bello, and Jean Honorio. 2019. Direct Learning with Guarantees of the Difference DAG Between Structural Equation Models. arXiv preprint arXiv:1906.12024 (2019).
[8]
Gideon M Hirschfield and M Eric Gershwin. 2013. The immunobiology and pathophysiology of primary biliary cirrhosis. Annual Review of Pathology: Mechanisms of Disease 8 (2013), 303--330.
[9]
Joyce C Ho, Joydeep Ghosh, and Jimeng Sun. 2014. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 115--124.
[10]
Biwei Huang, Kun Zhang, Jiji Zhang, Joseph D Ramsey, Ruben Sanchez-Romero, Clark Glymour, and Bernhard Schölkopf. 2020. Causal Discovery from Heterogeneous/Nonstationary Data. J. Mach. Learn. Res. 21, 89 (2020), 1--53.
[11]
Aapo Hyvärinen and Stephen M Smith. 2013. Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research 14, Jan (2013), 111--152.
[12]
Dominik Janzing, Kailash Budhathoki, Lenon Minorics, and Patrick Blöbaum. 2019. Causal structure based root cause analysis of outliers. arXiv preprint arXiv:1912.02724 (2019).
[13]
David Kale, Zhengping Che, Yan Liu, and Randall Wetzel. 2014. Computational discovery of physiomes in critically ill children using deep learning. In DMMI Workshop, AMIA, Vol. 2014.
[14]
David C Kale, Zhengping Che, Mohammad Taha Bahadori, Wenzhe Li, Yan Liu, and Randall Wetzel. 2015. Causal phenotype discovery via deep networks. In AMIA Annual Symposium Proceedings, Vol. 2015. American Medical Informatics Association, 677.
[15]
Thomas A Lasko, Joshua C Denny, and Mia A Levy. 2013. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PloS One 8, 6 (2013), e66341.
[16]
Thomas A Lasko and Diego A Mesa. 2019. Computational Phenotype Discovery via Probabilistic Independence. KDD Workshop on Applied Data Science for Healthcare (2019).
[17]
Steffen L Lauritzen, A Philip Dawid, Birgitte N Larsen, and H-G Leimer. 1990. Independence properties of directed Markov fields. Networks 20, 5 (1990), 491--505.
[18]
Keith R Loeb and Lawrence A Loeb. 2000. Significance of multiple mutations in cancer. Carcinogenesis 21, 3 (2000), 379--385.
[19]
Jorge A López-Velázquez, Norberto C Chávez-Tapia, Guadalupe Ponciano-Rodríguez, Vicente Sánchez-Valle, Stephen H Caldwell, Misael Uribe, and Nahum Méndez-Sánchez. 2014. Bilirubin alone as a biomarker for short-term mortality in acute-on-chronic liver failure: an important prognostic indicator. Annals of Hepatology 13, 1 (2014), 98--104.
[20]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 4768--4777.
[21]
Benjamin M Marlin, David C Kale, Robinder G Khemani, and Randall C Wetzel. 2012. Unsupervised pattern discovery in electronic health care data using probabilistic clustering models. In Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium. 389--398.
[22]
Joris M Mooij, Sara Magliacane, and Tom Claassen. 2020. Joint Causal Inference from Multiple Contexts. Journal of Machine Learning Research 21 (2020), 1--108.
[23]
Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan kaufmann.
[24]
Alexander Reisach, Christof Seiler, and Sebastian Weichwald. 2021. Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy to Game. Advances in Neural Information Processing Systems 34 (2021).
[25]
Peter Schulam, Fredrick Wigley, and Suchi Saria. 2015. Clustering longitudinal clinical marker trajectories from electronic health data: Applications to phenotyping and endotype discovery. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.
[26]
Shohei Shimizu, Patrik O Hoyer, Aapo Hyvärinen, Antti Kerminen, and Michael Jordan. 2006. A linear non-Gaussian acyclic model for causal discovery. The Journal of Machine Learning Research 7, 10 (2006).
[27]
Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvärinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, and Kenneth Bollen. 2011. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. The Journal of Machine Learning Research 12 (2011), 1225--1248.
[28]
Peter Spirtes, Clark N Glymour, Richard Scheines, and David Heckerman. 2000. Causation, Prediction, and Search. MIT press.
[29]
Alexander Statnikov, Jan Lemeir, and Constantin F Aliferis. 2013. Algorithms for discovery of multiple Markov boundaries. The Journal of Machine Learning Research 14, 1 (2013), 499--566.
[30]
Erik Štrumbelj and Igor Kononenko. 2014. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems 41, 3 (2014), 647--665.
[31]
Y Wang, C Squires, A Belyaeva, and C Uhler. 2018. Direct estimation of differences in causal graphs. Advances in Neural Information Processing Systems 31 (2018).
[32]
William Webber, Alistair Moffat, and Justin Zobel. 2010. A similarity measure for indefinite rankings. ACM Transactions on Information Systems (TOIS) 28, 4 (2010), 1--38.
[33]
Jiayu Zhou, Fei Wang, Jianying Hu, and Jieping Ye. 2014. From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 135--144.
[34]
Hui Zou. 2006. The adaptive lasso and its oracle properties. Journal of the American statistical association 101, 476 (2006), 1418--1429.

Cited By

View all
  • (2024)Why do probabilistic clinical models fail to transport between sitesnpj Digital Medicine10.1038/s41746-024-01037-47:1Online publication date: 1-Mar-2024
  • (2024)Counterfactual-Based Root Cause Analysis for Dynamical SystemsMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70365-2_18(303-319)Online publication date: 22-Aug-2024
  • (2023)On the pitfalls of Gaussian likelihood scoring for causal discoveryJournal of Causal Inference10.1515/jci-2022-006811:1Online publication date: 11-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
August 2022
549 pages
ISBN:9781450393867
DOI:10.1145/3535508
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. causal inference
  2. observational data
  3. precision medicine
  4. root cause

Qualifiers

  • Research-article

Funding Sources

Conference

BCB '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)3
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Why do probabilistic clinical models fail to transport between sitesnpj Digital Medicine10.1038/s41746-024-01037-47:1Online publication date: 1-Mar-2024
  • (2024)Counterfactual-Based Root Cause Analysis for Dynamical SystemsMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70365-2_18(303-319)Online publication date: 22-Aug-2024
  • (2023)On the pitfalls of Gaussian likelihood scoring for causal discoveryJournal of Causal Inference10.1515/jci-2022-006811:1Online publication date: 11-May-2023
  • (2023)Root Causal Inference from Single Cell RNA Sequencing with the Negative BinomialProceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/3584371.3612972(1-10)Online publication date: 3-Sep-2023
  • (2023)Identifiability of latent-variable and structural-equation models: from linear to nonlinearAnnals of the Institute of Statistical Mathematics10.1007/s10463-023-00884-476:1(1-33)Online publication date: 4-Nov-2023
  • (2022)Differentiable Causal Discovery Under Heteroscedastic NoiseNeural Information Processing10.1007/978-3-031-30105-6_24(284-295)Online publication date: 22-Nov-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media