Abstract
Causal discovery is a primary focus in many fields. Various methods have been developed to mine causal relationships from observational data. Most of the methods are only capable of identifying individual causes without considering their interactions. However, in real life, many effects are due to multiple factors that interact with each other. Therefore, detecting the interactions between those causal factors is essential for understanding the real causal mechanisms. So far, there are no efficient data-driven approaches to discovering causal interactions from data, especially large data sets. In this paper, we propose a general data-driven framework and develop four algorithms instantiated from the framework to detect causal interactions, directly from data. Extensive experiments on both synthetic and real-world data have shown that the proposed framework and the algorithms can achieve high effectiveness and efficiency for causal interaction discovery.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ahrens, W., Krickeberg, K., Pigeot, I.: An introduction to epidemiology. In: Ahrens, W., Pigeot, I. (eds.) Handbook of Epidemiology, pp 1–40. Springer, Berlin (2005)
Bartel, D.P.: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116(2), 281–297 (2004)
Dao, B., Nguyen, T., Venkatesh, S., Phung, D.: Latent sentiment topic modelling and nonparametric discovery of online mental health-related communities. Int. J. Data Sci. Anal. 4(3), 209–31 (2017)
Eberhardt, F.: Introduction to the foundations of causal discovery. Int. J. Data Sci. Anal. 3(2), 81–91 (2017)
Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, New York (2013)
Hahn, L.W., Ritchie, M.D., Moore, J.H.: Multifactor dimensionality reduction software for detecting gene–gene and gene-environment interactions. Bioinformatics 19(3), 376–382 (2003)
Hastie, T., Tibshirani, R., Narasimhan, B., Chu, G.: Package ‘impute’ (2016). https://bioconductor.org/packages/release/bioc/manuals/impute/man/impute.pdf
Hunter, D.J.: Gene-environment interactions in human diseases. Nat. Rev. Genet. 6(4), 287–298 (2005)
Imbens, G.W.: The role of the propensity score in estimating dose–response functions. Biometrika 87(3), 706–710 (2000)
Jiang, X., Neapolitan, R.E., Barmada, M.M., Visweswaran, S., Cooper, G.F.: A fast algorithm for learning epistatic genomic relationships. AMIA Ann. Symp. Proc. 2010, 341–345 (2010)
Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Knol, M.J., VanderWeele, T.J., Groenwold, R.H.H., Klungel, O.H., Rovers, M.M., Grobbee, D.E.: Estimating measures of interaction on an additive scale for preventive exposures. Eur. J. Epidemiol. 26(6), 433–438 (2011)
Kupper, L.L., Hogan, M.D.: Interaction in epidemiologic studies. Am. J. Epidemiol. 108(6), 447–453 (1978)
Le, T.D., Zhang, J., Liu, L., Li, J.: Ensemble methods for miRNA target prediction from expression data. PLoS ONE 10(6), e0131-627 (2015)
Lewis, B.P., Shih, I.H., Jones-Rhoades, M.W., Bartel, D.P., Burge, C.B.: Prediction of mammalian microRNA targets. Cell 115(7), 787–798 (2003)
Li, J., Le, T.D., Liu, L., Liu, J., Jin, Z., Sun, B., Ma, S.: From observational studies to causal rule mining. ACM Trans. Intell. Syst. Technol. 7(2), 14 (2015)
Li, J., Ma, S., Le, T., Liu, L., Liu, J.: Causal decision trees. IEEE Trans. Knowl. Data Eng. PP(99), 1–14 (2016)
Liddell, F.D.K.: The interaction of asbestos and smoking in lung cancer. Ann. Occup. Hyg. 45(5), 341–356 (2001)
Ma, S., Li, J., Liu, L., Le, T.D.: Discovering Context Specific Causal Relationships. arXiv preprint arXiv:1808.06316 (2018)
Ma, S., Li, J., Liu, L., Le, T.D.: Mining combined causes in large data sets. Knowl. Based Syst. 92, 104–111 (2016)
Miller, D.J., Zhang, Y., Yu, G., Liu, Y., Chen, L., Langefeld, C.D., Herrington, D., Wang, Y.: An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics 25(19), 2478–2485 (2009)
Novick, L.R., Cheng, P.W.: Assessing interactive causal influence. Psychol. Rev. 111(2), 455 (2004)
Pearl, J.: Causality: Models, Reasoning and Inference. Cambridge University Press, Cambridge (2000)
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., Smyth, G.K.: Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47 (2015)
Robins, J.M.: Marginal structural models versus structural nested models as tools for causal inference. In: Halloran, M.E., Berry, D. (eds.) Statistical Models in Epidemiology, the Environment, and Clinical Trials, pp 95–133. Springer, New York (2000)
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)
Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79(387), 516–524 (1984)
Rosenblum, M., van der Laan, M.J.: Optimizing randomized trial designs to distinguish which subpopulations benefit from treatment. Biometrika 98(4), 845–860 (2011)
Rothman, K.J.: Causes. Am. J. Epidemiol. 104(6), 587–592 (1976)
Rothman, K.J., Greenland, S., Lash, T.L.: Modern Epidemiology. Lippincott Williams & Wilkins, Philadelphia (2008)
Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688 (1974)
Song, J., Satoshi, O., Masahito, K.: Tell cause from effect: models and evaluation. Int. J. Data Sci. Anal. 4(2), 99–112 (2017)
Soulakis, N.D., Carson, M.B., Lee, Y.J., Schneider, D.H., Skeehan, C.T., Scholtens, D.M.: Visualizing collaborative electronic health record usage for hospitalized patients with heart failure. J. Am. Med. Inf. Assoc. 22(2), 299–311 (2015)
Van der Weele, T.J.: On the distinction between interaction and effect modification. Epidemiology 20(6), 863–871 (2009)
Van der Weele, T.J., Robins, J.M.: A theory of sufficient cause interactions. COBRA Preprint Series, p. 13 (2006)
Van der Weele, T.J., Robins, J.M.: Empirical and counterfactual conditions for sufficient cause interactions. Biometrika 95(1), 49–61 (2008)
Vimaleswaran, K.S., Power, C., Hyppnen, E.: Interaction between vitamin D receptor gene polymorphisms and 25-hydroxyvitamin D concentrations on metabolic and cardiovascular disease outcomes. Diabetes Metab. 40(5), 386–389 (2014)
White, P.A.: Causal judgement from contingency information: judging interactions between two causal candidates. Q. J. Exp. Psychol. Sect. A 55(3), 819–838 (2002)
Yang, S., Natarajan, S.: Knowledge intensive learning: combining qualitative constraints with causal independence for parameter learning in probabilistic models. In: Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, pp 580–595. Springer, Berlin (2013)
Acknowledgements
This work has been partially supported by Australian Research Council (ARC) Discovery grant DP140103617 and ARC Discovery grant DP170101306.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ma, S., Liu, L., Li, J. et al. Data-driven discovery of causal interactions. Int J Data Sci Anal 8, 285–297 (2019). https://doi.org/10.1007/s41060-018-0168-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-018-0168-0